24 November 2020 Textual restoration of occluded Tibetan document pages based on side-enhanced U-Net
Siqi Liu, Libiao Jin, Fang Miao
Author Affiliations +
Abstract

It is very challenging to recognize the information of occluded Tibetan document pages due to the lack of digitization and their long-term storage. Multiple pages are stuck, and textual characters are occluded with each other, which causes difficulties in restoration. Due to the large size of Tibetan documents, it is impossible to separate and repair these occluded pages by professionals. Therefore, the separation of overlapping pages and restoration of occluded pages play important roles in the digitization of Tibetan documents. We extract underlying pages by show-through scanning and eliminating the text area of top pages. In order to restore the occluded underlying pages, we present a side-enhanced U-Net (SEU-Net) that attaches side feature extraction module and side classification module to the U-Net to improve the classification of textual edges. Experiments performed on the dataset of Tibetan documents restoration patches show that SEU-Net is able to classify the textual pixels in the occluded pages accurately, and both side feature extraction module and side classification module improve performance independently.

© 2020 SPIE and IS&T 1017-9909/2020/$28.00© 2020 SPIE and IS&T
Siqi Liu, Libiao Jin, and Fang Miao "Textual restoration of occluded Tibetan document pages based on side-enhanced U-Net," Journal of Electronic Imaging 29(6), 063006 (24 November 2020). https://doi.org/10.1117/1.JEI.29.6.063006
Received: 10 July 2020; Accepted: 3 November 2020; Published: 24 November 2020
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Feature extraction

Convolution

Classification systems

Computer programming

Image classification

Image processing

Optical character recognition

Back to Top