It is very challenging to recognize the information of occluded Tibetan document pages due to the lack of digitization and their long-term storage. Multiple pages are stuck, and textual characters are occluded with each other, which causes difficulties in restoration. Due to the large size of Tibetan documents, it is impossible to separate and repair these occluded pages by professionals. Therefore, the separation of overlapping pages and restoration of occluded pages play important roles in the digitization of Tibetan documents. We extract underlying pages by show-through scanning and eliminating the text area of top pages. In order to restore the occluded underlying pages, we present a side-enhanced U-Net (SEU-Net) that attaches side feature extraction module and side classification module to the U-Net to improve the classification of textual edges. Experiments performed on the dataset of Tibetan documents restoration patches show that SEU-Net is able to classify the textual pixels in the occluded pages accurately, and both side feature extraction module and side classification module improve performance independently. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 2 scholarly publications.
Feature extraction
Convolution
Classification systems
Computer programming
Image classification
Image processing
Optical character recognition