Paper
16 February 2022 A multi-scale deformable convolution network model for text recognition
Lang Cheng, Junhong Yan, Minghui Chen, Yuanwen Lu, Yunhong Li, Lei Hu
Author Affiliations +
Proceedings Volume 12083, Thirteenth International Conference on Graphics and Image Processing (ICGIP 2021); 1208320 (2022) https://doi.org/10.1117/12.2623370
Event: Thirteenth International Conference on Graphics and Image Processing (ICGIP 2021), 2021, Kunming, China
Abstract
Natural scene text recognition is one of the most challenging tasks in recent years. Compared with traditional document text, natural scene text has the characteristics of various shapes and different directions, so the accuracy of scene text recognition still needs to be improved. In order to locate the text region better and identify the text content more accurate, we present a multi-scale deformable convolution network model for text recognition. The initial image is irregularly corrected through the rectified network, and the ResNet with FPN structure is used as the backbone network to achieve multi-scale feature extraction. In addition, the feature fusion method of Add is adopted to reduce feature information losing and increase the strength of feature extraction in the text area. The deformable convolution block is introduced in the deep convolution to improve the deformation modeling ability of convolution and expand the receptive field. The prediction module adopts the Transformer and abandons the inherent pre and post attributes of RNN to realize parallel operation and solve the problem of path length between remote dependencies. In order to evaluate the effectiveness of the proposed method, we trained our model on two mixed data sets, MJSynth and SynthText, and tested it on some regular and irregular data sets. The experiment results demonstrate that this method performs well in irregular scene text recognition, especially in CUTE80.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Lang Cheng, Junhong Yan, Minghui Chen, Yuanwen Lu, Yunhong Li, and Lei Hu "A multi-scale deformable convolution network model for text recognition", Proc. SPIE 12083, Thirteenth International Conference on Graphics and Image Processing (ICGIP 2021), 1208320 (16 February 2022); https://doi.org/10.1117/12.2623370
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Convolution

Feature extraction

Data modeling

Transformers

Computer programming

Associative arrays

Image enhancement

Back to Top