Paper
2 May 2024 ATTN: towards practical automated tabular semantic analysis
Author Affiliations +
Proceedings Volume 13164, International Workshop on Advanced Imaging Technology (IWAIT) 2024; 1316415 (2024) https://doi.org/10.1117/12.3018449
Event: International Workshop on Advanced Imaging Technology (IWAIT) 2024, 2024, Langkawi, Malaysia
Abstract
Access to printed copies of documents is only available in many organisations due to legal restrictions. Digitalising these documents has several challenges, such as overlapping texts and cancellations due to manual editing, varying layouts, low contrast, physical damages, and high cost for cloud-based (e.g., AWS) bulk processing. This paper introduces a low-cost practical method for analysing tabular semantics in printed document digitisation. We propose to first extract the text labels followed by text values and table structure semantics, then refine the extraction. Our method leverages Fuzzy matching, and Spatial hashing to facilitate the extraction. The results showcase that our method is effective and efficient with less than 1 cent/page cost on AWS.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Kan Chen, Teck Wei Low, and Alex Q. Chen "ATTN: towards practical automated tabular semantic analysis", Proc. SPIE 13164, International Workshop on Advanced Imaging Technology (IWAIT) 2024, 1316415 (2 May 2024); https://doi.org/10.1117/12.3018449
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Semantics

Printing

Clouds

Optical character recognition

Deep learning

Education and training

Evolutionary algorithms

Back to Top