Paper
7 August 2024 BiC2V-NFLAT: Chinese toponym recognition based on Bi-Char2Vec and non-flat-lattice transformer
Manzhen Yang
Author Affiliations +
Proceedings Volume 13224, 4th International Conference on Internet of Things and Smart City (IoTSC 2024); 132241X (2024) https://doi.org/10.1117/12.3034819
Event: 4th International Conference on Internet of Things and Smart City, 2024, Hangzhou, China
Abstract
In response to the issues of missing context semantics and imbalanced datasets in Chinese toponym recognition tasks, this paper proposes a Chinese toponym recognition method based on Bi-Char2Vec and Non-Flat-Lattice Transformer. Initially, a new Chinese character embedding model, Bi-Char2Vec, is designed to capture the semantic representation of text in long sequences, mitigating the problem of missing context semantics in Chinese toponym recognition. Then, by utilizing Inter-Attention to interact between "character-word" attention, followed by encoding contextual information through the Self-Attention in Transformer; a global optimal tagging sequence is finally obtained by a Conditional Random Field layer. On public datasets, BIC2V-NFLAT improves the accuracy of Chinese toponym recognition, achieving an F1 score of 95%.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Manzhen Yang "BiC2V-NFLAT: Chinese toponym recognition based on Bi-Char2Vec and non-flat-lattice transformer", Proc. SPIE 13224, 4th International Conference on Internet of Things and Smart City (IoTSC 2024), 132241X (7 August 2024); https://doi.org/10.1117/12.3034819
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Transformers

Semantics

Performance modeling

Data modeling

Education and training

Neural networks

Back to Top