Paper
20 October 2022 Study on Tibetan speech recognition based on ultrasonic tongue imaging, lip video, and audio
JinXi Zhang, ChunLing Li
Author Affiliations +
Proceedings Volume 12451, 5th International Conference on Computer Information Science and Application Technology (CISAT 2022); 124511D (2022) https://doi.org/10.1117/12.2656631
Event: 5th International Conference on Computer Information Science and Application Technology (CISAT 2022), 2022, Chongqing, China
Abstract
Speech recognition system is a pattern recognition system in essence, including feature extraction, pattern matching, reference pattern library and other three basic units.This paper implements a speech recognition system based on DNNHMM structure with Kaldi toolkit. For multiple pronunciators, it can have synchronous acquisition of the tongue body morphology, lipping, pronunciation and other multiple physiological signals of Tibetan Lhasa dialect and process the data for feature extraction, model training, voice recognition, etc. The result shows that recognition results may vary according to different pronunciator models or articulation parts and the pronunciators of different articulation parts have low dependent recognition error rate but high independent recognition error rate. It also shows that ultrasonic tongue imaging and lip video image sequence includes pronunciator’s acoustic feature information, which provide great significance for the study of voice recognition, particularly the speech synthesis and silent speech interface.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
JinXi Zhang and ChunLing Li "Study on Tibetan speech recognition based on ultrasonic tongue imaging, lip video, and audio", Proc. SPIE 12451, 5th International Conference on Computer Information Science and Application Technology (CISAT 2022), 124511D (20 October 2022); https://doi.org/10.1117/12.2656631
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Tongue

Ultrasonics

Laser induced plasma spectroscopy

Ultrasonography

Video

Speech recognition

Data modeling

Back to Top