Detection of L2 Mandarin vowel pronunciation errors based on multi-task learning and articulatory features

Yizhi Wu; Ran Ji

doi:10.1117/12.2680963

8 June 2023 Detection of L2 Mandarin vowel pronunciation errors based on multi-task learning and articulatory features

Yizhi Wu, Ran Ji

Proceedings Volume 12707, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2023); 1270757 (2023) https://doi.org/10.1117/12.2680963
Event: International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2023), 2023, Changsha, China

Abstract

Computer-assisted pronunciation training (CAPT) can meet the needs of second language learners for pronunciation training. Since articulatory features can reflect the intrinsic logic of the pronunciation process and show the specific process of pronunciation, it is of great significance to incorporate them into the CAPT system. To address the complexity of vowels in continuous Mandarin speech, we established a multi-task pronunciation feature recognition model that includes four pronunciation channels. Multiple acoustic features were used to train the multitask articulatory feature recognition model on the standard Chinese corpus, obtained the correlation between multiple pronunciation features, and the trained model was then applied to the detection of pronounciation errors for L2 learners. The experimental results of Biaobei corpus show that the combination of CNN-BLSTM and multi-task training can improve the recognition accuracy of various articulatory features by 0.1% to 3.19%. The model for detecting Mandarin vowel pronunciation errors and deviations was tested on samples of second language Mandarin speakers, and it was proven to be effective in detecting errors and deviations in Mandarin vowel pronunciation.

Citation Download Citation

Yizhi Wu and Ran Ji "Detection of L2 Mandarin vowel pronunciation errors based on multi-task learning and articulatory features", Proc. SPIE 12707, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2023), 1270757 (8 June 2023); https://doi.org/10.1117/12.2680963

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
6 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Education and training

Data modeling

Tongue

Acoustics

Error analysis

Speech recognition

RELATED CONTENT

A multi feature fusion based confidence estimation model for Chinese...
Proceedings of SPIE (December 28 2022)

Construction of intelligent evaluation model for Topic Talk in the...
Proceedings of SPIE (December 28 2022)

A feature lightweight method in optimized acoustic encoder
Proceedings of SPIE (June 01 2023)

Research on speech recognition technology for railway dispatching
Proceedings of SPIE (September 07 2023)

Speech motion anomaly detection via cross modal translation of 4D...
Proceedings of SPIE (May 01 2024)

Explicit noise hypotheses in speech recognition
Proceedings of SPIE (October 29 1993)

Research on English speaking assessment algorithms based on deep learning
Proceedings of SPIE (October 11 2023)

Subscribe to Digital Library

Receive Erratum Email Alert

Keywords/Phrases

Search In:

Publication Years