This paper proposes a study on whether the speaker’s body size (height, weight) and oral cavity (lip protrusion LP, lip opening LO, front cavity FC) characteristics can be predicted based on the acoustic features of speech. Firstly, Pearson’s correlation analysis was first conducted to examine the relationships between acoustic features and body size and oral cavity characteristics. Further, the effects of acoustic features in predicting body size and oral cavity characteristics were examined using random forest and decision tree models. The results showed that fundamental frequency statistics (i.e., mean, max, min) exhibited significant negative correlations with height, weight, and FC. Besides, good accuracies of classification in height, LP range, LO range, and FC range could be achieved based on the acoustic features. The findings in the current paper imply that acoustic features could be the potential features for identification of the speaker’s body size and oral cavity characteristics. This paper will not only contribute to the research and practices in forensic speaker profiling and but also provides foundations for the technology of automatic speaker recognition.
With the development of speech synthesis technology, the simulation of specific individual’s speech has gradually matured, synthetic speech is easily and perceptually recognized as real speech, which may occur frequently in illegal activities. To identify crimes, forensic technology is widely used such as comparing the formants, pitches, and rhythm. The present study aims to investigate whether the method of comparison of formants can recognize the differences between the perceptually similar synthetic speech (hereinafter “personal anchor” speech) and real speech. To this end, two young males and two young females from different dialect regions were recruited to read the same text. Their voices were recorded and used to generate four “personal anchors” by the software of sound spectrum and statistics analysis. The method of comparison of various parameters of the formant, including numerical statistical, stability analysis, and transitional segments feature were applied to analyze the differences between the real speech and the corresponding “personal anchors”. It was found that the numerical or stability analysis of formants was not sufficient to fully determine whether the speech was synthesized, while comparing the transitional segments of some specific syllables could efficiently detect the synthetic speech from the real speech.
Using a natural conversation paradigm, this study investigated the acoustic characteristics of Mandarin utterances in drug addicts. Twenty-one native speakers of Mandarin, including four heroin addicts, two 3,4-methylamphetamine (MDMA, also known as ecstasy) addicts, and 15 healthy controls without any history of drug abuse, were recruited for the speech production experiment. In comparison with the healthy controls, heroin addicts exhibited a higher mean F0, a lower mean intensity, a higher variability in both F0 and intensity, and a lower H1-H2, while MDMA addicts exhibited a higher variability in both F0 and intensity. Discriminant analysis based on these acoustic parameters further showed a good accuracy of differentiating the three groups of speakers. These findings provide the basis for future research into identifying drug addicts on the basis of speech signals.
Purpose: The current study aims to investigate the effect of orofacial motions for automatic attitude recognition Method: To achieve this goal, the movements of lips and jaw during the expressions of six common attitudes from 33 native Mandarin Chinese speakers were collected using ElectroMegnatic Articulography. The random forest classifications were then conducted for attitude recognition based on orofacial data. Results: the average rate of attitude recognition was 63.45%, and the identification accuracy was above 60% for almost every attitude. Besides, for the further classifications separately conducting on each pair of attitudes, the opposing attitudes within each attitude pair were all reasonably recognized (i.e., above 65%). Conclusion: The use of orofacial expressions in attitude expression could be valuable features for the technology of automatic attitude recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.