Open Access Open Access  Restricted Access Subscription or Fee Access

Speaker Variations and Vocal Disguise

Abhishek B.P., Diya E.S. Dinesh, Chandana S.


The process of recognizing the speaker based on parameters like pitch, loudness and other acoustic attributes is called speaker recognition. Speaker recognition is considered to be challenging. The voice changing apps facilitating vocal disguise have further made this process even difficult. The voice changing apps can induce variations, some of these variations may be predominantly different, even though these apps can disguise a person’s voice, certain parameters may stay real to the original/habitual samples. The current study was carried out with the aim of determining intra and inter-speaker differences in vocal disguise in six adult speakers. The habitual voice of these individuals was recorded and three variations were induced using a voice-changing app. As the sample size was limited, descriptive analysis was carried out for all the six participants. The first three formant frequencies were determined and the intra and inter-speaker differences were determined. The intra speaker differences were more when compared to inter speaker differences.


Variability, speaker recognition, formants, range, acoustic parameters

Full Text:



RAJU K, Anil Kumar Vuppala. A study on the emotional state of a speaker in voice bio-metrics [Internet]. ResearchGate. unknown; 2020 [cited 2023 Aug 24]. Available from:

Saeidi R., Huhtakallio I., & Alku P. Analysis of Face Mask Effect on Speaker Recognition. In Interspeech ((2016, September). pp. 1800-1804).

Zheng L, Li, J, Sun M., Zhang X. & Zheng, TF. When automatic voice disguise meets automatic speaker verification. IEEE Transactions on Information Forensics and Security, 2020, 16, 824-837.

San Segundo E., Alves H & Trinidad, M. F. CIVIL corpus: Voice quality for speaker forensic comparison. Procedia-Social and Behavioral Sciences, 2013, 95, 587-593.

Kajarekar, S. S., Bratt, H. Shriberg, E & R. Leon. A study of intentional voice modifications for evading automatic speaker recognition,” in 2006 IEEE International Workshop on The Speaker and Language Recognition (ODYSSEY’06), San Juan, 2006, pp. 1-6

Laroche J. Time and Pitch Scale Modification of Audio Signals [Internet]. ResearchGate. unknown; 2006 [cited 2023 Aug 24]. Available from:

Kawahara T. Speech analysis, modification and synthesis foundation STRAIGHT and its applications”, Computer Software, vol. 32, no. 3, pp.23-28, 2015

Tan T . The effect of voice disguise on automatic speaker recognition,” in 2010 3rd IEEE International Congress on Image and Signal Processing (CISP’10), Yantai, 2010, pp. 3538-3541

Dutoit T. High quality text-to-speech synthesis: a comparison of four candidate algorithms [Internet]. Proceedings of ICASSP ’94. IEEE International Conference on Acoustics, Speech and Signal Processing. ; 2017 [cited 2023 Aug 24]. Available from:

Jiang DN, Zhang W, Shen L, Cai LH. Prosody Analysis and Modeling for Emotional Speech Synthesis [Internet]. ResearchGate. IEEE (Institute of Electrical and Electronics Engineers); 2005 [cited 2023 Aug 24]. Available from:



  • There are currently no refbacks.

Copyright (c) 2023 Research & Reviews: A Journal of Health Professions

This Journal archive has been shifted to: