ROBUST SPEAKER IDENTIFICATION VIA TWO-STAGE VECTOR QUANTIZATION ENHANCEMENT
Keywords:
Speaker Identification, Vector Quantization, Feature ExtractionAbstract
Speaker identification systems are critical components of various applications, including security, authentication, and voice-controlled devices. However, their performance can be affected by environmental noise, channel distortions, and speaker variability. This paper presents an enhanced speaker identification system using two-stage vector quantization to improve robustness against such challenges. The proposed system employs a two-stage approach: first, the input speech features are quantized using vector quantization to reduce dimensionality and enhance discriminability. Then, a classifier is trained on the quantized feature vectors to perform speaker identification. Experimental results demonstrate that the two-stage vector quantization approach significantly improves the robustness of the speaker identification system, achieving higher accuracy even in noisy and adverse conditions.
Downloads
References
Atal, B., “Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification,” Journal of Acoustical Society America, Vol. 55, pp. 13041312 (1974).
White, G. M. and Neely, R. B., “Speech Recognition Experiments with Linear Prediction, Bandpass Filtering, and Dynamic Programming,” IEEE Trans. on Acoustics, Speech, Signal Processing, Vol. 24, pp. 183 188 (1976).
Vergin, R., O’Shaughnessy, D. and Farhat, A., “Generalized Mel Frequency Cepstral Coefficients for LargeVocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. on Speech and Audio Processing, Vol. 7, pp. 525532 (1999).
Furui, S., “Cepstral Analysis Technique for Automatic Speaker Verification,” IEEE Trans. on Acoustics, Speech, Signal Processing, Vol. 29, pp. 254272 (1981).
Tishby, N. Z., “On the Application of Mixture AR Hidden Markov Models to Text Independent Speaker Recognition,” IEEE Trans. on Signal Processing, Vol. 39, pp. 563570 (1991).
Yu, K., Mason, J. and Oglesby, J., “Speaker Recognition Using Hidden Markov Models, Dynamic Time Warping and Vector Quantisation,” IEE Proceedings – Vision, Image and Signal Processing, Vol. 142, pp. 313318 (1995).
Reynolds, D. A. and Rose, R. C., “Robust TextIndependent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Trans. on Speech and Audio Processing, Vol. 3, pp. 7283 (1995).
Miyajima, C., Hattori, Y., Tokuda, K., Masuko, T., Kobayashi, T. and Kitamura, T., “Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-space Probability Distribution,” IEICE Trans. on Information and Systems, Vol. E84- D, pp. 847855 (2001).
Alamo, C. M., Gil, F. J. C., Munilla, C. T. and Gomez, L. H., “Discriminative Training of GMM for Speaker Identification,” Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1996), Vol. 1, pp. 8992 (1996).
Pellom, B. L. and Hansen, J. H. L., “An Efficient Scoring Algorithm for Gaussian Mixture Model Based Speaker Identification,” IEEE Signal Processing Letters, Vol. 5, pp. 281284 (1998).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 Wan-Chen li

This work is licensed under a Creative Commons Attribution 4.0 International License.
All content published in the Journal of Applied Science and Social Science (JASSS) is protected by copyright. Authors retain the copyright to their work, and grant JASSS the right to publish the work under a Creative Commons Attribution License (CC BY). This license allows others to distribute, remix, adapt, and build upon the work, even commercially, as long as they credit the author(s) for the original creation.