Abstract
Speaker verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients, their first and second derivatives (Deltas and Delta–Deltas), Bark Frequency Cepstral Coefficients, Perceptual Linear Predictive, and Relative Spectral Transform Perceptual Linear Predictive. In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional support vector machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: linear kernel and Gaussian radial basis function kernel with a logistic regression classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Zanuy MF, Moreno EM (2005) State of the art in speaker recognition. IEEE Aerosp Electron Syst Mag 20(5):7–12
Hébert M (2008) Text-dependent speaker recognition. Springer handbook of speech processing. Springer, Berlin, pp 743–762
Gish H, Schmidt M (1994) Text-independent speaker identification. IEEE Signal Process Mag 11(4):18–32
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40
Yutai W, Bo L, Xiaoqing J, Feng L, Lihao W (2009) Speaker recognition based on dynamic MFCC parameters. In: International conference on image analysis and signal processing (IASP), Taizhou, pp 406–409, 11–12 April 2009
Sumithra MG, Devika AK (2012) A study on feature extraction techniques for text independent speaker identification. In: International conference on in computer communication and informatics (ICCCI), Coimbatore, pp 1–5, 10–12 January 2012
Ambikairajah E (2007) Emerging features for speaker recognition. In: 6th International IEEE conference on information, communications and signal processing, Singapore, pp 1–7, 10–13 December 2007
Campbell JP, Reynolds DA, Dunn RB (2003) Fusing high-and low-level features for speaker recognition. In: Proceedings of the of European conference on speech communication and technology (EUROSPEECH), Geneva, Switzerland, pp 2665–2668, September 2003
Jawarkar NP, Holambe RS, Basu TK (2014) On the use of classifiers for text-independent speaker identification. In: 1st International conference on automation, control, energy and systems (ACES), Hooghy, pp 1–6, 1–2 February 2014
Parveen S, Qadeer A, Green P (2000) Speaker recognition with recurrent neural networks, In: 6th International conference on spoken language processing (INTERSPEECH), Beijing, China, 16–20 October 2000
Almaadeed N, Aggoun A, Amira A (2015) Speaker identification using multimodal neural networks and wavelet analysis. IET Biom 4(1):18–28
Reynolds D, Quatieri T, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10(1):19–41
Yaman S, Pelecanos J (2013) Using polynomial kernel support vector machines for speaker verification. IEEE Signal Process Lett 20(9):901–904
Solera-Ureña R, Padrell-Sendra J, Martin-Iglesias D, Gallardo-Antolin A, Pelaez-Moreno C, Diaz-de-Maria F (2007) SVMs for automatic speech recognition: a survey. Progress in nonlinear speech processing, ser. Lecture notes in computer science, Berlin, Heidelberg, Germany, Springer, vol 4391, pp 190–216, May 2007
Dehak R, Dehak N, Kenny P, Dumouchel P (2008) Kernel combination for SVM speaker verification. In: Speaker and language recognition workshop (Odyssey 2008), Stellenbosch, South Africa, 21–24 January 2008
Farah S, Shamim A (2013) Speaker recognition system using Mel-frequency cepstrum coefficients, linear prediction coding and vector quantization. In: 3rd International conference on computer, Control and communication (IC4), Karachi, pp 1–5, 25–26 September 2013
Gaafar T-S, Abo Bakr HM, Abdalla MI (2014) An improved method for speech/speaker recognition. In: International conference on informatics, electronics and vision (ICIEV), Dhaka, pp 1–5, 23–24 May 2014
Maged H, Abou El-Farag A, Mesbah S (2014) Improving speaker identification system using discrete wavelet transform and AWGN. In: 5th IEEE International conference on software engineering and service science (ICSESS), Beijing, pp 1171–1176, 27–29 June 2014
Campbell WM, Campbell JP, Reynolds DA, Singer E, Torres-Carrasquillo PA (2006) Support vector machines for speaker and language recognition. Comput Speech Lang 20(2):210–229
Wang JC, Lian LX, Lin YY, Zhao JH (2015) VLSI design for SVM-based speaker verification system. IEEE Trans Very Large Scale Integr VLSI Syst 23(7):1355–1359
Alarifi A, Alkurtass I, Alsalman AS (2012) SVM based Arabic Speaker verification system for mobile devices. In: International conference on information technology and e-services, Sousse, pp 1–6, 24–26 March 2012
Wozniak M, Grana M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fus 16:3–17
Webb AR (2003) Statistical pattern recognition. Wiley, New Jersey
Kittler J, Hatef M, Duin RPW, Matas J (1998) On Combining Classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Davis SB, Mermelstein P (1980) Comparison of Parametric representations for Monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28:357–366
Cutajar M, Gatt E, Grech I, Casha O, Micallef J (2013) Comparative study of automatic speech recognition techniques. IET Signal Proc 7(1):25–46
Sharma U, Maheshkar S, Mishra AN (2015) Study of robust feature extraction techniques for speech recognition system. In: International conference on futuristic trends on computational analysis and knowledge management (ABLAZE), Noida, pp 654–658, February 2015
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am (JASA) 87:1738–1752
Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol 1(6):1–5
Sahidullah M, Chakroborty S, Saha G (2010) On the use of perceptual line spectral pairs frequencies for speaker identification. Int J Biom 2(4):358–378
Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2:578–589
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. J Stat Comput 14(3):199–222
Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. J Data Min Knowl Discov 2(2):121–167
McCullagh P, Nelder JA (1989) Generalized linear models, vol 37. CRC Press, Boca Raton
Dobson AJ (1990) “An introduction to generalized linear models”, University of Newcastle, New South Wales, Australia. Chapman and Hall Ltd., London
Hosmer DW Jr, Lemeshow S (2004) Applied logistic regression. Wiley, New Jersey
Feng L, Hansen LK (2005) A new database for speaker recognition. IMM, Informatics and Mathematical Modelling, DTU
http://www.pbs.org/wgbh/nova/pyramid (Online). Last accessed 01 March 2016
Micheloni C, Canazza S, Foresti GL (2009) Audio–video biometric recognition for non-collaborative access granting. J Vis Lang Comput 20(6):353–367
May T, van de Par S, Kohlrausch A (2012) Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans Audio Speech Lang Process 20(1):108–121
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
García-Perera LP, Raj B, Nolazco Flores JA (2013) Optimization of the DET curve in speaker verification under noisy conditions. In: IEEE International conference on acoustics, speech and signal processing (ICASSP) 2013, pp 7765–7769, Vancouver, Canada, 26–31 May 2013
Markaki M, Stylianou Y (2011) Voice pathology detection and discrimination based on modulation spectral features. IEEE Trans Audio Speech Lang Process 19(7):1938–1948
Uzan L, Wolf L (2015) I know that voice: Identifying the voice actor behind the voice. In: International conference on biometrics (ICB), pp 46–51, May 2015
Lan Y, Hu Z, Soh YC, Huang GB (2013) An extreme learning machine approach for speaker recognition. J Neur Comput Appl 22(3–4):417–425
Ellis DPW (2016) PLP and RASTA (and MFCC, and inversion) in Matlab, (Online). http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/. Accessed 01 March 2016
Zeppelzauer M (2005) Discrimination and retrieval of animal sounds, Master Dissertation, 2005
Openshaw JP, Sun ZP, Mason JS (1993) A comparison of composite features under degraded speech in speaker recognition. In: IEEE International conference on acoustics, speech, and signal processing (ICASSP-93), vol 2, pp 371–374, 1993
Furui S (1986) Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans Acoust Speech Signal Process 34(1):52–59
Aronowitz H (2010) Unsupervised compensation of intra-session intra-speaker variability for speaker diarization. In: Speaker and language recognition workshop, Odyssey 2010, pp 138–145, Brno, Czech Republic, 28 June–1 July 2010
Mazaira-Fernández LM, Álvarez-Marquina A, Gómez-Vilda P (2015) Improving speaker recognition by biometric voice deconstruction. Front Bioeng Biotechnol 3:1–19
Chougule SV, Chavan MS (2015) Robust spectral features for automatic speaker recognition in mismatch condition. Proc Comput Sci 58:272–279
Kamath SD, Loizou PC (2002) A multiband spectral subtraction method for enhancing speech corrupted by colored noise. In: IEEE International conference on acoustics, speech, and signal processing (ICASSP), USA, vol 4, pp 4164–4167, May 2002
Acknowledgments
The authors want to thank Erasmus Mundus “Green-IT” program for its grant for providing the funding for this work. This work has also been partially supported by the Spanish Government Grant TEC2014-53390-P and by the Regional Government of Madrid S2013/ICE-2845-CASI-CAM–CM project.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abdalmalak, K.A., Gallardo-Antolín, A. Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers. Neural Comput & Applic 29, 637–651 (2018). https://doi.org/10.1007/s00521-016-2470-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-016-2470-x