Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers

Kerlos Atia Abdalmalak^1,2 &
Ascensión Gallardo-Antolín²

702 Accesses
1 Altmetric
Explore all metrics

Abstract

Speaker verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients, their first and second derivatives (Deltas and Delta–Deltas), Bark Frequency Cepstral Coefficients, Perceptual Linear Predictive, and Relative Spectral Transform Perceptual Linear Predictive. In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional support vector machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: linear kernel and Gaussian radial basis function kernel with a logistic regression classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classifiers Ensemble of HMM and d-Vectors in Biometric Speaker Verification

Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes

Article Open access 18 July 2023

A Feature Level Fusion Scheme for Robust Speaker Identification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Zanuy MF, Moreno EM (2005) State of the art in speaker recognition. IEEE Aerosp Electron Syst Mag 20(5):7–12
Article Google Scholar
Hébert M (2008) Text-dependent speaker recognition. Springer handbook of speech processing. Springer, Berlin, pp 743–762
Chapter Google Scholar
Gish H, Schmidt M (1994) Text-independent speaker identification. IEEE Signal Process Mag 11(4):18–32
Article Google Scholar
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40
Article Google Scholar
Yutai W, Bo L, Xiaoqing J, Feng L, Lihao W (2009) Speaker recognition based on dynamic MFCC parameters. In: International conference on image analysis and signal processing (IASP), Taizhou, pp 406–409, 11–12 April 2009
Sumithra MG, Devika AK (2012) A study on feature extraction techniques for text independent speaker identification. In: International conference on in computer communication and informatics (ICCCI), Coimbatore, pp 1–5, 10–12 January 2012
Ambikairajah E (2007) Emerging features for speaker recognition. In: 6th International IEEE conference on information, communications and signal processing, Singapore, pp 1–7, 10–13 December 2007
Campbell JP, Reynolds DA, Dunn RB (2003) Fusing high-and low-level features for speaker recognition. In: Proceedings of the of European conference on speech communication and technology (EUROSPEECH), Geneva, Switzerland, pp 2665–2668, September 2003
Jawarkar NP, Holambe RS, Basu TK (2014) On the use of classifiers for text-independent speaker identification. In: 1st International conference on automation, control, energy and systems (ACES), Hooghy, pp 1–6, 1–2 February 2014
Parveen S, Qadeer A, Green P (2000) Speaker recognition with recurrent neural networks, In: 6th International conference on spoken language processing (INTERSPEECH), Beijing, China, 16–20 October 2000
Almaadeed N, Aggoun A, Amira A (2015) Speaker identification using multimodal neural networks and wavelet analysis. IET Biom 4(1):18–28
Article Google Scholar
Reynolds D, Quatieri T, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10(1):19–41
Article Google Scholar
Yaman S, Pelecanos J (2013) Using polynomial kernel support vector machines for speaker verification. IEEE Signal Process Lett 20(9):901–904
Article Google Scholar
Solera-Ureña R, Padrell-Sendra J, Martin-Iglesias D, Gallardo-Antolin A, Pelaez-Moreno C, Diaz-de-Maria F (2007) SVMs for automatic speech recognition: a survey. Progress in nonlinear speech processing, ser. Lecture notes in computer science, Berlin, Heidelberg, Germany, Springer, vol 4391, pp 190–216, May 2007
Dehak R, Dehak N, Kenny P, Dumouchel P (2008) Kernel combination for SVM speaker verification. In: Speaker and language recognition workshop (Odyssey 2008), Stellenbosch, South Africa, 21–24 January 2008
Farah S, Shamim A (2013) Speaker recognition system using Mel-frequency cepstrum coefficients, linear prediction coding and vector quantization. In: 3rd International conference on computer, Control and communication (IC4), Karachi, pp 1–5, 25–26 September 2013
Gaafar T-S, Abo Bakr HM, Abdalla MI (2014) An improved method for speech/speaker recognition. In: International conference on informatics, electronics and vision (ICIEV), Dhaka, pp 1–5, 23–24 May 2014
Maged H, Abou El-Farag A, Mesbah S (2014) Improving speaker identification system using discrete wavelet transform and AWGN. In: 5th IEEE International conference on software engineering and service science (ICSESS), Beijing, pp 1171–1176, 27–29 June 2014
Campbell WM, Campbell JP, Reynolds DA, Singer E, Torres-Carrasquillo PA (2006) Support vector machines for speaker and language recognition. Comput Speech Lang 20(2):210–229
Article Google Scholar
Wang JC, Lian LX, Lin YY, Zhao JH (2015) VLSI design for SVM-based speaker verification system. IEEE Trans Very Large Scale Integr VLSI Syst 23(7):1355–1359
Article Google Scholar
Alarifi A, Alkurtass I, Alsalman AS (2012) SVM based Arabic Speaker verification system for mobile devices. In: International conference on information technology and e-services, Sousse, pp 1–6, 24–26 March 2012
Wozniak M, Grana M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fus 16:3–17
Article Google Scholar
Webb AR (2003) Statistical pattern recognition. Wiley, New Jersey
MATH Google Scholar
Kittler J, Hatef M, Duin RPW, Matas J (1998) On Combining Classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Article Google Scholar
Davis SB, Mermelstein P (1980) Comparison of Parametric representations for Monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28:357–366
Article Google Scholar
Cutajar M, Gatt E, Grech I, Casha O, Micallef J (2013) Comparative study of automatic speech recognition techniques. IET Signal Proc 7(1):25–46
Article Google Scholar
Sharma U, Maheshkar S, Mishra AN (2015) Study of robust feature extraction techniques for speech recognition system. In: International conference on futuristic trends on computational analysis and knowledge management (ABLAZE), Noida, pp 654–658, February 2015
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am (JASA) 87:1738–1752
Article Google Scholar
Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol 1(6):1–5
Google Scholar
Sahidullah M, Chakroborty S, Saha G (2010) On the use of perceptual line spectral pairs frequencies for speaker identification. Int J Biom 2(4):358–378
Article Google Scholar
Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2:578–589
Article Google Scholar
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. J Stat Comput 14(3):199–222
Article MathSciNet Google Scholar
Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72
MathSciNet MATH Google Scholar
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. J Data Min Knowl Discov 2(2):121–167
Article Google Scholar
McCullagh P, Nelder JA (1989) Generalized linear models, vol 37. CRC Press, Boca Raton
Book MATH Google Scholar
Dobson AJ (1990) “An introduction to generalized linear models”, University of Newcastle, New South Wales, Australia. Chapman and Hall Ltd., London
Book Google Scholar
Hosmer DW Jr, Lemeshow S (2004) Applied logistic regression. Wiley, New Jersey
MATH Google Scholar
Feng L, Hansen LK (2005) A new database for speaker recognition. IMM, Informatics and Mathematical Modelling, DTU
Google Scholar
http://www.pbs.org/wgbh/nova/pyramid (Online). Last accessed 01 March 2016
Micheloni C, Canazza S, Foresti GL (2009) Audio–video biometric recognition for non-collaborative access granting. J Vis Lang Comput 20(6):353–367
Article Google Scholar
May T, van de Par S, Kohlrausch A (2012) Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans Audio Speech Lang Process 20(1):108–121
Article Google Scholar
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Article Google Scholar
García-Perera LP, Raj B, Nolazco Flores JA (2013) Optimization of the DET curve in speaker verification under noisy conditions. In: IEEE International conference on acoustics, speech and signal processing (ICASSP) 2013, pp 7765–7769, Vancouver, Canada, 26–31 May 2013
Markaki M, Stylianou Y (2011) Voice pathology detection and discrimination based on modulation spectral features. IEEE Trans Audio Speech Lang Process 19(7):1938–1948
Article Google Scholar
Uzan L, Wolf L (2015) I know that voice: Identifying the voice actor behind the voice. In: International conference on biometrics (ICB), pp 46–51, May 2015
Lan Y, Hu Z, Soh YC, Huang GB (2013) An extreme learning machine approach for speaker recognition. J Neur Comput Appl 22(3–4):417–425
Article Google Scholar
Ellis DPW (2016) PLP and RASTA (and MFCC, and inversion) in Matlab, (Online). http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/. Accessed 01 March 2016
Zeppelzauer M (2005) Discrimination and retrieval of animal sounds, Master Dissertation, 2005
Openshaw JP, Sun ZP, Mason JS (1993) A comparison of composite features under degraded speech in speaker recognition. In: IEEE International conference on acoustics, speech, and signal processing (ICASSP-93), vol 2, pp 371–374, 1993
Furui S (1986) Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans Acoust Speech Signal Process 34(1):52–59
Article Google Scholar
Aronowitz H (2010) Unsupervised compensation of intra-session intra-speaker variability for speaker diarization. In: Speaker and language recognition workshop, Odyssey 2010, pp 138–145, Brno, Czech Republic, 28 June–1 July 2010
Mazaira-Fernández LM, Álvarez-Marquina A, Gómez-Vilda P (2015) Improving speaker recognition by biometric voice deconstruction. Front Bioeng Biotechnol 3:1–19
Article Google Scholar
Chougule SV, Chavan MS (2015) Robust spectral features for automatic speaker recognition in mismatch condition. Proc Comput Sci 58:272–279
Article Google Scholar
Kamath SD, Loizou PC (2002) A multiband spectral subtraction method for enhancing speech corrupted by colored noise. In: IEEE International conference on acoustics, speech, and signal processing (ICASSP), USA, vol 4, pp 4164–4167, May 2002

Download references

Acknowledgments

The authors want to thank Erasmus Mundus “Green-IT” program for its grant for providing the funding for this work. This work has also been partially supported by the Spanish Government Grant TEC2014-53390-P and by the Regional Government of Madrid S2013/ICE-2845-CASI-CAM–CM project.

Author information

Authors and Affiliations

Electrical Engineering Department, Aswan University, Aswan, 81542, Egypt
Kerlos Atia Abdalmalak
Signal Theory and Communications Department, Carlos III University of Madrid, 28911, Leganes, Madrid, Spain
Kerlos Atia Abdalmalak & Ascensión Gallardo-Antolín

Authors

Kerlos Atia Abdalmalak
View author publications
You can also search for this author in PubMed Google Scholar
Ascensión Gallardo-Antolín
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kerlos Atia Abdalmalak.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdalmalak, K.A., Gallardo-Antolín, A. Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers. Neural Comput & Applic 29, 637–651 (2018). https://doi.org/10.1007/s00521-016-2470-x

Download citation

Received: 12 October 2015
Accepted: 06 July 2016
Published: 16 July 2016
Issue Date: February 2018
DOI: https://doi.org/10.1007/s00521-016-2470-x

Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classifiers Ensemble of HMM and d-Vectors in Biometric Speaker Verification

Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes

A Feature Level Fusion Scheme for Robust Speaker Identification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classifiers Ensemble of HMM and d-Vectors in Biometric Speaker Verification

Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes

A Feature Level Fusion Scheme for Robust Speaker Identification

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now