[go: up one dir, main page]

Skip to main content
Log in

Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Speaker verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients, their first and second derivatives (Deltas and Delta–Deltas), Bark Frequency Cepstral Coefficients, Perceptual Linear Predictive, and Relative Spectral Transform Perceptual Linear Predictive. In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional support vector machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: linear kernel and Gaussian radial basis function kernel with a logistic regression classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Zanuy MF, Moreno EM (2005) State of the art in speaker recognition. IEEE Aerosp Electron Syst Mag 20(5):7–12

    Article  Google Scholar 

  2. Hébert M (2008) Text-dependent speaker recognition. Springer handbook of speech processing. Springer, Berlin, pp 743–762

    Chapter  Google Scholar 

  3. Gish H, Schmidt M (1994) Text-independent speaker identification. IEEE Signal Process Mag 11(4):18–32

    Article  Google Scholar 

  4. Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40

    Article  Google Scholar 

  5. Yutai W, Bo L, Xiaoqing J, Feng L, Lihao W (2009) Speaker recognition based on dynamic MFCC parameters. In: International conference on image analysis and signal processing (IASP), Taizhou, pp 406–409, 11–12 April 2009

  6. Sumithra MG, Devika AK (2012) A study on feature extraction techniques for text independent speaker identification. In: International conference on in computer communication and informatics (ICCCI), Coimbatore, pp 1–5, 10–12 January 2012

  7. Ambikairajah E (2007) Emerging features for speaker recognition. In: 6th International IEEE conference on information, communications and signal processing, Singapore, pp 1–7, 10–13 December 2007

  8. Campbell JP, Reynolds DA, Dunn RB (2003) Fusing high-and low-level features for speaker recognition. In: Proceedings of the of European conference on speech communication and technology (EUROSPEECH), Geneva, Switzerland, pp 2665–2668, September 2003

  9. Jawarkar NP, Holambe RS, Basu TK (2014) On the use of classifiers for text-independent speaker identification. In: 1st International conference on automation, control, energy and systems (ACES), Hooghy, pp 1–6, 1–2 February 2014

  10. Parveen S, Qadeer A, Green P (2000) Speaker recognition with recurrent neural networks, In: 6th International conference on spoken language processing (INTERSPEECH), Beijing, China, 16–20 October 2000

  11. Almaadeed N, Aggoun A, Amira A (2015) Speaker identification using multimodal neural networks and wavelet analysis. IET Biom 4(1):18–28

    Article  Google Scholar 

  12. Reynolds D, Quatieri T, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10(1):19–41

    Article  Google Scholar 

  13. Yaman S, Pelecanos J (2013) Using polynomial kernel support vector machines for speaker verification. IEEE Signal Process Lett 20(9):901–904

    Article  Google Scholar 

  14. Solera-Ureña R, Padrell-Sendra J, Martin-Iglesias D, Gallardo-Antolin A, Pelaez-Moreno C, Diaz-de-Maria F (2007) SVMs for automatic speech recognition: a survey. Progress in nonlinear speech processing, ser. Lecture notes in computer science, Berlin, Heidelberg, Germany, Springer, vol 4391, pp 190–216, May 2007

  15. Dehak R, Dehak N, Kenny P, Dumouchel P (2008) Kernel combination for SVM speaker verification. In: Speaker and language recognition workshop (Odyssey 2008), Stellenbosch, South Africa, 21–24 January 2008

  16. Farah S, Shamim A (2013) Speaker recognition system using Mel-frequency cepstrum coefficients, linear prediction coding and vector quantization. In: 3rd International conference on computer, Control and communication (IC4), Karachi, pp 1–5, 25–26 September 2013

  17. Gaafar T-S, Abo Bakr HM, Abdalla MI (2014) An improved method for speech/speaker recognition. In: International conference on informatics, electronics and vision (ICIEV), Dhaka, pp 1–5, 23–24 May 2014

  18. Maged H, Abou El-Farag A, Mesbah S (2014) Improving speaker identification system using discrete wavelet transform and AWGN. In: 5th IEEE International conference on software engineering and service science (ICSESS), Beijing, pp 1171–1176, 27–29 June 2014

  19. Campbell WM, Campbell JP, Reynolds DA, Singer E, Torres-Carrasquillo PA (2006) Support vector machines for speaker and language recognition. Comput Speech Lang 20(2):210–229

    Article  Google Scholar 

  20. Wang JC, Lian LX, Lin YY, Zhao JH (2015) VLSI design for SVM-based speaker verification system. IEEE Trans Very Large Scale Integr VLSI Syst 23(7):1355–1359

    Article  Google Scholar 

  21. Alarifi A, Alkurtass I, Alsalman AS (2012) SVM based Arabic Speaker verification system for mobile devices. In: International conference on information technology and e-services, Sousse, pp 1–6, 24–26 March 2012

  22. Wozniak M, Grana M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fus 16:3–17

    Article  Google Scholar 

  23. Webb AR (2003) Statistical pattern recognition. Wiley, New Jersey

    MATH  Google Scholar 

  24. Kittler J, Hatef M, Duin RPW, Matas J (1998) On Combining Classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239

    Article  Google Scholar 

  25. Davis SB, Mermelstein P (1980) Comparison of Parametric representations for Monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28:357–366

    Article  Google Scholar 

  26. Cutajar M, Gatt E, Grech I, Casha O, Micallef J (2013) Comparative study of automatic speech recognition techniques. IET Signal Proc 7(1):25–46

    Article  Google Scholar 

  27. Sharma U, Maheshkar S, Mishra AN (2015) Study of robust feature extraction techniques for speech recognition system. In: International conference on futuristic trends on computational analysis and knowledge management (ABLAZE), Noida, pp 654–658, February 2015

  28. Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am (JASA) 87:1738–1752

    Article  Google Scholar 

  29. Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol 1(6):1–5

    Google Scholar 

  30. Sahidullah M, Chakroborty S, Saha G (2010) On the use of perceptual line spectral pairs frequencies for speaker identification. Int J Biom 2(4):358–378

    Article  Google Scholar 

  31. Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2:578–589

    Article  Google Scholar 

  32. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. J Stat Comput 14(3):199–222

    Article  MathSciNet  Google Scholar 

  33. Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72

    MathSciNet  MATH  Google Scholar 

  34. Burges CJ (1998) A tutorial on support vector machines for pattern recognition. J Data Min Knowl Discov 2(2):121–167

    Article  Google Scholar 

  35. McCullagh P, Nelder JA (1989) Generalized linear models, vol 37. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  36. Dobson AJ (1990) “An introduction to generalized linear models”, University of Newcastle, New South Wales, Australia. Chapman and Hall Ltd., London

    Book  Google Scholar 

  37. Hosmer DW Jr, Lemeshow S (2004) Applied logistic regression. Wiley, New Jersey

    MATH  Google Scholar 

  38. Feng L, Hansen LK (2005) A new database for speaker recognition. IMM, Informatics and Mathematical Modelling, DTU

    Google Scholar 

  39. http://www.pbs.org/wgbh/nova/pyramid (Online). Last accessed 01 March 2016

  40. Micheloni C, Canazza S, Foresti GL (2009) Audio–video biometric recognition for non-collaborative access granting. J Vis Lang Comput 20(6):353–367

    Article  Google Scholar 

  41. May T, van de Par S, Kohlrausch A (2012) Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans Audio Speech Lang Process 20(1):108–121

    Article  Google Scholar 

  42. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159

    Article  Google Scholar 

  43. García-Perera LP, Raj B, Nolazco Flores JA (2013) Optimization of the DET curve in speaker verification under noisy conditions. In: IEEE International conference on acoustics, speech and signal processing (ICASSP) 2013, pp 7765–7769, Vancouver, Canada, 26–31 May 2013

  44. Markaki M, Stylianou Y (2011) Voice pathology detection and discrimination based on modulation spectral features. IEEE Trans Audio Speech Lang Process 19(7):1938–1948

    Article  Google Scholar 

  45. Uzan L, Wolf L (2015) I know that voice: Identifying the voice actor behind the voice. In: International conference on biometrics (ICB), pp 46–51, May 2015

  46. Lan Y, Hu Z, Soh YC, Huang GB (2013) An extreme learning machine approach for speaker recognition. J Neur Comput Appl 22(3–4):417–425

    Article  Google Scholar 

  47. Ellis DPW (2016) PLP and RASTA (and MFCC, and inversion) in Matlab, (Online). http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/. Accessed 01 March 2016

  48. Zeppelzauer M (2005) Discrimination and retrieval of animal sounds, Master Dissertation, 2005

  49. Openshaw JP, Sun ZP, Mason JS (1993) A comparison of composite features under degraded speech in speaker recognition. In: IEEE International conference on acoustics, speech, and signal processing (ICASSP-93), vol 2, pp 371–374, 1993

  50. Furui S (1986) Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans Acoust Speech Signal Process 34(1):52–59

    Article  Google Scholar 

  51. Aronowitz H (2010) Unsupervised compensation of intra-session intra-speaker variability for speaker diarization. In: Speaker and language recognition workshop, Odyssey 2010, pp 138–145, Brno, Czech Republic, 28 June–1 July 2010

  52. Mazaira-Fernández LM, Álvarez-Marquina A, Gómez-Vilda P (2015) Improving speaker recognition by biometric voice deconstruction. Front Bioeng Biotechnol 3:1–19

    Article  Google Scholar 

  53. Chougule SV, Chavan MS (2015) Robust spectral features for automatic speaker recognition in mismatch condition. Proc Comput Sci 58:272–279

    Article  Google Scholar 

  54. Kamath SD, Loizou PC (2002) A multiband spectral subtraction method for enhancing speech corrupted by colored noise. In: IEEE International conference on acoustics, speech, and signal processing (ICASSP), USA, vol 4, pp 4164–4167, May 2002

Download references

Acknowledgments

The authors want to thank Erasmus Mundus “Green-IT” program for its grant for providing the funding for this work. This work has also been partially supported by the Spanish Government Grant TEC2014-53390-P and by the Regional Government of Madrid S2013/ICE-2845-CASI-CAM–CM project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kerlos Atia Abdalmalak.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdalmalak, K.A., Gallardo-Antolín, A. Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers. Neural Comput & Applic 29, 637–651 (2018). https://doi.org/10.1007/s00521-016-2470-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-016-2470-x

Keywords

Navigation