[go: up one dir, main page]

Skip to main content

Advertisement

Log in

Determining the adaptation data saturation of ASR systems for dysarthric speakers

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Automatic speech recognition (ASR) systems are gradually accepted as the assistive technology for the physically impaired individuals such as speakers with dysarthria. Dysarthria is a motor speech impairment, where the muscles related to speech organs are weak, causing slow or no movement of the muscles. It is often accompanied by neurological conditions such as cerebral palsy, head injury, muscles dystrophy and multiple sclerosis. Using the ASR system to understand the spoken language of a speaker with dysarthia came with many advantages as compared to the conventional keyboard and mouse method. However, the development of an effective ASR system for this condition often limited by data sparsity in terms of coverage of the language or the size of the speech databases. To overcome the data sparsity issues, existing researchers proposed several solutions including the adaptation techniques such as MLLR and MAP. In this study, two types of adaptation techniques were considered, which includes the individual MLLR and MAP adaptation technique, as well as the combined adaptation technique (MLLR + MAP sequence, and MAP + MLLR sequence) to determine the saturation point of the adaptation data of dysarthric speech. The saturation point is identified using linear regression between the data size and the recognition accuracy. The results show that the saturation points are different for both individual MLLR and MAP adaptation technique, while the sequence of the combined adaptation technique influences the saturation points.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Allison, B., Guthrie, D., & Guthrie, L. (2006). Another look at the data sparsity problem. In: P. Sojka & I. Kopeček (Eds.), Text, speech and dialogue: 9th international conference, TSD 2006, Brno, Czech Republic, September 11–15, 2006 Proceedings (pp. 327–334). Berlin: Springer.

  • Al-Qatab, B. A., Mustafa, M. B., & Salim, S. S. (2014). Severity based adaptation for ASR to Aid dysarthric speakers. The 8th Asia Modelling Symposium (pp. 165–169).

  • Choi, D. L., Kim, B. W., Lee, Y. J., Um, Y., & Chung, M. (2011, October). Design and creation of dysarthric speech database for development of QoLT software technology. In IEEE 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA), on (pp. 47–50).

  • Christensen, H., Casanueva, I., Cunningham, S., Green, P., & Hain, T. (2014). Automatic selection of speakers for improved acoustic modelling: Recognition of disordered speech with sparse data. IEEE Spoken Language Technology Workshop (SLT), 2014, 254–259.

    Article  Google Scholar 

  • Darley, F. L., Aronson, A. E., & Brown, J. R. (1969). Differential diagnostic patterns of dysarthria. Journal of Speech, Language, and Hearing Research, 12, 246–269. https://doi.org/10.1044/jshr.1202.246.

    Article  Google Scholar 

  • De Wet, F., Kleynhans, N., Van Compernolle, D., & Sahraeian, R. (2017). Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems. South African Journal of Science, 113(1–2), 1–9.

    Google Scholar 

  • Digalakis, V. V., & Neumeyer, L. G. (1996). Speaker adaptation using combined transformation and Bayesian methods. Speech and Audio Processing, IEEE Transactions on, 4, 294–300. https://doi.org/10.1109/89.506933.

    Article  Google Scholar 

  • Diwakar, G., & Karjigi, V. (2020). Improving speech to text alignment based on repetition detection for dysarthric speech. Circuits, Systems, and Signal Processing, 39, 1–25.

    Article  Google Scholar 

  • Duffy, J. R. (2012). Motor speech disorders: Substrates, differential diagnosis, and management. Amsterdam: Elsevier Health Sciences.

    Google Scholar 

  • Enderby, P. (1980). Frenchay dysarthria assessment. British Journal of Disorders of Communication, 15, 165–173.

    Article  Google Scholar 

  • Ferras, M., Cheung-Chi, L., Barras, C., & Gauvain, J. (2010). Comparison of speaker adaptation methods as feature extraction for SVM-based speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18, 1366–1378. https://doi.org/10.1109/TASL.2009.2034187.

    Article  Google Scholar 

  • Gales, M. J. F. (2001). Adaptive training for robust ASR. In IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU '01 (pp. 15–20). https://doi.org/10.1006/csla.1996.0013

  • Garofolo, J. S., & Consortium, L. D. (1993). TIMIT: Acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.

    Google Scholar 

  • Gauvain, J., & Chin-Hui, L. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2, 291–298. https://doi.org/10.1109/89.279278.

    Article  Google Scholar 

  • Goronzy, S., & Kompe, R. (1999). A combined MAP + MLLR approach for speaker adaptation. Proceedings of the Sony Research Forum, 99, 1.

    Google Scholar 

  • Green, P., Carmichael, J., Hatzis, A., Enderby, P., Hawley, M. S., & Parker, M. (2003). Automatic speech recognition with sparse training data for dysarthric speakers (pp. 1189–1192). Geneva: INTERSPEECH.

    Google Scholar 

  • Imseng, D., Motlicek, P., Bourlard, H., & Garner, P. N. (2014). Using out-of-language data to improve an under-resourced speech recognizer. Speech Communication, 56, 142–151.

    Article  Google Scholar 

  • Joy, N. M., & Umesh, S. (2018). Improving acoustic models in TORGO dysarthric speech database. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26, 637–645. https://doi.org/10.1109/TNSRE.2018.2802914.

    Article  Google Scholar 

  • Kayasith, P., & Theeramunkong, T. (2009). Speech confusion index: A confusion-based speech quality indicator and recognition rate prediction for dysarthria. Computers & Mathematics with Applications, 58, 1534–1549. https://doi.org/10.1016/j.camwa.2009.06.051.

    Article  MATH  Google Scholar 

  • Kent, R. D., Weismer, G., Kent, J. F., & Rosenbek, J. C. (1989). Toward phonetic intelligibility testing in dysarthria. Journal of Speech and Hearing Disorders, 54, 482–499.

    Article  Google Scholar 

  • Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T. S., Watkin, K., & Frame, S. (2008). Dysarthric speech database for universal access research. In Ninth Annual Conference of the International Speech Communication Association.

  • Kim, M., Kim, Y., Yoo, J., Wang, J., & Kim, H. (2017). Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(9), 1581–1591.

    Article  Google Scholar 

  • Kotler, A. L., & Thomas-Stonell, N. (1997). Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment. Augmentative and Alternative Communication, 13, 71–80. https://doi.org/10.1080/07434619712331277858.

    Article  Google Scholar 

  • Leggetter, C. J., & Woodland, P. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech & Language, 9, 171–185.

    Article  Google Scholar 

  • Leino, K., & Kurimo, M. (2017). Acoustic Model Compression with MAP adaptation. Paper presented at the Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22–24 May 2017, Gothenburg, Sweden.

  • Linguistic Data Consortium. (1994). CSR-II (WSJ1) Complete LDC94S13A. Philadelphia: DVD.

    Google Scholar 

  • Mak, B., Tsz-Chung, L., & Hsiao, R. (2006). Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers. IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, I-I.

  • Menendez-Pidal, X., Polikoff, J.B., Peters, S.M., Leonzio, J.E., & Bunnell, H.T. (1996). The Nemours database of dysarthric speech. Fourth International Conference on Spoken Language, ICSLP 96 Proceedings, (Vol. 1963, pp. 1962–1965).

  • Mengistu, K.T., & Rudzicz, F. (2011). Adapting acoustic and lexical models to dysarthric speech. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 4924–4927).

  • Mustafa, M. B., Salim, S. S., Mohamed, N., Al-Qatab, B., & Siong, C. E. (2014). Severity-based adaptation with limited data for ASR to aid dysarthric speakers. PLoS ONE, 9, e86285. https://doi.org/10.1371/journal.pone.0086285.

    Article  Google Scholar 

  • Paul, D. B., & Baker, J. M. (1992). The design for the wall street journal-based CSR corpus. Proceedings of the workshop on Speech and Natural Language (pp. 357–362). Harriman, New York: Association for Computational Linguistics.

  • Raghavendra, P., Rosengren, E., & Hunnicutt, S. (2001). An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems. Augmentative and Alternative Communication, 17, 265–275. https://doi.org/10.1080/aac.17.4.265.275.

    Article  Google Scholar 

  • Rudzicz, F. (2007). Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech. Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility (pp. 255–256). Tempe, Arizona, USA: ACM.

  • Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46, 523–541.

    Article  Google Scholar 

  • Sehgal, S., & Cunningham, S. (2015). Model adaptation and adaptive training for the recognition of dysarthric speech. In Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies, (pp. 65–71).

  • Sharma, H. V., & Hasegawa-Johnson, M. (2013). Acoustic model adaptation using in-domain background models for dysarthric speech recognition. Computer Speech & Language, 27, 1147–1162. https://doi.org/10.1016/j.csl.2012.10.002.

    Article  Google Scholar 

  • Shinoda, K. (2011). Speaker adaptation techniques for automatic speech recognition. Proc APSIPA ASC 2011 Xi'an.

  • Sriranjani, R., Ramasubba Reddy, M., & Umesh, S. (2015). Improved acoustic modeling for automatic dysarthric speech recognition. IEEE Twenty First National Conference on Communications (NCC), 2015, 1–6.

    Google Scholar 

  • Stadermann, J., & Rigoll, G. (2005, March). Two-stage speaker adaptation of hybrid tied-posterior acoustic models. In Proceedings.(ICASSP'05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. (Vol. 1, pp. I–977). IEEE.

  • Stern, R., & Lasry, M. (1987). Dynamic speaker adaptation for feature-based isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35, 751–763. https://doi.org/10.1109/TASSP.1987.1165203.

    Article  Google Scholar 

  • Takashima, R., Takiguchi, T., & Ariki, Y. (2020). Two-Step Acoustic Model Adaptation for Dysarthric Speech Recognition. Paper presented at the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

  • Von Agris, U., Blomer, C., & Kraiss, K. F. (2008). Rapid signer adaptation for continuous sign language recognition using a combined approach of eigenvoices, MLLR, and MAP. In 2008 19th International Conference on Pattern Recognition (pp. 1–4). IEEE.

  • Xiong, F., Barker, J., & Christensen, H. (2018). Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition. Paper presented at the Speech Communication; 13th ITG-Symposium.

  • Xiong, F., Barker, J., & Christensen, H. (2019). Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric speech recognition. Paper presented at the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

  • Yılmaz, E., Mitra, V., Sivaraman, G., & Franco, H. (2019). Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech. Computer Speech & Language, 58, 319–334.

    Article  Google Scholar 

  • Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., et al. (2009). The HTK book (for HTK version 3.4). Cambridge: Microsoft Corporation and Cambridge University Engineering Department.

    Google Scholar 

  • Young, V., & Mihailidis, A. (2010). Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review. Assistive Technology, 22, 99–112. https://doi.org/10.1080/10400435.2010.483646.

    Article  Google Scholar 

  • Yunxin, Z. (1994). An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 2, 380–394. https://doi.org/10.1109/89.294352.

    Article  Google Scholar 

  • Zhao, Y. (1994). An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 2(3), 380–394.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mumtaz Begum Mustafa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Qatab, B.A., Mustafa, M.B., Salim, S.S. et al. Determining the adaptation data saturation of ASR systems for dysarthric speakers. Int J Speech Technol 24, 183–192 (2021). https://doi.org/10.1007/s10772-020-09788-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09788-7

Keywords