Determining the adaptation data saturation of ASR systems for dysarthric speakers

Bassam Ali Al-Qatab¹,
Mumtaz Begum Mustafa ORCID: orcid.org/0000-0002-2835-4084¹,
Siti Salwah Salim¹ &
…
Asmiza Abdul Sani¹

239 Accesses
Explore all metrics

Abstract

Automatic speech recognition (ASR) systems are gradually accepted as the assistive technology for the physically impaired individuals such as speakers with dysarthria. Dysarthria is a motor speech impairment, where the muscles related to speech organs are weak, causing slow or no movement of the muscles. It is often accompanied by neurological conditions such as cerebral palsy, head injury, muscles dystrophy and multiple sclerosis. Using the ASR system to understand the spoken language of a speaker with dysarthia came with many advantages as compared to the conventional keyboard and mouse method. However, the development of an effective ASR system for this condition often limited by data sparsity in terms of coverage of the language or the size of the speech databases. To overcome the data sparsity issues, existing researchers proposed several solutions including the adaptation techniques such as MLLR and MAP. In this study, two types of adaptation techniques were considered, which includes the individual MLLR and MAP adaptation technique, as well as the combined adaptation technique (MLLR + MAP sequence, and MAP + MLLR sequence) to determine the saturation point of the adaptation data of dysarthric speech. The saturation point is identified using linear regression between the data size and the recognition accuracy. The results show that the saturation points are different for both individual MLLR and MAP adaptation technique, while the sequence of the combined adaptation technique influences the saturation points.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition

Article 27 August 2022

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Article 04 May 2017

Machine Learning Based Assistive Speech Technology for People with Neurological Disorders

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Allison, B., Guthrie, D., & Guthrie, L. (2006). Another look at the data sparsity problem. In: P. Sojka & I. Kopeček (Eds.), Text, speech and dialogue: 9th international conference, TSD 2006, Brno, Czech Republic, September 11–15, 2006 Proceedings (pp. 327–334). Berlin: Springer.
Al-Qatab, B. A., Mustafa, M. B., & Salim, S. S. (2014). Severity based adaptation for ASR to Aid dysarthric speakers. The 8th Asia Modelling Symposium (pp. 165–169).
Choi, D. L., Kim, B. W., Lee, Y. J., Um, Y., & Chung, M. (2011, October). Design and creation of dysarthric speech database for development of QoLT software technology. In IEEE 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA), on (pp. 47–50).
Christensen, H., Casanueva, I., Cunningham, S., Green, P., & Hain, T. (2014). Automatic selection of speakers for improved acoustic modelling: Recognition of disordered speech with sparse data. IEEE Spoken Language Technology Workshop (SLT), 2014, 254–259.
Article Google Scholar
Darley, F. L., Aronson, A. E., & Brown, J. R. (1969). Differential diagnostic patterns of dysarthria. Journal of Speech, Language, and Hearing Research, 12, 246–269. https://doi.org/10.1044/jshr.1202.246.
Article Google Scholar
De Wet, F., Kleynhans, N., Van Compernolle, D., & Sahraeian, R. (2017). Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems. South African Journal of Science, 113(1–2), 1–9.
Google Scholar
Digalakis, V. V., & Neumeyer, L. G. (1996). Speaker adaptation using combined transformation and Bayesian methods. Speech and Audio Processing, IEEE Transactions on, 4, 294–300. https://doi.org/10.1109/89.506933.
Article Google Scholar
Diwakar, G., & Karjigi, V. (2020). Improving speech to text alignment based on repetition detection for dysarthric speech. Circuits, Systems, and Signal Processing, 39, 1–25.
Article Google Scholar
Duffy, J. R. (2012). Motor speech disorders: Substrates, differential diagnosis, and management. Amsterdam: Elsevier Health Sciences.
Google Scholar
Enderby, P. (1980). Frenchay dysarthria assessment. British Journal of Disorders of Communication, 15, 165–173.
Article Google Scholar
Ferras, M., Cheung-Chi, L., Barras, C., & Gauvain, J. (2010). Comparison of speaker adaptation methods as feature extraction for SVM-based speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18, 1366–1378. https://doi.org/10.1109/TASL.2009.2034187.
Article Google Scholar
Gales, M. J. F. (2001). Adaptive training for robust ASR. In IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU '01 (pp. 15–20). https://doi.org/10.1006/csla.1996.0013
Garofolo, J. S., & Consortium, L. D. (1993). TIMIT: Acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.
Google Scholar
Gauvain, J., & Chin-Hui, L. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2, 291–298. https://doi.org/10.1109/89.279278.
Article Google Scholar
Goronzy, S., & Kompe, R. (1999). A combined MAP + MLLR approach for speaker adaptation. Proceedings of the Sony Research Forum, 99, 1.
Google Scholar
Green, P., Carmichael, J., Hatzis, A., Enderby, P., Hawley, M. S., & Parker, M. (2003). Automatic speech recognition with sparse training data for dysarthric speakers (pp. 1189–1192). Geneva: INTERSPEECH.
Google Scholar
Imseng, D., Motlicek, P., Bourlard, H., & Garner, P. N. (2014). Using out-of-language data to improve an under-resourced speech recognizer. Speech Communication, 56, 142–151.
Article Google Scholar
Joy, N. M., & Umesh, S. (2018). Improving acoustic models in TORGO dysarthric speech database. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26, 637–645. https://doi.org/10.1109/TNSRE.2018.2802914.
Article Google Scholar
Kayasith, P., & Theeramunkong, T. (2009). Speech confusion index: A confusion-based speech quality indicator and recognition rate prediction for dysarthria. Computers & Mathematics with Applications, 58, 1534–1549. https://doi.org/10.1016/j.camwa.2009.06.051.
Article MATH Google Scholar
Kent, R. D., Weismer, G., Kent, J. F., & Rosenbek, J. C. (1989). Toward phonetic intelligibility testing in dysarthria. Journal of Speech and Hearing Disorders, 54, 482–499.
Article Google Scholar
Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T. S., Watkin, K., & Frame, S. (2008). Dysarthric speech database for universal access research. In Ninth Annual Conference of the International Speech Communication Association.
Kim, M., Kim, Y., Yoo, J., Wang, J., & Kim, H. (2017). Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(9), 1581–1591.
Article Google Scholar
Kotler, A. L., & Thomas-Stonell, N. (1997). Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment. Augmentative and Alternative Communication, 13, 71–80. https://doi.org/10.1080/07434619712331277858.
Article Google Scholar
Leggetter, C. J., & Woodland, P. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech & Language, 9, 171–185.
Article Google Scholar
Leino, K., & Kurimo, M. (2017). Acoustic Model Compression with MAP adaptation. Paper presented at the Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22–24 May 2017, Gothenburg, Sweden.
Linguistic Data Consortium. (1994). CSR-II (WSJ1) Complete LDC94S13A. Philadelphia: DVD.
Google Scholar
Mak, B., Tsz-Chung, L., & Hsiao, R. (2006). Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers. IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, I-I.
Menendez-Pidal, X., Polikoff, J.B., Peters, S.M., Leonzio, J.E., & Bunnell, H.T. (1996). The Nemours database of dysarthric speech. Fourth International Conference on Spoken Language, ICSLP 96 Proceedings, (Vol. 1963, pp. 1962–1965).
Mengistu, K.T., & Rudzicz, F. (2011). Adapting acoustic and lexical models to dysarthric speech. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 4924–4927).
Mustafa, M. B., Salim, S. S., Mohamed, N., Al-Qatab, B., & Siong, C. E. (2014). Severity-based adaptation with limited data for ASR to aid dysarthric speakers. PLoS ONE, 9, e86285. https://doi.org/10.1371/journal.pone.0086285.
Article Google Scholar
Paul, D. B., & Baker, J. M. (1992). The design for the wall street journal-based CSR corpus. Proceedings of the workshop on Speech and Natural Language (pp. 357–362). Harriman, New York: Association for Computational Linguistics.
Raghavendra, P., Rosengren, E., & Hunnicutt, S. (2001). An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems. Augmentative and Alternative Communication, 17, 265–275. https://doi.org/10.1080/aac.17.4.265.275.
Article Google Scholar
Rudzicz, F. (2007). Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech. Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility (pp. 255–256). Tempe, Arizona, USA: ACM.
Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46, 523–541.
Article Google Scholar
Sehgal, S., & Cunningham, S. (2015). Model adaptation and adaptive training for the recognition of dysarthric speech. In Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies, (pp. 65–71).
Sharma, H. V., & Hasegawa-Johnson, M. (2013). Acoustic model adaptation using in-domain background models for dysarthric speech recognition. Computer Speech & Language, 27, 1147–1162. https://doi.org/10.1016/j.csl.2012.10.002.
Article Google Scholar
Shinoda, K. (2011). Speaker adaptation techniques for automatic speech recognition. Proc APSIPA ASC 2011 Xi'an.
Sriranjani, R., Ramasubba Reddy, M., & Umesh, S. (2015). Improved acoustic modeling for automatic dysarthric speech recognition. IEEE Twenty First National Conference on Communications (NCC), 2015, 1–6.
Google Scholar
Stadermann, J., & Rigoll, G. (2005, March). Two-stage speaker adaptation of hybrid tied-posterior acoustic models. In Proceedings.(ICASSP'05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. (Vol. 1, pp. I–977). IEEE.
Stern, R., & Lasry, M. (1987). Dynamic speaker adaptation for feature-based isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35, 751–763. https://doi.org/10.1109/TASSP.1987.1165203.
Article Google Scholar
Takashima, R., Takiguchi, T., & Ariki, Y. (2020). Two-Step Acoustic Model Adaptation for Dysarthric Speech Recognition. Paper presented at the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Von Agris, U., Blomer, C., & Kraiss, K. F. (2008). Rapid signer adaptation for continuous sign language recognition using a combined approach of eigenvoices, MLLR, and MAP. In 2008 19th International Conference on Pattern Recognition (pp. 1–4). IEEE.
Xiong, F., Barker, J., & Christensen, H. (2018). Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition. Paper presented at the Speech Communication; 13th ITG-Symposium.
Xiong, F., Barker, J., & Christensen, H. (2019). Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric speech recognition. Paper presented at the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Yılmaz, E., Mitra, V., Sivaraman, G., & Franco, H. (2019). Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech. Computer Speech & Language, 58, 319–334.
Article Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., et al. (2009). The HTK book (for HTK version 3.4). Cambridge: Microsoft Corporation and Cambridge University Engineering Department.
Google Scholar
Young, V., & Mihailidis, A. (2010). Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review. Assistive Technology, 22, 99–112. https://doi.org/10.1080/10400435.2010.483646.
Article Google Scholar
Yunxin, Z. (1994). An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 2, 380–394. https://doi.org/10.1109/89.294352.
Article Google Scholar
Zhao, Y. (1994). An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 2(3), 380–394.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software Engineering, Faculty of Computer Science and Information Technology, University of Malaya, 50603, Kuala Lumpur, Malaysia
Bassam Ali Al-Qatab, Mumtaz Begum Mustafa, Siti Salwah Salim & Asmiza Abdul Sani

Authors

Bassam Ali Al-Qatab
View author publications
You can also search for this author in PubMed Google Scholar
Mumtaz Begum Mustafa
View author publications
You can also search for this author in PubMed Google Scholar
Siti Salwah Salim
View author publications
You can also search for this author in PubMed Google Scholar
Asmiza Abdul Sani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mumtaz Begum Mustafa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Qatab, B.A., Mustafa, M.B., Salim, S.S. et al. Determining the adaptation data saturation of ASR systems for dysarthric speakers. Int J Speech Technol 24, 183–192 (2021). https://doi.org/10.1007/s10772-020-09788-7

Download citation

Received: 24 February 2020
Accepted: 08 November 2020
Published: 02 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10772-020-09788-7

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Machine Learning Based Assistive Speech Technology for People with Neurological Disorders

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Determining the adaptation data saturation of ASR systems for dysarthric speakers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Machine Learning Based Assistive Speech Technology for People with Neurological Disorders

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now