Abstract
Automatic speech recognition (ASR) systems are gradually accepted as the assistive technology for the physically impaired individuals such as speakers with dysarthria. Dysarthria is a motor speech impairment, where the muscles related to speech organs are weak, causing slow or no movement of the muscles. It is often accompanied by neurological conditions such as cerebral palsy, head injury, muscles dystrophy and multiple sclerosis. Using the ASR system to understand the spoken language of a speaker with dysarthia came with many advantages as compared to the conventional keyboard and mouse method. However, the development of an effective ASR system for this condition often limited by data sparsity in terms of coverage of the language or the size of the speech databases. To overcome the data sparsity issues, existing researchers proposed several solutions including the adaptation techniques such as MLLR and MAP. In this study, two types of adaptation techniques were considered, which includes the individual MLLR and MAP adaptation technique, as well as the combined adaptation technique (MLLR + MAP sequence, and MAP + MLLR sequence) to determine the saturation point of the adaptation data of dysarthric speech. The saturation point is identified using linear regression between the data size and the recognition accuracy. The results show that the saturation points are different for both individual MLLR and MAP adaptation technique, while the sequence of the combined adaptation technique influences the saturation points.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Allison, B., Guthrie, D., & Guthrie, L. (2006). Another look at the data sparsity problem. In: P. Sojka & I. Kopeček (Eds.), Text, speech and dialogue: 9th international conference, TSD 2006, Brno, Czech Republic, September 11–15, 2006 Proceedings (pp. 327–334). Berlin: Springer.
Al-Qatab, B. A., Mustafa, M. B., & Salim, S. S. (2014). Severity based adaptation for ASR to Aid dysarthric speakers. The 8th Asia Modelling Symposium (pp. 165–169).
Choi, D. L., Kim, B. W., Lee, Y. J., Um, Y., & Chung, M. (2011, October). Design and creation of dysarthric speech database for development of QoLT software technology. In IEEE 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA), on (pp. 47–50).
Christensen, H., Casanueva, I., Cunningham, S., Green, P., & Hain, T. (2014). Automatic selection of speakers for improved acoustic modelling: Recognition of disordered speech with sparse data. IEEE Spoken Language Technology Workshop (SLT), 2014, 254–259.
Darley, F. L., Aronson, A. E., & Brown, J. R. (1969). Differential diagnostic patterns of dysarthria. Journal of Speech, Language, and Hearing Research, 12, 246–269. https://doi.org/10.1044/jshr.1202.246.
De Wet, F., Kleynhans, N., Van Compernolle, D., & Sahraeian, R. (2017). Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems. South African Journal of Science, 113(1–2), 1–9.
Digalakis, V. V., & Neumeyer, L. G. (1996). Speaker adaptation using combined transformation and Bayesian methods. Speech and Audio Processing, IEEE Transactions on, 4, 294–300. https://doi.org/10.1109/89.506933.
Diwakar, G., & Karjigi, V. (2020). Improving speech to text alignment based on repetition detection for dysarthric speech. Circuits, Systems, and Signal Processing, 39, 1–25.
Duffy, J. R. (2012). Motor speech disorders: Substrates, differential diagnosis, and management. Amsterdam: Elsevier Health Sciences.
Enderby, P. (1980). Frenchay dysarthria assessment. British Journal of Disorders of Communication, 15, 165–173.
Ferras, M., Cheung-Chi, L., Barras, C., & Gauvain, J. (2010). Comparison of speaker adaptation methods as feature extraction for SVM-based speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18, 1366–1378. https://doi.org/10.1109/TASL.2009.2034187.
Gales, M. J. F. (2001). Adaptive training for robust ASR. In IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU '01 (pp. 15–20). https://doi.org/10.1006/csla.1996.0013
Garofolo, J. S., & Consortium, L. D. (1993). TIMIT: Acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.
Gauvain, J., & Chin-Hui, L. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2, 291–298. https://doi.org/10.1109/89.279278.
Goronzy, S., & Kompe, R. (1999). A combined MAP + MLLR approach for speaker adaptation. Proceedings of the Sony Research Forum, 99, 1.
Green, P., Carmichael, J., Hatzis, A., Enderby, P., Hawley, M. S., & Parker, M. (2003). Automatic speech recognition with sparse training data for dysarthric speakers (pp. 1189–1192). Geneva: INTERSPEECH.
Imseng, D., Motlicek, P., Bourlard, H., & Garner, P. N. (2014). Using out-of-language data to improve an under-resourced speech recognizer. Speech Communication, 56, 142–151.
Joy, N. M., & Umesh, S. (2018). Improving acoustic models in TORGO dysarthric speech database. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26, 637–645. https://doi.org/10.1109/TNSRE.2018.2802914.
Kayasith, P., & Theeramunkong, T. (2009). Speech confusion index: A confusion-based speech quality indicator and recognition rate prediction for dysarthria. Computers & Mathematics with Applications, 58, 1534–1549. https://doi.org/10.1016/j.camwa.2009.06.051.
Kent, R. D., Weismer, G., Kent, J. F., & Rosenbek, J. C. (1989). Toward phonetic intelligibility testing in dysarthria. Journal of Speech and Hearing Disorders, 54, 482–499.
Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T. S., Watkin, K., & Frame, S. (2008). Dysarthric speech database for universal access research. In Ninth Annual Conference of the International Speech Communication Association.
Kim, M., Kim, Y., Yoo, J., Wang, J., & Kim, H. (2017). Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(9), 1581–1591.
Kotler, A. L., & Thomas-Stonell, N. (1997). Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment. Augmentative and Alternative Communication, 13, 71–80. https://doi.org/10.1080/07434619712331277858.
Leggetter, C. J., & Woodland, P. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech & Language, 9, 171–185.
Leino, K., & Kurimo, M. (2017). Acoustic Model Compression with MAP adaptation. Paper presented at the Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22–24 May 2017, Gothenburg, Sweden.
Linguistic Data Consortium. (1994). CSR-II (WSJ1) Complete LDC94S13A. Philadelphia: DVD.
Mak, B., Tsz-Chung, L., & Hsiao, R. (2006). Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers. IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, I-I.
Menendez-Pidal, X., Polikoff, J.B., Peters, S.M., Leonzio, J.E., & Bunnell, H.T. (1996). The Nemours database of dysarthric speech. Fourth International Conference on Spoken Language, ICSLP 96 Proceedings, (Vol. 1963, pp. 1962–1965).
Mengistu, K.T., & Rudzicz, F. (2011). Adapting acoustic and lexical models to dysarthric speech. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 4924–4927).
Mustafa, M. B., Salim, S. S., Mohamed, N., Al-Qatab, B., & Siong, C. E. (2014). Severity-based adaptation with limited data for ASR to aid dysarthric speakers. PLoS ONE, 9, e86285. https://doi.org/10.1371/journal.pone.0086285.
Paul, D. B., & Baker, J. M. (1992). The design for the wall street journal-based CSR corpus. Proceedings of the workshop on Speech and Natural Language (pp. 357–362). Harriman, New York: Association for Computational Linguistics.
Raghavendra, P., Rosengren, E., & Hunnicutt, S. (2001). An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems. Augmentative and Alternative Communication, 17, 265–275. https://doi.org/10.1080/aac.17.4.265.275.
Rudzicz, F. (2007). Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech. Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility (pp. 255–256). Tempe, Arizona, USA: ACM.
Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46, 523–541.
Sehgal, S., & Cunningham, S. (2015). Model adaptation and adaptive training for the recognition of dysarthric speech. In Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies, (pp. 65–71).
Sharma, H. V., & Hasegawa-Johnson, M. (2013). Acoustic model adaptation using in-domain background models for dysarthric speech recognition. Computer Speech & Language, 27, 1147–1162. https://doi.org/10.1016/j.csl.2012.10.002.
Shinoda, K. (2011). Speaker adaptation techniques for automatic speech recognition. Proc APSIPA ASC 2011 Xi'an.
Sriranjani, R., Ramasubba Reddy, M., & Umesh, S. (2015). Improved acoustic modeling for automatic dysarthric speech recognition. IEEE Twenty First National Conference on Communications (NCC), 2015, 1–6.
Stadermann, J., & Rigoll, G. (2005, March). Two-stage speaker adaptation of hybrid tied-posterior acoustic models. In Proceedings.(ICASSP'05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. (Vol. 1, pp. I–977). IEEE.
Stern, R., & Lasry, M. (1987). Dynamic speaker adaptation for feature-based isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35, 751–763. https://doi.org/10.1109/TASSP.1987.1165203.
Takashima, R., Takiguchi, T., & Ariki, Y. (2020). Two-Step Acoustic Model Adaptation for Dysarthric Speech Recognition. Paper presented at the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Von Agris, U., Blomer, C., & Kraiss, K. F. (2008). Rapid signer adaptation for continuous sign language recognition using a combined approach of eigenvoices, MLLR, and MAP. In 2008 19th International Conference on Pattern Recognition (pp. 1–4). IEEE.
Xiong, F., Barker, J., & Christensen, H. (2018). Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition. Paper presented at the Speech Communication; 13th ITG-Symposium.
Xiong, F., Barker, J., & Christensen, H. (2019). Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric speech recognition. Paper presented at the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Yılmaz, E., Mitra, V., Sivaraman, G., & Franco, H. (2019). Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech. Computer Speech & Language, 58, 319–334.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., et al. (2009). The HTK book (for HTK version 3.4). Cambridge: Microsoft Corporation and Cambridge University Engineering Department.
Young, V., & Mihailidis, A. (2010). Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review. Assistive Technology, 22, 99–112. https://doi.org/10.1080/10400435.2010.483646.
Yunxin, Z. (1994). An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 2, 380–394. https://doi.org/10.1109/89.294352.
Zhao, Y. (1994). An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 2(3), 380–394.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Al-Qatab, B.A., Mustafa, M.B., Salim, S.S. et al. Determining the adaptation data saturation of ASR systems for dysarthric speakers. Int J Speech Technol 24, 183–192 (2021). https://doi.org/10.1007/s10772-020-09788-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-020-09788-7