Abstract
As the most used approach to extend a Spoken language Understanding (SLU) from a language to another, Machine translation achieves high performance for English domains, which is not the case for other languages, especially low-resourced ones as Arabic and its dialects. To avoid Machine Translation approach which requires huge parallel corpora, we will investigate, in this paper, the problem of user’s intent interpretation from natural language queries to a system’s semantic representation format across the languages and dialects, namely: English, Modern Standard Arabic (MSA) and four vernacular Algerian dialects from different regions: Blida, Djelfa, Tenes and Tizi-Ouzou. We should note that the domain we have chosen to run our experiments is a special application of school management. For this, We use three classifiers: kNN, Gaussian Naive Bayes and Bernoulli Naive Bayes which led to an average accuracy of 90%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
Glass, J., Flammia, G., Goodine, D., Phillips, M., Polifroni, J., Sakai, S., Seneff, S., Zue, V.: Multilingual spoken-language understanding in the MIT Voyager system. Speech Commun. 17(1–2), 1–18 (1995)
Lefevre, F., Mairesse, F., Young, S.: Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
Lefevre, F., Mostefa, D., Besacier, L., Esteve, Y., Quignard, M., Camelin, N., Favre, B., Jabaian, B., Barahona, L.M.R.: Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora. In: The International Conference on Language Resources and Evaluation, May 2012
Misu, T., Mizukami, E., Kashioka, H., Nakamura, S., Li, H.: A bootstrapping approach for SLU portability to a new language by inducting unannotated user queries. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4961–4964. IEEE, March 2012
Stepanov, E.A., Kashkarev, I., Bayer, A.O., Riccardi, G., Ghosh, A.: Language style and domain adaptation for cross-language SLU porting. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 144–149. IEEE, December 2013
Stepanov, E.A., Riccardi, G., Bayer, A.O.: The development of the multilingual LUNA corpus for spoken language system porting. In: LREC pp. 2675–2678, May 2014
Upadhyay, S., Faruqui, M., Tur, G., Hakkani-Tur, D., Heck, L.: (Almost) Zero-Shot Cross-Lingual Spoken Language Understanding (2018)
Graja, M., Jaoua, M., Belguith, L.H.: Building ontologies to understand spoken tunisian dialect. arXiv preprint arXiv:1109.0624 (2011)
Elmadany, A.A., Abdou, S.M., Gheith, M.: Towards understanding Egyptian Arabic dialogues. arXiv preprint arXiv:1509.03208 (2015)
Lichouri, M., Djeradi, A., Djeradi, R.: A new automatic approach for understanding the spontaneous utterance in human-machine dialogue based on automatic text categorization. In: Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication, p. 50. ACM, November 2015
Lichouri, M., Djeradi, A., Djeradi, R.: Une approche Statistico-Linguistique pour l’extraction de concepts sémantiques: Une première étape vers un système générique de dialogue Homme-Machine
Indurkhya, N., Damerau, F.J. (eds.): Handbook of Natural Language Processing, vol. 2. CRC Press, Boca Raton (2010)
Palmer, D.D., Hearst, M.A.: Adaptive multilingual sentence boundary disambiguation. Comput. Linguist. 23(2), 241–267 (1997)
Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, p. 31. Association for Computational Linguistics, July 2004
Kiss, T., Strunk, J.: Unsupervised multilingual sentence boundary detection. Comput. Linguist. 32(4), 485–525 (2006)
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora, pp. 157–176. Springer, Dordrecht (1999)
Steinberger, J., Jezek, K.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of ISIM, vol. 4, pp. 93–100 (2004)
Leskovec, J.: Dimensionality reduction PCA, SVD, MDS, ICA, and friends. Machine Learning recitation, 27 April 2006
Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retrieval 1(1–2), 69–90 (1999)
Schütze, H., Manning, C.D., Raghavan, P.: Introduction to Information Retrieval, vol. 39. Cambridge University Press, New York (2008)
Acknowledgment
Special thanks to Dhia El Hak Megtouf, Amel Elbachir and Karima Mahdjane for their contribution in corpus enrichment.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lichouri, M., Djeradi, R., Djeradi, A., Abbas, M. (2020). Towards a Portable SLU System Applied to MSA and Low-resourced Algerian Dialects. In: Hassanien, A., Azar, A., Gaber, T., Bhatnagar, R., F. Tolba, M. (eds) The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019). AMLTA 2019. Advances in Intelligent Systems and Computing, vol 921. Springer, Cham. https://doi.org/10.1007/978-3-030-14118-9_58
Download citation
DOI: https://doi.org/10.1007/978-3-030-14118-9_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14117-2
Online ISBN: 978-3-030-14118-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)