Abstract
Artificial-intelligence-powered neural machine translation might soon resuscitate endangered languages by empowering new speakers to communicate in real time using sentences quantifiably closer to the literary norm than those of native speakers, and starting from day one of their language reclamation journey. While Silicon Valley has been investing enormous resources into neural translation technology capable of superhuman speed and accuracy for the world’s most widely used languages, 98% have been left behind, for want of corpora: neural machine translation models train on millions of words of bilingual text, which simply do not exist for most languages, and cost upwards of a hundred thousand United States dollars per tongue to assemble.
For low-resource languages, there is a more resourceful approach, if not a more effective one: transfer learning, which enables lower-resource languages to benefit from achievements among higher-resource ones. In this experiment, Google’s English-Polish neural translation service was coupled with my classical, rule-based engine to translate from English into the endangered, low-resource, East Slavic language of Lemko. The system achieved a bilingual evaluation understudy (BLEU) quality score of 6.28, several times better than Google Translate’s English to Standard Ukrainian (BLEU 2.17), Russian (BLEU 1.10), and Polish (BLEU 1.70) services. Finally, the fruit of this experiment, the world’s first English to Lemko translation service, was made available at the web address www.LemkoTran.com to empower new speakers to revitalize their language.
New speakers are key to language revitalization, and the power to “say it right” in Lemko is now at their fingertips.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Disclosure: I work as a paid Russian, Polish, and Ukrainian linguist and translation quality control specialist for the Google Translate project; headquarters are in San Francisco.
- 2.
Disclosure: the event was sponsored by the Carpatho-Rusyn Society (Pennsylvania), and I was paid by the University of Pittsburgh for my presentation.
References
Graddol, D.: The future of language. Science 303(5662), 1329–1331 (2004). https://doi.org/10.1126/science.1096546
Eberhard, D.M., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, SIL International, 24th edn. SIL International, Dallas (2021). Online version: How many languages are endangered? https://www.ethnologue.com/guides/how-many-languages-endangered. Accessed 11 Feb 2022
ISO 639 Code Tables. https://iso639-3.sil.org/code_tables/639/data. Accessed 11 Feb 2022
Language support. https://cloud.google.com/translate/docs/languages. Accessed 11 Feb 2022
Select language. https://m.facebook.com/language.php. Accessed 11 Feb 2022
Orynycz, P., Dobry, T., Jackson, A., Litzenberg, K.: Yes I Speak… AI neural machine translation in multi-lingual training. In: Proceedings of the Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2021, Paper no. 21176. National Training and Simulation Association, Orlando (2021). https://www.xcdsystem.com/iitsec/proceedings/index.cfm?Year=2021&AbID=96953&CID=862
Duć-Fajfer, O.: Literatura a proces rozwoju i rewitalizacja tożsamości językowej na przykładzie literatury łemkowskiej. In: Olko, J., Wicherkiewicz, T., Borges, R. (eds.) Integral Strategies for Language Revitalization, 1st edn., pp. 175–200. Faculty of “Artes Liberales”, University of Warsaw, Warsaw (2016)
Scherrer, Y., Rabus, A.: Neural morphosyntactic tagging for Rusyn. In: Mitkov, R., Tait, J., Boguraev, B. (eds.) Natural Language Engineering, vol. 25, no. 5, pp. 633–650. Cambridge University Press, Cambridge (2019). https://doi.org/10.1017/S1351324919000287
Reservations and Declarations for Treaty No.148 - European Charter for Regional or Minority Languages (ETS No. 148). https://www.coe.int/en/web/conventions/full-list?module=declarations-by-treaty&numSte=148&codeNature=1&codePays=POL. Accessed 11 Feb 2022
Formularz indywidualny. https://stat.gov.pl/download/gfx/portalinformacyjny/pl/defaultstronaopisowa/5781/1/1/nsp_2011_badanie__pelne_wykaz_pytan.pdf. Accessed 11 Feb 2022
Narodowy Spis Powszechny Ludności i Mieszkań 2002 r. z 20 maja (formularz A). https://stat.gov.pl/gfx/portalinformacyjny/userfiles/_public/spisy_powszechne/nsp2002-form-a.pdf. Accessed 11 Feb 2022
IV Raport dotyczący sytuacji mniejszości narodowych i etnicznych oraz języka regionalnego w Rzeczypospolitej Polskiej – 2013. http://mniejszosci.narodowe.mswia.gov.pl/download/86/14637/TekstIVRaportu.pdf. Accessed 11 Feb 2022
Vaňko, J.: The Language of Slovakia’s Rusyns. East European Monographs, New York (2000)
Forston, B., IV.: Indo-European Language and Culture. Blackwell Publishing, Oxford (2004)
Pokorny, J.: Indogermanisches etymologisches Wörterbuch, Bern (1959)
Horoszczak, J.: Słownik łemkowsko-polski, polsko-łemkowski. Rutenika, Warsaw (2004)
Vasmer, M.: Russisches etymologisches Wörterbuch. Zweiter Band. Carl Winter, Universitätsverlag, Heidelberg (1955)
Monier-Williams, M.: A Sanskrit-English Dictionary Etymologically and Philologically Arranged with Special Reference to Cognate Indo-European Languages. The Clarendon Press, Oxford (1899)
Derksen, R.: Etymological dictionary of the slavic inherited lexicon. In: Lubotsky, A. (ed.) Leiden Indo-European Etymological Dictionary Series, vol. 4, Koninklijke Brill, Leiden (2008)
Post, M.: A call for clarity in reporting BLEU scores. In: Proceedings of the Third Conference on Machine Translation (WMT), vol. 1, pp. 186–191. Association for Computational Linguistics, Brussels (2018). https://aclanthology.org/W18-63
Chen, B., Cherry, C.: A systematic comparison of smoothing techniques for sentence-level BLEU. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 362–367. Association for Computational Linguistics, Baltimore (2014). http://dx.doi.org/10.3115/v1/W14-33
Ministerstwo Spraw Wewnętrznych i Administracji: Rozporządzenie Ministra Spraw Wewnętrznych i Administracji z dnia 30 maja 2005 r. w sprawie sposobu transliteracji imion i nazwisk osób należących do mniejszości narodowych i etnicznych zapisanych w alfabecie innym niż alfabet łaciński. In: Dziennik Ustaw Nr 102, pp. 6560–6573. Rządowe Centrum Legislacji, Warsaw (2005)
Shevelov, G.: On the chronology of H and the new G in Ukrainian. In: Harvard Ukrainian Studies, vol. 1, no. 2, pp. 137–152. Harvard Ukrainian Research Institute, Cambridge (1977). https://www.jstor.org/stable/40999942
Acknowledgements
I would like to thank my colleague Ming Qian of Peraton Labs for inspiring me to conduct this experiment, and Brian Stensrud of Soar Technology, Inc. for introducing us, as well as his encouragement.
I would also like to thank my friend Corinna Caudill for her encouragement and personal interest in the project, as well as for introducing me to Carpatho-Rusyn Society President Maryann Sivak of the University of Pittsburgh, whom I would like to thank for the opportunity to present my work.
I would also like to thank Maria Silvestri of the John and Helen Timo Foundation for conducting interviews with Lemko native speakers and donating the transcripts and my translations of them to research and development.
I would like to thank Achim Rabus of the University of Freiburg and Yves Scherrer of the University of Helsinki for their interest in the project and ideas.
I would also like to thank Myhal’ Lŷžečko of the minority-language technology blog InterFyisa for his early interest in the project and community outreach.
I would also like to thank fellow son of Zahoczewie Marko Łyszyk for his interest in the project and community outreach.
Finally, I would like to thank my co-author and Antech Systems Inc. colleague Tom Dobry for his encouragement and guidance.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Orynycz, P. (2022). Say It Right: AI Neural Machine Translation Empowers New Speakers to Revitalize Lemko. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2022. Lecture Notes in Computer Science(), vol 13336. Springer, Cham. https://doi.org/10.1007/978-3-031-05643-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-05643-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05642-0
Online ISBN: 978-3-031-05643-7
eBook Packages: Computer ScienceComputer Science (R0)