Skip to main content

Émilie Pagé-Perron

University of Oxford, Wolfson College, Post-Doc

University of Toronto, Near and Middle Eastern Civilizations, PhD Candidate

Followers

533

Following

470

Co-authors

7

Mentions

48

Public Views

- PhD candidate in Assyriology, University of Toronto
- Co-Director at the Cuneiform Digital Library Initiative (CDLI)

less

InterestsView All (64)

Uploads

Peer-reviewed articles and book chapters by Émilie Pagé-Perron

Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax

by Ilya Khait, Émilie Pagé-Perron, William McGrath, and Jinyan Wang

This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as ... more This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project's main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is twofold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.

Annotating Sumerian: A LLOD-enhanced Workflow for Cuneiform Corpora

by Ilya Khait and Émilie Pagé-Perron

Assyriology, the discipline that studies cuneiform sources and their context, has enormous potent... more Assyriology, the discipline that studies cuneiform sources and their context, has enormous potential for the application of computational linguistics theory and method on account of the significant quantity of transcribed texts that are available in digital form but that remain as yet largely unexploited. As part of the Machine Translation and Automated Analysis of Cuneiform Languages project (https://cdli-gh.github.io/mtaac/), we aim to bring together corpus data, lexical data, linguistic annotations and object metadata in order to contribute to resolving data processing and integration challenges in the field of Assyriology as a whole, as well as for related fields of research such as linguistics and history. Data sparsity presents a challenge to our goal of the automated transliteration of the administrative texts of the Ur III period. To mitigate this situation we have undertaken to annotate the whole corpus. To this end we have developed an annotation pipeline to facilitate the annotation of our gold corpus. This toolset can be re-employed to annotate any Sumerian text and will be integrated into the Cuneiform Digital Library Initiative (https://cdli.ucla.edu) infrastructure. To share these new data, we have also mapped our data to existing LOD and LLOD ontologies and vocabularies. This article provides details on the processing of Sumerian linguistic data using our pipeline, from raw transliterations to rich and structured data in the form of (L)LOD. We describe the morphological and syntactic annotation, with a particular focus on the publication of our datasets as LOD. This application of LLOD in Assyriology is unique and involves the concept of a LLOD edition of a linguistically annotated corpus of Sumerian, as well as linking with lexical resources, repositories of annotation terminology, and finally the museum collections in which the artifacts bearing these inscribed texts are kept.

Towards a Linked Open Data Edition of Sumerian Corpora

by Émilie Pagé-Perron and Ilya Khait

Linguistic Linked Open Data (LLOD) is a flourishing line of research in the language resource com... more Linguistic Linked Open Data (LLOD) is a flourishing line of research in the language resource community, so far mostly adopted for selected aspects of linguistics, natural language processing and the semantic web, as well as for practical applications in localization and lexicography. Yet, computational philology seems to be somewhat decoupled from the recent progress in this area: even though LOD as a concept is gaining significant popularity in Digital Humanities, existing LLOD standards and vocabularies are not widely used in this community, and philological resources are underrepresented in the LLOD cloud diagram (http://linguistic-lod.org/llod-cloud). In this paper, we present an application of Linguistic Linked Open Data in Assyriology. We describe the LLOD edition of a linguistically annotated corpus of Sumerian, as well as its linking with lexical resources, repositories of annotation terminology, and the museum collections in which the artifacts bearing these texts are kept. The chosen corpus is the Electronic Text Corpus of Sumerian Royal Inscriptions, a well curated and linguistically annotated archive of Sumerian text, in preparation for the creating and linking of other corpora of cuneiform texts, such as the corpus of Ur III administrative and legal Sumerian texts, as part of the Machine Translation and Automated Analysis of Cuneiform Languages project (https://cdli-gh.github.io/mtaac/).

Expanding Digital Assyriology with Open Access and Machine Learning (in press)

Digital Humanities Quarterly, 2018

Machine Translation and Automated Analysis of the Sumerian Language

by Émilie Pagé-Perron and Maria Sukhareva

Procedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, ACL, 2017

This paper presents a newly funded international project for machine translation and automated an... more This paper presents a newly funded international project for machine translation and automated analysis of ancient cuneiform languages where NLP specialists and Assyriologists collaborate to create an information retrieval system for Sumerian. This research is conceived in response to the need to translate large numbers of administrative texts that are only available in transcription, in order to make them accessible to a wider audience. The methodology includes creation of a specialized
NLP pipeline and also the use of linguistic linked open data to increase access to the results.

Invited talks by Émilie Pagé-Perron

“The State of Computational Linguistics Tools for the Study of Sumerian and Other Cuneiform Languages” Future Philologies: Digital Directions in Ancient World Text Workshop at the Institute for the Study of the Ancient World, New York, April 20 2018

“Recent Developments in Natural Language Processing for Cuneiform Languages” Thinking Digital in Cuneiform Studies: Methods, Problems, Perspectives Workshop, Venice, March 27-28 2018

“Le projet MTAAC: traduction et analyse automatique de textes cunéiformes” Conférence de l'Association des études du Proche-Orient ancien, Université du Québec à Montréal, Montréal, March 13 2018

E-philologie d'aujourd'hui et demain, l'assyriologie digitale en contexte” and “Hypothes.is comme outil pédagogique” Caf'E.PHE 15th edition, École Pratique des Hautes Études, Paris, February 24 2017

11th Caf'E.phe 2016 École des Hautes Études Paris - Réflexion sur la visualisation des données appliquée aux corpus en cunéiforme" ("Reflections on data visualization applied to cuneiform corpora") (Skype contribution)

In this talk, showing examples of current practice, I suggest that using new methods of modelling... more In this talk, showing examples of current practice, I suggest that using new methods of modelling data structure could enable us to discover patterns that would not be visible otherwise. My proposition is to think the data with network analysis in mind, focusing on object relationship instead of the objects themselves. The practical solution would be using graph databases like neo4j and OrientDB that store graphs instead of tables and have built-in visualization tools.

Panels organized by Émilie Pagé-Perron

“Text as Data: Digital Humanities for Text Analysis” American Oriental Society, Los Angeles, March 17-20 2017

Digital humanities provide us with new research pathways that enable us to answer questions that ... more Digital humanities provide us with new research pathways that enable us to answer questions that are difficult or impossible to resolve using traditional approaches, this often because the data explored are larger or more complex than what is casually apprehensible by researchers. This panel assembles interventions discussing research endeavors that approach text as data by means of computational methods. The presentations span over five oriental regions and cover a varied array of research topics that requires of us to think about encoding, processing, interpreting and disseminating textual data and associated research results.

Talks by Émilie Pagé-Perron

Getting LOADed: Practical Considerations, Tools, and Workflows for producing Linked Open Assyriological Data 2017

by Émilie Pagé-Perron and Terhi Nurmikko-Fuller

American Schools of Oriental Research annual meeting, Boston, November 15-18

Machine Translation and Automated Analysis of Cuneiform Languages” Canadian Society for Mesopotamian Studies Annual Symposium 2017, Toronto, September 30th

“Introducing the Machine Translation and Automated Analysis of Cuneiform Languages Project” Digital Humanities Network Event 2017, Toronto, August 29

“Enhancing Collaboration between Digital Assyriology Projects through Open Access Practices” Computer Applications and Quantitative Methods in Archaeology (CAA) international conference, Atlanta, March 14-16 2017

by Émilie Pagé-Perron, Vanessa Bigot Juloux, and Terhi Nurmikko-Fuller

“History and future developments of the Cuneiform Digital Library Initiative” Digital Humanities Summer Institute Colloquium 2017, Victoria, June 4-16 2017

“A Quantitative Method for Identifying Meaningful Groups of People in Administrative Cuneiform Archives” American Oriental Society 227th Meeting, Los Angeles, March 17-20 2017

Cuneiform Digital Library Initiative White Paper for the Global Philology Project, Leipzig, February 24-26 2017

Data Mining Cuneiform Corpora: Get Relational - ASOR 2016

The digital humanities are slowly setting foot into Assyriological studies. Cuneiform scholars ar... more The digital humanities are slowly setting foot into Assyriological studies. Cuneiform scholars are beginning to borrow and adapt computerized methods from both Science, Technology, Engineering, and Mathematics (STEM) disciplines and modern language studies. But how can we most effectively bridge the gap between those techniques and our cuneiform corpora? Most Assyriologists already standardize their digital transliterations; some are familiar with visualization tools. This paper demonstrates how, from a standardized set of digital transliterations (and other coded information relative to the tablets), new data can be created for lexical analysis and eventually for visualization. The example presented here is composed of mostly short Old Akkadian tablets provenienced in Adab. In order to generate new information, relationships between different types of words, on the same tablets, and across the corpus, must be recoded. In this instance, TEI/XML is not used since concordance at the line level is not necessary. The tablet itself, or the administrative transaction, is used as a link between the different elements present in the text. In my sample, the goal of this mining process is to feed two distinct types of visualization: a web database interface and network analysis tools. Both are used as analysis aids and diffusion tools. By closely examining the practical aspects of data mining cuneiform texts, this paper will provide insights for ancient language specialists who wish to prepare their corpus for new
types of digital investigation.

The Fishermen of Early Dynastic Lagash - AOS 226th Meeting Boston, March 18-21 2016

This paper is concerned with the work organization and practices of the fishermen that had a work... more This paper is concerned with the work organization and practices of the fishermen that had a working relationship with the e-mi / e-Bau, important household of the end of the Early Dynastic period in the Lagash region, Mesopotamia. Englund (1990) has studied the administrative practices related to the fish industry of the Ur III period in detail. Some general publications discuss fishing in Mesopotamia (Salonen 1970 and Sahrhge 1999), and other studies concerned with economy and exchange do hint at the topic (eg Prentice 2010). However, these works do not provide a clear overview of the organization and practices of the fish industry in our specific context, namely in Girsu (Lagash region), in the Early Dynastic IIIb period. What are the dynamics governing the work structure? Is there a continuity of the practices in the e-mi /e-Bau archive and compared to subsequent periods? By studying a group of 150 tablets coming from the archive of the e-mi /e-Bau concerning mostly fish transactions, examining the prosopography, and creating a typology for these tablets, it is possible to answer these questions and infer useful information for comparative analysis. In this communication, I present the peculiarities of the fishermen's work organization and their relationship with the patron household. I also demonstrate that the texts' structure is useful to understand the workers hierarchy. In light of the results, there is apparently a continuity of practice in the 17 years covered by the e-mi/e-Bau but also with the Ur III period.

Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax

by Ilya Khait, Émilie Pagé-Perron, William McGrath, and Jinyan Wang

This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as ... more This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project's main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is twofold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.

Annotating Sumerian: A LLOD-enhanced Workflow for Cuneiform Corpora

by Ilya Khait and Émilie Pagé-Perron

Assyriology, the discipline that studies cuneiform sources and their context, has enormous potent... more Assyriology, the discipline that studies cuneiform sources and their context, has enormous potential for the application of computational linguistics theory and method on account of the significant quantity of transcribed texts that are available in digital form but that remain as yet largely unexploited. As part of the Machine Translation and Automated Analysis of Cuneiform Languages project (https://cdli-gh.github.io/mtaac/), we aim to bring together corpus data, lexical data, linguistic annotations and object metadata in order to contribute to resolving data processing and integration challenges in the field of Assyriology as a whole, as well as for related fields of research such as linguistics and history. Data sparsity presents a challenge to our goal of the automated transliteration of the administrative texts of the Ur III period. To mitigate this situation we have undertaken to annotate the whole corpus. To this end we have developed an annotation pipeline to facilitate the annotation of our gold corpus. This toolset can be re-employed to annotate any Sumerian text and will be integrated into the Cuneiform Digital Library Initiative (https://cdli.ucla.edu) infrastructure. To share these new data, we have also mapped our data to existing LOD and LLOD ontologies and vocabularies. This article provides details on the processing of Sumerian linguistic data using our pipeline, from raw transliterations to rich and structured data in the form of (L)LOD. We describe the morphological and syntactic annotation, with a particular focus on the publication of our datasets as LOD. This application of LLOD in Assyriology is unique and involves the concept of a LLOD edition of a linguistically annotated corpus of Sumerian, as well as linking with lexical resources, repositories of annotation terminology, and finally the museum collections in which the artifacts bearing these inscribed texts are kept.

Towards a Linked Open Data Edition of Sumerian Corpora

by Émilie Pagé-Perron and Ilya Khait

Linguistic Linked Open Data (LLOD) is a flourishing line of research in the language resource com... more Linguistic Linked Open Data (LLOD) is a flourishing line of research in the language resource community, so far mostly adopted for selected aspects of linguistics, natural language processing and the semantic web, as well as for practical applications in localization and lexicography. Yet, computational philology seems to be somewhat decoupled from the recent progress in this area: even though LOD as a concept is gaining significant popularity in Digital Humanities, existing LLOD standards and vocabularies are not widely used in this community, and philological resources are underrepresented in the LLOD cloud diagram (http://linguistic-lod.org/llod-cloud). In this paper, we present an application of Linguistic Linked Open Data in Assyriology. We describe the LLOD edition of a linguistically annotated corpus of Sumerian, as well as its linking with lexical resources, repositories of annotation terminology, and the museum collections in which the artifacts bearing these texts are kept. The chosen corpus is the Electronic Text Corpus of Sumerian Royal Inscriptions, a well curated and linguistically annotated archive of Sumerian text, in preparation for the creating and linking of other corpora of cuneiform texts, such as the corpus of Ur III administrative and legal Sumerian texts, as part of the Machine Translation and Automated Analysis of Cuneiform Languages project (https://cdli-gh.github.io/mtaac/).

Expanding Digital Assyriology with Open Access and Machine Learning (in press)

Digital Humanities Quarterly, 2018

Machine Translation and Automated Analysis of the Sumerian Language

by Émilie Pagé-Perron and Maria Sukhareva

Procedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, ACL, 2017

This paper presents a newly funded international project for machine translation and automated an... more This paper presents a newly funded international project for machine translation and automated analysis of ancient cuneiform languages where NLP specialists and Assyriologists collaborate to create an information retrieval system for Sumerian. This research is conceived in response to the need to translate large numbers of administrative texts that are only available in transcription, in order to make them accessible to a wider audience. The methodology includes creation of a specialized
NLP pipeline and also the use of linguistic linked open data to increase access to the results.

“The State of Computational Linguistics Tools for the Study of Sumerian and Other Cuneiform Languages” Future Philologies: Digital Directions in Ancient World Text Workshop at the Institute for the Study of the Ancient World, New York, April 20 2018

“Recent Developments in Natural Language Processing for Cuneiform Languages” Thinking Digital in Cuneiform Studies: Methods, Problems, Perspectives Workshop, Venice, March 27-28 2018

“Le projet MTAAC: traduction et analyse automatique de textes cunéiformes” Conférence de l'Association des études du Proche-Orient ancien, Université du Québec à Montréal, Montréal, March 13 2018

E-philologie d'aujourd'hui et demain, l'assyriologie digitale en contexte” and “Hypothes.is comme outil pédagogique” Caf'E.PHE 15th edition, École Pratique des Hautes Études, Paris, February 24 2017

11th Caf'E.phe 2016 École des Hautes Études Paris - Réflexion sur la visualisation des données appliquée aux corpus en cunéiforme" ("Reflections on data visualization applied to cuneiform corpora") (Skype contribution)

In this talk, showing examples of current practice, I suggest that using new methods of modelling... more In this talk, showing examples of current practice, I suggest that using new methods of modelling data structure could enable us to discover patterns that would not be visible otherwise. My proposition is to think the data with network analysis in mind, focusing on object relationship instead of the objects themselves. The practical solution would be using graph databases like neo4j and OrientDB that store graphs instead of tables and have built-in visualization tools.

“Text as Data: Digital Humanities for Text Analysis” American Oriental Society, Los Angeles, March 17-20 2017

Digital humanities provide us with new research pathways that enable us to answer questions that ... more Digital humanities provide us with new research pathways that enable us to answer questions that are difficult or impossible to resolve using traditional approaches, this often because the data explored are larger or more complex than what is casually apprehensible by researchers. This panel assembles interventions discussing research endeavors that approach text as data by means of computational methods. The presentations span over five oriental regions and cover a varied array of research topics that requires of us to think about encoding, processing, interpreting and disseminating textual data and associated research results.

Getting LOADed: Practical Considerations, Tools, and Workflows for producing Linked Open Assyriological Data 2017

by Émilie Pagé-Perron and Terhi Nurmikko-Fuller

American Schools of Oriental Research annual meeting, Boston, November 15-18

Machine Translation and Automated Analysis of Cuneiform Languages” Canadian Society for Mesopotamian Studies Annual Symposium 2017, Toronto, September 30th

“Introducing the Machine Translation and Automated Analysis of Cuneiform Languages Project” Digital Humanities Network Event 2017, Toronto, August 29

“Enhancing Collaboration between Digital Assyriology Projects through Open Access Practices” Computer Applications and Quantitative Methods in Archaeology (CAA) international conference, Atlanta, March 14-16 2017

by Émilie Pagé-Perron, Vanessa Bigot Juloux, and Terhi Nurmikko-Fuller

“History and future developments of the Cuneiform Digital Library Initiative” Digital Humanities Summer Institute Colloquium 2017, Victoria, June 4-16 2017

“A Quantitative Method for Identifying Meaningful Groups of People in Administrative Cuneiform Archives” American Oriental Society 227th Meeting, Los Angeles, March 17-20 2017

Cuneiform Digital Library Initiative White Paper for the Global Philology Project, Leipzig, February 24-26 2017

Data Mining Cuneiform Corpora: Get Relational - ASOR 2016

The digital humanities are slowly setting foot into Assyriological studies. Cuneiform scholars ar... more The digital humanities are slowly setting foot into Assyriological studies. Cuneiform scholars are beginning to borrow and adapt computerized methods from both Science, Technology, Engineering, and Mathematics (STEM) disciplines and modern language studies. But how can we most effectively bridge the gap between those techniques and our cuneiform corpora? Most Assyriologists already standardize their digital transliterations; some are familiar with visualization tools. This paper demonstrates how, from a standardized set of digital transliterations (and other coded information relative to the tablets), new data can be created for lexical analysis and eventually for visualization. The example presented here is composed of mostly short Old Akkadian tablets provenienced in Adab. In order to generate new information, relationships between different types of words, on the same tablets, and across the corpus, must be recoded. In this instance, TEI/XML is not used since concordance at the line level is not necessary. The tablet itself, or the administrative transaction, is used as a link between the different elements present in the text. In my sample, the goal of this mining process is to feed two distinct types of visualization: a web database interface and network analysis tools. Both are used as analysis aids and diffusion tools. By closely examining the practical aspects of data mining cuneiform texts, this paper will provide insights for ancient language specialists who wish to prepare their corpus for new
types of digital investigation.

The Fishermen of Early Dynastic Lagash - AOS 226th Meeting Boston, March 18-21 2016

This paper is concerned with the work organization and practices of the fishermen that had a work... more This paper is concerned with the work organization and practices of the fishermen that had a working relationship with the e-mi / e-Bau, important household of the end of the Early Dynastic period in the Lagash region, Mesopotamia. Englund (1990) has studied the administrative practices related to the fish industry of the Ur III period in detail. Some general publications discuss fishing in Mesopotamia (Salonen 1970 and Sahrhge 1999), and other studies concerned with economy and exchange do hint at the topic (eg Prentice 2010). However, these works do not provide a clear overview of the organization and practices of the fish industry in our specific context, namely in Girsu (Lagash region), in the Early Dynastic IIIb period. What are the dynamics governing the work structure? Is there a continuity of the practices in the e-mi /e-Bau archive and compared to subsequent periods? By studying a group of 150 tablets coming from the archive of the e-mi /e-Bau concerning mostly fish transactions, examining the prosopography, and creating a typology for these tablets, it is possible to answer these questions and infer useful information for comparative analysis. In this communication, I present the peculiarities of the fishermen's work organization and their relationship with the patron household. I also demonstrate that the texts' structure is useful to understand the workers hierarchy. In light of the results, there is apparently a continuity of practice in the 17 years covered by the e-mi/e-Bau but also with the Ur III period.

LAWDI Event 2015 ISAW New York - Data Mining and Visualizing Data From Cuneiform Texts

Data Mining and Visualizing Data From Cuneiform Texts Conference: LAWDI Event: Digital Antiquity... more Data Mining and Visualizing Data From Cuneiform Texts
Conference: LAWDI Event: Digital Antiquity Coffee House
Venue : ISAW (Institute for the Study of the Ancient World) New York University, New York
October 2nd 2015

This talk discusses the technical aspects of my doctoral research project and explain my choice of technology and method. Three different aspects are of the process are addressed: storing, analyzing and visualizing the data; this in correlation with efficiency, preservation and diffusion.

CIRFF 2015 UQAM Montreal - La reconstruction du pantheon au 3e millénaire avant notre ère : le cas de la "déesse mère" (Reconstructing the 3rd mill. BC. Mesopotamian Pantheon: The Case of the "Mother Goddess")

Les questions de genre sont discutées depuis peu en assyriologie (étude de la Mésopotamie ancienn... more Les questions de genre sont discutées depuis peu en assyriologie (étude de la Mésopotamie ancienne). Certains chercheurs ont déjà tenté de démontrer l'incidence du contexte patriarcal mésopotamien mais aussi des perceptions des femmes entretenues par les assyriologues qui discutent des déesses mères mésopotamiennes.
Une question que ces chercheurs n'abordent pas est le choix des sources qui est une question primordiale dans l’élaboration plus riche d’un portrait du panthéon. Ce point n’est que rarement souligné dans les travaux généraux, le but étant habituellement de créer un portrait universel du panthéon. Les sources sont donc évaluées tel un corpus unique, sans examen attentif du contexte de production et de l’intention du scribe.
Je soutiens que l'utilisation de sources administratives dans ces travaux généraux pourrait enrichir notre vision des déesses mères au 3e millénaire. Les traces que nous avons des panthéons locaux et populaires dans les textes administratifs montrent une démultiplication de ces déesses, dynamique qui semble à l'opposé des autres sources composées par l'élite scribale.
Notre conception du féminin ainsi que notre contexte social aujourd’hui jouent un rôle dans l’image que nous entretenons à propos des déesses mères. Aussi le choix des sources ainsi que l’analyse et la compréhension du contexte de celles-ci ont un impact dans notre reconstruction. Il n’est pas suffisant de seulement reconnaître que nos sources primaires et secondaires ont émergé dans un contexte particulier, encore faut-il en tenir compte dans nos travaux et ce de manière systématique.

Towards the First Machine Translation System for Sumerian Transliterations

Proceedings of the 28th International Conference on Computational Linguistics, 2020

Annotating Sumerian: A LLOD-enhanced Workflow for Cuneiform Corpora

Assyriology, the discipline that studies cuneiform sources and their context, has enormous potent... more Assyriology, the discipline that studies cuneiform sources and their context, has enormous potential for the application of computational linguistics theory and method on account of the significant quantity of transcribed texts that are available in digital form but that remain as yet largely unexploited. As part of the Machine Translation and Automated Analysis of Cuneiform Languages project (https://cdli-gh.github.io/mtaac/), we aim to bring together corpus data, lexical data, linguistic annotations and object metadata in order to contribute to resolving data processing and integration challenges in the field of Assyriology as a whole, as well as for related fields of research such as linguistics and history. Data sparsity presents a challenge to our goal of the automated transliteration of the administrative texts of the Ur III period. To mitigate this situation we have undertaken to annotate the whole corpus. To this end we have developed an annotation pipeline to facilitate the...

Towards a Linked Open Data Edition of Sumerian Corpora

Linguistic Linked Open Data (LLOD) is a flourishing line of research in the language resource com... more Linguistic Linked Open Data (LLOD) is a flourishing line of research in the language resource community, so far mostly adopted for selected aspects of linguistics, natural language processing and the semantic web, as well as for practical applications in localization and lexicography. Yet, computational philology seems to be somewhat decoupled from the recent progress in this area: even though LOD as a concept is gaining significant popularity in Digital Humanities, existing LLOD standards and vocabularies are not widely used in this community, and philological resources are underrepresented in the LLOD cloud diagram (http://linguistic-lod.org/llod-cloud). In this paper, we present an application of Linguistic Linked Open Data in Assyriology. We describe the LLOD edition of a linguistically annotated corpus of Sumerian, as well as its linking with lexical resources, repositories of annotation terminology, and the museum collections in which the artifacts bearing these texts are kept...

Machine Translation and Automated Analysis of the Sumerian Language

Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax

Information

This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as ... more This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology,...

Expanding Digital Assyriology With Open Access and Machine Learning A Cuneiform Digital Library Initiative White Paper for the Global Philology Project

Machine Translation and Automated Analysis of Cuneiform Languages (MTAAC)

by Heather D Baker and Émilie Pagé-Perron

Ancient Mesopotamia, birthplace of writing, has produced vast numbers of cuneiform tablets that o... more Ancient Mesopotamia, birthplace of writing, has produced vast numbers of cuneiform tablets that only a handful of highly specialized scholars are able to read. The task of studying them is so labor intensive that the vast majority have not yet been translated, with the result that their contents are not accessible either to historians in other fields or to the wider public. This project will develop and apply new computerised methods to translate and analyse the contents of some 67,000 highly standardised administrative documents from southern Mesopotamia from the 21st century BC. By automating these basic but labor-intensive processes, we will free up scholars’ time. The tools that we will develop, combining machine learning, statistical and neural machine translation technologies, may then be applied to other ancient languages. Similarly, the translations themselves, and the historical, social and economic data extracted from them, will be made publicly available on the web.

Principal Investigators

Heather D. Baker, University of Toronto, Canada, SSHRC
Christian Chiarcos, University of Frankfurt, Germany, DFG
Robert K. Englund, University of California, Los Angeles, United States, NEH

The project is funded via the T-AP Digging Into Data Challenge.

Cuneiform Digital Library Initiative Framework Project

https://cdli.ucla.edu/?q=news/cdli-core-update