Peer-reviewed articles and book chapters by Émilie Pagé-Perron
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as ... more This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project's main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is twofold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.
Bookmarks Related papers MentionsView impact
Assyriology, the discipline that studies cuneiform sources and their context, has enormous potent... more Assyriology, the discipline that studies cuneiform sources and their context, has enormous potential for the application of computational linguistics theory and method on account of the significant quantity of transcribed texts that are available in digital form but that remain as yet largely unexploited. As part of the Machine Translation and Automated Analysis of Cuneiform Languages project (https://cdli-gh.github.io/mtaac/), we aim to bring together corpus data, lexical data, linguistic annotations and object metadata in order to contribute to resolving data processing and integration challenges in the field of Assyriology as a whole, as well as for related fields of research such as linguistics and history. Data sparsity presents a challenge to our goal of the automated transliteration of the administrative texts of the Ur III period. To mitigate this situation we have undertaken to annotate the whole corpus. To this end we have developed an annotation pipeline to facilitate the annotation of our gold corpus. This toolset can be re-employed to annotate any Sumerian text and will be integrated into the Cuneiform Digital Library Initiative (https://cdli.ucla.edu) infrastructure. To share these new data, we have also mapped our data to existing LOD and LLOD ontologies and vocabularies. This article provides details on the processing of Sumerian linguistic data using our pipeline, from raw transliterations to rich and structured data in the form of (L)LOD. We describe the morphological and syntactic annotation, with a particular focus on the publication of our datasets as LOD. This application of LLOD in Assyriology is unique and involves the concept of a LLOD edition of a linguistically annotated corpus of Sumerian, as well as linking with lexical resources, repositories of annotation terminology, and finally the museum collections in which the artifacts bearing these inscribed texts are kept.
Bookmarks Related papers MentionsView impact
Linguistic Linked Open Data (LLOD) is a flourishing line of research in the language resource com... more Linguistic Linked Open Data (LLOD) is a flourishing line of research in the language resource community, so far mostly adopted for selected aspects of linguistics, natural language processing and the semantic web, as well as for practical applications in localization and lexicography. Yet, computational philology seems to be somewhat decoupled from the recent progress in this area: even though LOD as a concept is gaining significant popularity in Digital Humanities, existing LLOD standards and vocabularies are not widely used in this community, and philological resources are underrepresented in the LLOD cloud diagram (http://linguistic-lod.org/llod-cloud). In this paper, we present an application of Linguistic Linked Open Data in Assyriology. We describe the LLOD edition of a linguistically annotated corpus of Sumerian, as well as its linking with lexical resources, repositories of annotation terminology, and the museum collections in which the artifacts bearing these texts are kept. The chosen corpus is the Electronic Text Corpus of Sumerian Royal Inscriptions, a well curated and linguistically annotated archive of Sumerian text, in preparation for the creating and linking of other corpora of cuneiform texts, such as the corpus of Ur III administrative and legal Sumerian texts, as part of the Machine Translation and Automated Analysis of Cuneiform Languages project (https://cdli-gh.github.io/mtaac/).
Bookmarks Related papers MentionsView impact
Digital Humanities Quarterly, 2018
Bookmarks Related papers MentionsView impact
Procedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, ACL, 2017
This paper presents a newly funded international project for machine translation and automated an... more This paper presents a newly funded international project for machine translation and automated analysis of ancient cuneiform languages where NLP specialists and Assyriologists collaborate to create an information retrieval system for Sumerian. This research is conceived in response to the need to translate large numbers of administrative texts that are only available in transcription, in order to make them accessible to a wider audience. The methodology includes creation of a specialized
NLP pipeline and also the use of linguistic linked open data to increase access to the results.
Bookmarks Related papers MentionsView impact
Invited talks by Émilie Pagé-Perron
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
In this talk, showing examples of current practice, I suggest that using new methods of modelling... more In this talk, showing examples of current practice, I suggest that using new methods of modelling data structure could enable us to discover patterns that would not be visible otherwise. My proposition is to think the data with network analysis in mind, focusing on object relationship instead of the objects themselves. The practical solution would be using graph databases like neo4j and OrientDB that store graphs instead of tables and have built-in visualization tools.
Bookmarks Related papers MentionsView impact
Panels organized by Émilie Pagé-Perron
Digital humanities provide us with new research pathways that enable us to answer questions that ... more Digital humanities provide us with new research pathways that enable us to answer questions that are difficult or impossible to resolve using traditional approaches, this often because the data explored are larger or more complex than what is casually apprehensible by researchers. This panel assembles interventions discussing research endeavors that approach text as data by means of computational methods. The presentations span over five oriental regions and cover a varied array of research topics that requires of us to think about encoding, processing, interpreting and disseminating textual data and associated research results.
Bookmarks Related papers MentionsView impact
Talks by Émilie Pagé-Perron
American Schools of Oriental Research annual meeting, Boston, November 15-18
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
The digital humanities are slowly setting foot into Assyriological studies. Cuneiform scholars ar... more The digital humanities are slowly setting foot into Assyriological studies. Cuneiform scholars are beginning to borrow and adapt computerized methods from both Science, Technology, Engineering, and Mathematics (STEM) disciplines and modern language studies. But how can we most effectively bridge the gap between those techniques and our cuneiform corpora? Most Assyriologists already standardize their digital transliterations; some are familiar with visualization tools. This paper demonstrates how, from a standardized set of digital transliterations (and other coded information relative to the tablets), new data can be created for lexical analysis and eventually for visualization. The example presented here is composed of mostly short Old Akkadian tablets provenienced in Adab. In order to generate new information, relationships between different types of words, on the same tablets, and across the corpus, must be recoded. In this instance, TEI/XML is not used since concordance at the line level is not necessary. The tablet itself, or the administrative transaction, is used as a link between the different elements present in the text. In my sample, the goal of this mining process is to feed two distinct types of visualization: a web database interface and network analysis tools. Both are used as analysis aids and diffusion tools. By closely examining the practical aspects of data mining cuneiform texts, this paper will provide insights for ancient language specialists who wish to prepare their corpus for new
types of digital investigation.
Bookmarks Related papers MentionsView impact
This paper is concerned with the work organization and practices of the fishermen that had a work... more This paper is concerned with the work organization and practices of the fishermen that had a working relationship with the e-mi / e-Bau, important household of the end of the Early Dynastic period in the Lagash region, Mesopotamia. Englund (1990) has studied the administrative practices related to the fish industry of the Ur III period in detail. Some general publications discuss fishing in Mesopotamia (Salonen 1970 and Sahrhge 1999), and other studies concerned with economy and exchange do hint at the topic (eg Prentice 2010). However, these works do not provide a clear overview of the organization and practices of the fish industry in our specific context, namely in Girsu (Lagash region), in the Early Dynastic IIIb period. What are the dynamics governing the work structure? Is there a continuity of the practices in the e-mi /e-Bau archive and compared to subsequent periods? By studying a group of 150 tablets coming from the archive of the e-mi /e-Bau concerning mostly fish transactions, examining the prosopography, and creating a typology for these tablets, it is possible to answer these questions and infer useful information for comparative analysis. In this communication, I present the peculiarities of the fishermen's work organization and their relationship with the patron household. I also demonstrate that the texts' structure is useful to understand the workers hierarchy. In light of the results, there is apparently a continuity of practice in the 17 years covered by the e-mi/e-Bau but also with the Ur III period.
Bookmarks Related papers MentionsView impact
Uploads
Peer-reviewed articles and book chapters by Émilie Pagé-Perron
NLP pipeline and also the use of linguistic linked open data to increase access to the results.
Invited talks by Émilie Pagé-Perron
Panels organized by Émilie Pagé-Perron
Talks by Émilie Pagé-Perron
types of digital investigation.
NLP pipeline and also the use of linguistic linked open data to increase access to the results.
types of digital investigation.
Conference: LAWDI Event: Digital Antiquity Coffee House
Venue : ISAW (Institute for the Study of the Ancient World) New York University, New York
October 2nd 2015
This talk discusses the technical aspects of my doctoral research project and explain my choice of technology and method. Three different aspects are of the process are addressed: storing, analyzing and visualizing the data; this in correlation with efficiency, preservation and diffusion.
Une question que ces chercheurs n'abordent pas est le choix des sources qui est une question primordiale dans l’élaboration plus riche d’un portrait du panthéon. Ce point n’est que rarement souligné dans les travaux généraux, le but étant habituellement de créer un portrait universel du panthéon. Les sources sont donc évaluées tel un corpus unique, sans examen attentif du contexte de production et de l’intention du scribe.
Je soutiens que l'utilisation de sources administratives dans ces travaux généraux pourrait enrichir notre vision des déesses mères au 3e millénaire. Les traces que nous avons des panthéons locaux et populaires dans les textes administratifs montrent une démultiplication de ces déesses, dynamique qui semble à l'opposé des autres sources composées par l'élite scribale.
Notre conception du féminin ainsi que notre contexte social aujourd’hui jouent un rôle dans l’image que nous entretenons à propos des déesses mères. Aussi le choix des sources ainsi que l’analyse et la compréhension du contexte de celles-ci ont un impact dans notre reconstruction. Il n’est pas suffisant de seulement reconnaître que nos sources primaires et secondaires ont émergé dans un contexte particulier, encore faut-il en tenir compte dans nos travaux et ce de manière systématique.
Principal Investigators
Heather D. Baker, University of Toronto, Canada, SSHRC
Christian Chiarcos, University of Frankfurt, Germany, DFG
Robert K. Englund, University of California, Los Angeles, United States, NEH
The project is funded via the T-AP Digging Into Data Challenge.