Graph Databases For Diachronic Language Data Modelling
Graph Databases For Diachronic Language Data Modelling
Barbara McGillivray
King’s College London, UK
barbara.mcgillivray@kcl.ac.uk
Pierluigi Cassotti and Davide Di Pierro Fahad Khan
University of Bari Aldo Moro, Italy Istituto di Linguistica Computazionale, CNR
{surname.name}@uniba.it fahad.khan@ilc.cnr.it
86
ing them suitable to dynamic systems in which et al., 2021) and the SemEval 2020 Task 1 Latin
merging information is relevant. Unlike traditional dataset (McGillivray, 2021), a sense-annotated por-
DBMSs such as relational (Kriegel et al., 2003) or tion of the LatinISE diachronic corpus of Latin
object-oriented (Bertino and Martino, 1991) ones, (McGillivray et al., 2022);6 (iii) the integration of
Graph DBMS lack predefined structures. Neo4j 1 external contextual information (Wikidata) about
is among the most common graph DBMSs. The the occupations of Latin authors. The term ‘oc-
Graph-BRAIN2 technology (Ferilli and Redavid, cupation’ is here used in a broad sense, to refer
2020) provides intelligent information retrieval to various types of political, cultural and societal
function-alities on a graph database. Its interface profiles that identify authors in Wikidata. These
provides end users with access to data employing could be e.g., priests, philosophers, historians, ha-
schema definitions. Schemes (available in terms of giographers, among others.
classes, relationships, and attributes) coordinate
how data is presented in the interface. In Basile et 2 Resources
al. (2022), we proposed the Linguistic Knowledge
2.1 Dataset
Graph, a model based on graph DBMSs. The
Linguistic Knowledge Graph models relations LatinISE contains approximately 10 million word
between con-cepts and words, information about tokens from texts dating from the fifth cen-
word occur-rences in corpora, and diachronic tury BCE to the contemporary era; it has been
information on both concepts and words. In semi-automatically lemmatized and part-of-speech
McGillivray et al.(2023), we show an application tagged. The corpus includes metadata fields indi-
of this model to the lexical-semantic analysis of cating text identifier, author, title, dates, century,
Latin data. genre, URL of the source, and book title/number
Our choice to focus on Latin is motivated by and character names (for plays). The semantically
several factors. First, Latin has one of the longest annotated dataset we use here was created as part of
recorded histories of any human language, mak- the SemEval shared task on Unsupervised Lexical
ing it naturally suitable for quantitative studies Semantic Change Detection (Schlechtweg et al.,
(Pinkster, 1991); this, in turn, allows for corpus- 2020) and will be henceforth referred to as the
driven analyses of semantic change processes over SemEval Latin dataset. It contains in-context anno-
tations for 40 Latin lemmas, 20 of which are known
long periods. Second, this language has a partic-
to have changed their meaning concerning Chris-
ularly favourable position among historical lan-
tianity (for example, beatus, which shifted its mean-
guages: there is a high availability of extensive
ing from ‘fortunate’ to ‘blessed’), and 20 are known
Latin corpora in digital form (some of which have
not to have changed their meaning between the
been linked to language resources at the level of
BCE era and the CE era. For each of these lemmas,
word lemmas in the context of the LiLa project 3)
60 sentences were annotated, of which 30 were
and of computational language resources such as
randomly extracted from BCE texts and 30 from
Latin WordNet (Minozzi, 2017) and digitized dic-
CE texts. The annotation was conducted following
tionaries such as the Lewis & Short Latin dictionary4.
a variation of the DuReL framework (Schlechtweg
Focusing on the development of the Latin lan-
et al., 2018) described in Schlechtweg et al. (2020):
guage, in this paper we expand the range of
the degree by which a usage instance of a tar-
Latin language resources included in the Linguis-
get word is related to each of its possible dictio-
tic Knowledge Graph for the study of lexical se-
nary definitions was annotated using a four-point
mantic change in Latin.5 Our contributions in-
scale (Unrelated, Distantly Related, Closely Re-
clude: (i) the ingestion of Latin WordNet into
lated, and Identical). The definitions were drawn
the Linguistic Knowledge Graph; (ii) a new cu-
from the Logeion online dictionary (https://
rated linking between existing resources for Latin,
logeion.uchicago.edu/), which contains
namely Latin WordNet (Minozzi, 2017; Biagetti
1
Lewis and Short’s Latin-English Lexicon (1879)
https://neo4j.com/
2
http://193.204.187.73:8088/GraphBRAIN/
(Lewis and Short, 1879), Lewis’ Elementary Latin
3
https://lila-erc.eu/ Dictionary (1890) (Lewis, 1890), and the dictionary
4
https://lila-erc.eu/data/ by Du Fresne Du Cange et al. (1883-1887). The de-
lexicalResources/LewisShort/Lexicon
5 6
Our code and data are available at https://github. Openly available at https://lindat.mff.cuni.
com/linguisticGraph/latin-graph cz/repository/xmlui/handle/11234/1-2506.
87
tails of the annotation are described in McGillivray Wikidata for de-noising and linking the authors
et al. (2022). of the documents containing the sentences in our
dataset.
2.2 Curated Linking First, we extracted the Wikidata entities
We manually linked each word sense of the Se- for which the author’s occupation is specified
mEval Latin dataset to one or more WordNet (wdt:P106, occupation), and Latin (wd:Q397,
synsets. We started with the dataset provided by the Latin) is one of the writing languages for the author
LiLa project (Franzini et al., 2019), which contains (wdt:P6886, writing language). We retrieve infor-
a sample of 10,314 lemmas from Latin WordNet mation about each author in the form of key/value
(LWN) (Minozzi, 2017; Biagetti et al., 2021). The properties. Author names in the SemEval Latin
LiLa team verified and corrected, where necessary, dataset can occur in different languages and dif-
the synsets associated with each lemma of the sam- ferent forms, for example praenomen and nomen
ple and linked them to version 3.0 of Princeton followed by cognomen e.g., Marcus Tullius Cicero;
WordNet (PWN) (Fellbaum, 1998; Miller, 1992). cognomen followed by praenomen and nomen e.g.,
However, as the LiLa dataset only covers 22 of the Cicero, Marcus Tullius; only cognomen e.g., Ci-
40 lemmas in our dataset, we used LWN as a refer- cero; only praenomen and nomen e.g., Marcus Tul-
ence for the remaining 18 lemmas. We converted lius. We processed the author’s mentions in the
the synset codes 1.6 used by LWN to version 3.0 SemEval Latin dataset and the writer labels and
of PWN for consistency. aliases extracted from Wikidata, performing lower-
The senses assigned to the target words in the case and punctuation removal. Matching is realized
SemEval Latin dataset often condensed multiple by computing the Levenshtein distance (Schimke
meanings into a single definition, requiring mul- et al., 2004) between the author reported in the Se-
tiple synsets to be linked to the same meaning to mEval dataset and all the collected surface forms
capture all nuances. For example, the sense “un- (i.e., labels/aliases) from Wikidata. The surface
derstanding, judgment, wisdom, sense, penetration, forms are then ranked by decreasing Levenshtein
prudence” of the lemma consilium was linked to distance. If the Levenshtein distance between the
four synsets. author’s mention and the top-ranked surface form
In some cases, a particular sense could not be is less than a fixed threshold, i.e., δ = 0.1, the en-
described by any of the assigned synsets in the tity referenced by the surface form is linked to the
LiLa dataset. In such cases, we searched for the author’s mention. For each author, Wikidata pro-
lemma in LWN and selected a more appropriate vides rich information, such as biographical data,
synset. This was the case e.g. for the adjective the author’s works, and events that influenced their
acerbus and one of its meanings in the SemEval life and production. In this study, we focus on
Latin dataset “(of things) heavy, sad, bitter”. For occupation information: we encode the informa-
this meaning we selected the synset 01650376-a tion provided by Wikidata about the occupations
“psychologically painful” from LWN. When we of the author exploiting the property wdt:P106 (oc-
could not find the synset in either LWN or the LiLa cupation). In particular, we create nodes of type
dataset, we looked for the most suitable synset Occupation for each occupation retrieved in Wiki-
in PWN. However, for some meanings specific to data, generating a relationship between the author
Roman culture and institutions, we could not find and their respective occupation.
a suitable synset, such as with the meaning ‘Virtue,
personified as a deity’ of virtus. In these cases, we 3 GraphBRAIN
did not link the sense to WordNet.
We stored the above information in a graph-based
2.3 Contextual Information structure, specifically in a knowledge graph based
In some instances, the metadata field of the Se- on the GraphBRAIN technology (Ferilli and
mEval Latin dataset (which indicates the author Redavid, 2020). GraphBRAIN is an approach
and title of the text, dating, and genre) was noisy, to knowl-edge bases in graph form using a graph
incorrectly structured, or incomplete. Wikidata database (DB) to store information, coupled with
is an extensive, collaboratively maintained knowl- an ontol-ogy that defines what information can
edge base (Vrandečić and Krötzsch, 2014), hosting be stored in the DB and how it must be
more than one hundred million items. We exploited described. Unlike the RDF graph model,
traditionally used in Seman-
88
tic Web approaches, GraphBRAIN adopts the La- knowledge discovery since it contains most aspects
belled Property Graph (LPG) model, where nodes of the knowledge that we need. It is linked to
and arcs may be labelled and carry information the Person who wrote the text (HAS_AUTHOR),
as attribute-value pairs, ensuring a more compact commonly named the “author”. A document may
and human-readable representation of knowledge. CONCERN specific A rtifacts, D evices, belong
The DBMS underlying GraphBRAIN is currently to (BELONGS_TO) one Category, be written
Neo4j (Miller, 2013), which is schema-less. Graph- in at least one (HAS_LANGUAGE) Language
BRAIN proposes an XML-based formalism to ex- and published (PUBLISHED_IN). We represent
press LPG ontologies that can be mapped onto the Texts belonging to (BELONGS_TO) documents.
elements of LPG graphs and act as a schema for From the text, we are able to represent the Words it
the DB (Ferilli et al., 2022b). This approach brings contains. Lemmas are labelled with their informa-
several advantages. The efficiency of a native LPG tion, e.g., morphology and PartOfSpeech tags. On
graph DB can be leveraged to run network analysis the other hand, word forms have (HAS_LEMMA)
and graph mining algorithms. In contrast, the ex- lemmas. Synsets have relationships with each
pressiveness of the ontology can be leveraged for other; one may be a sub-synset (hyponym) of an-
advanced automated reasoning capabilities. The on- other (IS_A) or be equivalent to (SAME_AS) an-
tology and data can be imported from or exported other one in a different database. This happens
to Web Owl Language (OWL), thus enabling the when mapping Princeton WordNet to Latin Word-
use of Semantic Web tools. However, they can Net. Time needs to be modelled for diachronic
also be imported or exported to other formalisms analysis. TemporalSpecification includes TimeIn-
(e.g., Prolog), enabling different kinds of inference, tervals and specific T imePoints, n amely Year,
e.g., rule-based deduction, abduction, abstraction, Month, and Day. This model allows authors and
argumentation (Esposito et al., 2000). texts to be bound to specific time p eriods. More-
The Linguistic Knowledge Graph (McGillivray over, we have Events, which may come in handy
et al., 2023) allows us to express information about to understand the reason why some words changed
corpora, linguistic properties (background lexical, their meaning (e.g., in relation to Christianity).
morphological, syntactic, and semantic informa-
tion), time, and context; linguistic information can 3.2 Latin WordNet Ingestion
be imported from existing resources such as Word-
The Latin WordNet (LWN) project is an initia-
Net. Its lexical part is inspired by and aligned to
tive to create and share a common lexico-semantic
the standard ontological lexicon model OntoLex-
database of the Latin language. The project orig-
Lemon (McCrae et al., 2014). A corpus can be
inated as a branch of the MultiWordNet (Pianta
described at several levels of granularity (word,
et al., 2002) project. For diachronic analyses, link-
sentence, text, document). Contextual information
ing linguistic resources with temporal information
concerns the standard bibliographic metadata (e.g.,
allows us to uncover instances of semantic changes
authors, publishers) but may be expanded to other
in the usage of words. Hence, we provide a mech-
entities (e.g., events). Time information can de-
anism to enrich the Linguistic Knowledge Graph
scribe specific time points (days, months, years,
with Latin WordNet and exploit the hierarchical
centuries) or time intervals.
structure of the relationships between synsets.
In Section 3, we described the GraphBRAIN
3.1 Linguistic Ontology
tech-nology and its reliance on schemes/
To address the need to create a shared vocabu- ontologies to deliver information extraction and
lary to visualize and connect the data, we here de- reasoning functionalities. We mapped the Latin
scribe our linguistic ontology’s main components. WordNet data with the portion of our
This scheme collects all the relevant pieces of in- ontology specifi-cally devoted to linguistic
formation available in standard lexical databases analysis and understand-ing. Further details
and other relevant sources of knowledge for di- about scheme specifications for document
achronic analysis. We report the classes and re- representation are available in (Fer-illi et al.,
lationships of our ontology in boldface; words 2022a). Here we describe the map-ping
are represented in lower-case, and relationships between the lexical database and our schema. In
in upper-case. Document represents the hub for LWN, we identified the following resources,
grouped into separate Comma Separated Value
89
(CSV) files: lemma, lexical_relation, literal_sense, we have the PoS tag information, which is
metaphoric_sense, metonymic_sense, phrase, se- modelled in the same way described above.
mantic_relation, synset. Each resource has features
that may be seen as classical columns in a rela- • semantic_relation: a relationship between two
tional database. From now on, we refer to specific synsets. Based on the semantic_relation.type
fields as resource.field to uniquely identify them several relationships may be expressed. They
and motivate how we map them. The alignment are mapped into the following ones and
process is as follows: their corresponding inverses: PART_OF,
HAS_SUBCLASS, ATTRIBUTE_OF, SIM-
• lemma: a specific lemma is embedded in our ILAR_TO, ANTONYMOUS_OF, PER-
class Lemma. A Lemma is characterized TAINS_TO, PART_PARTICIPLE_OF,
by a unique id, a lemma (its value), and a CAUSES, and ENTAILS.
PoS tag (modelled as a relationship). For our
purposes, the class PartOfSpeech collects all • synset: a synset is embedded in LexiconCon-
the pos tags used, following the Universal PoS cept while its property synset.gloss, which is
Tags standard7 . We can represent other fields the description of the synset, is represented
expressed in LWN, such as lemma.uri. as the attribute description of the class Lex-
iconConcept. synset.gloss is the description
• lexical_relation: this represents a relation- of the synset and is mapped onto the attribute
ship between two Lemmas. The field lex- description.
ical_relation.type specifies the type of re-
lationship. We modelled the present ones Thanks to this mapping, we can acquire the LWN
with some explicit names which express resource and represent it in our formalism, which
their meanings: ANTONYMOUS_OF, PER- allows us to leverage the connections between the
TAINS_TO (to refer to the type of rela- different datasets, as explained via examples in the
tion indicated by the attribute of the rela- next section.
tions), with their corresponding inverses, e.g.
IS_PAST_PARTICIPLE_OF. 4 Analysis and Discussion
• literal_sense: this represents a relationship Figure 1 shows the subgraph for the word humani-
between a lemma, identified by the field lit- tas. The occurrences of humanitas are annotated in
eral_sense.lemma, and a synset, identified by the SemEval dataset with three senses: (i) ‘human
literal_sense.synset. We call this relationship nature, humanity’, (ii) ‘humanity, philanthropy’,
expresses. We highlight that the relationship and (iii) ‘mankind’.8 In the curated link, we as-
has a “literal” sense by adding a specific at- sociate the sense (i) to the humanness.n.01 synset,
tribute sense. Additional information about the sense (ii) to the synsets kindness.n.01, kind-
the period and genre is available. ness.n.03, and courtesy.n.03 and sense (iii) to the
synset world.n.08. According to the Thesaurus
• metaphoric_sense: similarly to the previous Linguae Latinae (Thesaurus-Kommission, 1900–),
one, this represents a relationship between which confirms the first attestation of all senses in
a lemma and a synset, where the sense is the 1st century BCE, the sense (ii) ‘humanity, phi-
“metaphoric”. lanthropy’ developed from the more general sense
(i) ‘human nature, humanity’ which refers to hu-
• metonymic_sense: as before, but the sense is man nature in general. The subgraph shows that the
“metonymic” in this case. three senses are attested at least once in passages
dated 1st century BCE. However, the graph shows
• phrase: a phrase is a word or a multi-word
that the sense of ‘philanthropy’ dominates all other
expression. In both cases, the concept is ex-
senses in the 1st century BCE. In the transition to
pressed by the class Lemma since for our pur-
the CE period, the sense of ‘humanity’ prevails
poses both concepts play an equally important
role when analysing semantic changes. Again, 8
A fourth sense ‘liberal education, good breeding, the
elegance of manners or language, refinement’ was annotated
7
https://universaldependencies.org/u/ in the Latin dataset, but not encoded in the graph, since the
pos/ author matching described in Section 2.3 failed.
90
Figure 1: Subgraph for the word humanitas, including the sentences in which the lemma humanitas occurs in the
SemEval Latin dataset, the century of the works from which the sentences were extracted, the annotated senses
in the SemEval Latin dataset, and the curated links between the senses and the synsets in Latin WordNet. The
sentences are represented as Text nodes (in blue), the senses and the synsets as LexiconConcept nodes (in green),
and the centuries as TimePoint nodes (in red).
regarding the number of annotations, and the two izes in its meaning in the sphere of morality, origi-
meanings coexist in the CE period. nating the sense (ii) ‘philanthropy’. In the example
of humanitas shown in Figure 1, the injected infor-
By ascending the WordNet hierarchy, we can
mation from WordNet was exploited to analyze the
gain deeper insight into the relationship between
semantic relationship between the meanings of the
the two senses. The sense (ii) ‘humanity, philan-
lemma humanitas. While the synset taxonomy in
thropy’ and the sense (i) ‘human nature’ are con-
this example helps us track and classify phenom-
nected via two paths: sense (ii) originates from
ena of semantic change, including other types of
the quality.n.01 synset (i.e. ‘an essential and dis-
information retrievable from the metadata can help
tinguishing attribute of something or someone’);
gain further insights into the context of the seman-
sense (i) from the attribute.n.02 synset (i.e., ‘an
tic change. We add information about the authors’
abstraction belonging to or characteristic of an en-
occupations in the examples shown in Figure 2.
tity’). The two senses have in common the qual-
ity.n.01 synset, but the sense (ii) ‘humanity, philan- In Figure 2, three examples of subgraphs are
thropy’ is directly linked to kindness.n.01 synset, shown. The three graphs refer to the encoded
and to a higher degree of the WordNet hierarchy information for the Latin lemmas beatus, poena,
to the morality.n.01 synset (i.e., ‘concerned with and salus, respectively. In particular, we
the distinction between good and evil or right and filtered for nodes of type Text (blue nodes),
wrong’). The additional information provided by Century (red nodes), Synset (green nodes), and
including the WordNet hierarchy in the graph al- Occupation (yellow nodes). We grouped the Text
lows us to show the type of semantic relationship nodes by occu-pation and century, i.e., we created
between the two predominant senses of humanitas. an explicit link between nodes of type Text and
The more general sense (i) ‘human nature’ special- nodes of type Time-Point and between nodes of
type Text and nodes of
91
(a) Subgraph for beatus. The synsets for beatus are: (i) beatified.s.01: Roman Catholic; proclaimed one of the blessed and thus
worthy of veneration, (ii) blessed.s.05: enjoying the bliss of heaven, (iii) rich.a.01: possessing material wealth, (iv) fortunate.a.01:
having unexpected good fortune, (v) ample.s.02: affording an abundant supply, (vi) happy.a.01: enjoying or showing or marked
by joy or pleasure or good fortune
(b) Subgraph for poena. The synsets for poena are: (i) retribution.n.01: a justly deserved penalty, (ii) suffering.n.04: feelings of
mental or physical pain, (iii) agony.n.01: intense feelings of suffering; acute mental or physical pain
92
(c) Subgraph for salus. The synsets for salus are: (i) health.n.01: a healthy state of well-being, (ii) redemption.n.01: (Christianity)
the act of delivering from sin or saving from evil, (iii) greeting.n.01: an acknowledgment or expression of goodwill
93
that we know about semantic changes prompted of GraphBRAIN and wrote section 3.
by the advent of Christianity, which invested many We acknowledge the support of the PNRR
words already in use in pre-Christian Latin with projects FAIR - Future AI Research (PE00000013),
new meanings closely related to the Christian world Spoke 6 - Symbiotic AI (CUP H97G22000210007)
(Burton, 2011). Moreover, the lemmas shown in and CHANGES - Cultural Heritage Active
Figure 2 illustrate the different types of interaction innovation for Next-GEn Sustainable society
between older and new senses described in litera- (PE00000020), Spoke 3 - Digital Libraries,
ture (Traugott and Dasher, 2001, 10–12): in some Archives and Philology, under the NRRP MUR
cases, the two senses can continue to coexist, as program funded by the NextGenerationEU.
for the lemmas salus and poena (a phenomenon
called ‘layering’ (Hopper, 1991, 22)); in others, as
for the lemma beatus, the relationship between the References
new sense and the older ones is unbalanced as the
Florentina Armaselu, Elena Simona Apostol, Anas Fa-
new sense becomes more prominent in a society had Khan, Chaya Liebeskind, Barbara McGillivray,
invested in Christian values. Ciprian-Octavian Truica, Andrius Utka, Giedre Val-
unaite Oleskeviciene, and Marieke van Erp. 2022.
5 Conclusion and Future Work LL(O)D and NLP perspectives on semantic change
for humanities research. Semantic Web, 13(6):1051–
We applied diachronic lexical-semantic analysis by 1080.
integrating different resources into a graph-based
structure. Future research should be devoted to en- Pierpaolo Basile, Pierluigi Cassotti, Stefano Ferilli, and
Barbara McGillivray. 2022. A new time-sensitive
riching the dataset by collecting other resources to model of linguistic knowledge for graph databases.
uncover more complex relationships and possibly In Proceedings of the 1st Workshop on Artificial In-
automatically detect semantic changes among all telligence for Cultural Heritage, AI4CH 2022, co-
terms in the vocabulary. Currently, our model does located with the 21st International Conference of the
not include a programmatic way to automatically Italian Association for Artificial Intelligence (AIxIA
2022), Udine, Italy, November 28, 2022, volume
detect instances of semantic changes, but this is an 3286 of CEUR Workshop Proceedings, pages 69–80.
avenue of future research. We plan to publish a CEUR-WS.org.
version of the graph database in which experiments
can be replicated. Elisa Bertino and Lorenzo Martino. 1991. Object-
oriented database management systems: concepts
Authors’ contributions and and issues. Computer, 24(4):33–47.
Acknowledgements Erica Biagetti, Chiara Zanchi, and William Michael
BMcG contributed to the design of the study, man- Short. 2021. Toward the creation of WordNets for
ancient Indo-European languages. In Proceedings
aged the project, provided the SemEval dataset of the 11th Global Wordnet Conference, pages 258–
and wrote sections 1 and 2.1. PM provided the 266, University of South Africa (UNISA). Global
curated linking between the annotated SemEval Wordnet Association.
Latin dataset and WN, and wrote sections 2.2 and
4. PC processed the annotated LatinISE corpus, ex- Philip Burton. 2011. Christian latin. In A companion to
the Latin language, pages 485–501, Oxford. Wiley-
tracted metadata information from WikiData, gen- Blackwell.
erated the graph and the visualizations (Figure 1
and Figure 2), wrote Section 2.3, and contributed Christian Chiarcos, Katerina Gkirtzou, Maxim Ionov,
in writing Section 4. PB contributed to the design Besim Kabashi, Fahad Khan, and Ciprian-Octavian
of the study, generated the graph and wrote sec- Truică. 2022. Modelling collocations in OntoLex-
FrAC. In Proceedings of Globalex Workshop on
tion 3. FK proofread the article and contributed Linked Lexicography within the 13th Language Re-
to discussions on the relationship between native sources and Evaluation Conference, pages 10–18,
KG approaches to modelling lexical data as graphs Marseille, France. European Language Resources
and RDF/OntoLex approaches. DD contributed to Association.
the design of the schema and the upload of LWN
Charles Du Fresne Du Cange, G. A. Louis Hen-
resources into the LPG-based KG and wrote sec- schel, P. Carpentier, Johann Christoph Adelung, and
tion 3. SF contributed to the design of the schema Léopold Favre. 1883-1887. Glossarium mediæet in-
and the Knowledge Representation methodology fimælatinitatis. L. Favre, Niort.
94
Floriana Esposito, Giovanni Semeraro, Nicola Fanizzi, Fahad Khan. 2020. Representing temporal information
and Stefano Ferilli. 2000. Multistrategy theory revi- in lexical linked data resources. In Proceedings of the
sion: Induction and abduction in INTHELEX. Mach. 7th Workshop on Linked Data in Linguistics (LDL-
Learn., 38(1-2):133–156. 2020), pages 15–22, Marseille, France. European
Language Resources Association.
Christiane Fellbaum. 1998. WordNet: An Electronic
Lexical Database. MIT Press, Cambridge, MA. Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, and
Thomas Seidl. 2003. The paradigm of relational in-
Stefano Ferilli and Domenico Redavid. 2020. The dexing: A survey. In BTW 2003–Datenbanksysteme
graphbrain system for knowledge graph management für Business, Technologie und Web, Tagungsband der
and advanced fruition. In Foundations of Intelligent 10. BTW Konferenz. Gesellschaft für Informatik eV.
Systems: 25th International Symposium, ISMIS 2020, Charlton T. Lewis. 1890. An Elementary Latin Dictio-
Graz, Austria, September 23–25, 2020, Proceedings, nary. American Book Company, New York, Cincin-
pages 308–317. Springer. nati, and Chicago.
Stefano Ferilli, Domenico Redavid, and Davide Charlton T. Lewis and Charles Short. 1879. A Latin
Di Pierro. 2022a. Holistic graph-based document Dictionary, Founded on Andrews’ edition of Freund’s
representation and management for open science. In- Latin dictionary revised, enlarged, and in great part
ternational Journal on Digital Libraries, pages 1–23. rewritten by Charlton T. Lewis, Ph.D. and Charles
Short. Clarendon Press, Oxford.
Stefano Ferilli, Domenico Redavid, and Davide Di
Pierro. 2022b. Lpg-based ontologies as schemas John McCrae, Christiane Fellbaum, and Philipp Cimi-
for graph dbs. In Proceedings of the 30th Italian ano. 2014. Publishing and linking wordnet using
Symposium on Advanced Database Systems, SEBD lemon and rdf. In Proceedings of the 3rd Workshop
2022, Tirrenia (PI), Italy, June 19-22, 2022, volume on Linked Data in Linguistics.
3194 of CEUR Workshop Proceedings, pages 256– Barbara McGillivray. 2021. Dataset: Latin lex-
267. CEUR-WS.org. ical semantic annotation. Figshare. DOI:
https://doi.org/10.18742/16974823.v1.
Greta Franzini, Andrea Peverelli, Paolo Ruffolo, Marco
Passarotti, Helena Sanna, Edoardo Signoroni, Vi- Barbara McGillivray, Pierluigi Cassotti, Pierpaolo
viana Ventura, and Federica Zampedri. 2019. Nunc Basile, Davide Di Pierro, and Stefano Ferilli (in
Est Aestimandum: Towards an evaluation of the latin press). 2023. Using graph databases for historical
wordnet. In Proceedings of the Sixth Italian Con- language data: Challenges and opportunities. In Pro-
ference on Computational Linguistics. Accademia ceedings of the 19th Italian Research Conference on
University Press. Digital Libraries, Bari, Italy, February 23-24, 2023,
CEUR Workshop Proceedings. CEUR-WS.org.
Dirk Geeraerts, Caroline Gevaert, and Dirk Speelman.
2012. Current methods in historical semantics. Cur- Barbara McGillivray and Gard B. Jenset. 2023. Quanti-
rent methods in historical semantics, pages 73–109. fying the quantitative (re-)turn in historical linguis-
tics. Humanities and Social Sciences Communica-
tions, 10(37).
Paul J. Hopper. 1991. On some principles of grammat-
icalization. In Approaches to grammaticalization, Barbara McGillivray, Daria Kondakova, Annie Burman,
pages 17–35, Amsterdam, Philadelphia. John Ben- Francesca Dell’Oro, Helena Bermúdez Sabel, Paola
jamins Publishing. Marongiu, and Manuel Márquez Cruz. 2022. A new
corpus annotation framework for latin diachronic lexi-
Anas Fahad Khan. 2018. Towards the representation of cal semantics. Journal of Latin Linguistics, 21(1):47–
etymological data on the semantic web. Information, 105.
9(12):304. Publisher: MDPI AG.
George A. Miller. 1992. WORDNET: a lexical database
Anas Fahad Khan, Christian Chiarcos, Thierry De- for english. In Speech and Natural Language: Pro-
clerck, Daniela Gifu, Elena González-Blanco Gar- ceedings of a Workshop Held at Harriman, New York,
cía, Jorge Gracia, Maxim Ionov, Penny Labropoulou, USA, February 23-26, 1992. Morgan Kaufmann.
Francesco Mambrini, John P. McCrae, Émilie Pagé- Justin J Miller. 2013. Graph database applications and
Perron, Marco Passarotti, Salvador Rosl Muñoz, and concepts with neo4j. In Proceedings of the south-
Ciprian-Octavian Truică. 2022. When linguistics ern association for information systems conference,
meets web technologies. recent advances in mod- Atlanta, GA, USA, volume 2324.
elling linguistic linked data. Semantic Web, pages
1–64. Stefano Minozzi. 2017. Latin wordnet, una rete di
conoscenza semantica per il latino e alcune ipotesi
Anas Fahad Khan, John P McCrae, Francisco Javier Mi- di utilizzo nel campo dell’information retrieval. In
naya Gómez, Rafael Cruz González, and Javier E Strumenti digitali e collaborativi per le Scienze
Díaz-Vera. 2023. Some considerations in the con- dell’Antichita, pages 123–134, Venezia. Università
struction of a historical language wordnet. Ca’ Foscari.
95
Marco Passarotti, Francesco Mambrini, Greta Franzini,
Flavio Massimiliano Cecchini, Eleonora Litta, Gio-
vanni Moretti, Paolo Ruffolo, and Rachele Sprugnoli.
2020. Interlinking through lemmas. the lexical collec-
tion of the lila knowledge base of linguistic resources
for latin. Studi e Saggi Linguistici, 58.
Emanuele Pianta, Luisa Bentivogli, and Christian Gi-
rardi. 2002. Multiwordnet: developing an aligned
multilingual database. In First international confer-
ence on global WordNet, pages 293–302.
Harm Pinkster. 1991. Sintassi e semantica latina.
Rosenberg & Sellier.
Sascha Schimke, Claus Vielhauer, and Jana Dittmann.
2004. Using adapted levenshtein distance for on-
line signature authentication. In Proceedings of the
17th International Conference on Pattern Recogni-
tion, 2004. ICPR 2004., volume 2, pages 931–934.
IEEE.
Dominik Schlechtweg, Barbara McGillivray, Simon
Hengchen, Haim Dubossarsky, and Nina Tahmasebi.
2020. Semeval-2020 task 1: Unsupervised lexical
semantic change detection. In Proceedings of the
Fourteenth Workshop on Semantic Evaluation, Se-
mEval@COLING 2020, Barcelona (online), Decem-
ber 12-13, 2020, pages 1–23. International Commit-
tee for Computational Linguistics.
Dominik Schlechtweg, Sabine Schulte im Walde, and
Stefanie Eckmann. 2018. Diachronic Usage Relat-
edness (DURel): A framework for the annotation
of lexical semantic change. In Proceedings of the
2018 Conference of the North American Chapter of
the Association for Computational Linguistics: Hu-
man Language Technologies, pages 169–174, New
Orleans, Louisiana.
Thesaurusbüro München Internationale Thesaurus-
Kommission, editor. 1900–. Thesaurus linguae lati-
nae. Mouton de Gruyter, Berlin.
Elizabeth Closs Traugott and Richard B. Dasher. 2001.
Regularity in semantic change. Cambridge Univer-
sity Press, Cambridge.
Denny Vrandečić and Markus Krötzsch. 2014. Wiki-
data: a free collaborative knowledgebase. Communi-
cations of the ACM, 57(10):78–85. Publisher: ACM
New York, NY, USA.
96