HamleDT: Harmonized multi-language dependency treebank

Daniel Zeman¹,
Ondřej Dušek¹,
David Mareček¹,
Martin Popel¹,
Loganathan Ramasamy¹,
Jan Štěpánek¹,
Zdeněk Žabokrtský¹ &
…
Jan Hajič¹

795 Accesses
26 Citations
Explore all metrics

Abstract

We present HamleDT—a HArmonized Multi-LanguagE Dependency Treebank. HamleDT is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. In the present article, we provide a thorough investigation and discussion of a number of phenomena that are comparable across languages, though their annotation in treebanks often differs. We claim that transformation procedures can be designed to automatically identify most such phenomena and convert them to a unified annotation style. This unification is beneficial both to comparative corpus linguistics and to machine learning of syntactic parsing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prague Dependency Treebank

The Turkish Treebank

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

Notes

The initial version has been described in Zeman et al. (2012).
HamleDT v1.5 does not include the harmonization of verbal groups (see Sect. 5.4).
The transformations are not robust to coordination styles.
http://www.ldc.upenn.edu/.
So far, there are only two differences between the PDT style (used in [cs]) and the HamleDT v1.5 style: handling of appositions (see Table 3) and marking of conjuncts (in HamleDT, the root of a conjunct subtree is marked as conjunct even if it is a preposition or subordinating conjunction; in PDT, only content words are marked as conjuncts). By conjunct, we mean a member of coordination (unlike Quirk et al. 1985). By content word, we mean autosemantic word, i.e. a word with a full lexical meaning, as contrasted with auxiliary. Note that PDT also has a more abstract layer of annotation (called tectogrammatical), but in this work, we only use the shallow dependencies (called analytical layer in PDT).
Unless we explicitly say otherwise, we mean by “original” the data source indicated in Table 1. It may actually differ from the really original treebank. For instance, some of the CoNLL data underwent a conversion procedure to the CoNLL format from other formats, and some information may have been lost in the process.
In the Pāṇinian tradition, karta is the agent, doer of the action, and karma is the “deed” or patient. See Bharati et al. (1994).
They are approximately the same as the dependency relation labels in the Czech CoNLL data set. To illustrate the mapping, more details on [bn] and [en] conversion are presented in Tables 4 and 5 in Appendix 2.
Ideally we would also want to distinguish objects (Obj) from adverbials. Unfortunately, this particular source annotation does not provide enough information to make such a distinction.
In Chomskian (constituency-based) approaches, it is the standard analysis that determiners function as the head of a noun phrase.
Note however that numerals governing nouns are not restricted to [da]. Czech has a complex set of rules for numerals (motivated by the morphological agreement), which may result under some circumstances in the numeral serving as the head.
In [ja], the previous token essentially means the main predicate, but if it is followed by a question particle then the punctuation node is attached to the particle.
http://ufal.mff.cuni.cz/treex/.
http://ufal.mff.cuni.cz/tred/ with EasyTreex extension.
We do not attempt at reversibility when unifying dependency relations.

References

Aduriz, I., Aranzabe, M. J., Arriola, J. M., Atutxa, A., Díaz de Ilarraza, A., Garmendia, A., & Oronoz, M. (2003). Construction of a Basque dependency treebank. In Proceedings of the 2nd workshop on treebanks and linguistic theories.
Afonso, S., Bick, E., Haber, R., & Santos, D. (2002). “Floresta sintá(c)tica”: A treebank for Portuguese. In Proceedings of the 3rd international conference on language resources and evaluation (LREC) (pp. 1968–1703).
Atalay, N. B., Oflazer, K., Say, B., & Inst, I. (2003). The annotation process in the Turkish rreebank. In Proceedings of the 4th international workshop on linguistically interpreteted corpora (LINC).
Bamman, D., & Crane, G. (2011). The ancient Greek and Latin dependency treebanks. In C. Sporleder, A. Bosch, & K. Zervanou (Eds.), Language technology for cultural heritage, theory and applications of natural language processing (pp. 79–98). Berlin, Heidelberg: Springer.
Chapter Google Scholar
Bengoetxea, K., & Gojenola, K. (2009). Exploring treebank transformations in dependency parsing. In Proceedings of the international conference RANLP-2009. Borovets, Bulgaria (pp. 33–38). Association for Computational Linguistics.
Bharati, A., Chaitanya, V., & Sangal, R. (1994). Natural language processing: A paninian perspective. New Delhi: Prentice-Hall of India.
Google Scholar
Bick, E., Uibo, H., & Müürisep, K. (2004). Arborest—A VISL-style treebank derived from an Estonian constraint grammar corpus. In Proceedings of treebanks and linguistic theories.
Boguslavsky, I., Grigorieva, S., Grigoriev, N., Kreidlin, L., & Frid, N. (2000). Dependency treebank for Russian: Concept, tools, types of information. In Proceedings of the 18th conference on computational linguistics (Vol. 2, pp. 987–991).
Bosco, C., Montemagni, S., Mazzei, A., Lombardo, V., Lenci, A., Lesmo, L., Attardi, G., Simi, M., Lavelli, A., Hall, J., Nilsson, J., & Nivre, J. (2010). Comparing the influence of different treebank annotations on dependency parsing.
Brants, S., Dipper, S., Eisenberg, P., Hansen, S., König, E., Lezius, W., et al. (2004). TIGER: Linguistic interpretation of a German corpus. Journal of Language and Computation, 2(4), 597–620. Special Issue.
Article Google Scholar
Buchholz, S., & Marsi, E. (2006). CoNLL-X shared task on multilingual dependency parsing. In Proceedings of CoNLL (pp. 149–164).
Călăcean, M. (2008). Data-driven dependency parsing for Romanian. Master’s thesis, Uppsala University.
Civit, M., Martí, M. A., & Bufí, N. (2006). Cat3LB and Cast3LB: From constituents to dependencies. In T. Salakoski, F. Ginter, S. Pyysalo, & T. Pahikkala (Eds.), FinTAL, Vol. 4139 of Lecture notes in computer science (pp. 141–152). Berlin: Springer.
Csendes, D., Csirik, J., Gyimóthy, T., & Kocsor, A. (2005). The Szeged treebank. In V. Matoušek, P. Mautner, & T. Pavelka (Eds.), TSD, Vol. 3658 of Lecture notes in computer science (pp. 123–131). Berlin: Springer.
de Marneffe, M.-C., & Manning, C. D. (2008). Stanford typed dependencies manual.
Džeroski, S., Erjavec, T., Ledinek, N., Pajas, P., Žabokrtský, Z., & Žele, A. (2006). Towards a slovene dependency treebank. In Proceedings of the fifth international language resources and evaluation conference, LREC 2006. Genova, Italy (pp. 1388–1391). European Language Resources Association (ELRA).
Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M. A., Màrquez, L., Meyers, A., Nivre, J., Padó, S., Štěpánek, J., Straňák, P., Surdeanu, M., Xue, N., & Zhang, Y. (2009). The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the 13th conference on computational natural language learning (CoNLL-2009), June 4–5. Boulder, Colorado, USA.
Hajič, J., Panevová, J., Hajičová, E., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová, M., Žabokrtský, Z., & Ševčíková-Razímová, M. (2006). Prague dependency treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia.
Haverinen, K., Viljanen, T., Laippala, V., Kohonen, S., Ginter, F., & Salakoski, T. (2010). Treebanking finnish. In M. Dickinson, K. Müürisep, & M. Passarotti (Eds.), Proceedings of the ninth international workshop on treebanks and linguistic theories (TLT9) (pp. 79–90).
Hudson, R. (2004). Are determiners heads? Functions of Language, 11(1).
Hudson, R. (2010). An encyclopedia of word grammar and English grammar. London, UK: University College London. http://tinyurl.com/wg-encyc.
Husain, S., Mannem, P., Ambati, B., & Gadde, P. (2010). The ICON-2010 tools contest on Indian language dependency parsing. In Proceedings of ICON-2010 tools contest on Indian language dependency parsing. Kharagpur, India.
Hwa, R., Resnik, P., Weinberg, A., Cabezas, C. I., & Kolak, O. (2005). Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering, 11(3), 311–325.
Article Google Scholar
Kawata, Y., & Bartels, J. (2000). Stylebook for the Japanese treebank in verbmobil. In Report 240. Tübingen, Germany.
Kromann, M. T., Mikkelsen, L., & Lynge, S. K. (2004). Danish dependency treebank.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19(2), 313–330.
Google Scholar
Mareček, D., & Žabokrtský, Z. (2012). Exploiting reducibility in unsupervised dependency parsing. In Proceedings of EMNLP-CoNLL’12 (pp. 297–307).
McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Castelló, N. B., & Lee, J. (2013). Universal dependency annotation for multilingual parsing. In Proceedings of the ACL 2013. Association for Computational Linguistics.
McDonald, R., Petrov, S., & Hall, K. (2011a). Multi-source transfer of delexicalized dependency parsers. In Proceedings of the conference on empirical methods in natural language processing (pp. 62–72). Stroudsburg, PA, USA. Association for Computational Linguistics.
McDonald, R., Petrov, S., & Hall, K. (2011b). Multi-source transfer of delexicalized dependency parsers. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 62–72). Edinburgh, Scotland, UK. Association for Computational Linguistics.
Mel’čuk, I. A. (1988). Dependency syntax: Theory and practice. New York: State University of New York Press.
Google Scholar
Montemagni, S., Barsotti, F., Battista, M., Calzolari, N., Corazzari, O., Lenci, A., et al. (2003). Building the Italian syntactic-semantic treebank. In A. Abeillé (Ed.), Building and using parsed corpora (pp. 189–210). Dordrecht: Kluwer.
Google Scholar
Nilsson, J., Hall, J., & Nivre, J. (2005). MAMBA Meets TIGER: Reconstructing a Swedish treebank from antiquity. In Proceedings of the NODALIDA special session on treebanks.
Nilsson, J., Nivre, J., & Hall, J. (2006). Graph transformations in data-driven dependency parsing. In Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics (pp. 257–264).
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., & Yuret, D. (2007). The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL 2007 shared task. Joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL).
Popel, M., & Žabokrtský, Z. (2010). TectoMT: Modular NLP framework. In Advances in natural language processing (pp. 293–304).
Popel, M., Mareček, D., Štěpánek, J., Zeman, D., & Žabokrtský, Z. (2013). Coordination structures in dependency treebanks’. In Proceedings of the 51st annual meeting of the association for computational linguistics (pp. 517–527). Sofia, Bulgaria. Association for Computational Linguistics.
Prokopidis, P., Desipri, E., Koutsombogera, M., Papageorgiou, H., & Piperidis, S. (2005). Theoretical and practical issues in the construction of a Greek dependency treebank. In Proceedings of the 4th workshop on treebanks and linguistic theories (TLT) (pp. 149–160).
Quirk, R., Greenbaum, S., & Leech, G., Svartvik, J. (1985). A comprehensive grammar of the English language. London: Longman.
Ramasamy, L., & Žabokrtský, Z. (2012). Prague dependency style treebank for Tamil. In Proceedings of LREC 2012. İstanbul, Turkey.
Rasooli, M. S., Moloodi, A., Kouhestani, M., & Minaei-Bidgoli, B. (2011). A syntactic valency lexicon for persian verbs: The first steps towards Persian dependency treebank. In 5th language and technology conference (LTC): Human language technologies as a challenge for computer science and linguistics (pp. 227–231). Poland: Poznań.
Schwartz, R., Abend, O., & Rappoport, A. (2012). Learnability-based syntactic annotation design. In Proceedings of COLING 2012: Technical papers (pp. 2405–2422). India: Mumbai.
Seginer, Y. (2007). Learning syntactic structure. Ph.D. thesis, University of Amsterdam.
Simov, K., & Osenova, P. (2005). Extending the annotation of BulTreeBank: Phase 2. In The fourth workshop on treebanks and linguistic theories (TLT 2005), Barcelona (pp. 173–184).
Smrž, O., Bielický, V., Kouřilová, I., Kráčmar, J., Hajič, J., & Zemánek, P. (2008). Prague Arabic dependency treebank: A word on the million words. In Proceedings of the workshop on Arabic and local languages (LREC 2008) (pp. 16–23). Marrakech, Morocco. European Language Resources Association.
Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., & Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In Proceedings of CoNLL.
Taulé, M., Martí, M.A., & Recasens, M. (2008). AnCora: Multilevel annotated corpora for Catalan and Spanish. In LREC. European Language Resources Association.
Tesnière, L. (1959). Éléments de syntaxe structurale. Paris: Klincksieck.
Google Scholar
Tsarfaty, R., Nivre, J., & Andersson, E. (2011). Evaluating dependency parsing: Robust and heuristics-free cross-annotation evaluation. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 385–396). Edinburgh, Scotland, UK. Association for Computational Linguistics.
van der Beek, L., Bouma, G., Daciuk, J., Gaustad, T., Malouf, R., van Noord, G., Prins, R., & Villada, B. (2002). Chapter 5. The Alpino dependency treebank. In Algorithms for linguistic processing NWO PIONIER progress report. Groningen, The Netherlands.
Zeman, D. (2008). Reusable tagset conversion using tagset drivers. In N. Calzolari, K. Choukri, B. Maegaard, Mariani J., J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the sixth international language resources and evaluation conference, LREC 2008 (pp. 28–30). Marrakech, Morocco. European Language Resources Association (ELRA).
Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., & Hajič, J. (2012). HamleDT: To parse or not to parse? In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, J. Odijk, & S. Piperidis (Eds.), In Proceedings of the eight international conference on language resources and evaluation (LREC’12). İstanbul, Turkey. European Language Resources Association (ELRA).

Download references

Acknowledgments

The authors wish to express their gratitude to all the creators and providers of the respective corpora. The work on this project was supported by the Czech Science Foundation Grant Nos. P406/11/1499 and P406/14/06548P, by the European Union Seventh Framework Programme under Grant Agreement FP7-ICT-2013-10-610516 (QTLeap), and by research resources of the Charles University in Prague (PRVOUK). This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (Project LM2010013). Finally, we are very grateful for the numerous valuable comments provided by the anonymous reviewers.

Author information

Authors and Affiliations

Faculty of Mathematics and Physics, ÚFAL, Charles University in Prague, Prague, Czech Republic
Daniel Zeman, Ondřej Dušek, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský & Jan Hajič

Authors

Daniel Zeman
View author publications
You can also search for this author in PubMed Google Scholar
Ondřej Dušek
View author publications
You can also search for this author in PubMed Google Scholar
David Mareček
View author publications
You can also search for this author in PubMed Google Scholar
Martin Popel
View author publications
You can also search for this author in PubMed Google Scholar
Loganathan Ramasamy
View author publications
You can also search for this author in PubMed Google Scholar
Jan Štěpánek
View author publications
You can also search for this author in PubMed Google Scholar
Zdeněk Žabokrtský
View author publications
You can also search for this author in PubMed Google Scholar
Jan Hajič
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Zeman.

Appendices

Appendix 1: List of included languages and treebanks

Arabic [ar]: Prague Arabic Dependency Treebank 1.0/CoNLL 2007 (Smrž et al. 2008)

http://padt-online.blogspot.com/2007/01/conll-shared-task-2007.html
Basque [eu]: Basque Dependency Treebank, a larger version than the one included in CoNLL 2007, generously provided by IXA Group (Aduriz et al. 2003)

http://hdl.handle.net/10230/17098
Bengali [bn], Hindi [hi] and Telugu [te]: Hyderabad Dependency Treebank/ICON 2010 (Husain et al. 2010)

http://ltrc.iiit.ac.in/icon/2010/nlptools/
Bulgarian [bg]: BulTreeBank (Simov and Osenova 2005)

http://www.bultreebank.org/indexBTB.html
Catalan [ca] and Spanish [es]: AnCora (Taulé et al. 2008)

http://clic.ub.edu/corpus/en/ancora-descarregues
Czech [cs]: Prague Dependency Treebank 2.0/CoNLL 2009 (Hajič et al. 2006)

http://ufal.mff.cuni.cz/pdt2.0/
Danish [da]: Danish Dependency Treebank/CoNLL 2006 (Kromann et al. 2004), now part of the Copenhagen Dependency Treebank

http://code.google.com/p/copenhagen-dependency-treebank/
Dutch [nl]: Alpino Treebank/CoNLL 2006 (van der Beek et al. 2002)

http://odur.let.rug.nl/~vannoord/trees/
English [en]: Penn TreeBank 3/CoNLL 2007 (Marcus et al. 1993)

http://www.cis.upenn.edu/~treebank/
Estonian [et]: Eesti keele puudepank/Arborest (Bick et al. 2004)

http://www.cs.ut.ee/~kaili/Korpus/puud/
Finnish [fi]: Turku Dependency Treebank (Haverinen et al. 2010)

http://bionlp.utu.fi/fintreebank.html
German [de]: Tiger Treebank/CoNLL 2009 (Brants et al. 2004)

http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html
Greek (modern) [el]: Greek Dependency Treebank (Prokopidis et al. 2005)

http://gdt.ilsp.gr/
Greek (ancient) [grc] and Latin [la]: Ancient Greek and Latin Dependency Treebanks (Bamman and Crane 2011)

http://nlp.perseus.tufts.edu/syntax/treebank/greek.html,

http://nlp.perseus.tufts.edu/syntax/treebank/latin.html
Hindi [hi]: see Bengali
Hungarian [hu]: Szeged Treebank (Csendes et al. 2005)

http://www.inf.u-szeged.hu/projectdirs/hlt/index_en.html
Italian [it]: Italian Syntactic-Semantic Treebank/CoNLL 2007 (Montemagni et al. 2003)

http://medialab.di.unipi.it/isst/
Japanese [ja]: Verbmobil (Kawata and Bartels 2000)

http://www.sfs.uni-tuebingen.de/en/tuebajs.shtml
Latin [la]: see Greek (ancient)
Persian [fa]: Persian Dependency Treebank (Rasooli et al. 2011)

http://dadegan.ir/en/persiandependencytreebank
Portuguese [pt]: Floresta sintá(c)tica (Afonso et al. 2002)

http://www.linguateca.pt/floresta/info_floresta_English.html
Romanian [ro]: Romanian Dependency Treebank (Călăcean 2008)

http://www.phobos.ro/roric/texts/xml/
Russian [ru]: Syntagrus (Boguslavsky et al. 2000)

http://ruscorpora.ru/en/
Slovene [sl]: Slovene Dependency Treebank/CoNLL 2006 (Džeroski et al. 2006)

http://nl.ijs.si/sdt/
Spanish [es]: see Catalan
Swedish [sv]: Talbanken05 (Nilsson et al. 2005)

http://www.msi.vxu.se/users/nivre/research/Talbanken05.html
Tamil [ta]: TamilTB (Ramasamy and Žabokrtský 2012)

http://ufal.mff.cuni.cz/~ramasamy/tamiltb/0.1/
Telugu [te]: see Bengali
Turkish [tr]: METU-Sabanci Turkish Treebank (Atalay et al. 2003)

http://ii.metu.edu.tr/corpus/

Appendix 2: Examples of harmonization of dependency relations

See Tables 4 and 5.

Table 4 The Bengali treebank [bn] uses 42 dependency labels, but we show only 12 most frequent ones

Full size table

Table 5 The English treebank [en] (from CoNLL 2007) uses 20 dependency labels, but their mapping to HamleDT v1.5 labels is not straightforward

Full size table

Appendix 3: List of dependency relation labels in figures

Language	Label	Description	Example
	X	Our meta-label that represents the unknown relation of the depicted subtree to its unshown parent
bg	comp	Complement, i.e. argument of non-verbal head, non-finite verbal head, copula	Figure 18
bg	indobj	Child is indirect object of parent	Figure 18
bg	mod	Child is modifier, e.g. of a noun phrase, or a negative particle modifying a verb etc.	Figure 18
bg	prepcomp	Child is noun phrase, parent is preposition	Figure 18
bg	subj	Child is subject of parent	Figure 18
bg	xcomp	Child is clausal complement; this includes complements of modal verbs	Figure 18
ca	CO	Child is coordinating conjunction, parent is the first conjunct	Figure 4
ca	CONJUNCT	Parent is the first conjunct, child is one of the other conjuncts	Figure 4
ca	PUNC	Child is punctuation symbol	Figure 4
cs, sl, la, ta	Adv	Child is adverbial modifier of parent	Figure 2
cs, sl, la, ta	Atr	Parent is noun, child is its attribute	Figure 9
cs, sl, la, ta	AuxC	Child is subordinating conjunction, parent is governing predicate. The relation of the subordinate clause to the parent is labeled at the grandchild	Figure 19
cs, sl, la, ta	AuxP	Child is preposition. The relation of the prepositional phrase to the parent is labeled at the grandchild	Figure 2
cs, sl, la, ta	AuxV	Child is auxiliary verb or negative particle, parent is content verb	Figure 19
cs, sl, la, ta	AuxX	Child is comma and does not serve as coordination root	Figure 2
cs, sl, la, ta	AuxZ	Emphasizing word	Figure 8
cs, sl, la, ta	Coord	Child serves as root of a coordinate structure	Figure 1
cs, sl, la, ta	Obj	Child is object of parent	Figure 2
cs, sl, la, ta	Pred	Child is predicate of a main clause	Figure 2
cs, sl, la, ta	Sb	Child is subject of parent	Figure 19
cs, ta	_M	Suffix to a label, saying that the child is a conjunct. The main label tags its relation to the parent of the coordinate structure	Figure 1
da	appr	Restrictive apposition (no comma)	Figure 28
da	conj	Child is conjunct, parent is first conjunct or coordinating conjunction	Figure 6
da	coord	Parent is conjunct, child is coordinating conjunction	Figure 6
da	dobj	Child is direct object of parent	Figure 28
da	expl	Child is expletive subject of parent	Figure 28
da	mod	Modifier, e.g. attribute of noun, adverbial modifier of verb, adjective attached to determiner etc.	Figure 28
da	nobj	Child is noun phrase or infinitive, parent is e.g. determiner, numeral, preposition etc.	Figure 28
da	pnct	Child is punctuation symbol	Figure 6
da	possd	Child is argument of possessive parent, i.e. child is the thing possessed	Figure 28
de	CD	Child is coordinating conjunction, parent is one conjunct and right sibling is the other conjunct	Figure 3
de	CJ	Parent and child are conjuncts	Figure 3
de	MO	Modifier. In NPs only focus particles are annotated as modifiers	Figure 23
de	NG	Child is negative particle, parent is negated verb	Figure 23
de	NK	Noun Kernel. Child attached within a noun phrase or a prepositional phrase	Figure 10
de	OA	Child is accusative object of parent	Figure 23
de	OC	Clausal object. Also verb tokens building a complex verbal form and modal constructions	Figure 23
de	PUNC	Child is punctuation symbol	Figure 3
de	SB	Child is subject of parent	Figure 23
es	atr	Attribute. E.g. child is adverbial/prepositional phrase, parent is verb	Figure 12
es	cd	Child is direct object of parent	Figure 12
es	conj	Child is subordinating conjunction	Figure 12
es	s.a	Child is adjectival phrase, parent is not verb	Figure 12
es	sn	Child is noun phrase. Parent may be e.g. preposition	Figure 12
es	spec	Specifier. E.g. child is determiner and parent is noun	Figure 12
es	suj	Child is subject of parent	Figure 12
fa	NPREMOD	Child is premodifier of parent noun	Figure 26
fa	NVE	Child is non-verbal element of compound verb. Parent is verbal element	Figure 26
fa	SBJ	Child is subject of parent	Figure 26
hi	lwg_cont	Child is additional node of a complex expression; child and parent together perform certain function	Figure 27
hi	lwg_psp	Child is postposition and modifies a noun	Figure 11
hi	lwg_vaux	Child is auxiliary verb, parent is content verb	Figure 27
hi	pof	Part of relation, e.g. part of conjunct verb	Figure 27
hi	pof_cn	Part of relation	Figure 27
hi, bn, te	adv	Child is adverbial modifier (only adverbs of manner) of parent	Figure 29
hi, bn, te	ccof	Child is conjunct, parent is coordinating conjunction or comma	Figure 29
hi, bn, te	k1	Child is karta (doer/agent/subject) of parent predicate	Figure 27
hi, bn, te	k2	Child is karma (pacient/object) of parent predicate	Figure 27
hi, bn, te	k7p	Child is deshadhikarana (location in space) of the parent predicate	Figure 30
hi, bn, te	k7t	Child is kaalaadhikarana (location in time) of the parent predicate	Figure 31
hi, bn, te	nmod	Parent is noun, child is its attribute	Figure 29
hi, bn, te	nmod_adj	Child is adjective and modifies a noun	Figure 11
hi, bn, te	r6	Shashthi (possessive). Child is possessor in genitive, parent is the possessed noun	Figure 30
hu	ATT	Attribute	Figure 15
hu	CONJ	Child is conjunction (coordinating or subordinating)	Figure 5
hu	DET	Child is determiner, parent is noun	Figure 15
hu	ILL	Child is verbal argument in illative case	Figure 15
hu	OBJ	Child is object of parent	Figure 15
hu	PUNCT	Child is punctuation symbol	Figure 5
hu	SUBJ	Child is subject of parent	Figure 15
it	cong_sub	Parent is subordinating conjunction	Figure 13
it	det	Child is determiner, parent is noun	Figure 13
it	modal	Child is modal (dovere, volere, potere) or aspectual (andare, venire, stare) verb, parent is content verb	Figure 13
it	pred	Parent is verb (often it is copula), child is predicative complement (nominal predicate)	Figure 13
it	sogg	Child is subject of parent	Figure 13
ja	ADJ	Child is adjunct of parent	Figure 25
ja	COMP	Complement, e.g. verb attached to another verb form, noun attached to postposition etc.	Figure 25
ja	SBJ	Child is subject of parent	Figure 25
nl	det	Child is determiner, parent is noun	Figure 21
nl	mod	Child is adverbial modifier (bijwoordelijke bepaling) of parent	Figure 21
nl	obj1	Child is direct object; this includes nouns attached to prepositions!	Figure 21
nl	predm	Child determines state (adverbial modifier), parent is predicate	Figure 22
nl	su	Child is subject of parent	Figure 21
nl	vc	Verbal complement. Example: parent is modal, child is infinitive	Figure 21
pt	>N	Child is left dependent of nominal core	Figure 24
pt	ADVL	Child is adverbial adjunct (adjunto adverbial) of parent	Figure 24
pt	MV	Child is main verb, parent may be e.g. modal verb	Figure 24
pt	N<	Child is right dependent of nominal core	Figure 24
pt	P<	Child is right dependent of preposition	Figure 24
pt	PRT-AUX<	Child is verbal particle (partícula de ligação verbal), e.g. between modal and content verb, parent would be modal	Figure 24
pt	PUNC	Child is punctuation symbol	Figure 24
pt	SC	Child is nominal predicate (predicativo do sujeito), parent is copula	Figure 24
pt	SUBJ	Child is subject of parent	Figure 24
ro	rel.conj.	Parent is coordinating conjunction, child is conjunct	Figure 7
ru		Child is argument other than subject. Also: genitive noun modifier of another noun	Figure 17
ru		Child is agent-object of passive parent	Figure 17
ru		Parent is noun, child is its attribute	Figure 17
ru		Child is passive participle, parent is finite auxiliary verb	Figure 17
ru		Parent is predicate, child is subject	Figure 17
ta	AComp	Child is (obligatory) adverbial complement of parent	Figure 8
tr	OBJECT	Child is object of parent	Figure 16
tr	QUESTION .PARTICLE	Child is question particle, parent is verb	Figure 16
tr	SUBJECT	Child is subject of parent	Figure 16
tr	VOCATIVE	Child is vocative noun phrase serving as doer (actor) of parent verb	Figure 16

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeman, D., Dušek, O., Mareček, D. et al. HamleDT: Harmonized multi-language dependency treebank. Lang Resources & Evaluation 48, 601–637 (2014). https://doi.org/10.1007/s10579-014-9275-2

Download citation

Published: 26 August 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10579-014-9275-2

HamleDT: Harmonized multi-language dependency treebank

Abstract

Access this article

Subscribe and save

Buy Now