Skip to main content
Anne David

    Anne David

    For languages with significant inflectional morphology, development of a morpho-logical parser is often a prerequisite to fur-ther computational linguistic capabilities. We focus on two difficulties for this devel-opment: the short... more
    For languages with significant inflectional morphology, development of a morpho-logical parser is often a prerequisite to fur-ther computational linguistic capabilities. We focus on two difficulties for this devel-opment: the short lifetime of software such as parsing engines, and the difficulty of porting grammars to new parsing engines. We describe a methodology we have de-veloped to promote portability, using a for-mal declarative grammar written in XML, which we supplement with a traditional de-scriptive grammar. The two grammars are combined into a single document using
    There are perhaps seven thousand languages in the world, ranging from the largest with hundreds of millions of speakers, to the smallest, with one speaker. On a different axis, languages can be ranked according to the quantity and quality... more
    There are perhaps seven thousand languages in the world, ranging from the largest with hundreds of millions of speakers, to the smallest, with one speaker. On a different axis, languages can be ranked according to the quantity and quality of computational resources. Not surprisingly, there are correlations between these two axes: languages like English and Mandarin have substantial resources, while many of the smallest languages are completely undocumented. Nevertheless, the correlation is not perfect; there are languages with a million speakers which are more or less unwritten, and there are very large languages – some of the languages of India, for example – which are relatively resource-poor. Unfortunately, what counts as resource-rich (or even resource-adequate) in computational linguistics is a moving target. For languages to move in the direction of resource richness, considerable effort (people and money) have to be provided over a prolonged period of time. One can sit back a...
    For languages with inflectional morphology, development of a morphological parser can be a bottleneck to further development. We focus on two difficulties: first, finding people with expertise in both computer programming and the... more
    For languages with inflectional morphology, development of a morphological parser can be a bottleneck to further development. We focus on two difficulties: first, finding people with expertise in both computer programming and the linguistics of a particular language, and second, the short lifetime of software such as parsers. We describe a methodology to split parser building into two tasks: descriptive grammar development, and formal grammar development. The two grammars are combined into a single document using Literate Programming. The formal grammar is designed to be independent of a particular parsing engine’s programming language, so that it can be readily ported to a new parsing engine, thus helping solve the software lifetime problem.