For every discrete probability distribution, there is one and only one partial summation which le... more For every discrete probability distribution, there is one and only one partial summation which leaves the distribution unchanged. This invariance property is reconsidered for distributions with one parameter. We show that if we change the parameter value in the function which defines the summation, two families of distributions can be observed. The first of them consists of distributions which are resistant to the change. For these distributions, the change of the parameter is reversed by the normalization constant, and the distributions remain unchanged. The other contains distributions sensitive to the change. Partial summations with the changed parameter value applied to sensitive distributions result in new distributions with two parameters. A necessary and sufficient condition for a distribution to be resistant to the parameter change is presented.
Bivariate partial-sums discrete probability distributions are defined. The question of the existe... more Bivariate partial-sums discrete probability distributions are defined. The question of the existence of a limit distribution for iterated partial summations is solved for finite-support bivariate distributions which satisfy conditions under which the power method (known from matrix theory) can be used. An oscillating sequence of distributions, a phenomenon which has never been reported before, is presented.
ABSTRACT We derive a mathematically based method for switching from continuous to discrete lingui... more ABSTRACT We derive a mathematically based method for switching from continuous to discrete linguistic models and back. Several examples are presented. A general algorithmic approach is suggested.
ABSTRACT The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete ... more ABSTRACT The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete models in linguistics. It is argued that the stochastic independence, one of necessary conditions for a correct application of the test, is not realistic for linguistic data. Several alternative possibilities (computational and empirical approaches) are suggested. Advantages and drawbacks of the alternatives are discussed.
According to the Menzerath-Altmann law, there is inverse proportionality between sizes of languag... more According to the Menzerath-Altmann law, there is inverse proportionality between sizes of language units and their constituents (i.e., longer language units are composed of shorter constituents, and vice versa). The validity of the law was confirmed many times for the relation between lengths of a word and its syllables. However, the relation between lengths of sentences (measured in clauses) and clauses (measured in words) is problematic. In this paper, a new language unit – linear dependency segment – is introduced with the motivation to avoid some problems connected to the Menzerath-Altmann law on the syntactic level. The new unit is intermediate between clause and word and its definition takes into account both the linearity of language and dependency syntactic structure. It is shown that the relation between sentence length in clauses and clause length measured in linear dependency segments abides by the Menzerath-Altmann law in two Czech dependency treebanks.
According to the Menzerath-Altmann law, there is a relation between the size of the whole and the... more According to the Menzerath-Altmann law, there is a relation between the size of the whole and the mean size of its parts. The validity of the law was demonstrated on relations between several language units, e.g., the longer a word, the shorter the syllables the word consists of. In this paper it is shown that the law is valid also in syntactic dependency structure in Czech. In particular, longer clauses tend to be composed of shorter phrases (the size of a phrase is measured by the number of words it consists of).
ABSTRACT Let random variables X * ,X have discrete distributions on the nonnegative integers and ... more ABSTRACT Let random variables X * ,X have discrete distributions on the nonnegative integers and let P{X=k}=c∑ j=k ∞ P{X * =j},k=0,1,2,⋯, with c a proper constant. Repeated summations of this type are investigated. The limit distribution is geometric for a wide class of parent distributions.
We investigate the grapheme-phoneme relation in Ukrainian and some properties of the Ukrainian ve... more We investigate the grapheme-phoneme relation in Ukrainian and some properties of the Ukrainian version of the Cyrillic alphabet.
The relationship between frequency and declension of nouns in Czech is analyzed. The nominative i... more The relationship between frequency and declension of nouns in Czech is analyzed. The nominative is taken as the basic form of nouns. We define the measure of morphological change as the number of phonetic changes in the stem plus the number of phonetic changes in the declensional suffix. Two approaches were examined: 1) nominative singular as the basic form of a noun regardless of its grammatical number, 2) nominative singular as the basic form of a noun in the singular and nominative plural as the basic form of a noun in the plural. In both cases, the relation “the lesser the change, the higher the frequency” is observed.
In the paper, we analyze the distribution of complexities in the Vai script, an indigenous syllab... more In the paper, we analyze the distribution of complexities in the Vai script, an indigenous syllabic writing system from Liberia. It is found that the uniformity hypothesis for complexities fails for this script. The models using Poisson distribution for the number of components and hyper-Poisson distribution for connections provide good fits in the case of the Vai script.
This paper investigates the relation between the number of full valency frames (we do not disting... more This paper investigates the relation between the number of full valency frames (we do not distinguish between complements and optional adjuncts, both are taken into account) of a verb and the number of its synonyms. It is shown that for Czech verbs from the Prague Dependency Treebank it holds “the greater the full valency of a verb, the more synonyms the verb has”.
The problem of iterated partial summations is solved for some discrete distributions defined on f... more The problem of iterated partial summations is solved for some discrete distributions defined on finite supports. The power method, usually used as a computational approach to the problem of finding matrix eigenvalues and eigenvectors, is in some cases an effective tool to prove the existence of the limit distribution, which is then expressed as a solution of a system of linear equations. Some examples are presented.
For every discrete probability distribution, there is one and only one partial summation which le... more For every discrete probability distribution, there is one and only one partial summation which leaves the distribution unchanged. This invariance property is reconsidered for distributions with one parameter. We show that if we change the parameter value in the function which defines the summation, two families of distributions can be observed. The first of them consists of distributions which are resistant to the change. For these distributions, the change of the parameter is reversed by the normalization constant, and the distributions remain unchanged. The other contains distributions sensitive to the change. Partial summations with the changed parameter value applied to sensitive distributions result in new distributions with two parameters. A necessary and sufficient condition for a distribution to be resistant to the parameter change is presented.
Bivariate partial-sums discrete probability distributions are defined. The question of the existe... more Bivariate partial-sums discrete probability distributions are defined. The question of the existence of a limit distribution for iterated partial summations is solved for finite-support bivariate distributions which satisfy conditions under which the power method (known from matrix theory) can be used. An oscillating sequence of distributions, a phenomenon which has never been reported before, is presented.
ABSTRACT We derive a mathematically based method for switching from continuous to discrete lingui... more ABSTRACT We derive a mathematically based method for switching from continuous to discrete linguistic models and back. Several examples are presented. A general algorithmic approach is suggested.
ABSTRACT The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete ... more ABSTRACT The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete models in linguistics. It is argued that the stochastic independence, one of necessary conditions for a correct application of the test, is not realistic for linguistic data. Several alternative possibilities (computational and empirical approaches) are suggested. Advantages and drawbacks of the alternatives are discussed.
According to the Menzerath-Altmann law, there is inverse proportionality between sizes of languag... more According to the Menzerath-Altmann law, there is inverse proportionality between sizes of language units and their constituents (i.e., longer language units are composed of shorter constituents, and vice versa). The validity of the law was confirmed many times for the relation between lengths of a word and its syllables. However, the relation between lengths of sentences (measured in clauses) and clauses (measured in words) is problematic. In this paper, a new language unit – linear dependency segment – is introduced with the motivation to avoid some problems connected to the Menzerath-Altmann law on the syntactic level. The new unit is intermediate between clause and word and its definition takes into account both the linearity of language and dependency syntactic structure. It is shown that the relation between sentence length in clauses and clause length measured in linear dependency segments abides by the Menzerath-Altmann law in two Czech dependency treebanks.
According to the Menzerath-Altmann law, there is a relation between the size of the whole and the... more According to the Menzerath-Altmann law, there is a relation between the size of the whole and the mean size of its parts. The validity of the law was demonstrated on relations between several language units, e.g., the longer a word, the shorter the syllables the word consists of. In this paper it is shown that the law is valid also in syntactic dependency structure in Czech. In particular, longer clauses tend to be composed of shorter phrases (the size of a phrase is measured by the number of words it consists of).
ABSTRACT Let random variables X * ,X have discrete distributions on the nonnegative integers and ... more ABSTRACT Let random variables X * ,X have discrete distributions on the nonnegative integers and let P{X=k}=c∑ j=k ∞ P{X * =j},k=0,1,2,⋯, with c a proper constant. Repeated summations of this type are investigated. The limit distribution is geometric for a wide class of parent distributions.
We investigate the grapheme-phoneme relation in Ukrainian and some properties of the Ukrainian ve... more We investigate the grapheme-phoneme relation in Ukrainian and some properties of the Ukrainian version of the Cyrillic alphabet.
The relationship between frequency and declension of nouns in Czech is analyzed. The nominative i... more The relationship between frequency and declension of nouns in Czech is analyzed. The nominative is taken as the basic form of nouns. We define the measure of morphological change as the number of phonetic changes in the stem plus the number of phonetic changes in the declensional suffix. Two approaches were examined: 1) nominative singular as the basic form of a noun regardless of its grammatical number, 2) nominative singular as the basic form of a noun in the singular and nominative plural as the basic form of a noun in the plural. In both cases, the relation “the lesser the change, the higher the frequency” is observed.
In the paper, we analyze the distribution of complexities in the Vai script, an indigenous syllab... more In the paper, we analyze the distribution of complexities in the Vai script, an indigenous syllabic writing system from Liberia. It is found that the uniformity hypothesis for complexities fails for this script. The models using Poisson distribution for the number of components and hyper-Poisson distribution for connections provide good fits in the case of the Vai script.
This paper investigates the relation between the number of full valency frames (we do not disting... more This paper investigates the relation between the number of full valency frames (we do not distinguish between complements and optional adjuncts, both are taken into account) of a verb and the number of its synonyms. It is shown that for Czech verbs from the Prague Dependency Treebank it holds “the greater the full valency of a verb, the more synonyms the verb has”.
The problem of iterated partial summations is solved for some discrete distributions defined on f... more The problem of iterated partial summations is solved for some discrete distributions defined on finite supports. The power method, usually used as a computational approach to the problem of finding matrix eigenvalues and eigenvectors, is in some cases an effective tool to prove the existence of the limit distribution, which is then expressed as a solution of a system of linear equations. Some examples are presented.
Simonetta Montemagni, Joakim Nivre (Eds.): Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), 2017
According to the Menzerath-Altmann law, there is a relation between the size of the whole and the... more According to the Menzerath-Altmann law, there is a relation between the size of the whole and the mean size of its parts. The validity of the law was demonstrated on relations between several language units, e.g., the longer a word, the shorter the syllables the word consists of. In this paper it is shown that the law is valid also in syntactic dependency structure in Czech. In particular, longer clauses tend to be composed of shorter phrases (the size of a phrase is measured by the number of words it consists of).
In Tackling the Toolkit, we focus on the methodological innovations, challenges, obstacles and ev... more In Tackling the Toolkit, we focus on the methodological innovations, challenges, obstacles and even shortcomings associated with applying quantitative methods to poetry specifically and poetics more broadly. Using tools including natural language processing, web ontologies, similarity detection devices and machine learning, our contributors explore not only metres, stanzas, stresses and rhythms but also genres, subgenres, lexical material and cognitive processes. Whether they are testing old theories and laws, making complex concepts machine-readable or developing new lines of textual analysis, their works challenge standard descriptions of norms and variations.
Uploads
Papers by Jan Macutek