Skip to main content

Jan Macutek

Followers

36

Following

16

Co-authors

16

Mentions

1

Public Views

Interests

Uploads

Papers by Jan Macutek

On a Parametrization of Partial-Sums Discrete Probability Distributions

Mathematics

For every discrete probability distribution, there is one and only one partial summation which le... more For every discrete probability distribution, there is one and only one partial summation which leaves the distribution unchanged. This invariance property is reconsidered for distributions with one parameter. We show that if we change the parameter value in the function which defines the summation, two families of distributions can be observed. The first of them consists of distributions which are resistant to the change. For these distributions, the change of the parameter is reversed by the normalization constant, and the distributions remain unchanged. The other contains distributions sensitive to the change. Partial summations with the changed parameter value applied to sensitive distributions result in new distributions with two parameters. A necessary and sufficient condition for a distribution to be resistant to the parameter change is presented.

Issues in Quantitative Linguistics 4

On the limit behaviour of finite-support bivariate discrete probability distributions under iterated partial summations

arXiv: Probability, 2019

Bivariate partial-sums discrete probability distributions are defined. The question of the existe... more Bivariate partial-sums discrete probability distributions are defined. The question of the existence of a limit distribution for iterated partial summations is solved for finite-support bivariate distributions which satisfy conditions under which the power method (known from matrix theory) can be used. An oscillating sequence of distributions, a phenomenon which has never been reported before, is presented.

Word Order of Reflexive Sě in Finite Verb Phrases in the First Edition of the Old Czech Bible Translation (Part I)

Studia Linguistica Universitatis Iagellonicae Cracoviensis, 2018

Ord´s criterion in musical texts

Glottotheory, 2009

Word forms, style and typology

Glottotheory, 2010

The role of syntax in complex networks: Local and global importance of verbs in a syntactic dependency network

Physica A: Statistical Mechanics and its Applications, 2011

Discrete and continuous modelling in quantitative linguistics*

Journal of Quantitative Linguistics, 2007

ABSTRACT We derive a mathematically based method for switching from continuous to discrete lingui... more

Evaluating goodness-of-fit of discrete distribution models in quantitative linguistics

Journal of Quantitative Linguistics, 2013

ABSTRACT The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete ... more ABSTRACT The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete models in linguistics. It is argued that the stochastic independence, one of necessary conditions for a correct application of the test, is not realistic for linguistic data. Several alternative possibilities (computational and empirical approaches) are suggested. Advantages and drawbacks of the alternatives are discussed.

The Menzerath-Altmann law in syntactic structure revisited

According to the Menzerath-Altmann law, there is inverse proportionality between sizes of languag... more According to the Menzerath-Altmann law, there is inverse proportionality between sizes of language units and their constituents (i.e., longer language units are composed of shorter constituents, and vice versa). The validity of the law was confirmed many times for the relation between lengths of a word and its syllables. However, the relation between lengths of sentences (measured in clauses) and clauses (measured in words) is problematic. In this paper, a new language unit – linear dependency segment – is introduced with the motivation to avoid some problems connected to the Menzerath-Altmann law on the syntactic level. The new unit is intermediate between clause and word and its definition takes into account both the linearity of language and dependency syntactic structure. It is shown that the relation between sentence length in clauses and clause length measured in linear dependency segments abides by the Menzerath-Altmann law in two Czech dependency treebanks.

Menzerath-Altmann Law in Syntactic Dependency Structure

According to the Menzerath-Altmann law, there is a relation between the size of the whole and the... more According to the Menzerath-Altmann law, there is a relation between the size of the whole and the mean size of its parts. The validity of the law was demonstrated on relations between several language units, e.g., the longer a word, the shorter the syllables the word consists of. In this paper it is shown that the law is valid also in syntactic dependency structure in Czech. In particular, longer clauses tend to be composed of shorter phrases (the size of a phrase is measured by the number of words it consists of).

A Limit Property of the Geometric Distribution

Theory of Probability & Its Applications, 2006

ABSTRACT Let random variables X * ,X have discrete distributions on the nonnegative integers and ... more

When is Menzerath-Altmann law mathematically trivial?

Some properties of the Ukrainian writing system

ArXiv, 2008

We investigate the grapheme-phoneme relation in Ukrainian and some properties of the Ukrainian ve... more

Frequency and Declensional Morphology of Czech Nouns

The relationship between frequency and declension of nouns in Czech is analyzed. The nominative i... more The relationship between frequency and declension of nouns in Czech is analyzed. The nominative is taken as the basic form of nouns. We define the measure of morphological change as the number of phonetic changes in the stem plus the number of phonetic changes in the declensional suffix. Two approaches were examined: 1) nominative singular as the basic form of a noun regardless of its grammatical number, 2) nominative singular as the basic form of a noun in the singular and nominative plural as the basic form of a noun in the plural. In both cases, the relation “the lesser the change, the higher the frequency” is observed.

Distribution of complexities in the Vai script

ArXiv, 2009

In the paper, we analyze the distribution of complexities in the Vai script, an indigenous syllab... more In the paper, we analyze the distribution of complexities in the Vai script, an indigenous syllabic writing system from Liberia. It is found that the uniformity hypothesis for complexities fails for this script. The models using Poisson distribution for the number of components and hyper-Poisson distribution for connections provide good fits in the case of the Vai script.

On the relation between verb full valency and synonymy

This paper investigates the relation between the number of full valency frames (we do not disting... more This paper investigates the relation between the number of full valency frames (we do not distinguish between complements and optional adjuncts, both are taken into account) of a verb and the number of its synonyms. It is shown that for Czech verbs from the Prague Dependency Treebank it holds “the greater the full valency of a verb, the more synonyms the verb has”.

Iterated partial summations applied to finite-support discrete distributions

Mathematica Slovaca

The problem of iterated partial summations is solved for some discrete distributions defined on f... more The problem of iterated partial summations is solved for some discrete distributions defined on finite supports. The power method, usually used as a computational approach to the problem of finding matrix eigenvalues and eigenvectors, is in some cases an effective tool to prove the existence of the limit distribution, which is then expressed as a solution of a system of linear equations. Some examples are presented.

Full valency and the position of enclitics in the Old Czech

Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)

Length of non-projective sentences: A pilot study using a Czech UD treebank

Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)

On a Parametrization of Partial-Sums Discrete Probability Distributions

Mathematics

For every discrete probability distribution, there is one and only one partial summation which le... more For every discrete probability distribution, there is one and only one partial summation which leaves the distribution unchanged. This invariance property is reconsidered for distributions with one parameter. We show that if we change the parameter value in the function which defines the summation, two families of distributions can be observed. The first of them consists of distributions which are resistant to the change. For these distributions, the change of the parameter is reversed by the normalization constant, and the distributions remain unchanged. The other contains distributions sensitive to the change. Partial summations with the changed parameter value applied to sensitive distributions result in new distributions with two parameters. A necessary and sufficient condition for a distribution to be resistant to the parameter change is presented.

Issues in Quantitative Linguistics 4

On the limit behaviour of finite-support bivariate discrete probability distributions under iterated partial summations

arXiv: Probability, 2019

Bivariate partial-sums discrete probability distributions are defined. The question of the existe... more Bivariate partial-sums discrete probability distributions are defined. The question of the existence of a limit distribution for iterated partial summations is solved for finite-support bivariate distributions which satisfy conditions under which the power method (known from matrix theory) can be used. An oscillating sequence of distributions, a phenomenon which has never been reported before, is presented.

Word Order of Reflexive Sě in Finite Verb Phrases in the First Edition of the Old Czech Bible Translation (Part I)

Studia Linguistica Universitatis Iagellonicae Cracoviensis, 2018

Ord´s criterion in musical texts

Glottotheory, 2009

Word forms, style and typology

Glottotheory, 2010

The role of syntax in complex networks: Local and global importance of verbs in a syntactic dependency network

Physica A: Statistical Mechanics and its Applications, 2011

Discrete and continuous modelling in quantitative linguistics*

Journal of Quantitative Linguistics, 2007

ABSTRACT We derive a mathematically based method for switching from continuous to discrete lingui... more

Evaluating goodness-of-fit of discrete distribution models in quantitative linguistics

Journal of Quantitative Linguistics, 2013

ABSTRACT The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete ... more ABSTRACT The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete models in linguistics. It is argued that the stochastic independence, one of necessary conditions for a correct application of the test, is not realistic for linguistic data. Several alternative possibilities (computational and empirical approaches) are suggested. Advantages and drawbacks of the alternatives are discussed.

The Menzerath-Altmann law in syntactic structure revisited

According to the Menzerath-Altmann law, there is inverse proportionality between sizes of languag... more According to the Menzerath-Altmann law, there is inverse proportionality between sizes of language units and their constituents (i.e., longer language units are composed of shorter constituents, and vice versa). The validity of the law was confirmed many times for the relation between lengths of a word and its syllables. However, the relation between lengths of sentences (measured in clauses) and clauses (measured in words) is problematic. In this paper, a new language unit – linear dependency segment – is introduced with the motivation to avoid some problems connected to the Menzerath-Altmann law on the syntactic level. The new unit is intermediate between clause and word and its definition takes into account both the linearity of language and dependency syntactic structure. It is shown that the relation between sentence length in clauses and clause length measured in linear dependency segments abides by the Menzerath-Altmann law in two Czech dependency treebanks.

Menzerath-Altmann Law in Syntactic Dependency Structure

According to the Menzerath-Altmann law, there is a relation between the size of the whole and the... more According to the Menzerath-Altmann law, there is a relation between the size of the whole and the mean size of its parts. The validity of the law was demonstrated on relations between several language units, e.g., the longer a word, the shorter the syllables the word consists of. In this paper it is shown that the law is valid also in syntactic dependency structure in Czech. In particular, longer clauses tend to be composed of shorter phrases (the size of a phrase is measured by the number of words it consists of).

A Limit Property of the Geometric Distribution

Theory of Probability & Its Applications, 2006

ABSTRACT Let random variables X * ,X have discrete distributions on the nonnegative integers and ... more

When is Menzerath-Altmann law mathematically trivial?

Some properties of the Ukrainian writing system

ArXiv, 2008

We investigate the grapheme-phoneme relation in Ukrainian and some properties of the Ukrainian ve... more

Frequency and Declensional Morphology of Czech Nouns

The relationship between frequency and declension of nouns in Czech is analyzed. The nominative i... more The relationship between frequency and declension of nouns in Czech is analyzed. The nominative is taken as the basic form of nouns. We define the measure of morphological change as the number of phonetic changes in the stem plus the number of phonetic changes in the declensional suffix. Two approaches were examined: 1) nominative singular as the basic form of a noun regardless of its grammatical number, 2) nominative singular as the basic form of a noun in the singular and nominative plural as the basic form of a noun in the plural. In both cases, the relation “the lesser the change, the higher the frequency” is observed.

Distribution of complexities in the Vai script

ArXiv, 2009

In the paper, we analyze the distribution of complexities in the Vai script, an indigenous syllab... more In the paper, we analyze the distribution of complexities in the Vai script, an indigenous syllabic writing system from Liberia. It is found that the uniformity hypothesis for complexities fails for this script. The models using Poisson distribution for the number of components and hyper-Poisson distribution for connections provide good fits in the case of the Vai script.

On the relation between verb full valency and synonymy

This paper investigates the relation between the number of full valency frames (we do not disting... more This paper investigates the relation between the number of full valency frames (we do not distinguish between complements and optional adjuncts, both are taken into account) of a verb and the number of its synonyms. It is shown that for Czech verbs from the Prague Dependency Treebank it holds “the greater the full valency of a verb, the more synonyms the verb has”.

Iterated partial summations applied to finite-support discrete distributions

Mathematica Slovaca

The problem of iterated partial summations is solved for some discrete distributions defined on f... more The problem of iterated partial summations is solved for some discrete distributions defined on finite supports. The power method, usually used as a computational approach to the problem of finding matrix eigenvalues and eigenvectors, is in some cases an effective tool to prove the existence of the limit distribution, which is then expressed as a solution of a system of linear equations. Some examples are presented.

Full valency and the position of enclitics in the Old Czech

Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)

Length of non-projective sentences: A pilot study using a Czech UD treebank

Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)

Menzerath-Altmann Law in Syntactic Dependency Structure

by Jan Macutek and Jiří Milička

Simonetta Montemagni, Joakim Nivre (Eds.): Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), 2017

According to the Menzerath-Altmann law, there is a relation between the size of the whole and the... more According to the Menzerath-Altmann law, there is a relation between the size of the whole and the mean size of its parts. The validity of the law was demonstrated on relations between several language units, e.g., the longer a word, the shorter the syllables the word consists of. In this paper it is shown that the law is valid also in syntactic dependency structure in Czech. In particular, longer clauses tend to be composed of shorter phrases (the size of a phrase is measured by the number of words it consists of).

Tackling the Toolkit: Plotting Poetry through Computational Literary Studies

by Petr Plecháč, Robert Kolár, Anne-Sophie Bories, Jakub Říha, Jan Macutek, Helena Bermúdez Sabel, Laura Hernández-Lorenzo, Mirella De Sisto, Szilvia Maróthy, Levente Selaf, and Anastasia Belousova

In Tackling the Toolkit, we focus on the methodological innovations, challenges, obstacles and ev... more In Tackling the Toolkit, we focus on the methodological innovations, challenges, obstacles and even shortcomings associated with applying quantitative methods to poetry specifically and poetics more broadly. Using tools including natural language processing, web ontologies, similarity detection devices and machine learning, our contributors explore not only metres, stanzas, stresses and rhythms but also genres, subgenres, lexical material and cognitive processes. Whether they are testing old theories and laws, making complex concepts machine-readable or developing new lines of textual analysis, their works challenge standard descriptions of norms and variations.