Papers by Andreas Baumann

Most diachronic studies on both lexico-semantic change and political language usage are based on ... more Most diachronic studies on both lexico-semantic change and political language usage are based on individual or comparable corpora. In this paper, we explore ways of studying the stability (and changeability) of lexical usage in political discourse across two corpora which are substantially different in structure and size. We present a case study focusing on lexical items associated with political parties in two diachronic corpora of Austrian German, namely a diachronic media corpus (AMC) and a corpus of parliamentary records (ParlAT), and measure the cross-temporal stability of lexical usage over a period of 20 years. We conduct three sets of comparative analyses investigating a) the stability of sets of lexical items associated with the three major political parties over time, b) lexical similarity between parties, and c) the similarity between the lexical choices in parliamentary speeches by members of the parties vis-‘a-vis the media’s reporting on the parties. We employ time ser...
Linguistics Vanguard
Lexical dispersion and acquisition are evidently linked to each other. In one direction, the acqu... more Lexical dispersion and acquisition are evidently linked to each other. In one direction, the acquisition of a word is promoted by it being used frequently and in diverse contexts. Conversely, words that are acquired early might have higher chances of being produced frequently and diversely. In this study, we analyze various measures of lexical dispersion and assess the extent to which they are linked to age of acquisition by means of a Bayesian network model. We find that lexical prevalence, that is, the fraction of individuals knowing a word, is most closely linked to acquisition and argue that this can be partially explained by the population dynamics of lexical spread. We also highlight related cognitive mechanisms in language acquisition.

We investigate the evolutionary dynamics of life-history traits in structured populations, under ... more We investigate the evolutionary dynamics of life-history traits in structured populations, under the assumption of discrete population structure. The model family analyzed in this thesis consists of deterministic models in continuous time, that is, we consider systems of ordinary differential equations. Life cycles are characterized by demographic parameters and regulatory functions, which account for density dependence. We employ the framework of Adaptive Dynamics in order to conduct the evolutionary analysis of the model family, thereby investigating phenotypic evolution. An algebraically simple fitness proxy is derived, which allows to predict long-term evolutionary outcomes based on the configuration of loops, evolving and regulated demographic parameters in a life cycle. We derive a list of sufficient conditions on the structure of life cycles for frequency-independent selection. More precisely, we provide a range of explicit optimization criteria. If for a given life cycle the...

Language acquisition theory suggests a variety of formal models that provide a description of the... more Language acquisition theory suggests a variety of formal models that provide a description of the search of a certain target grammar the learner is supposed to learn based on linguistic input given to the learner, which are adequate to a greater or lesser extent. These models vary in their underlying linguistic frameworks as well as in different underlying approaches to the problem of language acquisition as such. This thesis focusses on parameters and principles theory (more precisely, Government and Binding, GB) as linguistic framework and triggers, that is, input sentences that are ungrammatical in a learner's hypothetical grammar, as basic language acquisition concept. The modeling of the search for the target grammar was described by Gibson and Wexler in an algorithmic―and, in particular, formal―way, thus being referred to as Triggering Learning Algorithm (TLA). Once the TLA is confronted with language learning situations based on few elementary parameters of GB-theory, pro...

Proceedings of the 12th International Conference on the Evolution of Language (Evolang12), 2018
The effect of population size on linguistic stability and evolution has been investigated in diff... more The effect of population size on linguistic stability and evolution has been investigated in different linguistic domains. The relationship among these factors, however, is not always clear. In this paper, we study a basic population-dynamical model of linguistic spread, derive measures of linguistic stability and fitness, and investigate the effect of population size on these measures. By allowing for stochasticity in the learning process of linguistic constituents, it is shown that a constituent's stability and fitness increases with population size, but that high variability in the learning environment may cause constituent loss, also in large populations. The respective roles of learning and usability are also discussed. This paper is distributed under a Creative Commons CC-BY-ND license.
Papers in Historical Phonology, 2018
Machine learning is a powerful method when working with large data sets such as diachronic corpor... more Machine learning is a powerful method when working with large data sets such as diachronic corpora. However, as opposed to standard techniques from inferential statistics like regression modeling, machine learning is less commonly used among phonological corpus linguists. This paper discusses three different machine learning techniques (K nearest neighbors classifiers; Naïve Bayes classifiers; artificial neural networks) and how they can be applied to diachronic corpus data to address specific phonological questions. To illustrate the methodology, I investigate Middle English schwa deletion and when and how it potentially triggered reduction of final /mb/ clusters in English.

When we speak, we do not use sound sequences arbitrarily. For instance, no English word ends in t... more When we speak, we do not use sound sequences arbitrarily. For instance, no English word ends in the sequence /mb/, while many words end in /ks/ (like in box), yet we do not find any words that begin with /ks/. The study of the rules and tendencies that determine which sound sequences are permitted, ruled out, or preferred in a language is called 'phonotactics'. Some of the constraints on the phonotactic setup of natural languages reflect weak biases in processing, whose effects accumulate when languages are transmitted in vast numbers of parallel and iterated acquisition and interaction processes, and as a consequence become visible in language history. Thus, history provides evidence not only of articulatory and auditory constraints on language production and perception, but also on cognitive constraints on the processing of language. In this dissertation project, I study how sound sequences like /mb/ and /ks/ replicate and spread through languages and populations of speake...

Consonant clusters are articulatorily and perceptually disadvantaged as opposed to consonant-vowe... more Consonant clusters are articulatorily and perceptually disadvantaged as opposed to consonant-vowel sequences (Dziublska-Kołaczyk 2002). Nevertheless, they stably exist in a number of languages. This can be partially explained by the fact that consonant clusters created by morphological operations – ‘morphonotactic clusters’ – help in the identification of morphological complexity, as e.g. /nd/ in weaken-ed. However, this does not explain the diachronic stability of consonant clusters primarily appearing within morphemes – henceforth ‘lexical clusters’, e.g. /nd/ in week-end. While it has been proposed that lexical clusters benefit from the presence of morphonotactic clusters via analogy (Hogg and McCully 1987), the so-called Strong Morphonotactic Hypothesis (‘SMH’; Dressler et al. 2010) suggests the opposite: if a consonant cluster occurs across morpheme boundaries as well as within morphemes, this leads to semiotically less optimal configurations, thus weakening the cluster’s stabi...
This article presents a new method for testing hypotheses in diachronic linguistics. It consists ... more This article presents a new method for testing hypotheses in diachronic linguistics. It consists in the construction of hypothetical language stages that reflect the immediate effects of changes which are supposed to have triggered therapeutic responses in the language because (some of) their outputs were sub-optimal with regard to universal or language specific constraints. We demonstrate the method in terms of the example of Middle English schwa loss and its effects in the domain of coda-cluster phonotactics. We show that schwa loss created codas that were phonologically dispreferred and created ambiguities in the phonotactic representation of morphological word structure. We also show that the problems brought about through schwa loss were (partly) resolved in the later development of the language.
Morpho-syntactic boundaries can either be signaled by alignment to boundaries in regular prosodic... more Morpho-syntactic boundaries can either be signaled by alignment to boundaries in regular prosodic patterns or by being ‘irregularly’ misaligned, in which case they are often signaled instead through highly dispreferred, or marked, structures such as consonant clusters. In some languages these structures additionally appear in simple forms, which compromises their compositionality-signaling function. This paper models the dynamics of such structures in complex and simple forms by means of a Lotka-Volterra model, which is analyzed evolutionarily. Finally, the evolutionary dynamics of the model are tested against diachronic language data.

Cognitive Linguistics, 2021
This paper analyzes symmetric NPN constructions (e.g., day to day, face to face, step by step) qu... more This paper analyzes symmetric NPN constructions (e.g., day to day, face to face, step by step) qualitatively and quantitatively by examining data from the Corpus of Contemporary American English (Davies, Mark. 2008–. The Corpus of Contemporary American English (COCA): 570 million words, 1990–present. http://corpus.byu.edu/coca/). The constructions’ frequency and productivity, as well as their semantics and extension potential (i.e., modification, complementation) is investigated (e.g., by conducting collostructional analysis). In terms of theoretical modeling, the paper takes a Usage-based, Cognitive Construction Grammar approach (UCCxG) and sketches the constructional network of this constructional family, postulating various constructional templates on different levels of specificity – among others – the existence of the following subtypes [CNsg,time i after CNsg,time i]Cx (e.g., day after day, night after night), [CNsg,measurement i by CNsg,measurement i]Cx (e.g., inch by inch, s...
Language and Speech, 2020
Two prominent statistical laws in language and other complex systems are Zipf’s law and Heaps’ la... more Two prominent statistical laws in language and other complex systems are Zipf’s law and Heaps’ law. We investigate the extent to which these two laws apply to the linguistic domain of phonotactics—that is, to sequences of sounds. We analyze phonotactic sequences with different lengths within words and across word boundaries taken from a corpus of spoken English (Buckeye). We demonstrate that the expected relationship between the two scaling laws can only be attested when boundary spanning phonotactic sequences are also taken into account. Furthermore, it is shown that Zipf’s law exhibits both high goodness-of-fit and a high scaling coefficient if sequences of more than two sounds are considered. Our results support the notion that phonotactic cognition employs information about boundary spanning phonotactic sequences.

Language Sciences, 2018
Consonantal diphones differ as to their ambiguity (whether or not they indicate morphological com... more Consonantal diphones differ as to their ambiguity (whether or not they indicate morphological complexity reliably by occurring exclusively either within or across morphemes) and lexicality (how frequently they occur within morphemes rather than across morpheme boundaries). This study empirically investigates the influence of ambiguity and lexicality on the processing speed of consonantal diphones in speech perception. More specifically, its goal is to test the predictions of the Strong Morphonotactic Hypothesis, which asserts that phonotactic processing is influenced by morphological structure, and to clarify the two conceptions thereof present in extant research. In two discrimination task experiments, it is found that the processing of cross-morpheme diphones decreases with their ambiguity, but there is no processing difference between primarily cross-morphemic and morpheme-internal diphones. We conclude that the predictions of the Strong Morphonotactic Hypothesis are borne out only partially, and we discuss the discrepancies. Highlights: ★ Ambiguity in signaling morphological complexity affects diphone processing ★ Speakers have probabilistic knowledge of how often diphone types span morpheme boundaries ★ Diphones that occur prototypically within morphemes are processed as fast as prototypically cross-morphemic diphones ★ Processing of cross-morphemic diphones is slow if they are ambiguous ★ Participants can be primed for analyzing diphones in nonce words as spanning a morpheme boundary
Phonology, 2017
This paper accounts for stress-pattern diversity in languages such as English, where words that a... more This paper accounts for stress-pattern diversity in languages such as English, where words that are otherwise equivalent in terms of phonotactic structure and morphosyntactic category can take both initial and final stress, as seen in ˈlentil – hoˈtel, ˈenvoy – deˈgree, ˈresearchN – reˈsearchN and ˈaccessV – acˈcessV. Addressing the problem in general and abstract terms, we identify systematic conditions under which stress-pattern diversity becomes stable. We hypothesise that words adopt stress patterns that produce, on average, the best possible phrase-level rhythm. We model this hypothesis in evolutionary game theory, predict that stress-pattern diversity among polysyllabic word forms depends on the frequency of monosyllables and demonstrate how that prediction is met both in Present-Day English and in its history.

Language Dynamics and Change, 2018
This paper tries to narrow the gap between diachronic linguistics and research on population dyna... more This paper tries to narrow the gap between diachronic linguistics and research on population dynamics by presenting a mathematical model corroborating the notion that the cognitive mechanism of asymmetric priming can account for observable tendencies in language change. The asymmetric-priming hypothesis asserts that items with more substance are more likely to prime items with less substance than the reverse. Although these effects operate on a very short time scale (e.g. within an utterance) it has been argued that their long-term effect might be reductionist, unidirectional processes in language change. In this paper, we study a mathematical model of the interaction of linguistic items that differ in their formal substance, showing that, in addition to reductionist effects, asymmetric priming also results in diversification and stable coexistence of two formally related variants. The model will be applied to phenomena in the sublexical as well as the lexical domain.
Papers in Historical Phonology, 2016
Consonant clusters that rarely occur lexically (i.e. within morphemes) may function as complexity... more Consonant clusters that rarely occur lexically (i.e. within morphemes) may function as complexity markers when they span a morpheme boundary, i.e. when they occur morphonotactically. In this study we observe patterns in the diachronic dynamics of Middle English which hint at mutually beneficial effects between morphonotactic and lexical clusters. We suggest that the patterns revealed can be explained by frequency-based analogy effects in language acquisition.
Yearbook of the Poznan Linguistic Meeting, 2016
Consonant clusters appear either lexically within morphemes or morphonotactically across morpheme... more Consonant clusters appear either lexically within morphemes or morphonotactically across morpheme boundaries. According to extant theories, their diachronic dynamics are suggested to be determined by analogical effects on the one hand as well as by their morphological signaling function on the other hand. This paper presents a mathematical model which allows for an investigation of the interaction of these two forces and the resulting diachronic dynamics. The model is tested against synchronic and diachronic language data. It is shown that the evolutionary dynamics of the cluster inventory crucially depend on how the signaling function of morphonotactic clusters is compromised by the presence of lexical items containing their morpheme internal counterparts.

Research in Language, 2016
Coalescent assimilation (CA), where alveolar obstruents /t, d, s, z/ in word-final position merge... more Coalescent assimilation (CA), where alveolar obstruents /t, d, s, z/ in word-final position merge with word-initial /j/ to produce postalveolar /tʃ, dʒ, ʃ, ʒ/, is one of the most wellknown connected speech processes in English. Due to its commonness, CA has been discussed in numerous textbook descriptions of English pronunciation, and yet, upon comparing them it is difficult to get a clear picture of what factors make its application likely. This paper aims to investigate the application of CA in American English to see a) what factors increase the likelihood of its application for each of the four alveolar obstruents, and b) what is the allophonic realization of plosives /t, d/ if the CA does not apply. To do so, the Buckeye Corpus (Pitt et al. 2007) of spoken American English is analyzed quantitatively. As a second step, these results are compared with Polish English; statistics analogous to the ones listed above for American English are gathered for Polish English based on the PL...

Cognition, 2018
Language acquisition and change are thought to be causally connected. We demonstrate a method for... more Language acquisition and change are thought to be causally connected. We demonstrate a method for quantifying the strength of this connection in terms of the 'basic reproductive ratio' of linguistic constituents. It represents a standardized measure of reproductive success, which can be derived both from diachronic and from acquisition data. By analyzing phonotactic English data, we show that the results of both types of derivation correlate, so that phonotactic acquisition indeed predicts phonotactic change, and vice versa. After drawing that general conclusion, we discuss the role of utterance frequency and show that the latter exhibits destabilizing effects only on late acquired items, which belong to phonotactic periphery. We conclude that - at least in the evolution of English phonotactics - acquisition serves conservation, while innovation is more likely to occur in adult speech and affects items that are less entrenched but comparably frequent.

Stellenbosch Papers in Linguistics Plus, 2018
The phonotactic system of Afrikaans underwent multiple changes in its diachronic development. Whi... more The phonotactic system of Afrikaans underwent multiple changes in its diachronic development. While some consonant clusters got lost, others still surface in contemporary Afrikaans. In this paper, we investigate to what extent articulatory difference between the segments of a cluster contribute to its successful transmission. We proceed in two steps. First, we analyse the respective effects of differences in manner of articulation, place of articulation and voicing on the age at which a cluster is acquired by analysing Dutch acquisition data. Second, we investigate the role that these articulatory differences play in the diachronic frequency development from Dutch to Afrikaans. We demonstrate that large differences in manner of articulation between segments contribute to a cluster's success in acquisition and diachrony. In contrast, large differences in place of articulation have impeding effects, while voicing difference shows a more complicated behaviour.
Uploads
Papers by Andreas Baumann
methods such as natural language processing, machine learning, and network analysis in the fields of
digital humanities and linguistics by characterizing and modeling the diachronic dynamics of lexical
networks. The proposed analysis will be based on two corpora containing 20 years of data with
billions of tokens.
In this paper I investigate whether a similar relationship can be observed in the phonotactic domain. I conceptualize phonotactic items, i.e. sequences of sounds, as self-contained linguistic units (cf. Kuperman et al. 2008) that are acquired and transmitted within speaker populations, and evaluate their reproductive success. The study has two aims: On the empirical level I show that phonotactic items which are acquired early tend to exhibit higher diachronic growth rates, and vice versa. On a more general and methodological level it is illustrated how tools from mathematical epidemiology can be used to directly and rigorously link concepts from diachronic linguistics and acquisition research.
The main focus of our study is on investigating the diachronic growth and acquisition of English word-final consonant diphones (e.g. /kt/ in blocked). This subset of the English phonotactic system provides a reasonable testing ground for the addressed question, since word-final consonant diphones were relatively rare before 1150, and because sufficient English data from 1150 onwards is available. Various English corpora and databases (PPCME2, PPCEME, PPCMBE, COHA, COCA) are used to track the frequency development of all consonant diphones occurring word-finally in English word forms in the period from 1200 to 2012, thus covering the complete life-time of a selection of phonotactic items. Phonological transcriptions were added manually (early periods) or by using the CELEX database (late periods). AoA ratings were extracted from Kuperman et al. (2012).
I use a modified version of Nowak et al.’s (2000) population-dynamical model of linguistic spread in order to estimate the basic reproductive ratio (R0) for each diphone, based on their diachronic growth rates. R0 is a standardized measure of reproductive success, defined as the average number of individuals that successfully learn a linguistic item from a proficient speaker (Solé 2011; Nowak 2000). If R0 is sufficiently large (>1), a linguistic item successfully spreads, otherwise it declines. On the other hand, by exploiting results from epidemiology (Dietz 1993), I directly estimate R0 from the AoA of a diphone.
After considering the entangled effect of frequency (cf. Pagel et al. 2007), I show that both estimates of R0 correlate. Thus, I provide a mechanistically derived link between AoA and diachronic stability. I discuss various cognitive, physiological and speaker-external factors that potentially determine phonotactic acquisition and change – such as (morphological) boundary signaling, utterance frequency, perception, articulation, and social network density – and explain in which way they contribute to reproductive success in the underlying model.