Skip to main content
The aim of the present study is twofold: (1) to assess the degree of register flexibility in advanced second language (L2) learners of English and (2) to determine whether and to what extent this flexibility is impacted by... more
The aim of the present study is twofold: (1) to assess the degree of register flexibility in advanced second language (L2) learners of English and (2) to determine whether and to what extent this flexibility is impacted by inter-individual variability in experiential factors and personality traits. Register flexibility is quantitatively measured as the degree of differentiation in the use of linguistic complexity – gauged by a range of lexical, syntactic, and information-theoretic complexity measures – across three writing tasks. At the methodological level, we aim to demonstrate how a corpus-based approach combined with natural language processing (NLP) techniques and a within-subjects design can be a valuable complement to experimental approaches to language adaptation.
In a recent study, Wulff & Gries (2011) put forward the constructionist definition of accuracy in L2 production as the selection of a construction in its preferred context within a particular target variety and genre. By focusing on the... more
In a recent study, Wulff & Gries (2011) put forward the constructionist definition of accuracy in L2 production as the selection of a construction in its preferred context within a particular target variety and genre. By focusing on the use of concessive adverbial clauses in L2 academic writing, the current study takes up this definition of accuracy in L2 production and sets out to explore whether, and to what extent, the 'genre-specific construction' (i.e. genre-specific repository of symbolic form-function alignments) of advanced German learners of academic English is similar/different to that of native expert academic writers of English. To this end, all instances of concessive adverbial clauses were extracted from a 216,418 word-token learner corpus and coded for the various factors proposed in the literature. For comparison purposes, a data set of all relevant data points was distilled from a native expert corpus of the same size and annotated in terms of the same facto...
A substantial body of research has demonstrated that both native and non-native speakers are sensitive to the statistics of multiword sequences (MWS). However, this research has predominantly focused on demonstrating that a given sample... more
A substantial body of research has demonstrated that both native and non-native speakers are sensitive to the statistics of multiword sequences (MWS). However, this research has predominantly focused on demonstrating that a given sample of participants shows evidence of learning the statistical properties of MWS. Recent theoretical approaches to language learning and processing emphasize the importance of moving away from group-level analyses towards analyses that account for individual differences (IDs). Here, through a within subject design embedded within an IDs framework, we investigate whether and to what extent individual variability in the online processing of MWS are associated with the statistical learning (SL) ability of an individual. Second language learners were administered a battery of SL tasks in the visual and auditory modalities, using verbal and non-verbal stimuli, with adjacent and non-adjacent contingencies along with two online processing tasks of MWS designed ...
The aim of the paper is twofold: (1) to automatically predict the ratings assigned by viewers to 14 categories available for TED talks in a multi-label classification task and (2) to determine what types of features drive classification... more
The aim of the paper is twofold: (1) to automatically predict the ratings assigned by viewers to 14 categories available for TED talks in a multi-label classification task and (2) to determine what types of features drive classification accuracy for each of the categories. The focus is on features of language usage from five groups pertaining to syntactic complexity, lexical richness, register-based n-gram measures, information-theoretic measures and LIWC-style measures. We show that a Recurrent Neural Network classifier trained exclusively on within-text distributions of such features can reach relatively high levels of overall accuracy (69%) across the 14 categories. We find that features from two groups are strong predictors of the affective ratings across all categories and that there are distinct patterns of language usage for each rating category.
Considerable variability has been observed in sentence comprehension abilities in both native speakers and second language (L2) learner populations. While it has long been established that, in native speakers, this variability is closely... more
Considerable variability has been observed in sentence comprehension abilities in both native speakers and second language (L2) learner populations. While it has long been established that, in native speakers, this variability is closely linked to individual differences (IDs) in verbal working memory (vWM), previous research on the role of vWM in L2 sentence comprehension has produced mixed results. Moreover, there is an accumulating body of evidence demonstrating that the relationship between vWM and native language comprehension abilities is mediated by language experience. However, to our knowledge, until now, no attempt has been made to integrate language experience measures into the vWM-L2 sentence comprehension relationship. The goal of the present study is twofold: (1) to determine whether and to what extent vWM – as gauged by a reading span (RSPAN) task – correlates with IDs in two proxy estimates of L2 experience and (2) to investigate the effects of vWM on L2 comprehension...
Prior research has identified a large number of processing-related factors that affect language users’ choices between two (or more) competing linguistic variants. However, one factor that up until now has not been taken into account is... more
Prior research has identified a large number of processing-related factors that affect language users’ choices between two (or more) competing linguistic variants. However, one factor that up until now has not been taken into account is the reliance on formulaic language. This is surprising given a growing body of evidence demonstrating that recurrent multiword units – formulaic sequences and low-scope constructional patterns – play a key role in language learning and processing. Building on insights from research on formulaic language and usage-based constructionist models of language, we take a first step towards determining whether and to what extent formulaicity affects choices language users make in encoding the preverbal message. Focusing on clause ordering in English complex sentences with adverbial clauses (ACs), we investigate whether the choice between a preposed and a postposed AC is affected by the degree of formulaicity of the clause, as measured by (1) the amount of fo...
"Although Bilingual First Language Acquisition research has increased considerably over the past few decades, there is still much controversy regarding the rate of development, i.e. the question whether bilinguals lag behind their... more
"Although Bilingual First Language Acquisition research has increased considerably over the past few decades, there is still much controversy regarding the rate of development, i.e. the question whether bilinguals lag behind their monolingual peers in various aspects of language. Some studies have found similar rates of development, whereas others have found that bilingual children lag behind their monolingual peers. The current study contributes to this discussion of (dis)similar rates of development by investigating bilingual children’s acquisition of German complex sentence constructions involving adverbial clauses (ACs). Our findings are consistent with usage-based approaches to language acquisition, which predict that bilingual acquisition should proceed slower due to learners having less exposure, on average, to each language."
One of the most fundamental goals in linguistic theory is to understand the nature of linguistic knowledge, that is, the representations and mechanisms that figure in a cognitively plausible model of human language-processing. The past 50... more
One of the most fundamental goals in linguistic theory is to understand the nature of linguistic knowledge, that is, the representations and mechanisms that figure in a cognitively plausible model of human language-processing. The past 50 years have witnessed the development and refinement of various theories about what kind of ‘stuff’ human knowledge of language consists of, and tech- nological advances now permit the development of increasingly sophisticated computational mod- els implementing key assumptions of different theories from both rationalist and empiricist perspectives. The present special issue does not aim to present or discuss the arguments for and against the two epistemological stances or discuss evidence that supports either of them (cf. Bod, Hay, & Jannedy, 2003; Christiansen & Chater, 2008; Hauser, Chomsky, & Fitch, 2002; Oaksford & Chater, 2007; O’Donnell, Hauser, & Fitch, 2005). Rather, the research presented in this issue, which we label usage-based here, conceives of linguistic knowledge as being induced from experience. Articles in the special issue can be found at: doi:10.1177/0023830913484896, doi:10.1177/0023830913484891, doi:10.1177/0023830913484901, doi:10.1177/0023830913484902, doi:10.1177/0023830913484897, doi:10.1177/0023830913484899, and [one article pending]
English permits adverbial subordinate clauses to be placed either before or after their associated main clause. Previous research has shown that the positioning is conditioned by various factors from the domains of semantics, discourse... more
English permits adverbial subordinate clauses to be placed either before or after their associated main clause. Previous research has shown that the positioning is conditioned by various factors from the domains of semantics, discourse pragmatics and language processing. With the exception of Diessel (2008), these factors have never been investigated in concert, which makes it difficult to understand their relative importance. Diessel's study, however, discusses only temporal constructions and identifies iconicity of sequence as the strongest predictor of clause position. Since this explanation is, in principle, unavailable for other types of subordinate clauses, the generalizability of Diessel's findings is somewhat limited. The present study offers a multifactorial analysis of 2,000 concessive constructions from the written part of the BNC and assesses the variable importance of six factors for the ordering choice, showing that semantic and discourse-pragmatic factors are much stronger predictors of clause position than processing-based, weight-related ones. On a methodological note, the study proposes that random forests using conditional inference trees constitute the preferred tool for the general type of problem investigated here.
ABSTRACT In the present study we investigate the use and function of prefabricated chunks in academic writing by focusing on what we will term "research predicates", i.e. high-frequency lexical items designating the... more
ABSTRACT In the present study we investigate the use and function of prefabricated chunks in academic writing by focusing on what we will term "research predicates", i.e. high-frequency lexical items designating the research process with its key stages. We conducted a manual analysis of these predicates in the academic subcomponent of the British National Corpus and extracted a set of partially lexically filled constructions. Adopting a usage-based constructionist approach and exmining its ability to study prefabricated chunks in the register of academic writing, we show that research predicates are part of more complex partially substantive constructions which commonly occur inthe register of academic texts and have acquired a more or less formulaic status. The function of these constructions is to mirror the key phases of an idealized research process.
ABSTRACT Two within-subject studies were conducted in which detailed discourse analysis techniques were used to analyze 27 nursing students' comprehension of texts typically found in their anticipated workplace (Montreal-area... more
ABSTRACT Two within-subject studies were conducted in which detailed discourse analysis techniques were used to analyze 27 nursing students' comprehension of texts typically found in their anticipated workplace (Montreal-area hospitals) in both their first and second languages. The results of these studies support the argument that the use of second-language production tends to underestimate and distort second-language comprehension, at least at intermediate levels of second-language proficiency. These studies also suggest that the lack of inferencing or higher-level processing that has been attributed to second-language comprehension may be a result of the assessment techniques used. Except when second-language production was employed, the same factors appeared to affect recall and inference in both languages. These studies also stress the importance of text content and structure on text comprehension in either one's first or second language.
In a recent paper, Biber and Gray (2010) provide empirical evidence for the dramatic increase of compressed structures in English academic writing over the last 100 years. According to their corpus findings, the grammatical complexity of... more
In a recent paper, Biber and Gray (2010) provide empirical evidence for the dramatic increase of compressed structures in English academic writing over the last 100 years. According to their corpus findings, the grammatical complexity of academic writing displays a phrasal rather than clausal character, the corollary of which is a compressed rather than elaborated discourse style (the latter one being typical of spoken registers). Given this finding, the question arises as to how far the traditional view that information structure should be viewed as a single partition of information within a given utterance adequately accounts for genre-specific information packaging strategies. To provide an answer to this question, the current study sets out to explore and compare information structuring within what will be referred to here as ‘compression strategies’, namely the use of adverbial subordinate clauses, -ING constructions, and complex NP constructions across two different genres: th...
One of the key communicative competencies is the ability to maintain fluency in monologic speech and the ability to produce sophisticated language to argue a position convincingly. In this paper we aim to predict TED talk-style affective... more
One of the key communicative competencies is the ability to maintain fluency in monologic speech and the ability to produce sophisticated language to argue a position convincingly. In this paper we aim to predict TED talk-style affective ratings in a crowdsourced dataset of argumentative speech consisting of 7 hours of speech from 110 individuals. The speech samples were elicited through task prompts relating to three debating topics. The samples received a total of 2211 ratings from 737 human raters pertaining to 14 affective categories. We present an effective approach to the classification task of predicting these categories through fine-tuning a model pre-trained on a large dataset of TED talks public speeches. We use a combination of fluency features derived from a state-of-the-art automatic speech recognition system and a large set of humaninterpretable linguistic features obtained from an automatic text analysis system. Classification accuracy was greater than 60% for all 14 ...
Copyright It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open... more
Copyright It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content licence (like
The purpose of this paper is twofold: [1] to introduce, to our knowledge, the largest available resource of keystroke logging (KSL) data generated by Etherpad (https://etherpad.org/), an open-source, web-based collaborative real-time... more
The purpose of this paper is twofold: [1] to introduce, to our knowledge, the largest available resource of keystroke logging (KSL) data generated by Etherpad (https://etherpad.org/), an open-source, web-based collaborative real-time editor, that captures the dynamics of second language (L2) production and [2] to relate the behavioral data from KSL to indices of syntactic and lexical complexity of the texts produced obtained from a tool that implements a sliding window approach capturing the progression of complexity within a text. We present the procedures and measures developed to analyze a sample of 14,913,009 keystrokes in 3,454 texts produced by 512 university students (upper-intermediate to advanced L2 learners of English) (95,354 sentences and 18,32,027 words) aiming to achieve a better alignment between keystroke-logging measures and underlying cognitive processes, on the one hand, and L2 writing performance measures, on the other hand. The resource introduced in this paper ...
A substantial body of research has demonstrated that children and adults (both native and non-native speakers) are sensitive to the statistics of multiword sequences (MWS) and rely on knowledge of such statistics to facilitate their... more
A substantial body of research has demonstrated that children and adults (both native and non-native speakers) are sensitive to the statistics of multiword sequences (MWS) and rely on knowledge of such statistics to facilitate their language processing and boost their acquisition. However, this research was primarily aimed at determining whether and to what extent speakers can develop sensitivity to MWS statistics of a single type of linguistic input: that of spoken language. Recently, there has been a growing awareness of the key role of written input in the development of linguistic knowledge, as it provides a source of substantial change in the statistics of an individual’s language experience. The present study reports on a series of experiments designed to determine whether second language learners of English are able to develop sensitivity to distributional statistics of MWS inherent in different (registerspecific) input types.
We present a novel approach to the automatic assessment of text complexity based on a sliding-window technique that tracks the distribution of complexity within a text. Such distribution is captured by what we term “complexity contours”... more
We present a novel approach to the automatic assessment of text complexity based on a sliding-window technique that tracks the distribution of complexity within a text. Such distribution is captured by what we term “complexity contours” derived from a series of measurements for a given linguistic complexity measure. This approach is implemented in an automatic computational tool, CoCoGen – Complexity Contour Generator, which in its current version supports 32 indices of linguistic complexity. The goal of the paper is twofold: (1) to introduce the design of our computational tool based on a sliding-window technique and (2) to showcase this approach in the area of second language (L2) learning, i.e. more specifically, in the area of L2 writing.
A central question in bilingual child language acquisition research concerns the effect of dual language exposure on the rate of language development. Previous research has produced conflicting evidence: While some studies have reported... more
A central question in bilingual child language acquisition research concerns the effect of dual language exposure on the rate of language development. Previous research has produced conflicting evidence: While some studies have reported similar acquisition rates, other studies have found that bilingual children lag behind their monolingual peers in their vocabulary and grammatical development. The goal of the present study was to contribute to this ongoing debate by investigating acquisition rates in bilingual and monolingual children in single language comparisons. Fifty German kindergarten children aged from 4 to 6 years old participated in the study: 25 German-speaking monolingual children (mean age 63 months [SD=7.5 months]) and 25 bilingually developing children who acquired German in combination with another language (mean age 64 months [SD=8.3 months]). We compared acquisition rates of three global measures (MLU, vocabulary size and vocabulary growth rate) and five local meas...
Automatically predicting the level of second language (L2) learner proficiency is an emerging topic of interest and research based on machine learning approaches to language learning and development. The key to the present paper is the... more
Automatically predicting the level of second language (L2) learner proficiency is an emerging topic of interest and research based on machine learning approaches to language learning and development. The key to the present paper is the combined use of what we refer to as ‘complexity contours’, a series of measurements of indices of L2 proficiency obtained by a computational tool that implements a sliding window technique, and recurrent neural network (RNN) classifiers that adequately capture the sequential information in those contours. We used the EF-Cambridge Open Language Database (Geertzen et al. 2013) with its labelled Common European Framework of Reference (CEFR) levels (Council of Europe 2018) to predict six classes of L2 proficiency levels (A1, A2, B1, B2, C1, C2) in the assessment of writing skills. Our experiments demonstrate that an RNN classifier trained on complexity contours achieves higher classification accuracy than one trained on text-average complexity scores. In ...
Over the last years, there has been an increased interest in the combined use of natural language processing techniques and machine learning algorithms to automatically classify texts on the basis of wide range of features. One class of... more
Over the last years, there has been an increased interest in the combined use of natural language processing techniques and machine learning algorithms to automatically classify texts on the basis of wide range of features. One class of features that have been successfully employed for a wide range of classification tasks, including native language identification, readability assessment and text genre categorization pertain to the construct of ‘linguistic complexity’. This paper presents a novel approach to the use of linguistic complexity features in text categorization: Rather than representing text complexity ‘globally’ in terms of summary statistics, this approach assesses text complexity ‘locally’ and captures the progression of complexity within a text as a sequence of complexity scores, generating what is referred to here as ‘complexity contours’. We demonstrate the utility of the approach in an automatic text classification task for five genres – academic, newspaper, fiction...
Although Bilingual First Language Acquisition research has increased considerably over the past few decades, there is still much controversy regarding the rate of development, i.e. the question whether bilinguals lag behind their... more
Although Bilingual First Language Acquisition research has increased considerably over the past few decades, there is still much controversy regarding the rate of development, i.e. the question whether bilinguals lag behind their monolingual peers in various aspects of language. Some studies have found similar rates of development, whereas others have found that bilingual children lag behind their monolingual peers. The current study contributes to this discussion of (dis)similar rates of development by investigating bilingual children’s acquisition of German complex sentence constructions involving adverbial clauses (ACs). Our findings are consistent with usage-based approaches to language acquisition, which predict that bilingual acquisition should proceed slower due to learners having less exposure, on average, to each language.
A large body of research over the past two decades has demonstrated that children and adults are equipped with statistical learning mechanisms that facilitate their language processing and boost their acquisition. However, this research... more
A large body of research over the past two decades has demonstrated that children and adults are equipped with statistical learning mechanisms that facilitate their language processing and boost their acquisition. However, this research has been conducted primarily using artificial languages that are highly simplified relative to real language input. Here, we aimed to determine to what extent adult native and non-native speakers show sensitivity to real-life language statistics obtained from large-scale analyses of authentic language use. Through a within-subject design, we conducted a series of behavioral experiments geared towards assessing the sensitivity to two types of distributional statistics (frequency and entropy) during online processing of multiword sequences across four registers of English (spoken, fiction, news and academic language). Our results show that both native and non-native speakers are able to `tune to' multiple distributional statistics inherent in diffe...
There is an accumulating body of evidence that knowledge of the statistics of multiword phrases (MWP) facilitates native language learning and processing both in children and adults. However, less is known about whether adult second... more
There is an accumulating body of evidence that knowledge of the statistics of multiword phrases (MWP) facilitates native language learning and processing both in children and adults. However, less is known about whether adult second language (L2) learners are able to develop native-like sensitivity to the statistics of MWP, and more importantly up until now no attempt has been made to determine to what extent variation in this ability is related to individual differences (IDs) in cognitive and affective factors. Using a within-subject design embedded in an individual-differences framework, the aim of the present study was twofold: (1) to replicate the MWP frequency effect reported in Hernandez et al. [1] with another sample of L2 learners, and (2) to determine to what extent variation in online L2 processing of MWP can be accounted for by two IDs measures, working memory (WM) and personality. We could replicate the frequency effect for MWP from the high frequency band but not for MWP from the low frequency band. Our findings revealed that while there was no main effect of WM, the Openness personality trait had a statistically significant impact on online L2 processing of MWP. However, subsequent analyses yielded three significant two-way interactions indicating that the relationship between WM, personality and online L2 processing of MWP is a complex one. Taken together our findings underscore the importance of further investigation of L2 learners’ ability to keep track of and build up knowledge of the statistics of MWPs as well as the interrelationships between the cognitive and affective IDs that impact the development of such ability.
A growing field of research has made use of a semiartificial language paradigm to investigate the role of awareness in L2 acquisition. A central and empirically still unresolved issue in this field concerns the possibility of learning... more
A growing field of research has made use of a semiartificial language paradigm to investigate the role of awareness in L2 acquisition. A central and empirically still unresolved issue in this field concerns the possibility of learning implicitly, that is, without intention to learn and without awareness of what has been learned. Up until now, studies on implicit learning have mainly been conducted in laboratory settings under highly controlled conditions with university students as participants. The present study investigated whether and to what extent the results obtained in such settings can be extrapolated to the general population. Building on Williams (2005), we designed two crowdsourcing experiments that examined the learning of novel form-meaning mappings under incidental conditions in 163 participants. Our design allowed us to disentangle the effects of awareness at the level of noticing and understanding. The results of the two experiments demonstrated the implicit learning...
One of the main assumptions of usage-based constructionist approaches is that linguistic knowledge is best conceived of as a repository of constructions, which emerge from experience with language and whose strength of mental... more
One of the main assumptions of usage-based constructionist approaches is that linguistic knowledge is best conceived of as a repository of constructions, which emerge from experience with language and whose strength of mental representation (entrenchment) is a function of their usage frequency. On the basis of a multistep statistical procedure geared to identify patterns of adverbial clause constructions in two distinct registers, we argue that a model of language that generalizes over situational contexts is implausible. Instead, a more adequate model of linguistic knowledge comprises a set of subrepositories that are adapted to the discourse-functional needs of situational contexts, in which constructions have register-specific entrenchment values.
abstractUsage-based (UB) accounts conceive of language learning as continuous, locally contingent construction learning, i.e., a lifelong process of developing and honing the repertoire of constructional patterns geared to the... more
abstractUsage-based (UB) accounts conceive of language learning as continuous, locally contingent construction learning, i.e., a lifelong process of developing and honing the repertoire of constructional patterns geared to the optimization of a language user’s communicative ability across a wide range of language domains. The continuous nature of the process entails that a full UB model needs to account for not only the dynamics of language learning at early stages of acquisition, but also the functionally motivated adaptations of the language system at more advanced levels of proficiency. We present a design based on naturalistic second language (L2) written productions that sets out to reconstruct the states of constructional knowledge of advanced L2 learners through the statistical analysis of their productions. Irrespective of theoretical framing, the study provides foundational data relevant for any property theory of language learning, i.e., any theory that is concerned with t...
English permits adverbial subordinate clauses to be placed either before or after their associated main clause. Previous research has shown that the positioning is conditioned by various factors from the domains of semantics, discourse... more
English permits adverbial subordinate clauses to be placed either before or after their associated main clause. Previous research has shown that the positioning is conditioned by various factors from the domains of semantics, discourse pragmatics and language processing. With the exception of Diessel (2008), these factors have never been investigated in concert, which makes it difficult to understand their relative importance. Diessel's study, however, discusses only temporal constructions and identifies iconicity of sequence as the strongest predictor of clause position. Since this explanation is, in principle, unavailable for other types of subordinate clauses, the generalizability of Diessel's findings is somewhat limited. The present study offers a multifactorial analysis of 2,000 concessive constructions from the written part of the BNC and assesses the variable importance of six factors for the ordering choice, showing that semantic and discourse-pragmatic factors are ...

And 17 more