0% found this document useful (0 votes)

135 views26 pages

Adjective-Noun Collocations in L2 Speech

This document discusses research on adjective + noun collocations in second language speech. It begins by introducing phraseological knowledge as an important component of linguistic competence. The aim is to investigate adjective + noun collocations in the Trinity Lancaster Corpus of L2 speech and compare it to collocations in native British English speech. The overarching research question asks about the nature of adjective + noun collocations in L2 speech. Previous research has found both overuse and underuse of collocations by L2 learners compared to native speakers, and that L1 background and proficiency level can influence collocation use. The relationship between collocations and factors like frequency, exclusivity, and phrasal diversity is complex.

Uploaded by

Yang Linsey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views26 pages

Adjective-Noun Collocations in L2 Speech

Uploaded by

Yang Linsey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

7

Adjective + Noun
Collocations in L2 and L1
Speech: Evidence from the
Trinity Lancaster Corpus
and the Spoken BNC2014
Vaclav Brezina and Lorrae Fox

1 Introduction
Phraseological knowledge represents an important component of
linguistic competence (e.g. Bestgen & Granger, 2014; Gablasova et al.,
2017; Howarth, 1998). It is an essential part of naturally sounding
language production and successful communication. A simple look
at the short excerpt below (1) from the Trinity Lancaster Corpus
(Gablasova et al., 2019) reveals the omnipresence of collocations,
chunks, idiomatic sequences, phrases, etc. in L2 speech. These are
underlined in example (1).

(1) before the starting I er like to er outline the principle er or main points
in my er topic er in the first place traditional classrooms talking
about traditional classrooms if are they the best places or the <only>
places to learn and to teach in? er in the second place traditional
methodologies are they certainly or best tools and at the end of this
presentation I’d like to er to talk you about effective teachers how to
recognise them and how to be one of them [TLC, 2_SP_30].

Let us focus on the last underlined expression: effective teachers. This

is an adjective + noun combination; these are one of the most frequent
syntactic structures in English. The appropriate use of the adjective
effective in combination with teacher demonstrates phraseological
knowledge on the part of the L2 speaker, which is suitable for a semi-
formal spoken presentation on an educational topic. Arguably, using this

152
Adjective + Noun Collocations in L2 and L1 Speech 153

combination points to a higher level of phraseological sophistication

than simply saying a good or excellent teacher. Phraseology is thus often
a matter of preference (in a given context), choice and co-selection of
words.
The aim of this chapter is to investigate adjective + noun
collocations in the Trinity Lancaster Corpus Sample (2.7 million words)
and compare these with adjective + noun collocations in L1 speech
sampled in the Spoken British National Corpus 2014 (10 million words).
The study compares the use of adjective + noun collocations across
three proficiency levels, B1 – B2 – C1/C2, of the Common European
Framework of Reference (CEFR) and also pays attention to the L1
background of L2 speakers. The chapter discusses different aspects
of the collocational relationship and implications of these aspects
for second language acquisition (SLA). We formulated the following
overarching research question to guide this study: What is the nature of
adjective + noun collocations in L2 speech?

2 Adjective + Noun Collocations in Context

Although researchers differ in their approaches to and terminology
on phraseology (Paquot & Granger, 2012), there is general agreement
about the importance of formulaic structures. They contribute to
idiomatic language use, processing and acquisition as well as to overall
language fluency, in both L1 and L2 (e.g. Martinez & Schmitt, 2012;
Paquot, 2018; Wray, 2013). Phraseology is also important from the
pedagogical perspective. Research has found that the use of phraseology
can increase perceived proficiency of language learners. Boers et al.
(2006) established that L2 speakers who used formulaic sequences in
speech were thought to be more fluent in their language production
by blind judges. Similar findings have been reported for writing
(Ohlrogge, 2009). Collocations are one form of formulaic structure
and there are two broad approaches to investigating them: frequency-
based (statistical collocations) and traditional (restricted collocations;
Granger, 2018). As suggested by Granger (2018: 231), this chapter
takes a combined approach to the study, looking for a specific syntactic
structure, namely the adjective + noun combination and measuring the
statistical association; this helps maintain the focus of the study and the
consistency of the target features identified.
Phraseology poses challenges for L2 learners. Hasselgren (1994)
noted overuse of ‘lexical teddy bears’, phrasal units learned early and
then heavily relied upon as L2 speakers advance through the learning
process. L2 speakers’ use of phraseology has also been compared with
L1 speakers. When investigating intensifier + adjective collocations,
Granger (1998) discovered both overuse and underuse of specific
collocations, which may be linked to various degrees of salience of
154 Part 3: The Learner Phrasicon: Developmental Approaches

word combinations and L2 speaker’s L1 background. In a longitudinal

case study on L2 phraseological development, Li and Schmitt (2009: 85)
noted heavy reliance on ‘a limited range of phrases, sometimes to the
point where judges considered the usage non-nativelike’. Investigating
verb-noun collocations in written English of native Hebrew speakers,
Laufer and Waldman (2011) found fewer collocations at all proficiency
levels than in native speaker data. Although the number of collocations
increased at an advanced level, errors were still present despite the
learners engaging in communicative teaching techniques. Overuse of
adjectival collocates with certain types of nouns can also be seen in a
corpus of L1 Taiwanese written English when compared with the British
National Corpus (Shih, 2000). Learners tended to overuse the adjective
‘big’, especially in combination with abstract nouns (45% of cases),
a feature which appeared only in 30% of L1 uses. Finally, Choi (2019)
found evidence of both underuse of delexical collocations and overuse of
high-frequency verbs in L1 Korean written English when compared to an
L1 corpus and noted possible influences from overgeneralisation and L1
transfer.
L1 background has been reported to play a role in L2 phraseological
use, with cross- linguistic differences occurring in various types of
construction, including verb + noun collocations (Gyllstad & Snoder,
Chapter 3 of this volume). Looking specifically at adjective + noun
collocations, Wolter and Gyllstad (2013) found that when the L1 and L2
are typologically and lexically similar, there is evidence that L1 influences
L2 collocational processing. Taking a contrastive phraseological
approach, Maurer-Stroh (2005) found that errors such as collocational
mismatches and false friends in German and English equivalent
adjective + noun collocations could be explained by the hypothesis of
transferability (Bahns, 1993). Similarly, Takač and Lukač (2013) suggested
that indications of collocation usage in Croatian students’ essays could
reflect transfer from their L1, particularly considering general adjective
overuse and overreliance on phraseological units. Conversely, Bahardoust
and Moeini (2012) did not find written collocations being influenced by
EFL learners’ L1.
Considering L2 proficiency, Li and Schmitt (2010) investigated the
development of adjective + noun collocations used in academic texts
by four advanced English learners using case studies. They employed a
statistical approach to longitudinally track collocational proficiency
development and found little change in the usage of adjective + noun
collocations over one academic year. This finding is also supported
by a recent longitudinal learner corpus study of adjective + noun
combinations in L2 Italian (Siyanova-Chanturia & Spina, 2019), which
did not find a straightforward effect of L2 proficiency. Instead, as the
authors argue: ‘following extended exposure to the L2, as learners
become more able language users and as they acquire more extensive
Adjective + Noun Collocations in L2 and L1 Speech 155

(single word) vocabularies, they also tend to experiment more with word
combinatorial mechanisms and, as a result, produce less idiomatic,
less nativelike word sequences’ (2019: 32). Similarly, Omidian et al.
(Chapter 8 of this volume) show the complex nature of phraseological
development. The authors used linear mixed-effects model analysis to
study verb + noun collocations within three dimensions in L2 Italian:
exclusivity, phrase frequency and phrasal diversity. In particular, they
noted aspects of non-linearity in the phrasal diversity and exclusivity
dimensions when considering language proficiency; A1 level (beginner)
learners were the only group demonstrating a longitudinal increase in
collocation diversity while B1 level (intermediate) learners produced
more strongly associated collocations when exploring exclusivity, with
random individual variation also being found.
In contrast, other studies have discovered some effect of L2
proficiency on the use of adjective + noun collocations. Granger and
Bestgen (2014) identified differences between intermediate and advanced
learners’ use of collocations (n-grams) in English academic writing,
including adjective + noun combinations. The study confirmed the L2
proficiency hypothesis, which predicts ‘a smaller proportion of lower-
frequency, strongly-associated, collocations [attested by MI] and a larger
proportion of high-frequency collocations [attested by t-score] in the
intermediate learner texts than in the advanced learner texts’ (Granger &
Bestgen, 2014: 238). Further, Sonbul (2015) investigated frequency effects
on collocational processing in an experimental study, when L1 and L2
speakers were exposed to synonymous adjective + noun collocations.
The study found that proficiency affected the off-line sensitivity
measure, where participants rated written English sentences based on
typicality. Both L1 and L2 speakers were found to show some sensitivity
to collocational frequency in the on-line condition measured by
eye-tracking, demonstrating possible similarities in language processing
shared by the two groups.
The results of previous studies thus offer a complex and often
contradictory picture on the development of L2 phraseological compe-
tence and the relationship between L1 and L2 phraseological use. It is
important to realise that phraseological competence is often evaluated,
especially in pedagogical contexts, within the broader framework of
native and non-native language use. Although there are many common-
alities between L1 and L2 speakers, L1 speaker performance is often
perceived as the target or baseline for L2. Llurda (2016) explained that
this has been fuelled mostly by Western approaches to teaching and
learning languages, as to speak ‘native-like’ has historically been the
goal. Previously, researchers found that even highly proficient language
learners may never achieve this (Long, 1990), while others estab-
lished that some learners can attain language norms typical for native
speakers (Birdsong, 1992). The term ‘native speaker’ is also not without
156 Part 3: The Learner Phrasicon: Developmental Approaches

controversy: the concept proposes a dichotomy of native speaker and

non-native speaker rather than considering a continuum of learning.
Others have argued that ‘native speaker’ is, in fact, a social construct
(Seargeant, 2013), which leads to an issue of how a native speaker should
be defined (Davies, 2003). Is this to based on actual language use or
learners’ linguistic background? Instead, Cook (2007) argued that the
proficient L2 speaker should be the target in L2 language teaching. With
this in mind, care must be taken when undertaking research to learn
more about both the L1 and L2 speakers’ language use. Corpus methods
provide the advantage of using real-life data to describe linguistic
patterns and to compare similarities and differences. The comparison
itself should be evaluatively neutral to avoid inferring the ‘native speaker’
as ideal or the learner as deficient.
Current research into L2 use of collocations often employs corpus
techniques to compare language across variables that mirror SLA
interests such as proficiency levels and L1 background. Using large
amounts of data, as in a corpus, allows researchers to identify patterns
of use that would not otherwise be found. Many corpus-based studies
have involved investigating written collocations, and this is partly
driven by the availability of corpora, with written data being generally
much easier to compile. However, written and spoken language differ
fundamentally in aspects such as grammar (Leech, 2000). Likewise,
collocations are used differently, with Biber et al. (1999: 993ff) observing
that speech is more phraseological than written language. This is
perhaps due to the greater cognitive strain that speaking demands in
terms of language processing and memory (Wray, 2000). Essentially,
we rely on phrases in speech to make communication easier for the
interlocutors and this is why it is of interest to study collocations within
this mode. This chapter aims to shed light on the use of adjective + noun
collocations within the under-researched register of spoken interactional
English.

3 Data and Methodology

3.1 Data
In the study, two corpora of spoken English were used. The Trinity
Lancaster Corpus Sample represents L2 speech in semi-formal settings
(during an exam eliciting both academic language and personal
interaction), while the Spoken British National Corpus 2014 is a sample
of current British informal L1 conversations. It is important to note
that the nature of the L2 interactions is somewhat different from the L1
interactions, due to the difference in the settings. The two corpora are
thus not exact counterparts and caution needs to be exercised when
interpreting the sources of the differences between these two datasets.
Adjective + Noun Collocations in L2 and L1 Speech 157

Table 7.1 Number of L2 speakers (and word tokens) across proficiency levels, L1s
and cultural backgrounds in the Trinity Lancaster Corpus Sample – conversation and
discussion tasks
L1/cultural background Proficiency level according to CEFR
B1 B2 C1/C2
China 80 (63,503) 70 (65,522) 30 (35,548)
India 100 (102,668) 100 (117,019) 30 (50,074)
Italy 100 (79,781) 100 (102,390) 60 (75,415)
Mexico 100 (81,488) 70 (71,648) 60 (71,861)
Russia 40 (22,145) 28 (32,526) –
Spain 100 (74,256) 100 (94,702) 60 (64,672)
Total 520 (423,841) 468 (483,807) 240 (297,570)

Let us look at the corpora in more detail. The Trinity Lancaster

Corpus Sample (TLC Sample) is a balanced 2.7-million-word sample
taken from the Trinity Lancaster Corpus (TLC; Gablasova et al., 2019).
It is compiled from transcribed recordings of Trinity College London’s
Graded Examination in Spoken English (GESE). The corpus has been
annotated with the CLAWS tagger1 (C6 tagset).
The distribution of L2 speakers in the TLC Sample across three
proficiency levels of the Common European Framework of Reference
(CEFR) (Council of Europe, 2001), and L1 and cultural backgrounds is
displayed in Table 7.1. More details about the corpus (nature of tasks
included, transcription conventions, etc.) can be found in Gablasova
et al. (2019).
For this research, only the conversation and discussion tasks were
considered in the analysis as these two tasks occur across all proficiency
levels; this was important to ensure that the effect of task is kept
constant. Both the conversation and the discussion are dialogic tasks
between an L2 and an L1 speaker of English. A conversation task elicits
a dialogue on a topic of general interest such as ‘Society and living
standards’, ‘Personal values and ideals’ and ‘National environmental
concerns’, while the discussion task focuses on a topic selected by the
L2 speakers according to their interests (for a detailed description of the
tasks see Gablasova et al., 2019: 142ff). Both these tasks thus offer a large
variety of topics, which are fairly randomly distributed across proficiency
levels and L1 backgrounds; the topic is thus not considered as a predictor
(independent variable) in this study.
The Spoken BNC2014 (Love et al., 2017) is a 10-million-word
sample of spoken conversational British English. It contains 668
speakers and provides additional rich metadata such as age, gender and
socioeconomic status for the 1,251 recordings. Further details about the
corpus can be seen in Table 7.2.
158 Part 3: The Learner Phrasicon: Developmental Approaches

Table7.2 Corpus characteristics of the Spoken BNC2014

Speakers 668
Recordings 1,251
Words 10,527,369
L1 British English
L2s (self-reported bilinguals)2 Arabic, Cantonese, Dutch, French, German, Gujarati, Irish,
Italian, Kikuyu, Kutchi, Russian, Spanish, Swedish, Turkish,
Urdu and Welsh
Ages 2–91

3.2 Methodology
The data was searched using Sketch Engine (Kilgarriff et al., 2014) to
extract adjective + noun combinations with possible multiple adjectives
and intervening elements such as hesitation marks (er, erm). A complex
CQL query was formulated to capture this pattern. The query is shown
below:

[tag="J." & word!="[A-Z][a-z]+"] [tag="J." & word!="[A-Z][a-z]+"]

{0,1} [tag="UH.*" | word="and"]{0,2} [tag="J.*" & word!="[A-Z]
[a-z]+"]{0,1} [tag="N.*" & tag!="NP.*"]

It was established that this query has a high precision and a high
recall. The precision was 97% with non-accurate hits involving the
use of like and just as spoken features that had been mistagged as
adjectives. Recall was tested on a sample of six texts with three male and
three female speakers across four language backgrounds and different
proficiency levels. These texts were manually annotated and compared
with the results of the automated procedure. The recall was 98.02%. The
obtained precision and recall levels were deemed sufficient for the present
study.
A series of PHP scripts (PHP is a flexible all-purpose scripting
language) using regular expressions were used to further process and
compare the L2 and L1 data. In this process, disfluencies such as n-new
year were normalised to allow a fair comparison of the collocation
strength. The comparison of L1 and L2 collocation use follows the
method applied in Granger and Bestgen (2014), originally devised by
Durrant and Schmitt (2009). Adapting this method, we used the Spoken
BNC2014 as the reference corpus for the L2 performance in the TLC
Sample. The values of collocational strength of individual adjective
+ noun combinations in L2 were assigned based on the L1 use in the
Spoken BNC2014.
The unit of analysis in the comparisons is a collocation type.
This allows more precise measuring of the evidence of collocational
knowledge in individual speakers than if tokens were considered. We
Adjective + Noun Collocations in L2 and L1 Speech 159

also decided against using lemmas as the units of analysis, following

Shin and Nation (2008) who focused on types over lemmas and word
families; they found evidence that different types of the same lemma/
word family attract different collocates. Statistical analyses were carried
out using Lancaster Stats Tools online (Brezina, 2018).

4 Results
This section deals with the results of the study, focusing on the
frequency, types and collocational strength of the target feature (adjective
+ noun combinations). The results are then discussed in Section 5.

4.1 Frequency of adjective + noun combinations

Overall, there were 23,833 instances of adjective + noun
combinations in the TLC Sample. On average, this constitutes 19.8
occurrences of this structure per 1,000 words in the dataset. The
relative frequency per 1,000 tokens does not show any differences in the
distribution of this structure across different proficiency levels. The only
difference observable is between the L1 benchmark (with lower relative
frequency) and L2 performance. However, as indicated in the data
section (Section 3.1), the nature of the L1 interactions is different from
the tasks in the L2 corpus, and the relative frequency per 1,000 tokens
may simply be an indicator of the overall frequency of nouns in the two
corpora. We therefore need to consider the relative frequency of adjective
+ noun combinations per the number of nouns in the datasets. In this
case, a more interesting pattern emerges. The latter normalisation is in
fact more precise because it focuses on the adjectival modification of
noun phrases (the focus of this study) rather than a competition between
(modified) nouns and other word classes in the corpus. The details can
be seen in Table 7.3.
In order to estimate statistical significance of the observation, 95%
confidence intervals (95% CIs) around group means were calculated
and displayed in a form of error bars (for more information about
this technique, see Brezina, 2018: 146). This visualisation takes into

Table 7.3 Adjective + noun combinations per proficiency level

Proficiency Absolute Tokens Nouns Relative Relative
frequency of frequency per frequency per
adj + noun 1000 tokens 1000 nouns
B1 8,168 423,841 58,275 19.27 140.54
B2 9,817 483,807 65,709 20.29 148.19
C1/C2 5,848 297,570 37,328 19.65 155.39
L1 – benchmark 176,258 11,035,380 1,178,207 15.97 153.03
160 Part 3: The Learner Phrasicon: Developmental Approaches

Figure 7.1 Relative frequency of adjective + noun combinations per 1,000 nouns
(group means and 95% CIs)

consideration inter-speaker variation and the amount of evidence present

in the individual proficiency-based subcorpora. The circle in the graph
indicates the mean value of the relative frequency of adjective + noun
combinations per 1,000 nouns, while the error bars show the interval,
where the mean is likely to lie in 95 % of samples taken from the same
population. Largely overlapping confidence intervals indicate that the
result is not statistically significant (taken from the same population);
non-overlapping error bars point to statistically significant results.
As can be seen from Figure 7.1, the B1 level differs from C1/C2 as
well as from the L1 performance in the Spoken BNC2014. There is no
statistically significant difference between B1 and B2 or among the B2 –
C1/C2 – L1 groups. This tendency has been confirmed by the one-way
ANOVA with Bonferroni corrected t-tests as post hoc tests (F (3, 1905) =
7.55; p < .001; ω = 0.101; small effect; post-hoc B1 vs. C1&C2 and B1 vs.
L1 p< .001). Further statistical testing (ANOVA) did not show an effect
of L1 background (F (1, 1242) = 2.48; p = .115).
Adjective + Noun Collocations in L2 and L1 Speech 161

4.2 Types of adjective + noun combinations

Regarding the nature of typical adjective + noun collocations in the
dataset, Table 7.4 displays the top 5 nouns in terms of their frequency at
different proficiency levels and in the L1 data. Their most frequent (as
measured by collocation frequency) and most exclusive combinations
(as measured by the MI score) are recorded in Table 7.4. While this
section explores typical cases of adjective + noun combinations from a

Table 7.4 Top 5 nouns and their adjectival collocates in TLC Sample and the Spoken
BNC2014
Corpus Proficiency Nouns Absolute Most frequent Most exclusive adjective
frequency adjective modifications (MI score)
modifications
(collocate
frequency)
TLC B1 people 1474 other, young, poor poor, rich, corrupt
Sample
school 1094 primary, high, nursing, elementary,
middle primary
money 882 little, important, extra, lucky, little
extra
time 777 long, free, last sleeping, spare, free
years 654 next, recent, past recent, past, last
B2 people 2565 other, young, old qualified, illiterate, elder
time 1170 long, free, good spare, rough, olden
things 1093 other, different, spooky, meaningful,
new chemical
school 976 high, primary, primary, elementary,
middle secondary
years 844 last, early, next, past, recent, early
C1/C2 people 2889 young, other, old handicapped, Brazilian,
elder
things 946 other, good, bad politic, mean,
controversial
way 860 good, different, positive, certain, proper
other
time 813 long, next, ancient spare, changing, ancient
children 697 little, other, young little, own, young
BNC2014 L1 – thing other, good, whole petrol-driven, withered,
benchmark strawberry-flavoured
stuff good, other, new power-based, anti-
mosquito, run-of-the-mill
people gay, normal, stupid heterosexual, like-
minded, effeminate
person only, other, good non-lazy, clean-living,
debauched
way other, good, long joking, uncivilized,
convoluted
162 Part 3: The Learner Phrasicon: Developmental Approaches

more qualitative perspective, the actual statistical scores are analysed in

Section 4.3. Rather than categorising collocations into low/medium/high
collocational bands (e.g. Bestgen & Granger, 2014), which ultimately
begs the question of appropriate cut-off points, we conceptualise the
collocational relationship as a continuum best described from the
perspective of frequency and exclusivity (cf. also Gablasova et al., 2017).
Overall, there is a large overlap in the nouns frequently appearing in
the adjective + noun combinations across the proficiency levels and the
L1 baseline, e.g. the noun people occurs in every group. Considering
frequent collocates of these nouns, there is also a large overlap in the L2
data. For example, the collocation young/other + people runs across all
proficiency levels. The L1 baseline, however, shows less overlap in these
frequent collocations with the L2 data. For instance, the top 3 most
frequent adjectival collocates of the noun people are gay, normal and
stupid. The L1 speakers thus use different frequent adjectival collocates
for the same nouns when compared to the L2 speakers, while there is
similarity in collocations for the L2 speakers regardless of the proficiency
level. Looking at MI score collocates, we can see that exclusive collocates
are more specific to a particular proficiency level because these often
reflect low-frequency combinations, which are topic-related such as
corrupt people (B1), chemical things (B2) and anti-mosquito stuff (L1).
To further investigate the overlap among the different proficiency
levels and contrast this with L1 use, we looked at the ranking of specific
collocations in different groups of speakers. After extracting a master list
consisting of 42 adjective + noun collocations that appeared in all three
proficiency levels, we ranked each collocation according to the frequency
of occurrence per proficiency level. The same collocations were then
found in the Spoken BNC2014 and ranked. Pairwise Spearman’s
correlations (ρ) are reported in Table 7.5.
Overall, there are strong positive correlations between all proficiency
levels; the correlations with L1 use are noticeably lower but still
reasonably high (above 0.5). It needs to be noted that the similarities and
differences observed are to some extent topic related. Interestingly, the
correlations do not show a clear linear development through proficiency
levels. The strongest correlation (0.859) was observed between the B1
and the C1/C2 proficiency levels. To illustrate the overlap, Table 7.6

Table 7.5 Commonalities in adjective + noun collocation frequency ranks

B1 B2 C1/C2 L1
B1 1.00** 0.774** 0.859** 0.508**
B2 1.00** 0.813** 0.507**
C1/C2 1.00** 0.534**
L1 1.00**
Note: Significance codes: ** .01.
Adjective + Noun Collocations in L2 and L1 Speech 163

Table 7.6 Top 20 adjective + noun collocations and their overlap

Frequency rank B1 B2 C1/C2 L1 baseline
1 other people other people young people little bit
2 foreign language high school good thing other day
3 mobile phone big problem other people long time
4 other country young people mobile phone other people
5 healthy food big city other country other thing
6 little bit public figure good way good thing
7 good way global warming high school good idea
8 primary school good thing other thing whole thing
9 important thing mobile phone little bit only thing
10 long time long time bad thing other side
11 high school other country important thing other way
12 fast food good idea good idea bloody hell
13 other thing important thing social network good job
14 young people little bit other hand bad thing
15 other language good way equal opportunity different thing
16 good thing bad thing big problem good friend
17 social network other thing long time big thing
18 free time good job old people little thing
19 middle school good friend rich people good time
20 favourite sport early memory primary school good way
Shared among B1 & B2 & C1/C2 Shared among B1 & B2 &C1/2 and L1

displays the actual collocations across proficiency levels and the L1

baseline; only the top 20 collocations are included in this summary.
Despite the overall similarities, some differences can be observed in
Table 7.6; these can be attributed to the topic of the conversation, such
as fast food and mobile phone. For example, within the B1 level both fast
food and healthy food occurred within the top 20 collocates and were
not present in other proficiency levels.

4.3 Frequency, MI score and log Dice collocations

In this section, we look at the effect of L1 and proficiency on
collocation frequencies, MI scores and log Dice scores. Three multi-way
(factorial) ANOVA models were built to investigate the effect of L1 and
proficiency as predictors as well as their interaction. The results of the
analyses are reported in Tables 7.7 – 7.9.
As can be seen from Tables 7.7 – 7.9, while all three analyses identified
an L1 effect, the effect of proficiency and an interaction between
proficiency and L1 was statistically significant only with log Dice scores.
164 Part 3: The Learner Phrasicon: Developmental Approaches

Table 7.7 Multi-way ANOVA: Collocation frequencies

Df Sum Sq. Mean Sq. F value Pr (>F) Significant?
L1 11 352098 32009 3.206 0.000259 ***
Proficiency 2 19899 9949 0.996 0.369510
L1:Proficiency 17 195734 11514 1.153 0.297060
Residuals 1211 12092176 9985
Note: Significance codes: ***.001, *.05.

Table 7.8 Multi-way ANOVA: MI scores

Df Sum Sq. Mean Sq. F value Pr (>F) Significant?
L1 10 73.1 7.308 3.973 2.33e-05 ***
Proficiency 2 1.9 0.949 0.516 0.5972
L1:Proficiency 17 48.2 2.837 1.542 0.0727
Residuals 1205 2216.5 1.839
Note: Significance codes: ***.001, *.05.

Table 7.9 Multi-way ANOVA: log Dice scores

Df Sum Sq. Mean Sq. F value Pr (>F) Significant?
L1 10 16.9 1.6922 1.852 0.0480 *
Proficiency 2 5.6 2.7918 3.055 0.0475 *
L1:Proficiency 17 26.5 1.5560 1.703 0.0366 *
Residuals 1205 1101.2 0.9138
Note: Significance codes: ***.001, *.05.

Focusing on proficiency, Table 7.10 shows mean values and standard

deviations of collocation frequencies, MI scores and log Dice scores for
three proficiency groups in the TLC Sample, calculated on the basis of
L1 baseline in the Spoken BNC2014. The last column with sparklines
displays the tendency of development of a particular collocation index
across proficiency levels. In this analysis, we are trying to establish if any
of the proficiency levels is closer to the L1 usage in terms of these three
indices and what the overall trend is.

Table 7.10 Comparison of collocation statistics across proficiency levels

Proficiency B1 B2 C1/C2 Tendency
Mean collocation 99.88 (112.79) 103.40 (93.09) 112.50 (88.43)
frequency (SD)
Mean MI score (SD) 5.41 (1.50) 5.34 (1.33) 5.36 (1.19)

Mean log Dice (SD) 7.20 (1.13) 7.24 (0.83) 7.39 (0.82)
Adjective + Noun Collocations in L2 and L1 Speech 165

Figure 7.2 Comparison of the three proficiency levels using the mean collocation
frequency (95% CIs)

As can be seen from Table 7.10, two of the indices (mean collocation
frequency and log Dice) increase with proficiency; mean MI score, on
the other hand, follows an opposite trend. Note that while the MI score
is designed to identify exclusive collocates and penalises high frequency
combinations, log Dice highlights combinations that are both exclusive
and relatively frequent (for more information see Brezina, 2018: 73–74
and Gablasova et al., 2017). We can also observe that standard deviations
in each group are fairly high (compared to the mean), indicating large
inter-speaker differences. The overall picture is best illustrated with
three graphs with 95% confidence intervals around the mean values of
the respective statistics. These are displayed in Figures 7.2–7.4. Largely
overlapping 95% CIs indicate non-significant results.
In order to test whether L2 speakers tend to repeat the same
formulaic sequences multiple times, the type/token ratios of the
adjective + noun combinations were also computed per proficiency
166 Part 3: The Learner Phrasicon: Developmental Approaches

Figure 7.3 Comparison of the three proficiency levels using the mean MI score
(95% CIs)

level. Table 7.11 reports the results. As can be seen, the mean type/token
ratios of adjective + noun combinations are comparable across all three
proficiency levels; the L1 baseline, however, has considerably higher
mean type/token ratios, indicating less repetition in L1 speech than in L2
speech. Note that, for comparability, type/token ratios for the L1 baseline
were computed for a subset of scripts with the same range of adjective
+ noun tokens (1–73) as the L2 data; this is because type/token ratios of
any kind are dependent on text length and are negatively correlated with
token counts (Brezina, 2018: 57ff).

Table 7.11 Type/token ratio (TTR) of adjective + noun combinations

Proficiency B1 B2 C1/C2 L1 baseline
Mean collocation TTR (SD) 0.86 (0.12) 0.85 (0.11) 0.86 (0.09) 0.93 (0.07)
Adjective + Noun Collocations in L2 and L1 Speech 167

Figure 7.4 Comparison of the three proficiency levels using the mean log Dice (95% CIs)

5 Discussion
The results indicate a complex picture related to the use of adjective
+ noun combinations in L2 and L1 speech. The focus of the discussion
will be on L2 use with L1 serving as a reference point. The frequencies
of adjective + noun combinations observed in L2 speech are considerably
lower (on average 19.8 occurrences per 1,000 words) than what has
been reported in the literature for L2 writing. Granger and Bestgen
(2014), for instance, report 45.4 adjective + noun bigrams per 1,000
words in L2 student essays. This effect is likely related to the variation
in the frequencies of noun phrases across different registers of speech
and writing (e.g. Biber, 1988). Indeed, L1 use in the Spoken BNC2014
showed even fewer adjective + noun combinations (on average 15.97
per 1,000 words) due to the fact that these data include less formal
and less informational types of interaction than the TLC Sample data.
Less informational and more involved registers, to use Biber’s (1988)
terminology, typically occur with fewer noun phrases in general. In order
to focus on phraseological aspects of adjective + noun combinations,
168 Part 3: The Learner Phrasicon: Developmental Approaches

therefore, we looked at three areas of interest, namely (i) frequencies

of modified noun phrases (adjective + noun) out of all noun phrases,
(ii) types of adjective + noun combinations and (iii) collocational strength
of the observed combinations. These three areas are now discussed.
First, the analysis of the frequencies of modified noun phrases
revealed an interesting pattern of differences. The B1 spoken performance
differed from both the C1/C2 level and the L1 baseline with both C1/C2
and the L1 groups using statistically significantly more modified noun
phrases than the B1-level speakers. The B1-level distinction is a critical
point within language development. This echoes previous research from
Thewissen (2015), who found improvement in lexical phrase accuracy
in B1- to B2-level writers but noted a lack of linear progression beyond
these adjacent proficiency groups. This seems to point to higher
syntactic complexity (cf. Biber et al., 2011) as well as to a higher level
of phraseological complexity at higher levels of proficiency. Focusing
on the latter, depending on the opportunity for use in a specific context,
the ability to co-select an appropriate adjective to modify a noun can be
seen as an indicator of the phraseological complexity of the text/spoken
discourse and phraseological competence on the part of the L2 speaker.
This can be demonstrated most clearly with patterns where the adjective
forms an optional element of the noun phrase, thus enhancing it from the
perspective of phraseology. This can be exemplified with the following
three utterances taken from the TLC Sample.

(2) I don’t really get the question sorry (B1 level, 2_8_RU_4)
(3) it’s kind of hard question (B1 level, 2_6_IN _65)
(4) this is a a really challenging question (C2 level, IN_13)

In example (2), the noun question is not modified by an adjective,

while examples (3) and (4) show two possible modifications with
approximately the same meaning. It can be argued that while hard
question is a more frequent combination in the TLC Sample (10
occurrences), challenging question (2 occurrences) is a more exclusive
combination and hence more phraseologically complex. The relative
MI scores in the TLC Sample of hard question and challenging question
are 6.6 and 9.5 respectively, which underscores this fact. In contrast to
the non-modified noun phrase in (2), the co-occurrence of hard and
challenging with the noun question brings evidence of phraseological
knowledge by L2 speakers.
Second, the investigation of the types of collocation in the data
revealed a variety of adjective + noun collocations and their occurrence
across proficiency levels. Here, we considered two approaches to
identifying collocations using (i) the collocation frequency and (ii) the MI
score. The collocation frequency is the frequency of co-occurrence of the
noun and the adjective. Highly frequent combinations such as free time,
Adjective + Noun Collocations in L2 and L1 Speech 169

young people, other thing, etc., are fairly general and thus have the ability
to occur across different contexts. The combinations identified using the
MI score (e.g. chemical things, handicapped people), on the other hand,
are exclusive and often context specific (Gablasova et al., 2017). In the
literature (e.g. Granger & Bestgen, 2014; Li & Schmitt, 2009), the MI
score is typically contrasted with the t-score, demonstrating the same type
of polarity as in the present study because the t-score is highly correlated
with the collocation frequency; more recent studies (e.g. Siyanova-
Chanturia & Spina, 2019) abandon the problematic t-score in favour of
collocation frequency and other measures. For a more detailed discussion
of appropriate collocation measures, see Gablasova et al. (2017).
On a more general level, collocation measures operate on two
principal scales of frequency and exclusivity (Brezina, 2018: 71). These
scales recognise the frequency and the exclusivity of the collocational
relationship as two important features of word combinations in
language, which are also highlighted in this study. We have seen that
L2 speakers in the TLC Sample used a similar range of high frequency
collocations in a similar frequency rank order as demonstrated by the
large Spearman’s correlations (above 0.77). This range, however, differed
to some extent from the L1 baseline (Spearman’s ρ approximately 0.5).
Frequent adjective + noun combinations that were shared across the
proficiency levels and the L1 baseline included collocations such as little
bit, long time, other people, other thing and good thing. The frequent L1
combinations not shared in the top 20 frequent collocations in the TLC
Sample included examples such as bloody hell, bad thing, good thing and
big thing. However, we have to note that while the L1 interactions were
informal and between interlocutors who knew each other well (friends
and family, Love et al., 2017), the L2 interactions were semi-formal and
between interlocutors with some power hierarchy (Gablasova et al.,
2019). The absence of swear-word combinations (bloody hell) and vague
expressions (bad thing, good thing, big thing) from the top 20 frequency
lists in the L2 data can thus also be explained by this difference in the
nature of the interaction.
A side effect of looking at exclusivity (using the MI score) is the
identification of some combinations which demonstrate lexical rather
than phraseological knowledge. A prime example of these combinations
in this study is the occurrence of compound nouns such as high
school, middle school and mobile phone. Similar combinations (e.g.
nitrous oxide, Hippocratic oath and pop music) figure also in the list
of collocations (n-grams) identified using the MI score in Granger and
Bestgen (2014). Strictly speaking, these combinations point to the use of
particular naming strategies and terminology in L2 production rather
than to the ability to combine words in a particular way (phraseological
competence). From the perspective of vocabulary learning, such
combinations demonstrate the breadth rather than the depth of
170 Part 3: The Learner Phrasicon: Developmental Approaches

vocabulary (cf. Read, 2004). They are thus likely to be correlated with
the topic of the spoken exchange, chosen at a higher level than the
sentence/utterance level (choice of topic at the textual level).
Third, when interpreting the collocational strength of the observed
adjective + noun combinations (see Table 7.10), we need to bear in mind
different properties of collocates identified using different collocation
statistics. In this study, we looked at three measures highlighting different
aspects of the collocational relationship between adjectives and nouns
on the frequency and exclusivity scales (Brezina, 2018; Gablasova et al.,
2017). The collocation frequency highlights frequent, non-exclusive
combinations, while the MI score focuses on infrequent but highly
exclusive combinations. We have also added a third measure, log Dice,
which identifies relatively exclusive combinations, which are also
reasonably frequent. The log Dice score, therefore, represents a measure,
which offers a balance between frequency and exclusivity (Omidian
et al., Chapter 8 of this volume; Rychlý, 2008).
In the study, we observed that all three collocation measures show
an effect of speakers’ L1. L1 has been recognised in the literature (e.g.
Wolter & Gyllstad, 2013) as an important factor in phraseological
knowledge, with L1 transfer effects being reported (Maurer-Stroh, 2005;
Takač & Lukač, 2013). In our study, we contrasted six L1 and cultural
backgrounds with typologically different languages (Chinese vs. Italian
vs. Hindi, etc.). It is therefore not surprising to find differences in the
collocation measures across these backgrounds. However, we need to
exercise caution and not automatically assume L1 transfer without
proper analysis of phraseological mechanisms in the individual L1
contexts (cf. Granger, 2015). A more in-depth investigation of this
variable is, however, beyond the scope of the present study.
The only statistically significant difference related to L2 proficiency
was between the B1 and the C1/C2 group when the mean strength of
collocations was calculated using the log Dice score. This demonstrates
that the higher proficiency level (C1/C2) spoken production included
more adjective + noun combinations that are relatively frequent and
exclusive than the lower B1 level. Examples of high log Dice score
collocations in C1/C2 are the following: good life, extended family, real
time and sore throat. This is in line with Thewissen’s (2015) finding
related to lexical phrase accuracy showing a clear distinction between
the higher proficiency bands and the B1 group, who ‘produced a higher
number of “wide-of-the-mark” errors’ (2015: 187).
From the perspective of second language acquisition, there is a certain
tension between the frequency of co-occurrence and the exclusivity of
the combination. High frequency of co-occurrence of two words points
to a high probability of repeated exposure of the language learner to
these combinations and the possibility of statistical (implicit) learning
(e.g. Saffran, 2003). On the other hand, very frequent combinations
Adjective + Noun Collocations in L2 and L1 Speech 171

might not get noticed and hence might not be committed to memory. As
Divjak (2019: 4) points out: ‘[t]here is no pure frequency effect: experience
cannot be reduced to frequency as experience is filtered through attention
before being committed to memory’. There could be a fundamental
processing difference of formulaic language between L1 and L2 speakers
regarding the frequency and exclusivity of phraseological units. Ellis et al.
(2008) found L1 English speakers had shorter recognition response times
for statistical collocations with higher MI scores, a measure of exclusivity,
while L2 speakers’ shorter reaction times correlated with the frequency
of occurrence of the collocation. Sonbul (2015), in a study of adjective +
noun collocations, found frequency to have a similar effect on typicality
rating of collocations for both L1 and L2 speakers; L2 proficiency also
played a role, with more proficient L2 speakers showing greater sensitivity
to collocation frequency.
Given this complex picture, and the importance of both frequency and
exclusivity, we can argue that for restricted collocations, such as those in
this study, it may be more apt to consider both measures combined. While
in corpus-based research we cannot control for psychological variables
such as attention, we can hypothesise that a collocation measure that takes
into account both frequency and exclusivity of the collocation might be
able to provide access to the proficiency effect better than measures that
deal either with frequency or exclusivity alone. The results of this study
clearly seem to indicate the plausibility of this hypothesis.
Considering the perspective of L2 production, high frequency
combinations have been hypothesised to appear with larger proportions
in lower proficiency texts (intermediate) than in higher proficiency texts
(advanced) (Granger & Bestgen, 2014). This study did not find evidence
for this claim in spoken L2 data: no difference was observed in the use
of the MI and frequency-based collocations across the proficiency levels.
Instead, the evidence points to a more complex picture of non-linear
development of phraseological competence, where individual differences
between speakers play a major role. The need to consider individual
differences and changing contexts has also been highlighted by Larsen-
Freeman (2006) in her case study of five L1 Chinese speakers learning
English, which shows that language attainment develops dynamically.
A similar conclusion has been put forward in Siyanova-Chanturia and
Spina (2019), who investigated the development of adjective + noun
collocations in L2 Italian in a longitudinal learner corpus. The results
demonstrated phraseological development to be complex, with higher
proficiency speakers not using more idiomatic language systematically.
Another longitudinal study, by Omidian et al. (Chapter 8 of this volume),
brings further evidence about phraseological development as a multi-
faceted and non-linear process, noting that exclusivity and diversity in
verb + noun collocations are not only influenced by time and language
proficiency but also, importantly, by individual learner differences.
172 Part 3: The Learner Phrasicon: Developmental Approaches

Finally, the effect of repetition of the same adjective + noun

combinations needs to be briefly considered. In this study (see Section
3.2), a methodological decision was made to focus on collocation types
rather than tokens (cf. Granger & Bestgen, 2014 who considered both
bigram types and tokens). This means that the same adjective + noun
combination are counted only once in the speech of the same speaker.
However, we observed that the type/token ratios of these combinations
differed between L2 speech on the one hand and the L1 baseline on the
other hand, with L1 speakers using fewer repetitions. This is in line with
previous studies of lexical repetition and lexical complexity in L1 and
L2 (e.g. Connor, 1984; Crossley & McNamara, 2012; Ferris, 1994) and
supports the idea of ‘lexical teddy bears’ proposed by Hasselgren (1994).
Nevertheless, when investigating repetition, we always need to consider
its multiple functions (Molenda et al., 2018). One of these is lexical
cohesion, as demonstrated in example (5).

(5) L1 okay alright tell me about supersonic cars

L2 supersonic cars are are not like just normal normal cars they
are too fast to stop to stop <unclear=the super car er sonic>
cars you don’t there are special brakes there are two sets of
parachutes and much more t-to to to stop supersonic cars
L1 parachutes
L2 yes like there are some supersonic cars which <are which
company> has made er the main supersonic car is Thrust
SSC wh-which is <unclear> which is more than some two
thousand one hundred and ninety eight kilometres per hour
(B2 level, 2_8_IN_17).

In example (5), the topic of discussion is ‘supersonic cars’; these are

contrasted with ‘normal cars’. All adjective + noun combinations in L2
speech are highlighted in bold. This example demonstrates the use of a
technical term (‘supersonic cars’), which cannot be easily paraphrased
and whose repetition by both interlocutors forms a natural part of the
conversation and a means of lexical cohesion. However, for this script,
the type/token ratio of adjective + noun combinations is fairly low (0.42)
compared to the B2 group mean of 0.85 and the L1 baseline of 0.93. In
this script, there are 23 unique adjective + noun types and 55 occurrences
of these structures (tokens). Nevertheless, it would be incorrect to
interpret ‘supersonic cars’ as a lexical teddy bear and a sign of limited
phraseological competence.

6 Conclusion
This study offers a unique insight into the use of adjective + noun
collocations in spoken L2, thus complementing previous research,
Adjective + Noun Collocations in L2 and L1 Speech 173

which largely focused on written language. The picture that emerges is

a complex one. Three major trends are observable in the data. First, a
proficiency effect on the proportion of modified nouns can be seen, with
B1 speakers using statistically significantly fewer collocations of this
type than C1/C2 speakers and the L1 baseline group. Second, there is
a proficiency effect on the collocation strength measured using the log
Dice score: higher proficiency level (C1/C2) speakers used more relatively
frequent and exclusive adjective + noun combinations than B1-level
speakers. These two points seem to indicate a threshold in phraseological
development at the B1 level. Third, an L1 effect on collocation strength
regardless of the measure (frequency, MI score, log Dice) is apparent
in the data. Speakers with different L1 backgrounds systematically use
collocations of varying strength, which could point to L1 transfer effects,
although more investigation is needed in this area.
Overall, however, the study revealed more potential similarities
than differences in the use of adjective + noun combinations across
proficiency levels, and large individual differences between speakers. We
paid attention to both statistically significant as well as non-significant
(negative) results. While the former are easier to interpret as evidence
of a difference likely to occur in the population, the latter invite us
to explore further the relationships, with a view to seeking more
evidence. This means that a non-significant result is not evidence of no
relationship between the variables in question. However, reporting such
results is crucial for providing the whole picture and for the ability of the
field to build on previous studies in replication and meta-analysis (Lehrer
et al., 2007; Brezina, 2018: 267).
While the study offers numerous insights into the use of adjective +
noun collocations in L2 speech, it also has its limitations, imposed by
the scope and focus of this research. Successful use of collocations does
not depend only on the suitable (target-like) co-selection of two words
(which we largely discussed in this chapter), but also on the appropriate
use of the collocations in a specific context. This area clearly deserves a
more detailed exploration. A prime example of this aspect is the frequent
use of the adjective + noun combination bloody hell in the Spoken
BNC2014 and its absence in the TLC Sample. In this case, the absence
is entirely justifiable because the use of swear words in the semi-formal
context of the interactions in the TLC Sample would not be appropriate.
This also raises a more general question of a suitable L1 baseline for
L2 production and the role of the native speaker in this comparison
(Seargeant, 2013), which needs to be considered in future studies.
The complexity of a phenomenon such as collocations always
requires paying attention to different levels of interpretation of the
results and possible alternative interpretations. When analysing
collocations, we need therefore to consider both the quantitative aspects
(collocation frequency and strength) and the qualitative and contextual
174 Part 3: The Learner Phrasicon: Developmental Approaches

aspects (types of collocation and the appropriateness of their use). In

future studies it might therefore be advisable to separate different types
of adjective + noun collocations, such as compounds, from collocations
in the strict sense. Also, we should not assume a single trajectory of
language acquisition in all L2 speakers because learning experience is
multi-faceted, often involving progress, regression and more progress,
etc. (De Bot et al., 2007).

Acknowledgement
The writing of this chapter was supported by ESRC grants no.
EP/P001559/1 and ES/R008906/1.

Notes
(1) [Link]
(2) Speakers were asked to report if they were bilingual and, if so, what their L2 was
other than British English (Table 7.2). Twenty-eight (4.2%) speakers responded pos-
itively and specified the listed languages (see [Link]
[Link]).

References
Bahardoust, M. and Moeini, M.R. (2012) Lexical and grammatical collocations in writing
production of EFL learners. The Journal of Applied Linguistics 5 (1), 61–86.
Bahns, J. (1993) Lexical collocations: A contrastive view. ELT Journal 47 (1), 56–63.
Bestgen, Y. and Granger, S. (2014) Quantifying the development of phraseological
competence in L2 English writing: An automated approach. Journal of Second
Language Writing 26, 28–41.
Biber, D. (1988) Variation Across Speech and Writing. Cambridge: Cambridge University Press.
Biber, D., Gray, B. and Poonpon, K. (2011) Should we use characteristics of conversation to
measure grammatical complexity in L2 writing development? TESOL Quarterly 45 (1),
5–35.
Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999) Longman Grammar
of Spoken and Written English. London: Longman.
Birdsong, D. (1992) Ultimate attainment in second language acquisition. Language 68 (4),
706–755.
Boers, F., Eyckmans, J., Kappel, J., Stengers, H. and Demecheleer, M. (2006) Formulaic
sequences and perceived oral proficiency: Putting a lexical approach to the test.
Language Teaching Research 10 (3), 245–261.
Brezina, V. (2018) Statistics in Corpus Linguistics: A Practical Guide. Cambridge:
Cambridge University Press.
Choi, W. (2019) A corpus-based study on ‘delexical verb + noun’ collocations made by
Korean learners of English. The Journal of Asia TEFL 16 (1), 279–293.
Connor, U. (1984) A study of cohesion and coherence in ESL students’ writing. Paper in
Linguistics 17 (3), 301–316.
Cook, V.J. (2007) The goals of ELT: Reproducing native-speakers or promoting
multicompetence among second language users? In J. Cummins and C. Davison (eds)
International Handbook of English Language Teaching (pp. 237–248). New York, NY:
Springer.
Adjective + Noun Collocations in L2 and L1 Speech 175

Council of Europe (2001) Common European Framework of Reference for Languages:

Learning, Teaching, Assessment. Cambridge: Cambridge University Press.
Crossley, S.A. and McNamara, D.S. (2012) Predicting second language writing proficiency:
The roles of cohesion and linguistic sophistication. Journal of Research in Reading
35 (2), 115–135.
Davies, A. (2003) The Native Speaker: Myth and Reality. Clevedon: Multilingual
Matters.
De Bot, K., Lowie, W. and Verspoor, M. (2007) A dynamic systems theory approach to
second language acquisition. Bilingualism: Language and Cognition 10 (1), 7–21.
Divjak, D. (2019) Frequency in Language: Memory, Attention and Learning. Cambridge:
Cambridge University Press.
Durrant, P. and Schmitt, N. (2009) To what extent do native and non-native writers make
use of collocations? IRAL-International Review of Applied Linguistics in Language
Teaching 47 (2), 157–177.
Ellis, N., Simpson-Vlach, C. and Maynard, C. (2008) Formulaic language in native and
second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL
Quarterly 42 (3), 375–396.
Ferris, D.R. (1994) Lexical and syntactic features of ESL writing by students at different
levels of L2 proficiency. TESOL Quarterly 28 (2), 414–420.
Gablasova, D., Brezina, V. and McEnery, T. (2017) Collocations in corpus-based language
learning research: Identifying, comparing, and interpreting the evidence. Language
Learning 67 (S1), 155–179.
Gablasova, D., Brezina, V. and McEnery, A. (2019) The Trinity Lancaster Corpus:
Development, description and application. International Journal of Learner Corpus
Research 5 (2), 126–158.
Granger, S. (1998) Prefabricated patterns in advanced EFL writing: Collocations and
formulae. In A.P. Cowie (ed.) Phraseology: Theory, Analysis, and Applications
(pp. 145–60). Oxford: Oxford University Press.
Granger, S. (2015) Contrastive interlanguage analysis: A reappraisal. International Journal
of Learner Corpus Research 1 (1), 7–24.
Granger, S. (2018) Formulaic sequences in learner corpora: Collocations and lexical
bundles. In A. Siyanova-Chanturia and A. Pellicer-Sánchez (eds) Understanding
Formulaic Language: A Second Language Acquisition Perspective (pp. 228–247). New
York, NY: Routledge.
Granger, S. and Bestgen, Y. (2014) The use of collocations by intermediate vs. advanced
non-native writers: A bigram-based study. International Review of Applied Linguistics
in Language Teaching 52 (3), 229–252.
Hasselgren, A. (1994) Lexical teddy bears and advanced learners: A study into the ways
Norwegian students cope with English vocabulary. International Journal of Applied
Linguistics 4 (2), 237–258.
Howarth, P. (1998) Phraseology and second language proficiency. Applied Linguistics
19 (1), 24–44.
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P. and
Suchomel, V. (2014) The Sketch Engine: Ten years on. Lexicography 1, 7–36.
Larsen-Freeman, D. (2006) The emergence of complexity, fluency, and accuracy in the oral
and written production of five Chinese learners of English. Applied Linguistics 27 (4),
590–619.
Laufer, B. and Waldman, T. (2011) Verb-noun collocations in second language writing: A
corpus analysis of learners’ English. Language Learning 61 (2), 647–672.
Leech, G. (2000) Grammars of spoken English: New outcomes of corpus-oriented research.
Language Learning 50 (4), 675–724.
Lehrer, D., Leschke, J., Lhachimi, S., Vasiliu, A. and Weiffen, B. (2007) Negative results in
social science. European Political Science 6 (1), 51–68.
176 Part 3: The Learner Phrasicon: Developmental Approaches

Li, J. and Schmitt, N. (2009) The acquisition of lexical phrases in academic writing: A
longitudinal case study. Journal of Second Language Writing 18 (2), 85–102.
Li, J. and Schmitt, N. (2010) The development of collocation use in academic texts by
advanced L2 learners: A multiple case study approach. In D. Wood (ed.) Perspectives
on Formulaic Language: Acquisition and Communication (pp. 2–46). London & New
York: Continuum.
Llurda, E. (2016) Native speakers? English and ELT: Changing perspectives. In G. Hall
(ed.) The Routledge Handbook of English Language Teaching (pp. 51–64). London:
Routledge.
Long, M.H. (1990) Maturational constraints on language development. Studies in Second
Language Acquisition 12 (3), 251–285.
Love, R., Dembry, C., Hardie, A., Brezina, V. and McEnery, T. (2017) The Spoken BNC2014:
Designing and building a spoken corpus of everyday conversations. International
Journal of Corpus Linguistics 22 (3), 319–344.
Martinez, R. and Schmitt, N. (2012) A phrasal expressions list. Applied Linguistics 33 (3),
299–320.
Maurer-Stroh, P. (2005) ‘House-high favourites?’ A contrastive analysis of adjective-noun
collocations in German and English. ELOPE: English Language Overseas Perspectives
and Enquires 2 (1–2), 57–64.
Molenda, M., Pęzik, P. and Osborne, J. (2018) Self-repetitions in learners’ spoken language:
A corpus-based study. In V. Brezina and L. Flowerdew (eds) Learner Corpus Research:
New Perspectives and Applications (pp. 90–111). London: Bloomsbury.
Ohlrogge, A. (2009) Formulaic expressions in intermediate EFL writing assessment. In
R. Corrigan, A. Moravcsik, H. Ouali and K. M. Wheatley (eds) Formulaic Language,
Volume 2: Acquisition, Loss, Psychological Reality, and Functional Explanations
(pp. 387–404). Amsterdam & Philadelphia: Benjamins.
Paquot, M. (2018) Phraseological competence: A missing component in university entrance
language tests? Insights from a study of EFL learners’ use of statistical collocations.
Language Assessment Quarterly 15 (1), 1–15.
Paquot, M. and Granger, S. (2012) Formulaic language in learner corpora. Annual Review
of Applied Linguistics 32, 130–149.
Read, J. (2004) Research in teaching vocabulary. Annual Review of Applied Linguistics 24
(1), 146–161.
Rychlý, P. (2008) A lexicographer-friendly association score. In P. Sojka and A. Horák (eds)
Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN
(pp. 6–9). Brno, Czech Republic: Masaryk University.
Saffran, J.R. (2003) Statistical language learning: Mechanisms and constraints. Current
Directions in Psychological Science 12 (4), 110–114.
Seargeant, P. (2013) Ideologies of nativism and linguistic globalization. In S.A. Houghton
and D.J. Rivers (eds) Native-Speakerism in Japan: Intergroup Dynamics in Foreign
Language Education (pp. 231–242). Bristol: Multilingual Matters.
Shih, H.-H.R. (2000) Collocation deficiency in a learner corpus of English: From an overuse
perspective. Proceedings of the 14th Pacific Asia Conference on Language, Information
and Computation (pp. 281–288). Tokyo: Waseda University International Conference
Center.
Shin, D. and Nation, P. (2008) Beyond single words: The most frequent collocations in
spoken English. ELT Journal 62 (4), 339–348.
Siyanova-Chanturia, A. and Spina, S. (2019) Multi-word expressions in second language
writing: A large-scale longitudinal learner corpus study. [Advance online publication.]
Language Learning 70 (2), 420–463.
Sonbul, S. (2015) Fatal mistake, awful mistake, or extreme mistake? Frequency effects on
off-line/on-line collocational processing. Bilingualism: Language and Cognition 18 (3),
419–437.
Adjective + Noun Collocations in L2 and L1 Speech 177

Takač, V.P. and Lukač, M. (2013) How word choice matters: An analysis of adjective-noun
collocations in a corpus of learner essays. Jezikoslovlje 14 (2–3), 385–402.
Thewissen, J. (2015) Accuracy Across Proficiency Levels: A Learner Corpus Approach.
Louvain-la-Neuve, Belgium: Presses universitaires de Louvain.
Wolter, B. and Gyllstad, H. (2013) Frequency of input and L2 collocation processing: A
comparison of congruent and incongruent collocations. Studies in Second Language
Acquisition 35 (3), 451–482.
Wray, A. (2000) Formulaic sequences in second language teaching: Principle and practice.
Applied Linguistics 21 (4), 463–489.
Wray, A. (2013) Formulaic language. Language Teaching 46 (3), 316–334.

L1 to L2 Translation in EFL Teaching
No ratings yet
L1 to L2 Translation in EFL Teaching
15 pages
Understanding Research in Applied Linguistics
No ratings yet
Understanding Research in Applied Linguistics
22 pages
Understanding KWHLAQ Charts
No ratings yet
Understanding KWHLAQ Charts
4 pages
Subtitles and Second Language Learning
No ratings yet
Subtitles and Second Language Learning
6 pages
Teaching English in Libya: A Historical Overview
No ratings yet
Teaching English in Libya: A Historical Overview
7 pages
MOOC - Unit 4 Packet
No ratings yet
MOOC - Unit 4 Packet
86 pages
TESOL Teaching Methods and Strategies
No ratings yet
TESOL Teaching Methods and Strategies
11 pages
Linguistic Landscape in Rural English Learning
No ratings yet
Linguistic Landscape in Rural English Learning
22 pages
Material Design in Language Teaching
No ratings yet
Material Design in Language Teaching
7 pages
01 MOOC - Unit 3 - Introduction To Listening and Technology
No ratings yet
01 MOOC - Unit 3 - Introduction To Listening and Technology
23 pages
Benefits and Challenges of CALL in ELT
No ratings yet
Benefits and Challenges of CALL in ELT
14 pages
Enhancing English Skills with Songs
100% (1)
Enhancing English Skills with Songs
11 pages
01 MOOC - Unit 1 - Introductory Lecture
No ratings yet
01 MOOC - Unit 1 - Introductory Lecture
12 pages
English Teaching Glossary PDF
100% (1)
English Teaching Glossary PDF
6 pages
Analyzing Jordanian Identity in Textbooks
No ratings yet
Analyzing Jordanian Identity in Textbooks
13 pages
Evaluation of Iranian High School English Textbook
No ratings yet
Evaluation of Iranian High School English Textbook
13 pages
IPA Study Guide for Phonetics
No ratings yet
IPA Study Guide for Phonetics
15 pages
Toy Museum Unit for K5 Students
No ratings yet
Toy Museum Unit for K5 Students
2 pages
Subtitled Audiobooks for EFL Competence
No ratings yet
Subtitled Audiobooks for EFL Competence
17 pages
Teaching Academic Vocabulary and Grammar
No ratings yet
Teaching Academic Vocabulary and Grammar
19 pages
Pronoun Antecedent Agreement Key
No ratings yet
Pronoun Antecedent Agreement Key
1 page
Visual Aids for ESL Vocabulary Learning
No ratings yet
Visual Aids for ESL Vocabulary Learning
11 pages
Teaching and Learning Concepts Explained
No ratings yet
Teaching and Learning Concepts Explained
7 pages
World Englishes Course Outline
No ratings yet
World Englishes Course Outline
7 pages
（搭配-词汇语法）Sinclair J.：Lexical Grammar PDF
No ratings yet
（搭配-词汇语法）Sinclair J.：Lexical Grammar PDF
14 pages
Novice Teachers' TBLT Experiences in Honduras
No ratings yet
Novice Teachers' TBLT Experiences in Honduras
25 pages
Stages of Reading Development Explained
100% (1)
Stages of Reading Development Explained
5 pages
English Teaching Theory Guide
No ratings yet
English Teaching Theory Guide
56 pages
Kessler 2020 Can Task Based Language Teaching Be Authentic in FL
No ratings yet
Kessler 2020 Can Task Based Language Teaching Be Authentic in FL
16 pages
ICT in Teaching Oral English in Borgu Schools
No ratings yet
ICT in Teaching Oral English in Borgu Schools
49 pages
The Teaching of Writing To English Language Learners (PDFDrive)
No ratings yet
The Teaching of Writing To English Language Learners (PDFDrive)
1,850 pages
Video Impact on EFL Reading Skills
No ratings yet
Video Impact on EFL Reading Skills
30 pages
Understanding Semantics in Language
No ratings yet
Understanding Semantics in Language
61 pages
Bangla Essay Writing Challenges and Solutions
No ratings yet
Bangla Essay Writing Challenges and Solutions
10 pages
Vocabulary Learning Strategies for High School
No ratings yet
Vocabulary Learning Strategies for High School
3 pages
Preview-9781139524933 A25046318
No ratings yet
Preview-9781139524933 A25046318
23 pages
English Language Teaching in Algeria
No ratings yet
English Language Teaching in Algeria
9 pages
B.Ed Notes on Educational Leadership
100% (1)
B.Ed Notes on Educational Leadership
13 pages
Understanding Syllabus Design Elements
100% (1)
Understanding Syllabus Design Elements
25 pages
Typological and Functional Language Approaches
No ratings yet
Typological and Functional Language Approaches
7 pages
First Language Transfer in Second Language Writing
No ratings yet
First Language Transfer in Second Language Writing
24 pages
EFL Vocabulary Teaching Strategies
100% (1)
EFL Vocabulary Teaching Strategies
239 pages
Overview of English Phonology and Morphology
No ratings yet
Overview of English Phonology and Morphology
11 pages
ICT in Teaching Oral English in Schools
No ratings yet
ICT in Teaching Oral English in Schools
77 pages
Politeness and Inclusion in Tagalog Culture
No ratings yet
Politeness and Inclusion in Tagalog Culture
20 pages
Language Assessment Impact on ESL Learning
100% (1)
Language Assessment Impact on ESL Learning
102 pages
English Language Teaching Materials Guide
No ratings yet
English Language Teaching Materials Guide
8 pages
Exam Backwash and Fairness Insights
No ratings yet
Exam Backwash and Fairness Insights
4 pages
Teaching Collocations in English
No ratings yet
Teaching Collocations in English
19 pages
TEFL Course Outline by Prof. Belmekki
No ratings yet
TEFL Course Outline by Prof. Belmekki
7 pages
Teaching Cultural Concepts Through Language
No ratings yet
Teaching Cultural Concepts Through Language
6 pages
Factors Influencing L2 Pronunciation
No ratings yet
Factors Influencing L2 Pronunciation
16 pages
Teaching World Englishes Today
0% (1)
Teaching World Englishes Today
29 pages
Pedagogical Strategies for Learning Disabilities
No ratings yet
Pedagogical Strategies for Learning Disabilities
4 pages
Askaripour - A Textbook Evaluation of New Version (2nd Edition) of Top Notch English Series PDF
No ratings yet
Askaripour - A Textbook Evaluation of New Version (2nd Edition) of Top Notch English Series PDF
30 pages
Take Verb Usage in Learner English
No ratings yet
Take Verb Usage in Learner English
14 pages
L2 Collocation Learning Research Timeline
100% (1)
L2 Collocation Learning Research Timeline
27 pages
Tiv and English Morphological Contrast
100% (5)
Tiv and English Morphological Contrast
14 pages
Collocational Errors in Iranian EFL Learners
No ratings yet
Collocational Errors in Iranian EFL Learners
11 pages
IELTS Exam Format and Scoring Guide
No ratings yet
IELTS Exam Format and Scoring Guide
216 pages
Europass Curriculum Vitae: Personal Information Mihai CRAIU
No ratings yet
Europass Curriculum Vitae: Personal Information Mihai CRAIU
5 pages
CEFR Companion Volume in Practice
No ratings yet
CEFR Companion Volume in Practice
396 pages
CEFR: Language Competences Overview
No ratings yet
CEFR: Language Competences Overview
3 pages
2018 English Language Teaching Guide
No ratings yet
2018 English Language Teaching Guide
20 pages
Myanmar Primary English Curriculum Reform
No ratings yet
Myanmar Primary English Curriculum Reform
11 pages
Advanced Vocabulary Practice (C1)
100% (5)
Advanced Vocabulary Practice (C1)
132 pages
Business English Course Overview
No ratings yet
Business English Course Overview
12 pages
A Level Chinese Language Mark Scheme
No ratings yet
A Level Chinese Language Mark Scheme
14 pages
Life Vision B2 Curriculum Overview
No ratings yet
Life Vision B2 Curriculum Overview
317 pages
CompletePET TEST FullTest
100% (1)
CompletePET TEST FullTest
75 pages
EF SET Certificate: C1 Advanced Level
No ratings yet
EF SET Certificate: C1 Advanced Level
4 pages
Russian Language Course for Adults
0% (1)
Russian Language Course for Adults
1 page
Pearson English Level Test Guide
No ratings yet
Pearson English Level Test Guide
25 pages
English Teaching Methodology for Kids
No ratings yet
English Teaching Methodology for Kids
54 pages
Goethe A1-A2 Exam Preparation Guide
16% (19)
Goethe A1-A2 Exam Preparation Guide
3 pages
VSTEP: English Proficiency Reform in Vietnam
No ratings yet
VSTEP: English Proficiency Reform in Vietnam
31 pages
QR CV (Curriculum Vitae) (Code2) PDF
No ratings yet
QR CV (Curriculum Vitae) (Code2) PDF
2 pages
Introduction To Toeic Reading and Writing
No ratings yet
Introduction To Toeic Reading and Writing
43 pages
Understanding the CEFR Framework
No ratings yet
Understanding the CEFR Framework
51 pages
Get Ready for Flyers Syllabus Overview
50% (2)
Get Ready for Flyers Syllabus Overview
6 pages
Casino Dealer Resume: Andrei Chisalita
No ratings yet
Casino Dealer Resume: Andrei Chisalita
2 pages
Graduate Student Application Form
No ratings yet
Graduate Student Application Form
4 pages
IELTS Test Report for Nandhini Ramachandran
No ratings yet
IELTS Test Report for Nandhini Ramachandran
1 page
Common Language Activities in Spanish L2
No ratings yet
Common Language Activities in Spanish L2
27 pages
IELTS Reading Preparation Guide
No ratings yet
IELTS Reading Preparation Guide
136 pages
StartUp 7 Student Book Overview
0% (1)
StartUp 7 Student Book Overview
15 pages
Master's in Medical Systems Engineering
No ratings yet
Master's in Medical Systems Engineering
7 pages
Cambridge English Skills Real Reading Level1 Beginner Book With Answers Frontmatter
40% (5)
Cambridge English Skills Real Reading Level1 Beginner Book With Answers Frontmatter
9 pages
Understanding Bodily-Kinesthetic Learners
No ratings yet
Understanding Bodily-Kinesthetic Learners
52 pages

Adjective-Noun Collocations in L2 Speech

Uploaded by

Adjective-Noun Collocations in L2 Speech

Uploaded by

7

Let us focus on the last underlined expression: effective teachers. This

combination points to a higher level of phraseological sophistication

2 Adjective + Noun Collocations in Context

word combinations and L2 speaker’s L1 background. In a longitudinal

controversy: the concept proposes a dichotomy of native speaker and

3 Data and Methodology

Let us look at the corpora in more detail. The Trinity Lancaster

Table7.2 Corpus characteristics of the Spoken BNC2014

[tag="J.*" & word!="[A-Z][a-z]+"] [tag="J.*" & word!="[A-Z][a-z]+"]

also decided against using lemmas as the units of analysis, following

4.1 Frequency of adjective + noun combinations

Table 7.3 Adjective + noun combinations per proficiency level

consideration inter-speaker variation and the amount of evidence present

4.2 Types of adjective + noun combinations

more qualitative perspective, the actual statistical scores are analysed in

Table 7.5 Commonalities in adjective + noun collocation frequency ranks

Table 7.6 Top 20 adjective + noun collocations and their overlap

displays the actual collocations across proficiency levels and the L1

4.3 Frequency, MI score and log Dice collocations

Table 7.7 Multi-way ANOVA: Collocation frequencies

Table 7.8 Multi-way ANOVA: MI scores

Table 7.9 Multi-way ANOVA: log Dice scores

Focusing on proficiency, Table 7.10 shows mean values and standard

Table 7.10 Comparison of collocation statistics across proficiency levels

Table 7.11 Type/token ratio (TTR) of adjective + noun combinations

therefore, we looked at three areas of interest, namely (i) frequencies

In example (2), the noun question is not modified by an adjective,

Finally, the effect of repetition of the same adjective + noun

(5) L1 okay alright tell me about supersonic cars

In example (5), the topic of discussion is ‘supersonic cars’; these are

which largely focused on written language. The picture that emerges is

aspects (types of collocation and the appropriateness of their use). In

Council of Europe (2001) Common European Framework of Reference for Languages:

You might also like

[tag="J." & word!="[A-Z][a-z]+"] [tag="J." & word!="[A-Z][a-z]+"]