System 74 (2018) 21e37
Contents lists available at ScienceDirect
System
journal homepage: www.elsevier.com/locate/system
Effects of task complexity and working memory capacity on
L2 reading comprehension
Jookyoung Jung
Center for English Language Education, Korea University, 145 Anam-ro, Anam-dong, Seongbuk-gu, Seoul, 02841, Republic of Korea
a r t i c l e i n f o
a b s t r a c t
Article history:
Received 6 July 2017
Received in revised form 22 January 2018
Accepted 12 February 2018
The present study investigated whether cognitive task complexity affects second language reading comprehension and whether working memory capacity moderates the
influence of task complexity. Fifty-two Korean undergraduate students were randomly
assigned to either simple or complex condition and read two TOEFL passages while
answering multiple-choice reading comprehension questions. Unlike simple versions
that included coherent texts, complex versions contained texts whose paragraphs were
disarranged and additionally required participants to order them coherently. A forward
digit span test and a nonword repetition test were used to measure the participants’
phonological short-term memory, and a backward digit span test and an operation span
test were employed to assess their complex working memory. The results revealed that
task complexity did not affect reading comprehension scores, although participants
perceived the complex tasks significantly more demanding. Also, under the complex
condition, participants benefited from higher nonword span scores when answering
reading comprehension questions.
© 2018 Published by Elsevier Ltd.
1. Introduction
In the 1980s, task-based language teaching (henceforth, TBLT), in which a task is defined as a meaningful and goal-oriented
activity using the target language (TL), was proposed as a potential approach to second language (L2) instruction, and it has
attracted growing attention ever since (Long, 2016). In order to facilitate maximum L2 learning, individual pedagogic tasks
must be sequenced so as to match the learner's development level. While various criteria have been suggested for analyzing
and grading tasks thus far, task complexity, i.e., task-induced demands imposed on learners' limited cognitive resources, has
received increasing attention from researchers. In particular, Skehan's Limited Capacity Model (Skehan, 2009) and Robinson's
Cognition Hypothesis (Robinson, 2011) have exerted a substantial impact on the recent empirical studies on whether and how
manipulating cognitive task complexity affected learners' oral or written production and L2 development (e.g., Michel, 2013;
ve
sz, 2011). However, the influence of task complexity on receptive language skills, such as L2 reading, has long been
Re
n, & Vasylets, 2016).
neglected, presumably due to the absence of theoretical agenda that can be called upon (Gilabert, Mancho
That is, it is difficult to find a readily applicable conceptual framework that predicts and explains how task complexity would
affect learners' L2 reading performance.
An additional gap in the literature is the lack of research into the moderating effects of individual differences in TBLT. Breen
(1987) asserts that a priori task design (in his term, ‘task-as-workplan’) must be tempered by what learners bring to the tasks,
E-mail address: jookyoungjung@korea.ac.kr.
https://doi.org/10.1016/j.system.2018.02.005
0346-251X/© 2018 Published by Elsevier Ltd.
22
J. Jung / System 74 (2018) 21e37
i.e., individual differences, resulting in unique and idiosyncratic task engagement (‘task-as-process’). Robinson (1995, 2001)
also includes learner factors under the category of Task Difficulty, and highlights the need to investigate how learners' individual differences interact with task features. In a similar vein, Norris, Bygate and van den Branden (2009) suggest that “as
increasing empirical light is shed on the learner side of the equation, it is also likely that the interactions of particular learners
with particular tasks will become much more predictable” (p. 245). Researchers have been keenly aware of this issue, and a
variety of learner-variables have been addressed in the studies of task effects, such as working memory capacity (e.g., Baralt,
2010; Kim, Payant, & Pearson, 2015; Kormos & Trebits, 2011). Along this line of research, the present study explores whether
and how task complexity influences L2 reading comprehension, and what role working memory capacity plays as a
moderating variable.
2. Literature review
2.1. Task complexity and L2 reading
In the present study, Khalifa and Weir’s (2009) cognitive processing model for reading comprehension was considered as
an ideal theoretical basis. Unlike previous models of reading (e.g., Kintsch & van Dijk, 1978; Perfetti, 1999; Stanovich,
1980) with a scope restricted to cognitive processes of reading comprehension, such as word recognition, syntactic
parsing, textual understanding, interpretation, and integration, Khalifa and Weir's framework incorporates and stresses
the role of metacognitive processes in reading comprehension. The highlighted role of metacognition enables the model
to account for the task effects on reading, by viewing reading as a cognitive process that constantly adapts to the goal of
the reading task. The model presupposes three sources of knowledge: metacognitive activity, the central core, and the
knowledge base (see Fig. 1). Among these, metacognitive activity has particular relevance to this study, as it pertains to
regulating entire reading processes in such a way to successfully perform the given reading task. Metacognitive activity
involves setting goals, monitoring and remediating text understanding where necessary. The goal-setter determines the
type of reading comprehension that should be achieved and the speed and scope of reading required. More specifically,
according to the purpose of reading, the reader engages in either careful or expeditious reading, which may take place at
either local or global level. Local comprehension refers to extracting propositions at a lexical or clausal level, whereas
global comprehension entails understanding the structure of a text as a whole (Kintsch & van Dijk, 1978). Careful reading
may take place at local or global level and is usually based on slow, sequential and incremental reading for comprehension. By contrast, expeditious reading includes quick, selective and efficient reading, as in the case of skimming,
search-reading or scanning. Monitoring occurs in all stages of reading, from checking word recognition to evaluating the
text-level representation and extracting the writer's intentions and text structure. As a result of monitoring, remediation
of text understanding takes place, if necessary.
In the context of TBLT, it seems possible to assume that cognitive demands of L2 reading tasks may be manipulated
along the degree to which careful reading is required for successful task completion. For instance, if a reading task
necessitates thorough and scrupulous processing of textual information, the cognitive demands may be greater compared
to its counterpart that can be completed with shallow and superficial processing of the same text (Craik & Lockhart, 1972;
Craik & Tulving, 1975). While there has been little research into the influence of task demands on L2 reading performance
(Grabe, 2009), a few studies provide empirical support for the greater task demands induced by pronounced importance
of careful reading (e.g., Horiba, 2000; Jung, 2012; Taillefer, 1996). For example, Horiba (2000) investigated how L2
readers' control over their reading might differ from that of L1 readers across tasks that entail differential processing
demands using think-aloud protocols. In her studies, it was repeatedly found that tasks that require more careful reading,
such as reading for coherence, encourage L2 readers to engage in more intensive text-based decoding as well as topdown inferring processes compared to the simpler counterparts such as reading freely. The quality of the free written
recalls, however, was found comparable across task conditions, implying the marginal influence of task complexity on L2
reading comprehension outcomes. The distinct importance of careful reading in more complex tasks can also be found in
research into the role of L2 proficiency in L2 reading comprehension. In Taillefer’s (1996) study, for instance, the role of L2
proficiency was greater in a complex task, i.e., reading to prepare for a debate (receptive reading), in comparison to a
simple task, i.e., reading to locate a keyword (scanning). Taillefer assumed that receptive reading is more complex than
scanning, as scanning is by and large a simple cognitive matching task, searching for what is sought and what is already
given. In other words, when the text needs to be processed at a deeper level through careful reading for successful task
completion, the task seem to pose greater processing demands on L2 readers, calling for more attentive and accurate
textual analysis.
2.2. Validation of the construct of task complexity
As empirical research into the influence of task complexity accumulates, diverse attempts have been made in order to
v e
sz, 2009), designing a
improve methodological rigour in this line of research, such as including a control group (Re
continuous complexity scale (Kim, 2012), employing a distractor task (e.g., Nuevo, Adams, & Ross-Feldman, 2011) and
J. Jung / System 74 (2018) 21e37
23
Fig. 1. Cognitive processing model of reading comprehension (Khalifa & Weir, 2009, p.43).
comparing learners' data with native speakers' baseline data (e.g., Michel, 2013). Robinson’s (2001) questionnaire for
overall perceptions of task difficulty has enjoyed popularity among researchers, perhaps due to its convenience of
administration. Learners are presented with these binary items and asked to rank each of them on a seven-to-nine point
Likert scale. What has typically been done is to conduct a descriptive analysis, treating the collected scales as interval
data.
24
J. Jung / System 74 (2018) 21e37
In order for more rigorous validation of the construct of task complexity, recent studies tend to incorporate additional
methods for assessing task complexity, such as subjective time estimation task. In subjective time estimation task, learners
are usually asked to judge the time taken for task completion in the absence of an external timing device, and it has
consistently been shown that estimated time duration becomes less accurate as the cognitive load of the task increases (Block,
Hancock, & Zakay, 2010). Thomas and Weaver (1975) provided the theoretical basis for this method. That is, as nontemporal
task demands increase, less attention is left for processing temporal information, and as a result estimated time duration
becomes more inaccurate. More specifically, under the prospective paradigm where participants are aware of the upcoming
time estimation to be made at the outset of the task, it has consistently been found that estimated-to-real duration ratio
decreases with increasing cognitive load. By contrast, under the retrospective paradigm where participants are unaware of
the subjective time estimation task until it has to be made, the estimated-to-real duration ratio increases with cognitive load
(Fink & Neubauer, 2001).
For instance, Baralt (2010) employed not only a perception questionnaire but also a retrospective time judgment task
as an additional source for estimating cognitive complexity. In her study, learners were asked to estimate how long they
believed it took them to complete each task, postulating that the greater the demands imposed on the learner, the more
time he or she would judge had passed (Pass, Tuovinen, Tabbers, & Van Gerven, 2003). The findings of this study showed
that the retrospective time judgment measure matched Baralt's operationalisation of task complexity, while a perception
questionnaire failed to do so. More specifically, participants who performed the complex version estimated the time
taken for task completion to be significantly longer than the time actually taken. It should be noted, however, that Baralt
analysed subtracted values (the difference between estimated and real time duration), not estimated-to-real duration
ratios, which could have lowered the internal validity of the results. Similarly, Sasayama (2016) also utilized subjective
time estimation in addition to self-reported perceptions of task difficulty and dual-task methodology. She found that only
large differences in the number of task elements were detectable, while smaller differences did not make a significant
change to the level of cognitive complexity. Based on this finding, Sasayama underscores the importance of independent
measures of task complexity in order to attest to whether designed task features exercise putative effects on task
complexity.
2.3. Working memory capacity and L2 reading
Working memory capacity has been explored extensively as an important cognitive construct in the field of cognitive
psychology. The present study adopted Baddeley’s (2000, 2003; Baddeley & Hitch, 1974) framework, considering its
substantial impact on studies on the role of working memory capacity in L2 reading. According to the framework,
working memory consists of the executive control, a limited attentional control system, and two slave systems, the
phonological loop and the visuo-spatial sketchpad (see Fig. 2). Among the three components of working memory,
phonological short-term memory and complex working memory have attracted interests from researchers in the field of
L2 reading. The phonological loop subsumes two subcomponents, i.e., a temporary storage system that holds phonological information for a few seconds during which it decays, and subvocal rehearsal system in which the information
stored in the short-term memory is maintained and registered (Baddeley, 2003). As the phonological loop in particular
pertains to the retention of sequential information, its function is typically measured with tasks that require the immediate repetition of sequences of digits or words/nonwords in the order of presentation (Baddeley, 2000). The
executive-control is responsible for the attentional control, carrying out conscious processing, monitoring, intentional
learning, and problem solving (Baddeley, 2003). Thus, the executive-control is deemed as the principal factor affecting
individual differences in working memory span. Complex working memory span is often assessed with various types of
Fig. 2. Current multi-component model of working memory (Baddeley, 2003, p.203).
J. Jung / System 74 (2018) 21e37
25
span tasks, most notably reading span tasks (e.g., Daneman & Carpenter, 1980), which require learners to process information while retaining it in short-term memory.
Previous studies have supported the importance of working memory capacity in L2 reading comprehension (e.g., Alptekin
& Erçetin, 2009, 2011; Harrington & Sawyer, 1992; Leeser, 2007; Waters & Caplan, 1996). In Harrington and Sawyer’s (1992)
seminal research, the participants who scored higher on the L2 reading span test did better on the L2 reading comprehension
measures, whereas scores on the L2 digit and word span tests only weakly correlated with L2 reading comprehension. This
finding led Harrington and Sawyer to conclude that complex working memory, rather than phonological short-term memory,
plays a crucial role in L2 reading comprehension. In this study, however, the reading span test was constructed in L2, which
could have measured overlapping construct with the reading comprehension test. It was also revealed the participants'
performances on L1 digit and word span tests were, overall, superior to those on L2 measures, showing a strong effect for
language. That said, in order to unveil the influence of working memory capacity on L2 reading comprehension, it appears
desirable to use domain/language-independent measures.
Whether working memory is language-dependent was explored in Osaka and Osaka (1992) and Osaka, Osaka, and Groner
(1993). In Osaka and Osaka's research, L1 Japanese and L2 English participants performed Daneman and Carpenter’s (1980)
reading span test in L1 and L2 versions. The results showed that there were significant correlations between L1 and L2 reading
span scores, and between Daneman and Carpenter's original version and the Japanese version. In their follow-up research,
Osaka et al. (1993) again found a significant correlation between L1 German and L2 French versions of the reading span test.
Based on their findings, Osaka et al. suggested that complex working memory might be independent of any specific language
proficiency.
It seems worth mentioning, however, that in contrast to the considerable amount of attention paid to the reading span
task, either in L1 or L2, other types of complex working memory indices, such as a counting span task (e.g., Case, Kurland, &
Goldberg, 1982) or an operation span task (e.g., Turner & Engle, 1989), have not been preferred when exploring the
contribution made by working memory to L2 reading comprehension, although they may provide more languageindependent index of complex working memory. What seems also noteworthy is that phonological short-term memory
has been more or less unattended in the field of L2 reading research, regardless of its crucial role in sentence processing
(e.g., Gathercole & Baddeley, 1990). That said, it might be recommended to include both phonological short-term memory
as well as complex working memory when investigating the contribution made by working memory capacity in L2 reading
comprehension.
2.4. Research questions
The review of previous literature revealed that the effects of task complexity on L2 reading performance has remained
largely unexplored, even though tasks are usually conceptualized as holistic activities subsuming diverse language skills,
including L2 reading. In addition, complex working memory has received near exclusive attention in L2 reading research,
predominantly measured with a reading span task, whereas the role of phonological short-term memory has often not
included when investigating the relationship between working memory and L2 reading comprehension. To fill these gaps in
the literature, the following research questions were addressed in the present study:
1) To what extent does task complexity affect L2 reading comprehension?
2) To what extent does working memory capacity moderate the effects of task complexity on L2 reading comprehension?
3. Methodology
3.1. Design
The data for the present study, except for the working memory data, were collected as part of a larger study (Jung, 2016)
that explored the effects of manipulating cognitive task demands on Korean undergraduate students' L2 English reading
Fig. 3. Experimental design and procedure.
26
J. Jung / System 74 (2018) 21e37
comprehension and learning of target constructions contained in the texts. Hence, the influence of task complexity on L2
reading comprehension is reported in this study as well. As illustrated in Fig. 3, fifty-two participants were randomly assigned
to either the simple or complex condition and took part in two reading sessions. In each reading session, they read a TOEFL
passage while simultaneously answering reading comprehension questions.
3.2. Participants
The participants included 14 male and 38 female undergraduate students enrolled in a university in Korea. Their L1 was
Korean and the average age of the participants was 22.84 (SD ¼ 1.94). Their average onset age of English learning was 8.73
years (SD ¼ 2.18), and 11 students reported that they had stayed in English speaking countries, such as the US, Australia,
Canada, the Philippines, and Malaysia (Mean ¼ 6.73 months, SD ¼ 4.88 months). To ensure the homogeneity of participants’
English ability, their English proficiency level was measured with a Reading and Use of English section of a practice Cambridge
Proficiency: English (CPE) test developed and provided by University of Cambridge ESOL examinations. Based on the scores,
stratified random sampling was applied in order to reduce sampling error and ensure equivalence among the groups in terms
of English proficiency. More specifically, strata were created based on the CPE scores, and the participants were randomly
taken from each stratum to form the two experimental groups.
3.3. Texts
For this study, two expository texts were selected from passages used for previous TOEFL iBT tests developed by the ETS. In
order to prevent confounding influence of the participants' topic familiarity on their reading comprehension (e.g., Leeser,
2007), texts that appeared potentially unfamiliar to the participants were carefully selected. Participants’ familiarity with
the topics of the texts was assessed with two 5-point Likert-scale post-reading questionnaire items (i.e., Item 1: I thought this
reading topic was familiar. Item 2: I had some background knowledge about the reading topic.). Text 1 explained formation,
extraction, and refinement of petroleum resources and challenges and dangers posed in the use of petroleum resources; Text
2 reviewed fossil evidence of evolutionary explosion that happened during the Cambrian period. Text 1 contained 682 words
and six paragraphs, whereas Text 2 consisted of 699 words and seven paragraphs. The average readability of the two texts,
calculated with various indices such as Flesch-Kincaid grade level, Gunning-Fog score, Coleman-Liau index, SMOG index, and
Automated Readability index, was 11.6 and 13.4, respectively. Each readability index corresponded to the number of years of
education (based on the US education system) required to understand the text. According to their average readability, the
texts required an upper-intermediate level of English proficiency and thus were considered appropriate for the participants of
this study. In order to prevent ordering effects, the presentation order of the texts was counter-balanced within each task
condition.
3.4. Reading task and task complexity manipulation
The reading task in this study was similar to what learners normally do when taking a paper-based reading comprehension test, that is, reading a printed text while answering related multiple-choice comprehension questions using a pencil.
The multiple choice reading comprehension items were also taken from TOEFL tests developed and validated by ETS (Freedle
& Kostin, 1999). The items asked participants to identify factual/negative-factual information, make inferences, understand
rhetorical purpose, recognize vocabulary meaning, determine reference, simplify/paraphrase a sentence, insert a sentence
into a paragraph, and select main ideas of the text (Educational Testing Services, 2012). As in the TOEFL format, the texts were
divided into five segments, comprised of either one or two paragraphs, and followed by reading comprehension questions
related to each segment. There were 14 multiple-choice comprehension items for each text. In order to avoid confounding
effects induced by the reading comprehension items on the level of task complexity, the same reading comprehension items
were used for both simple and complex conditions. Following the original TOEFL scoring system, 1 point was given to each
item, while the last item, which measured global text understanding, received 2 points. The maximum score for reading
comprehension for each text was 15.
Under the complex condition, the five segments were jumbled and presented to participants in a mixed order. That is, after
reading the text and answering comprehension questions, participants in the complex group also had to arrange the segments into a correct order to make a coherent text (see Appendix). The complex version was judged as cognitively more
demanding in that readers' comprehension was substantially influenced by the degree of clarity and coherence of text
structure (Meyer & Ray, 2011). Also, according to Khalifa and Weir’s (2009) cognitive processing model, the additional textordering task was expected to encourage more careful reading in order to arrange the segments logically and coherently,
posing a greater amount of demands on the participants in comparison to the simple version that did not involve such
additional task requirements.
To test the validity of task complexity manipulation, the participants’ retrospective and subjective time estimations were
collected to be used as an additional source for estimating the cognitive/mental demand imposed on learners (Baralt, 2010;
ve
sz, 2014). As the time estimation task was conducted under a retrospective paradigm (Fink & Neubauer, 2001),
Re
J. Jung / System 74 (2018) 21e37
27
participants were unaware of the upcoming duration judgement task until it had to be done. As such, there was no time limit
for task completion, and subjective time estimations were only collected after the first treatment session. Unlike the prospective paradigm (informed time estimation), wherein estimated-to-real duration ratio decreases with increasing cognitive
load, it was expected that the duration ratio would increase after performing cognitively more demanding tasks. In addition,
in order to triangulate subjective time estimations, two additional 5-point Likert-scale post-reading questionnaire items
asked the participants to provide their perceived level of task difficulty (e.g., Item 3: I thought this task was difficulty. Item 4: I
thought this task was demanding.).
3.5. Working memory measurements
In this study, a forward digit span test and a nonword repetition test were used to measure participants' phonological
short-term memory. In addition, a backward digit span test and an automated operation span test were used to measure
participants’ complex working memory.
3.6. Forward digit span test (DS)
In the digit span test, participants were provided with sequences of unrelated digits that were presented on an automated
ve
sz, 2015). Each digit stayed in the slide for 1 s, and set sizes ranged from
PowerPoint slide show (adopted from Brunfaut & Re
3 to 11 digits (2 sets for each length, 18 sets in total) presented in an increasing order. Digits were repeated across sets but not
within sets, and all of them were used approximately equally in the text. Participants were instructed to recall the digits from
each set in the response sheet. Ten seconds were allowed for recalling each set. The maximum set size correctly recalled once
was the digit span score. The test took 7e8 min, and the Cronbach's alpha was .76.
3.7. Nonword repetition test (NWR)
For this study, a nonword repetition test was developed in Korean. More specifically, nonsense words that conformed to
the phonotactic rules of Korean were created and then presented to participants in an automated PowerPoint slide show. The
test stimuli consisted of 32 nonwords, each containing 4 to 11 syllables (4 sets for each syllable length). Each nonword was
aurally delivered to participants in a random order, and 10 s were allowed for oral recall. Each of the nonword recalls was
scored either correct or incorrect, and the maximum number of syllables that participants correctly recalled at least twice for
each syllable length was the score for this test. The test was piloted on seven Korean graduate students to determine
appropriate syllable lengths and the reliability of the test. They were also asked to rate the wordlikeness of each nonword on a
5-point Likert scale from 1 (very likely to pass for a real Korean word) to 5 (very unlikely to pass for a real Korean word). This
process was to ensure that the nonword stimuli of the test were low in wordlikeness so that participants would be less likely
to retrieve similar phonological structures from their long-term memory and have to depend on short-term phonological
representation to mediate nonword repetition (Gathercole, 1995). The mean value of wordlikeness was 2.23 (SD ¼ .74). Seven
nonwords that were rated relatively highly for wordlikeness (1 SD above from the mean) were replaced with other lesswordlike nonwords. The test was administered individually, and took about 9e10 min to complete. Cronbach's alpha for
this test was .73.
3.8. Backward digit span test (BDS)
The design, structure, and procedure of the backward digit span test were the same as for the forward digit span test,
ve
sz,
except for the fact that participants were instructed to recall the digits in reverse order (adopted from Brunfaut & Re
2015). The maximum set size correctly recalled once was the backward digit span score. The test took 7e8 min, and the
Cronbach's alpha was .81.
3.9. Automated operation span test (OSPAN)
Operation span test, created by Turner and Engle (1989), requires participants to solve a series of math problems while
remembering a set of unrelated letters or words (see Fig. 4). The source file of the automated operation span test was obtained
from Attention and Working Memory Lab at Georgia Tech (Redick et al., 2012; Unsworth, Heitz, Schrock, & Engle, 2005). The
test began with two practice sessions to familiarize participants with the math operation and word/letter recalling tasks and
to calculate the individual differences in the time required to solve math problems. The time taken to solve math problems
(plus 2.5 SD, determined after extensive piloting; Unsworth et al., 2005) was used as the time limit for each math problem
session for that individual. In order to guarantee that participants engaged in a trade-off between storage (remembering
word/letter strings) and processing (solving math problems), an 85% of accuracy criterion on the math operation was
required. The real test session consisted of three sets, set sizes ranging from 3 to 7 (75 letters and 75 math problems in total).
The order of set size was random. The total number of correct letter recalls was used as the OSPAN index. The test took
approximately 20 min to complete. The Cronbach's alpha was .78 (Unsworth et al., 2005).
28
J. Jung / System 74 (2018) 21e37
Fig. 4. Example of slides used in the OSPAN test (Source: Adopted from Unsworth et al., 2005, p. 500, p. 500).
3.10. Questionnaires
Participants were asked to answer a background questionnaire and two post-reading questionnaires. The aim of the
background questionnaire was to collect information about participants’ demographics and English learning experiences. The
post-reading questionnaires asked participants to provide their retrospective subjective time estimation taken to complete
the given reading task, perceived level of task difficulty, and familiarity with the topic of the reading text. All questionnaires
were administered in Korean.
3.11. Procedure
The data were collected over four weeks. All participants took the pre-reading questionnaire, and the L2 proficiency test
(CPE) in the first session. One week later, they took part in two reading sessions on separate days. In each of the sessions, they
read the given text and answered the post-reading questionnaire. In the last session, the participants carried out working
J. Jung / System 74 (2018) 21e37
29
memory measurements. Each session took approximately 45 min to an hour. The experimental sessions were conducted in a
computer-laboratory at a university in Korea.
3.12. Analysis
SPSS 22.0 for Mac was used for examining reliability of the tests as well as computing descriptive and correlational statistics of the data. More specifically, the reliability of the different tests was determined using Cronbach's alpha, and the level
of significance for this study was set at alpha level of p < .05.
In order to answer the research question, the statistical program R version 3.3.0 was used (R Development Core Team,
2016) (Baayen, 2008). One particular strength of mixed-effects modeling is that it can account for the potential idiosyncrasies of individual participants and items (Rogers, 2016), and thus allow researchers to make a “simultaneous generalization
& Spalding, 2009, p. 25). Considering that this study included
of the results on new items and new participants” (Gagne
participant- and item-related factors, mixed-effects modeling was considered a robust and appropriate method for data
analysis.
Data analyses were conducted by constructing various linear mixed-effects models using the lmer function in the
package lme4 (Bates, Maechler & Bolker, 2015). When analyzing the effects of task complexity on reading comprehension
scores, task complexity, i.e., the independent variable, was entered as the fixed effect. In addition, in order to account for
uncontrollable peculiarities nested in each participant and/or item, subjects and items were included as the random
effects. The modeling started with a null model that contained only the random intercept for subjects and items. To the
null models, task complexity was added and tested to see whether the inclusion significantly improved the model fit,
using likelihood ratio tests with the c2 statistic. When exploring the moderating role of working memory capacity,
working memory indices were included as an additional fixed effect one by one to the models that contained task
complexity as an existing fixed effect. Then, likelihood ratio tests were performed to compare the reduced model with the
increased model that additionally contained a working memory measure. If significant difference was found from likelihood ratio tests, post-hoc maximal random effects structures were produced following Barr, Levy, Scheepers, and Tily
(2013), in order to examine the clear contribution made by each of the fixed effects. Whenever the maximal models failed
to converge, random parameters were dropped out from the one that accounted the least variance to the next until
convergence was reached (Cunnings & Sturt, 2014).
As linear model summaries provide t statistics without p-values, absolute t-values above 2.0 was set for testing significance of the models (Gelman & Hill, 2007). Effect sizes for the models were computed using r.squaredGLMM function in the
package MuMln (Barton, 2015). Following Plonsky and Oswald (2014), a R2 value of .06, .16, and .36 were interpreted as small,
medium, and large effect size respectively.
4. Results
4.1. Preliminary analysis
Prior to answering the research questions, some preliminary steps were taken to ensure the internal validity of the results.
The following methodological concerns were taken into consideration: equivalence between the groups, potential influence
of topic familiarity on reading comprehension scores, and validation of task complexity.
4.2. Equivalence between groups
English proficiency scores were analyzed to see if the groups were equivalent between the simple and the complex
conditions. The mean score was 14.62 for the participants who were assigned in the simple condition (SD ¼ 4.97, 95% CI
[13.85, 15.39]), and 13.96 for those in the complex condition (SD ¼ 4.98, 95% CI [13.19,14.73]). To check the equivalence of
English proficiency level among the groups, a mixed-effects model was constructed with the CPE scores as the dependent
variable, Complexity as a fixed effect, and Subject and Item as random effects. When compared with a null model that
contained only the random effects, the results showed that the inclusion of Complexity as a fixed effect did not make a
significant difference, c2(1) ¼ .29, p ¼ .59. In other words, the two groups did not significantly differ from each other in terms
of English proficiency.
4.3. Effects of topic familiarity
The maximum value for each item that measured topic familiarity was 7. The responses to the two items significantly
correlated with each other, Text 1: r(52) ¼ .68, p < .01, Text 2: r(52) ¼ .56, p < .01, suggesting that the items assessed overlapping constructs. The mean of summed value of the two items was 7.08 for Text 1 (SD ¼ .44, 95% CI [6.54, 7.62]) and 5.55 for
Text 2 (SD ¼ .38, 95% CI [5.01, 6.09]). In order to examine whether participants' topic familiarity had an impact on their reading
comprehension scores, likelihood ratio tests were conducted. The null model contained Subject and Item as random effects
30
J. Jung / System 74 (2018) 21e37
and the increased model additionally included topic familiarity as a fixed effect. The dependent variable was reading
comprehension scores for Text 1 and Text 2. The results showed that adding topic familiarity did not make significant
improvement to the null models, Text 1: c2(1) ¼ .01, p ¼ .91, Text 2: c2(1) ¼ 2.25, p ¼ .13. In short, the participants’ topic familiarity with the texts did not affect their reading comprehension scores.
4.4. Validation of task complexity
To validate the operationalization of task complexity, all participants were asked to judge the perceived time duration
taken to complete each task immediately after the task completion. As mentioned earlier, only the time estimations made
after the first task were analyzed. In order to examine whether the subjective time estimations differed as a function of task
complexity, the estimated-to-target duration ratios were calculated by dividing estimated time by real time taken to complete
the given task. A duration judgment ratio higher than 1 indicated that participants overestimated time taken to task
completion compared to real time. In the retrospective time estimation paradigm, duration judgment ratio is expected to
increase with greater cognitive load.
As shown in Table 1, for both Text 1 and Text 2, duration judgment ratios of the complex versions were on average larger
than those in the simple versions. The results from independent samples t-tests on the duration judgment ratios across
simple and complex conditions also revealed significant effects of task complexity for both Text 1 and Text 2, Text 1:
t(50) ¼ 2.86, p ¼ .01, 95% CI [.04, .22]; Text 2: t(50) ¼ 3.85, p < .01, 95% CI [.11, .36]. Cohen's ds were .79 and 1.09 respectively,
which were considered as medium and large effect sizes (Plonsky & Oswald, 2014). It seems noteworthy that the actual time
on task was comparable between the two conditions, c2(1) ¼ 1.91, p ¼ .17, R2 ¼ 02. In other words, regardless of the actual time
on task, the participants' subjective estimation of time taken for task completion was significantly greater for the complex
condition, suggesting that the complex tasks induced heavier cognitive loads on the participants compared to the simple
tasks.
To triangulate the subject time estimation data, two post-reading questionnaire items additionally asked the participants
to provide their perceived level of task difficulty. Cronbach's alpha for the items was .63 for Text 1 and .75 for Text 2.
Descriptive statistics for the responses to the two items are presented in Table 2.
In order to examine if there were significant differences between simple and complex conditions in participants’ ratings of
perceived task difficulty, likelihood ratio tests were conducted on the responses to the reported mental efforts. Null models
included only random effects (i.e., Subject and Item), whereas increased models contained Complexity as a fixed effect.
Significance was found for both texts, Text 1: c2(1) ¼ 4.05, p ¼ .04, R2 ¼ .04; Text 2: c2(1) ¼ 8.27, p < .01, R2 ¼ .08. In other
words, in the same vein with the results from subjective time estimations, the participants perceived the complex tasks
significantly more difficult than the simple tasks.
4.5. Effects of task complexity on L2 reading
First, the descriptive statistics for the reading comprehension scores of each group are displayed in Table 3. Reading
comprehension scores on Text 2 were on average higher than those on Text 1. The variances in the scores of reading
comprehension tests were relatively small, while the mean reading comprehension scores seemed to imply a ceiling effect. In
order to examine whether task complexity had a significant impact on L2 reading comprehension scores, linear mixed-effects
models were constructed with R. Null models contained Subject and Item as random effects, and Complexity was entered as a
fixed effect and compared against the null models with likelihood ratio tests using c2 statistics. The results revealed that,
when the random effects were controlled, task complexity did not have significant effects on reading comprehension scores,
Text 1: c2(1) ¼ .02, p ¼ .90, R2 < 01, Text 2: c2(1) ¼ .40, p ¼ .53, R2 < 01.
4.6. WMC as a moderator of L2 reading comprehension
The participants’ performances on the working memory tests were analyzed in order to examine whether working
memory capacity moderated the effects of task complexity on L2 reading comprehension scores. The descriptive statistics for
each of the working memory tests are provided in Table 4 and the correlations among working memory indices are presented
in Table 5.
In order to test whether working memory moderated the effects of Complexity on reading comprehension scores, likelihood ratio tests were conducted using c2 statistics. Null models included Complexity as an existing fixed effect and Subject
Table 1
Descriptive statistics for duration judgment ratio.
Condition
Simple
Complex
N
13
13
Text 1
Text 2
Mean
SD
95% CI
Mean
SD
95% CI
1.03
1.16
.16
.17
[0.97, 1.09]
[1.09, 1.23]
.95
1.19
.11
.29
[0.91, 0.99]
[1.08, 1.30]
J. Jung / System 74 (2018) 21e37
31
Table 2
Descriptive statistics for perceived task difficulty.
Item
Condition
N
Reported mental effort
Text 1
#3
#4
Total
Simple
Complex
Simple
Complex
Simple
Complex
26
26
26
26
26
26
Text 2
Mean
SD
95% CI
Mean
SD
95% CI
4.23
4.65
4.00
4.46
8.23
9.12
1.03
0.80
1.17
1.42
1.86
2.07
[3.83, 4.63]
[4.62, 4.68]
[3.55, 4.45]
[3.91, 5.01]
[7.52, 8.95]
[8/32, 9.92]
3.77
4.46
3.62
4.19
7.38
8.65
1.03
1.07
1.10
1.27
1.92
2.08
[3.37,
[4.05,
[3.20,
[3.70,
[6.64,
[7.85,
4.17]
4.87]
4.04]
4.68]
8.12]
9.45]
Note. Maximum value for each item ¼ 7.
and Item as random effects, and each of the four working memory measures was entered into the null models one by one in
order to see if the inclusion improved model fit significantly. The results revealed that nonword repetition scores (NWS) and
backward digit span scores (BDS) improved model fit for Text 1 (NWS: c2(4) ¼ .6.30, p ¼ .04, R2 ¼ 23, BDS: c2(4) ¼ 6.33,
p ¼ .04, R2 ¼ 23), and forward digit span scores (DS) and nonword repetition scores (NWS) enhanced model fit for Text 2 (DS:
c2(4) ¼ 9.56, p < 01, R2 ¼ 15, NWS: c2(4) ¼ 9.84, p < .01, R2 ¼ 15). For these measures, maximal linear-mixed-effects models
were constructed. As can be seen in Table 6, significance was found for interaction between Complexity and nonword span
scores for Text 1 and Text 2, and for interaction between Complexity and forward digit span scores for text 2.
Next, post-hoc mixed-effects modeling was conducted in order to compare the differential contribution made by each of
the working memory measures to reading comprehension scores between the simple and complex conditions. As Table 7
demonstrates, for both Text 1 and Text 2, nonword repetition scores were found to make significant contribution to L2
reading comprehension scores in the complex conditions. In other words, when assigned in the complex condition, participants with higher nonword repetition scores were better in answering reading comprehension items than those with lower
nonword repetition scores.
5. Discussion and conclusion
In this study, it was investigated whether task complexity affected Korean undergraduate students’ English reading
comprehension, and whether working memory capacity played as a moderator of the effects of task complexity. Task
complexity of the reading tasks was manipulated by disarranging five segments of each reading passage, based on the understanding that incoherent text structure might interrupt reading comprehension and thereby promote careful and
attentive reading. The results showed that, even though the actual time on task did not differ between the two task conditions, the participants significantly overestimated the time duration taken to task completion when assigned in the complex
condition, supporting that task manipulation of this study was successful. This finding implies the feasibility of adjusting
cognitive demands of L2 reading tasks through modifying various task features. Furthermore, it seems possible to assume that
reading tasks may become more demanding they necessitate thorough and scrupulous processing of the text through careful
reading (Khalifa & Weir, 2009). That said, more empirical research that includes various ways of task manipulation and their
impact on L2 reading performance appears imperative in order to establish and refine the theoretical framework for analyzing
and assessing cognitive complexity of L2 reading tasks.
The results of mixed-effects modeling, however, demonstrated that reading comprehension scores were not affected by
task complexity of the reading tasks. It should be noted that, as shown in the relatively high mean scores and small SDs,
participants overall performed well in the reading comprehension tests, and thus a ceiling effect might have masked
between-group differences. In addition, as described in the method section, the texts were divided into five segments and
related multiple-choice reading comprehension items followed each segment, as in the original TOEFL reading tests. As such,
while the disarrangement of the segments might have impeded global understanding of the text as a whole, it could have
failed to impede answering the reading comprehension items. It should be also noted that there was no time limit and
participants were allowed to stay on the task as long as they felt necessary, which might have further contributed to the nonsignificant findings.
In this study, when assigned in the complex condition, participants with higher nonword span scores performed better in
answering reading comprehension questions than those with lower nonword span scores. This finding appears to indicate
that a larger phonological short-term memory span allowed the participants to retain more textual information, facilitating
more efficient handling of mixed paragraph order. The findings also indicate that, when the processing demands increase,
phonological short-term memory emerges as an additional significant factor in addition to complex working memory, which
have been supported as a stable explanatory factor in L2 reading comprehension. This finding is particularly noteworthy given
that most previous studies have focused on the role of complex working memory in L2 reading comprehension, predominantly using a reading span task, while that of phonological short-term memory has largely been neglected. Except for
32
J. Jung / System 74 (2018) 21e37
Table 3
Descriptive statistics for reading comprehension scores.
Group
Simple
Complex
Total
N
26
26
52
Text 1
Text 2
Mean
SD
95% CI
Mean
SD
95% CI
11.15
10.85
11.04
1.35
2.04
2.22
[10.38, 11.92]
[10.08, 11.62]
[10.50, 11.58]
13.23
13.08
12.85
1.42
.76
1.85
[12.46, 14.00]
[12.85, 13.85]
[12.31, 13.39]
Note. Maximum score ¼ 15.
Table 4
Descriptive statistics for working memory tests.
Tests
N
Mean
SD
95% CI
Digit span test
Backward digit span test
Nonword repetition test
Operation span test
52
52
52
52
9.19
8.02
9.04
65.15
1.09
1.32
.86
6.35
[8.65, 9.73]
[7.48, 8.56]
[8.50, 9.58]
[64.61, 65.69]
Note. Maximum score: digit span test ¼ 11, backward digit span test ¼ 11.
Nonword repetition test ¼ 11, Operation span test ¼ 75.
Table 5
Correlations among working memory capacity indices.
DS
NWS
BDS
OSPAN
Coefficient
Significance
Coefficient
Significance
Coefficient
Significance
Coefficient
Significance
DS
NWS
BDS
OSPAN
1
.50**
.00
1
.38*
.00
.28*
.05
1
.31*
.03
.29*
.04
-.11
.48
1
Note. DS ¼ digit span scores, NWS ¼ nonword span scores.
BDS ¼ backward digit span scores, OSPAN ¼ operation span scores.
Significance level: þp < .1, *p < .05, **p < .01.
Table 6
Results for linear mixed-effects models for interaction among WMC and task complexity on reading comprehension scores.
Fixed effects
Random effects
by-Subject
Estimate
Text 1
Intercept
.38
Complexity*NWS
.11
Formula: RC ~ Complexity * NWS þ (COMj Subject) þ (1j Item), R2 ¼ 23.
Intercept
.42
Complexity*BDS
-.01
Formula: RC ~ Complexity * BDS þ (COMj Subject) þ (COMj Item), R2 ¼ 23.
Text 2
Intercept
.74
Complexity*DS
.07
Formula: RC ~ Complexity * DS þ (1j Subject) þ (1j Item), R2 ¼ 15.
Intercept
.48
Complexity*NWS
.08
Formula: RC ~ Complexity * NWS þ (1| Subject) þ (1| Item), R2 ¼ 15.
by-Item
SE
t
SD
SD
.24
.05
1.56
2.21
.00
e
.02
e
.18
.04
2.40
-.33
.00
e
.04
e
.13
.03
5.747
2.67
.00
e
.01
e
.16
.04
2.96
2.17
.00
e
.01
e
Note. Significance: jt j > 2.0.
Harrington and Sawyer’s (1992) seminal research that included L2 digit and word span tests, most L2 reading studies have not
paid attention to the contribution made by phonological memory to L2 reading comprehension. The pronounced role of the
phonological short-term memory in the present study, however, warrants caution, as the nonword repetition test was the
only working memory measure that included linguistic stimuli. This may provide a partial explanation to why the other
phonological short-term memory index, forward digit span scores, did not moderate the effects of task complexity on L2
reading comprehension scores, even though it significantly correlated with nonword span scores (rs(50) ¼ .50, p < .01).
J. Jung / System 74 (2018) 21e37
33
Table 7
Results of post-hoc mixed-effects models for interaction among WMC and task complexity on reading comprehension scores.
Fixed effects
Estimate
Text 1
Intercept
1.38
Simple
NWS
-.07
Formula: RC ~ NWS þ (NWSj Subject) þ (NWSj Item), R2 ¼ 29.
Complex
Intercept
-.14
NWS
.10
Formula: RC ~ NWS þ (NWSj Subject) þ (NWSj Item), R2 ¼ 21.
Text 2
Intercept
1.05
Simple
DS
-.01
2
Formula: RC ~ DS þ (DSj Subject) þ (DSj Item), R ¼ 15.
Complex
Intercept
.43
DS
.05
Formula: RC ~ DS þ (DSj Subject) þ (DSj Item), R2 ¼ 15.
Simple
Intercept
.84
NWS
.01
Formula: RC ~ NWS þ (NWSj Subject) þ (NWSj Item), R2 ¼ 15.
Complex
Intercept
.17
NWS
.08
Formula: RC ~ NWS þ (NWS| Subject) þ (NWS| Item), R2 ¼ 15.
Random effects
by-Subject
by-Item
SE
t
SD
SD
.32
.04
4.32
1.88
.44
.00
.11
.00
.39
.04
-.35
2.51
.07
.00
.48
.00
.20
.02
5.26
-.60
.14
.00
.01
.00
.27
.03
1.57
1.93
.05
.00
.52
.00
.22
.02
3.88
.47
.05
.00
.01
.00
.30
.03
.57
2.65
.16
.00
.33
.00
Note. Significance: jt j > 2.0.
On a pedagogical level, the findings of this study seem to imply that cognitive complexity of a reading task, not just
linguistic complexity of the text, should be considered as an important factor that influences L2 reading. That is, learners may
be under greater processing demands when the task requires more attentive and scrupulous reading of the text, even though
this may not necessarily surface in reading comprehension scores. That said, it appears essential to match cognitive
complexity of reading tasks with learners’ L2 proficiency so that learners can better cope with the task requirements. For
example, with the same text material, cognitively simpler tasks (e.g., reading to locate specific information) may need to be
developed for beginning-level learners who suffer from insufficient L2 resources, whereas more demanding tasks (e.g.,
reading to critique) can be employed for advanced-level learners who are equipped with affluent L2 means.
There are some limitations in this study. Firstly, task manipulation conducted by jumbling paragraphs was shown to affect the
participants’ perceived level of task difficulty as measured with subjective time estimation, but not their reading comprehension
scores. It was also pointed out that reading comprehension questions followed each paragraph, which might have failed to capture
any effects induced by task manipulation on L2 reading comprehension scores. Hence, in order to better detect the effects of task
complexity in the future research, task complexity manipulation may need to be conducted on a more localized level within each
paragraph. Along the same line of logic, a time limit for task completion may place additional cognitive demands on learners, and
in so doing, enhance the likelihood to observe the effects of task complexity as well as that of working memory capacity on reading
comprehension scores. Next, it was speculated that a ceiling effect could have masked the effects of task complexity on reading
comprehension scores. Indeed, mean scores were low while variances were small, suggesting an inherent limitation in detecting
significant effects of task complexity on reading comprehension scores. Therefore, in future studies, more difficult reading
comprehension items, ideally in an increased number, would be desirable to expand variances among the participants' scores.
With respect to the moderating role of working memory capacity, as previously pinpointed, nonword repetition task was the only
measure that entailed linguistic elements, suggesting a possible domain-specific influence.
Despite these limitations, the present study sheds light on the possibility to manipulate cognitive complexity of L2 reading
tasks, which has long been neglected in the fields of both task-based language teaching and L2 reading research. Given the scant
research on the relationship between cognitive task demands and learners performance in L2 reading tasks, more studies are
necessary to accumulate empirical findings as well as establish theoretical framework that can explain and predict the effects of
task complexity on L2 reading comprehension. In addition, the pronounced importance of nonword span scores in answering
reading comprehension items under increased task demands indicate the need to take phonological short-term memory into
account when exploring the relationship between working memory capacity and L2 reading comprehension. Last but not least,
future research incorporating process measures, obtained through verbal reports or eye-tracking technology, may provide a
fuller picture of the relationship between task complexity and working memory capacity and its theoretical, methodological,
and pedagogical implications in the context of L2 learning through L2 reading.
Appendix. Sample task layout of simple and complex task conditions
34
J. Jung / System 74 (2018) 21e37
J. Jung / System 74 (2018) 21e37
35
References
Alptekin, C., & Erçetin, G. (2009). Assessing the relationship of working memory to L2 reading: Does the nature of comprehension process and reading span
task make a difference? System, 37, 627e639.
Alptekin, C., & Erçetin, G. (2011). Effects of working memory capacity and content familiarity on literal and inferential comprehension in L2 reading. TESOL
Quarterly, 45, 235e266.
Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics. Cambridge: Cambridge University Press.
Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417e423.
Baddeley, A. D. (2003). Working memory and language: an overview. Journal of Communication Disorders, 36, 189e208.
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The psychology of learning and motivation (vol. 8, pp. 47e89). San Diego, CA:
Academic Press.
36
J. Jung / System 74 (2018) 21e37
Baralt, M. (2010). Task complexity, the Cognition Hypothesis, and interaction in CMC and FTF environments. Unpublished Ph.D dissertation. Washington D.C:
Department of Spanish and Applied Linguistics, Georgetown University.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmnatory hypothesis testing: Keep it maximal. Journal of Memory and
Language, 68, 255e278.
Barton, K. (2015). MuMIn: Multi-Model inference. R package version 1.13.4. http://cran.r-project.org/package¼MuMIn.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1e48.
Block, R. A., Hancock, P. A., & Zakay, D. (2010). How cognitive load affects duration judgments: A meta-analytic review. Acta Psychologica, 134, 330e343.
Breen, M. P. (1987). Learner contributions to task design. In C. N. Candlin, & D. Murphy (Eds.), Language learning tasks. Lancaster practical papers in English
language education (Vol. 7, pp. 23e46). Englewood Cliffs, NJ: Prentice-Hall International.
ve
sz, A. (2015). The role of task and listener characteristics in second language listening. TESOL Quarterly, 49, 141e168.
Brunfaut, T., & Re
Case, R., Kurland, M. D., & Goldberg, J. (1982). Operational efficiency and the growth of short-term memory span. Journal of Experimental Child Psychology, 33,
386e404.
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6), 671e684.
Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104,
268e294.
Cunnings, I., & Sturt, P. (2014). Coargumenthood and the processing of reflexives. Journal of Memory and Language, 75, 117e139.
Daneman, M., & Carpenter, P. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450e466.
Educational Testing Services. (2012). Official TOEFL iBT tests. New York: McGraw-Hill.
Fink, A., & Neubauer, A. C. (2001). Speed of information processing, psychometric intelligence: And time estimation as an index of cognitive load. Personality
and Individual Differences, 30, 1009e1021.
Freedle, R., & Kostin, I. (1999). Does the text matter in a multiple-choice test of comprehension? The case for the construct validity of TOEFL's minitalks.
Language Testing, 16, 2e32.
, C. L., & Spalding, T. L. (2009). Constituent integration during the processing of compound words: Does it involve the use of relational structures?
Gagne
Journal of Memory and Language, 60, 20e35.
Gathercole, S. E. (1995). Is nonword repetition a test of phonological memory or long- term knowledge? It all depends on the nonwords. Memory &
Cognition, 23(1), 83e94.
Gathercole, S. E., & Baddeley, A. D. (1990). Phonological memory deficits in language disordered children: Is there a causal connection? Journal of Memory
and Language, 29, 336e360.
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.
n, R., & Vasylets, O. (2016). Mode in theoretical and empirical TBLT research: Advancing research agendas. Annual Review of Applied
Gilabert, R., Mancho
Linguistics, 36, 117e135.
Grabe, W. (2009). Reading in a second language: Moving from theory to practice. Cambridge: Cambridge University Press.
Harrington, M., & Sawyer, M. (1992). L2 working memory capacity and L2 reading skill. Studies in Second Language Acquisition, 14, 25e38.
Horiba, Y. (2000). Reader control in reading: Effects of language competence, text type, and task. Discourse Processes, 29, 223e267.
Jung, J. (2012). Relative roles of grammar and vocabulary in different L2 reading tasks. English teaching, 67(1), 57e77.
Jung, J. (2016). Effects of task complexity on L2 reading and L2 learning. English Teaching, 71(4), 141e166.
Khalifa, H., & Weir, C. J. (2009). Examining reading: Research and practice in assessing second language learning. Cambridge, UK: Cambridge University Press.
Kim, Y.-J. (2012). Task complexity, learning opportunities and Korean EFL learners' question development. Studies in Second Language Acquisition, 34,
627e658.
Kim, Y.-J., Payant, C., & Pearson, P. (2015). The intersection of task-based interaction, task complexity, and working memory: L2 question development
through recasts in a laboratory setting. Studies in Second Language Acquisition, 37, 549e581.
Kintsch, W., & van Dijk. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363e394.
Kormos, J., & Trebits, A. (2011). Working memory capacity and narrative task performance. In P. Robinson (Ed.), Researching second language task complexity:
Task demands, language learning and language performance (pp. 267e285). Amsterdam: John Benjamins.
Leeser, M. J. (2007). Learner-based factors in L2 reading comprehension and processing grammatical form: Topic familiarity and working memory. Language
Learning, 57, 229e270.
Long, M. H. (2016). In defense of tasks and TLBT: Non-issues and real issues. Annual Review of Applied Linguistics, 36, 5e33.
Meyer, B. J. F., & Ray, M. N. (2011). Structure strategy interventions: Increasing reading comprehension of expository text. International Electronic Journal of
Elementary Education, 4(1), 127e152.
Michel, M. C. (2013). The use of conjunctions in cognitively simple versus complex oral L2 tasks. The Modern Language Journal, 97, 178e195.
Norris, J. M., Bygate, M., & van den Branden, K. (2009). Task-based language assessment. In K. van den Branden, M. Bygate, & J. M. Norris (Eds.), Task- based
language teaching. A reader (pp. 431e434). Amsterdam/Philadelphia: John Benjamins Publishing Company.
Nuevo, A.-M., Adams, R., & Ross-Feldman, L. (2011). Task complexity, modified output, and L2 development in learner-learner interaction. In P. Robinson (Ed.
), Researching second language task complexity: Task demands, language learning and language performance (pp. 175e201). Amsterdam, The Netherlands:
John Benjamins.
Osaka, M., & Osaka, N. (1992). Language-independent working memory as measured by Japanese and English reading span tests. Bulletin of the Psychonomic
Society, 30, 287e289.
Osaka, M., Osaka, N., & Groner, R. (1993). Language-independent working memory: Evidence from German and French reading span tests. Bulletin of the
Psychonomic Society, 31, 117e118.
Pass, F., Tuovinen, J. E., Tabbers, H., & Van Gerven, P. W. M. (2003). Cognitive load measurement as a means to advance cognitive load theory. Educational
Psychologist, 38, 63e71.
Perfetti, C. A. (1999). Comprehending written language: A blueprint of the reader. In C. M. Brown, & P. Hagoort (Eds.), The neurocognition of language (pp.
167e210). Oxford: Oxford University Press.
Plonsky, L., & Oswald, F. L. (2014). How big is “big”? Interpreting effect sizes in L2 research. Language Learning, 64, 878e912.
R Development Core Team. (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. http://
www.R-project.org/.
Redick, T. S., Broadway, J. M., Meier, M. E., Kuriakose, P. S., Unsworth, N., Kane, M. J., et al. (2012). Measuring working memory capacity with automated
complex span tasks. European Journal of Psychological Assessment, 28, 164e171.
ve
sz, A. (2009). Task complexity, focus on form, and second language development. Studies in Second Language Acquisition, 31, 437e470.
Re
Revesz, A. (2011). Task complexity, focus on L2 constructions, and individual differences: A classroom-based study. Modern Language Journal, 95, 162e181.
ve
sz, A. (2014). Towards a fuller assessment of cognitive models of task-based learning: Investigating task-generated cognitive demands and processes.
Re
Applied Linguistics, 35, 87e92.
Robinson, P. (1995). Task complexity and second language narrative discourse. Language Learning, 45, 99e140.
Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied Linguistics, 22,
27e57.
Robinson, P. (2011). Researching second language task complexity: Task demands, language learning and language performance. Amsterdam: John Benjamins.
Rogers, J. R. (2016). Developing implicit and explicit knowledge of L2 case marking under incidental learning conditions. Unpublished dissertation. London, UK:
University College London Institute of Education.
Sasayama, S. (2016). Is a ‘complex’ task really complex? Validating the assumption of cognitive task complexity. The Modern Language Journal, 100, 231e254.
J. Jung / System 74 (2018) 21e37
37
Skehan, P. (2009). Modelling second language performance: Integrating complexity, accuracy, fluency, and lexis. Applied Linguistics, 30, 510e532.
Stanovich, K. E. (1980). Toward an interactive-compensatory model of individual differences in the development of reading fluency. Reading Research
Quarterly, 16, 32e71.
Taillefer, G. E. (1996). L2 reading ability: Further insight into the short-circuit hypothesis. The Modern Language Journal, 80, 461e477.
Thomas, E. A. C., & Weaver, W. B. (1975). Cognitive processing and time perception. Perception & Psychophysics, 17, 363e367.
Turner, M. L., & Engle, R. W. (1989). Is working memory capacity task dependent? Journal of Memory and Language, 28, 127e154.
Unsworth, N., Heitz, R., Schrock, J. C., & Engle, R. W. (2005). An automated version of the operation span task. Behavior Research Methods, 37, 498e505.
Waters, G. S., & Caplan, D. (1996). The measurement of verbal working memory capacity and its relation to reading comprehension. The Quarterly Journal of
Experimental Psychology Section A, 49, 51e79.