[go: up one dir, main page]

Academia.eduAcademia.edu
System 74 (2018) 21e37 Contents lists available at ScienceDirect System journal homepage: www.elsevier.com/locate/system Effects of task complexity and working memory capacity on L2 reading comprehension Jookyoung Jung Center for English Language Education, Korea University, 145 Anam-ro, Anam-dong, Seongbuk-gu, Seoul, 02841, Republic of Korea a r t i c l e i n f o a b s t r a c t Article history: Received 6 July 2017 Received in revised form 22 January 2018 Accepted 12 February 2018 The present study investigated whether cognitive task complexity affects second language reading comprehension and whether working memory capacity moderates the influence of task complexity. Fifty-two Korean undergraduate students were randomly assigned to either simple or complex condition and read two TOEFL passages while answering multiple-choice reading comprehension questions. Unlike simple versions that included coherent texts, complex versions contained texts whose paragraphs were disarranged and additionally required participants to order them coherently. A forward digit span test and a nonword repetition test were used to measure the participants’ phonological short-term memory, and a backward digit span test and an operation span test were employed to assess their complex working memory. The results revealed that task complexity did not affect reading comprehension scores, although participants perceived the complex tasks significantly more demanding. Also, under the complex condition, participants benefited from higher nonword span scores when answering reading comprehension questions. © 2018 Published by Elsevier Ltd. 1. Introduction In the 1980s, task-based language teaching (henceforth, TBLT), in which a task is defined as a meaningful and goal-oriented activity using the target language (TL), was proposed as a potential approach to second language (L2) instruction, and it has attracted growing attention ever since (Long, 2016). In order to facilitate maximum L2 learning, individual pedagogic tasks must be sequenced so as to match the learner's development level. While various criteria have been suggested for analyzing and grading tasks thus far, task complexity, i.e., task-induced demands imposed on learners' limited cognitive resources, has received increasing attention from researchers. In particular, Skehan's Limited Capacity Model (Skehan, 2009) and Robinson's Cognition Hypothesis (Robinson, 2011) have exerted a substantial impact on the recent empirical studies on whether and how manipulating cognitive task complexity affected learners' oral or written production and L2 development (e.g., Michel, 2013; ve sz, 2011). However, the influence of task complexity on receptive language skills, such as L2 reading, has long been Re n, & Vasylets, 2016). neglected, presumably due to the absence of theoretical agenda that can be called upon (Gilabert, Mancho That is, it is difficult to find a readily applicable conceptual framework that predicts and explains how task complexity would affect learners' L2 reading performance. An additional gap in the literature is the lack of research into the moderating effects of individual differences in TBLT. Breen (1987) asserts that a priori task design (in his term, ‘task-as-workplan’) must be tempered by what learners bring to the tasks, E-mail address: jookyoungjung@korea.ac.kr. https://doi.org/10.1016/j.system.2018.02.005 0346-251X/© 2018 Published by Elsevier Ltd. 22 J. Jung / System 74 (2018) 21e37 i.e., individual differences, resulting in unique and idiosyncratic task engagement (‘task-as-process’). Robinson (1995, 2001) also includes learner factors under the category of Task Difficulty, and highlights the need to investigate how learners' individual differences interact with task features. In a similar vein, Norris, Bygate and van den Branden (2009) suggest that “as increasing empirical light is shed on the learner side of the equation, it is also likely that the interactions of particular learners with particular tasks will become much more predictable” (p. 245). Researchers have been keenly aware of this issue, and a variety of learner-variables have been addressed in the studies of task effects, such as working memory capacity (e.g., Baralt, 2010; Kim, Payant, & Pearson, 2015; Kormos & Trebits, 2011). Along this line of research, the present study explores whether and how task complexity influences L2 reading comprehension, and what role working memory capacity plays as a moderating variable. 2. Literature review 2.1. Task complexity and L2 reading In the present study, Khalifa and Weir’s (2009) cognitive processing model for reading comprehension was considered as an ideal theoretical basis. Unlike previous models of reading (e.g., Kintsch & van Dijk, 1978; Perfetti, 1999; Stanovich, 1980) with a scope restricted to cognitive processes of reading comprehension, such as word recognition, syntactic parsing, textual understanding, interpretation, and integration, Khalifa and Weir's framework incorporates and stresses the role of metacognitive processes in reading comprehension. The highlighted role of metacognition enables the model to account for the task effects on reading, by viewing reading as a cognitive process that constantly adapts to the goal of the reading task. The model presupposes three sources of knowledge: metacognitive activity, the central core, and the knowledge base (see Fig. 1). Among these, metacognitive activity has particular relevance to this study, as it pertains to regulating entire reading processes in such a way to successfully perform the given reading task. Metacognitive activity involves setting goals, monitoring and remediating text understanding where necessary. The goal-setter determines the type of reading comprehension that should be achieved and the speed and scope of reading required. More specifically, according to the purpose of reading, the reader engages in either careful or expeditious reading, which may take place at either local or global level. Local comprehension refers to extracting propositions at a lexical or clausal level, whereas global comprehension entails understanding the structure of a text as a whole (Kintsch & van Dijk, 1978). Careful reading may take place at local or global level and is usually based on slow, sequential and incremental reading for comprehension. By contrast, expeditious reading includes quick, selective and efficient reading, as in the case of skimming, search-reading or scanning. Monitoring occurs in all stages of reading, from checking word recognition to evaluating the text-level representation and extracting the writer's intentions and text structure. As a result of monitoring, remediation of text understanding takes place, if necessary. In the context of TBLT, it seems possible to assume that cognitive demands of L2 reading tasks may be manipulated along the degree to which careful reading is required for successful task completion. For instance, if a reading task necessitates thorough and scrupulous processing of textual information, the cognitive demands may be greater compared to its counterpart that can be completed with shallow and superficial processing of the same text (Craik & Lockhart, 1972; Craik & Tulving, 1975). While there has been little research into the influence of task demands on L2 reading performance (Grabe, 2009), a few studies provide empirical support for the greater task demands induced by pronounced importance of careful reading (e.g., Horiba, 2000; Jung, 2012; Taillefer, 1996). For example, Horiba (2000) investigated how L2 readers' control over their reading might differ from that of L1 readers across tasks that entail differential processing demands using think-aloud protocols. In her studies, it was repeatedly found that tasks that require more careful reading, such as reading for coherence, encourage L2 readers to engage in more intensive text-based decoding as well as topdown inferring processes compared to the simpler counterparts such as reading freely. The quality of the free written recalls, however, was found comparable across task conditions, implying the marginal influence of task complexity on L2 reading comprehension outcomes. The distinct importance of careful reading in more complex tasks can also be found in research into the role of L2 proficiency in L2 reading comprehension. In Taillefer’s (1996) study, for instance, the role of L2 proficiency was greater in a complex task, i.e., reading to prepare for a debate (receptive reading), in comparison to a simple task, i.e., reading to locate a keyword (scanning). Taillefer assumed that receptive reading is more complex than scanning, as scanning is by and large a simple cognitive matching task, searching for what is sought and what is already given. In other words, when the text needs to be processed at a deeper level through careful reading for successful task completion, the task seem to pose greater processing demands on L2 readers, calling for more attentive and accurate textual analysis. 2.2. Validation of the construct of task complexity As empirical research into the influence of task complexity accumulates, diverse attempts have been made in order to v e sz, 2009), designing a improve methodological rigour in this line of research, such as including a control group (Re continuous complexity scale (Kim, 2012), employing a distractor task (e.g., Nuevo, Adams, & Ross-Feldman, 2011) and J. Jung / System 74 (2018) 21e37 23 Fig. 1. Cognitive processing model of reading comprehension (Khalifa & Weir, 2009, p.43). comparing learners' data with native speakers' baseline data (e.g., Michel, 2013). Robinson’s (2001) questionnaire for overall perceptions of task difficulty has enjoyed popularity among researchers, perhaps due to its convenience of administration. Learners are presented with these binary items and asked to rank each of them on a seven-to-nine point Likert scale. What has typically been done is to conduct a descriptive analysis, treating the collected scales as interval data. 24 J. Jung / System 74 (2018) 21e37 In order for more rigorous validation of the construct of task complexity, recent studies tend to incorporate additional methods for assessing task complexity, such as subjective time estimation task. In subjective time estimation task, learners are usually asked to judge the time taken for task completion in the absence of an external timing device, and it has consistently been shown that estimated time duration becomes less accurate as the cognitive load of the task increases (Block, Hancock, & Zakay, 2010). Thomas and Weaver (1975) provided the theoretical basis for this method. That is, as nontemporal task demands increase, less attention is left for processing temporal information, and as a result estimated time duration becomes more inaccurate. More specifically, under the prospective paradigm where participants are aware of the upcoming time estimation to be made at the outset of the task, it has consistently been found that estimated-to-real duration ratio decreases with increasing cognitive load. By contrast, under the retrospective paradigm where participants are unaware of the subjective time estimation task until it has to be made, the estimated-to-real duration ratio increases with cognitive load (Fink & Neubauer, 2001). For instance, Baralt (2010) employed not only a perception questionnaire but also a retrospective time judgment task as an additional source for estimating cognitive complexity. In her study, learners were asked to estimate how long they believed it took them to complete each task, postulating that the greater the demands imposed on the learner, the more time he or she would judge had passed (Pass, Tuovinen, Tabbers, & Van Gerven, 2003). The findings of this study showed that the retrospective time judgment measure matched Baralt's operationalisation of task complexity, while a perception questionnaire failed to do so. More specifically, participants who performed the complex version estimated the time taken for task completion to be significantly longer than the time actually taken. It should be noted, however, that Baralt analysed subtracted values (the difference between estimated and real time duration), not estimated-to-real duration ratios, which could have lowered the internal validity of the results. Similarly, Sasayama (2016) also utilized subjective time estimation in addition to self-reported perceptions of task difficulty and dual-task methodology. She found that only large differences in the number of task elements were detectable, while smaller differences did not make a significant change to the level of cognitive complexity. Based on this finding, Sasayama underscores the importance of independent measures of task complexity in order to attest to whether designed task features exercise putative effects on task complexity. 2.3. Working memory capacity and L2 reading Working memory capacity has been explored extensively as an important cognitive construct in the field of cognitive psychology. The present study adopted Baddeley’s (2000, 2003; Baddeley & Hitch, 1974) framework, considering its substantial impact on studies on the role of working memory capacity in L2 reading. According to the framework, working memory consists of the executive control, a limited attentional control system, and two slave systems, the phonological loop and the visuo-spatial sketchpad (see Fig. 2). Among the three components of working memory, phonological short-term memory and complex working memory have attracted interests from researchers in the field of L2 reading. The phonological loop subsumes two subcomponents, i.e., a temporary storage system that holds phonological information for a few seconds during which it decays, and subvocal rehearsal system in which the information stored in the short-term memory is maintained and registered (Baddeley, 2003). As the phonological loop in particular pertains to the retention of sequential information, its function is typically measured with tasks that require the immediate repetition of sequences of digits or words/nonwords in the order of presentation (Baddeley, 2000). The executive-control is responsible for the attentional control, carrying out conscious processing, monitoring, intentional learning, and problem solving (Baddeley, 2003). Thus, the executive-control is deemed as the principal factor affecting individual differences in working memory span. Complex working memory span is often assessed with various types of Fig. 2. Current multi-component model of working memory (Baddeley, 2003, p.203). J. Jung / System 74 (2018) 21e37 25 span tasks, most notably reading span tasks (e.g., Daneman & Carpenter, 1980), which require learners to process information while retaining it in short-term memory. Previous studies have supported the importance of working memory capacity in L2 reading comprehension (e.g., Alptekin & Erçetin, 2009, 2011; Harrington & Sawyer, 1992; Leeser, 2007; Waters & Caplan, 1996). In Harrington and Sawyer’s (1992) seminal research, the participants who scored higher on the L2 reading span test did better on the L2 reading comprehension measures, whereas scores on the L2 digit and word span tests only weakly correlated with L2 reading comprehension. This finding led Harrington and Sawyer to conclude that complex working memory, rather than phonological short-term memory, plays a crucial role in L2 reading comprehension. In this study, however, the reading span test was constructed in L2, which could have measured overlapping construct with the reading comprehension test. It was also revealed the participants' performances on L1 digit and word span tests were, overall, superior to those on L2 measures, showing a strong effect for language. That said, in order to unveil the influence of working memory capacity on L2 reading comprehension, it appears desirable to use domain/language-independent measures. Whether working memory is language-dependent was explored in Osaka and Osaka (1992) and Osaka, Osaka, and Groner (1993). In Osaka and Osaka's research, L1 Japanese and L2 English participants performed Daneman and Carpenter’s (1980) reading span test in L1 and L2 versions. The results showed that there were significant correlations between L1 and L2 reading span scores, and between Daneman and Carpenter's original version and the Japanese version. In their follow-up research, Osaka et al. (1993) again found a significant correlation between L1 German and L2 French versions of the reading span test. Based on their findings, Osaka et al. suggested that complex working memory might be independent of any specific language proficiency. It seems worth mentioning, however, that in contrast to the considerable amount of attention paid to the reading span task, either in L1 or L2, other types of complex working memory indices, such as a counting span task (e.g., Case, Kurland, & Goldberg, 1982) or an operation span task (e.g., Turner & Engle, 1989), have not been preferred when exploring the contribution made by working memory to L2 reading comprehension, although they may provide more languageindependent index of complex working memory. What seems also noteworthy is that phonological short-term memory has been more or less unattended in the field of L2 reading research, regardless of its crucial role in sentence processing (e.g., Gathercole & Baddeley, 1990). That said, it might be recommended to include both phonological short-term memory as well as complex working memory when investigating the contribution made by working memory capacity in L2 reading comprehension. 2.4. Research questions The review of previous literature revealed that the effects of task complexity on L2 reading performance has remained largely unexplored, even though tasks are usually conceptualized as holistic activities subsuming diverse language skills, including L2 reading. In addition, complex working memory has received near exclusive attention in L2 reading research, predominantly measured with a reading span task, whereas the role of phonological short-term memory has often not included when investigating the relationship between working memory and L2 reading comprehension. To fill these gaps in the literature, the following research questions were addressed in the present study: 1) To what extent does task complexity affect L2 reading comprehension? 2) To what extent does working memory capacity moderate the effects of task complexity on L2 reading comprehension? 3. Methodology 3.1. Design The data for the present study, except for the working memory data, were collected as part of a larger study (Jung, 2016) that explored the effects of manipulating cognitive task demands on Korean undergraduate students' L2 English reading Fig. 3. Experimental design and procedure. 26 J. Jung / System 74 (2018) 21e37 comprehension and learning of target constructions contained in the texts. Hence, the influence of task complexity on L2 reading comprehension is reported in this study as well. As illustrated in Fig. 3, fifty-two participants were randomly assigned to either the simple or complex condition and took part in two reading sessions. In each reading session, they read a TOEFL passage while simultaneously answering reading comprehension questions. 3.2. Participants The participants included 14 male and 38 female undergraduate students enrolled in a university in Korea. Their L1 was Korean and the average age of the participants was 22.84 (SD ¼ 1.94). Their average onset age of English learning was 8.73 years (SD ¼ 2.18), and 11 students reported that they had stayed in English speaking countries, such as the US, Australia, Canada, the Philippines, and Malaysia (Mean ¼ 6.73 months, SD ¼ 4.88 months). To ensure the homogeneity of participants’ English ability, their English proficiency level was measured with a Reading and Use of English section of a practice Cambridge Proficiency: English (CPE) test developed and provided by University of Cambridge ESOL examinations. Based on the scores, stratified random sampling was applied in order to reduce sampling error and ensure equivalence among the groups in terms of English proficiency. More specifically, strata were created based on the CPE scores, and the participants were randomly taken from each stratum to form the two experimental groups. 3.3. Texts For this study, two expository texts were selected from passages used for previous TOEFL iBT tests developed by the ETS. In order to prevent confounding influence of the participants' topic familiarity on their reading comprehension (e.g., Leeser, 2007), texts that appeared potentially unfamiliar to the participants were carefully selected. Participants’ familiarity with the topics of the texts was assessed with two 5-point Likert-scale post-reading questionnaire items (i.e., Item 1: I thought this reading topic was familiar. Item 2: I had some background knowledge about the reading topic.). Text 1 explained formation, extraction, and refinement of petroleum resources and challenges and dangers posed in the use of petroleum resources; Text 2 reviewed fossil evidence of evolutionary explosion that happened during the Cambrian period. Text 1 contained 682 words and six paragraphs, whereas Text 2 consisted of 699 words and seven paragraphs. The average readability of the two texts, calculated with various indices such as Flesch-Kincaid grade level, Gunning-Fog score, Coleman-Liau index, SMOG index, and Automated Readability index, was 11.6 and 13.4, respectively. Each readability index corresponded to the number of years of education (based on the US education system) required to understand the text. According to their average readability, the texts required an upper-intermediate level of English proficiency and thus were considered appropriate for the participants of this study. In order to prevent ordering effects, the presentation order of the texts was counter-balanced within each task condition. 3.4. Reading task and task complexity manipulation The reading task in this study was similar to what learners normally do when taking a paper-based reading comprehension test, that is, reading a printed text while answering related multiple-choice comprehension questions using a pencil. The multiple choice reading comprehension items were also taken from TOEFL tests developed and validated by ETS (Freedle & Kostin, 1999). The items asked participants to identify factual/negative-factual information, make inferences, understand rhetorical purpose, recognize vocabulary meaning, determine reference, simplify/paraphrase a sentence, insert a sentence into a paragraph, and select main ideas of the text (Educational Testing Services, 2012). As in the TOEFL format, the texts were divided into five segments, comprised of either one or two paragraphs, and followed by reading comprehension questions related to each segment. There were 14 multiple-choice comprehension items for each text. In order to avoid confounding effects induced by the reading comprehension items on the level of task complexity, the same reading comprehension items were used for both simple and complex conditions. Following the original TOEFL scoring system, 1 point was given to each item, while the last item, which measured global text understanding, received 2 points. The maximum score for reading comprehension for each text was 15. Under the complex condition, the five segments were jumbled and presented to participants in a mixed order. That is, after reading the text and answering comprehension questions, participants in the complex group also had to arrange the segments into a correct order to make a coherent text (see Appendix). The complex version was judged as cognitively more demanding in that readers' comprehension was substantially influenced by the degree of clarity and coherence of text structure (Meyer & Ray, 2011). Also, according to Khalifa and Weir’s (2009) cognitive processing model, the additional textordering task was expected to encourage more careful reading in order to arrange the segments logically and coherently, posing a greater amount of demands on the participants in comparison to the simple version that did not involve such additional task requirements. To test the validity of task complexity manipulation, the participants’ retrospective and subjective time estimations were collected to be used as an additional source for estimating the cognitive/mental demand imposed on learners (Baralt, 2010; ve sz, 2014). As the time estimation task was conducted under a retrospective paradigm (Fink & Neubauer, 2001), Re J. Jung / System 74 (2018) 21e37 27 participants were unaware of the upcoming duration judgement task until it had to be done. As such, there was no time limit for task completion, and subjective time estimations were only collected after the first treatment session. Unlike the prospective paradigm (informed time estimation), wherein estimated-to-real duration ratio decreases with increasing cognitive load, it was expected that the duration ratio would increase after performing cognitively more demanding tasks. In addition, in order to triangulate subjective time estimations, two additional 5-point Likert-scale post-reading questionnaire items asked the participants to provide their perceived level of task difficulty (e.g., Item 3: I thought this task was difficulty. Item 4: I thought this task was demanding.). 3.5. Working memory measurements In this study, a forward digit span test and a nonword repetition test were used to measure participants' phonological short-term memory. In addition, a backward digit span test and an automated operation span test were used to measure participants’ complex working memory. 3.6. Forward digit span test (DS) In the digit span test, participants were provided with sequences of unrelated digits that were presented on an automated ve sz, 2015). Each digit stayed in the slide for 1 s, and set sizes ranged from PowerPoint slide show (adopted from Brunfaut & Re 3 to 11 digits (2 sets for each length, 18 sets in total) presented in an increasing order. Digits were repeated across sets but not within sets, and all of them were used approximately equally in the text. Participants were instructed to recall the digits from each set in the response sheet. Ten seconds were allowed for recalling each set. The maximum set size correctly recalled once was the digit span score. The test took 7e8 min, and the Cronbach's alpha was .76. 3.7. Nonword repetition test (NWR) For this study, a nonword repetition test was developed in Korean. More specifically, nonsense words that conformed to the phonotactic rules of Korean were created and then presented to participants in an automated PowerPoint slide show. The test stimuli consisted of 32 nonwords, each containing 4 to 11 syllables (4 sets for each syllable length). Each nonword was aurally delivered to participants in a random order, and 10 s were allowed for oral recall. Each of the nonword recalls was scored either correct or incorrect, and the maximum number of syllables that participants correctly recalled at least twice for each syllable length was the score for this test. The test was piloted on seven Korean graduate students to determine appropriate syllable lengths and the reliability of the test. They were also asked to rate the wordlikeness of each nonword on a 5-point Likert scale from 1 (very likely to pass for a real Korean word) to 5 (very unlikely to pass for a real Korean word). This process was to ensure that the nonword stimuli of the test were low in wordlikeness so that participants would be less likely to retrieve similar phonological structures from their long-term memory and have to depend on short-term phonological representation to mediate nonword repetition (Gathercole, 1995). The mean value of wordlikeness was 2.23 (SD ¼ .74). Seven nonwords that were rated relatively highly for wordlikeness (1 SD above from the mean) were replaced with other lesswordlike nonwords. The test was administered individually, and took about 9e10 min to complete. Cronbach's alpha for this test was .73. 3.8. Backward digit span test (BDS) The design, structure, and procedure of the backward digit span test were the same as for the forward digit span test, ve sz, except for the fact that participants were instructed to recall the digits in reverse order (adopted from Brunfaut & Re 2015). The maximum set size correctly recalled once was the backward digit span score. The test took 7e8 min, and the Cronbach's alpha was .81. 3.9. Automated operation span test (OSPAN) Operation span test, created by Turner and Engle (1989), requires participants to solve a series of math problems while remembering a set of unrelated letters or words (see Fig. 4). The source file of the automated operation span test was obtained from Attention and Working Memory Lab at Georgia Tech (Redick et al., 2012; Unsworth, Heitz, Schrock, & Engle, 2005). The test began with two practice sessions to familiarize participants with the math operation and word/letter recalling tasks and to calculate the individual differences in the time required to solve math problems. The time taken to solve math problems (plus 2.5 SD, determined after extensive piloting; Unsworth et al., 2005) was used as the time limit for each math problem session for that individual. In order to guarantee that participants engaged in a trade-off between storage (remembering word/letter strings) and processing (solving math problems), an 85% of accuracy criterion on the math operation was required. The real test session consisted of three sets, set sizes ranging from 3 to 7 (75 letters and 75 math problems in total). The order of set size was random. The total number of correct letter recalls was used as the OSPAN index. The test took approximately 20 min to complete. The Cronbach's alpha was .78 (Unsworth et al., 2005). 28 J. Jung / System 74 (2018) 21e37 Fig. 4. Example of slides used in the OSPAN test (Source: Adopted from Unsworth et al., 2005, p. 500, p. 500). 3.10. Questionnaires Participants were asked to answer a background questionnaire and two post-reading questionnaires. The aim of the background questionnaire was to collect information about participants’ demographics and English learning experiences. The post-reading questionnaires asked participants to provide their retrospective subjective time estimation taken to complete the given reading task, perceived level of task difficulty, and familiarity with the topic of the reading text. All questionnaires were administered in Korean. 3.11. Procedure The data were collected over four weeks. All participants took the pre-reading questionnaire, and the L2 proficiency test (CPE) in the first session. One week later, they took part in two reading sessions on separate days. In each of the sessions, they read the given text and answered the post-reading questionnaire. In the last session, the participants carried out working J. Jung / System 74 (2018) 21e37 29 memory measurements. Each session took approximately 45 min to an hour. The experimental sessions were conducted in a computer-laboratory at a university in Korea. 3.12. Analysis SPSS 22.0 for Mac was used for examining reliability of the tests as well as computing descriptive and correlational statistics of the data. More specifically, the reliability of the different tests was determined using Cronbach's alpha, and the level of significance for this study was set at alpha level of p < .05. In order to answer the research question, the statistical program R version 3.3.0 was used (R Development Core Team, 2016) (Baayen, 2008). One particular strength of mixed-effects modeling is that it can account for the potential idiosyncrasies of individual participants and items (Rogers, 2016), and thus allow researchers to make a “simultaneous generalization  & Spalding, 2009, p. 25). Considering that this study included of the results on new items and new participants” (Gagne participant- and item-related factors, mixed-effects modeling was considered a robust and appropriate method for data analysis. Data analyses were conducted by constructing various linear mixed-effects models using the lmer function in the package lme4 (Bates, Maechler & Bolker, 2015). When analyzing the effects of task complexity on reading comprehension scores, task complexity, i.e., the independent variable, was entered as the fixed effect. In addition, in order to account for uncontrollable peculiarities nested in each participant and/or item, subjects and items were included as the random effects. The modeling started with a null model that contained only the random intercept for subjects and items. To the null models, task complexity was added and tested to see whether the inclusion significantly improved the model fit, using likelihood ratio tests with the c2 statistic. When exploring the moderating role of working memory capacity, working memory indices were included as an additional fixed effect one by one to the models that contained task complexity as an existing fixed effect. Then, likelihood ratio tests were performed to compare the reduced model with the increased model that additionally contained a working memory measure. If significant difference was found from likelihood ratio tests, post-hoc maximal random effects structures were produced following Barr, Levy, Scheepers, and Tily (2013), in order to examine the clear contribution made by each of the fixed effects. Whenever the maximal models failed to converge, random parameters were dropped out from the one that accounted the least variance to the next until convergence was reached (Cunnings & Sturt, 2014). As linear model summaries provide t statistics without p-values, absolute t-values above 2.0 was set for testing significance of the models (Gelman & Hill, 2007). Effect sizes for the models were computed using r.squaredGLMM function in the package MuMln (Barton, 2015). Following Plonsky and Oswald (2014), a R2 value of .06, .16, and .36 were interpreted as small, medium, and large effect size respectively. 4. Results 4.1. Preliminary analysis Prior to answering the research questions, some preliminary steps were taken to ensure the internal validity of the results. The following methodological concerns were taken into consideration: equivalence between the groups, potential influence of topic familiarity on reading comprehension scores, and validation of task complexity. 4.2. Equivalence between groups English proficiency scores were analyzed to see if the groups were equivalent between the simple and the complex conditions. The mean score was 14.62 for the participants who were assigned in the simple condition (SD ¼ 4.97, 95% CI [13.85, 15.39]), and 13.96 for those in the complex condition (SD ¼ 4.98, 95% CI [13.19,14.73]). To check the equivalence of English proficiency level among the groups, a mixed-effects model was constructed with the CPE scores as the dependent variable, Complexity as a fixed effect, and Subject and Item as random effects. When compared with a null model that contained only the random effects, the results showed that the inclusion of Complexity as a fixed effect did not make a significant difference, c2(1) ¼ .29, p ¼ .59. In other words, the two groups did not significantly differ from each other in terms of English proficiency. 4.3. Effects of topic familiarity The maximum value for each item that measured topic familiarity was 7. The responses to the two items significantly correlated with each other, Text 1: r(52) ¼ .68, p < .01, Text 2: r(52) ¼ .56, p < .01, suggesting that the items assessed overlapping constructs. The mean of summed value of the two items was 7.08 for Text 1 (SD ¼ .44, 95% CI [6.54, 7.62]) and 5.55 for Text 2 (SD ¼ .38, 95% CI [5.01, 6.09]). In order to examine whether participants' topic familiarity had an impact on their reading comprehension scores, likelihood ratio tests were conducted. The null model contained Subject and Item as random effects 30 J. Jung / System 74 (2018) 21e37 and the increased model additionally included topic familiarity as a fixed effect. The dependent variable was reading comprehension scores for Text 1 and Text 2. The results showed that adding topic familiarity did not make significant improvement to the null models, Text 1: c2(1) ¼ .01, p ¼ .91, Text 2: c2(1) ¼ 2.25, p ¼ .13. In short, the participants’ topic familiarity with the texts did not affect their reading comprehension scores. 4.4. Validation of task complexity To validate the operationalization of task complexity, all participants were asked to judge the perceived time duration taken to complete each task immediately after the task completion. As mentioned earlier, only the time estimations made after the first task were analyzed. In order to examine whether the subjective time estimations differed as a function of task complexity, the estimated-to-target duration ratios were calculated by dividing estimated time by real time taken to complete the given task. A duration judgment ratio higher than 1 indicated that participants overestimated time taken to task completion compared to real time. In the retrospective time estimation paradigm, duration judgment ratio is expected to increase with greater cognitive load. As shown in Table 1, for both Text 1 and Text 2, duration judgment ratios of the complex versions were on average larger than those in the simple versions. The results from independent samples t-tests on the duration judgment ratios across simple and complex conditions also revealed significant effects of task complexity for both Text 1 and Text 2, Text 1: t(50) ¼ 2.86, p ¼ .01, 95% CI [.04, .22]; Text 2: t(50) ¼ 3.85, p < .01, 95% CI [.11, .36]. Cohen's ds were .79 and 1.09 respectively, which were considered as medium and large effect sizes (Plonsky & Oswald, 2014). It seems noteworthy that the actual time on task was comparable between the two conditions, c2(1) ¼ 1.91, p ¼ .17, R2 ¼ 02. In other words, regardless of the actual time on task, the participants' subjective estimation of time taken for task completion was significantly greater for the complex condition, suggesting that the complex tasks induced heavier cognitive loads on the participants compared to the simple tasks. To triangulate the subject time estimation data, two post-reading questionnaire items additionally asked the participants to provide their perceived level of task difficulty. Cronbach's alpha for the items was .63 for Text 1 and .75 for Text 2. Descriptive statistics for the responses to the two items are presented in Table 2. In order to examine if there were significant differences between simple and complex conditions in participants’ ratings of perceived task difficulty, likelihood ratio tests were conducted on the responses to the reported mental efforts. Null models included only random effects (i.e., Subject and Item), whereas increased models contained Complexity as a fixed effect. Significance was found for both texts, Text 1: c2(1) ¼ 4.05, p ¼ .04, R2 ¼ .04; Text 2: c2(1) ¼ 8.27, p < .01, R2 ¼ .08. In other words, in the same vein with the results from subjective time estimations, the participants perceived the complex tasks significantly more difficult than the simple tasks. 4.5. Effects of task complexity on L2 reading First, the descriptive statistics for the reading comprehension scores of each group are displayed in Table 3. Reading comprehension scores on Text 2 were on average higher than those on Text 1. The variances in the scores of reading comprehension tests were relatively small, while the mean reading comprehension scores seemed to imply a ceiling effect. In order to examine whether task complexity had a significant impact on L2 reading comprehension scores, linear mixed-effects models were constructed with R. Null models contained Subject and Item as random effects, and Complexity was entered as a fixed effect and compared against the null models with likelihood ratio tests using c2 statistics. The results revealed that, when the random effects were controlled, task complexity did not have significant effects on reading comprehension scores, Text 1: c2(1) ¼ .02, p ¼ .90, R2 < 01, Text 2: c2(1) ¼ .40, p ¼ .53, R2 < 01. 4.6. WMC as a moderator of L2 reading comprehension The participants’ performances on the working memory tests were analyzed in order to examine whether working memory capacity moderated the effects of task complexity on L2 reading comprehension scores. The descriptive statistics for each of the working memory tests are provided in Table 4 and the correlations among working memory indices are presented in Table 5. In order to test whether working memory moderated the effects of Complexity on reading comprehension scores, likelihood ratio tests were conducted using c2 statistics. Null models included Complexity as an existing fixed effect and Subject Table 1 Descriptive statistics for duration judgment ratio. Condition Simple Complex N 13 13 Text 1 Text 2 Mean SD 95% CI Mean SD 95% CI 1.03 1.16 .16 .17 [0.97, 1.09] [1.09, 1.23] .95 1.19 .11 .29 [0.91, 0.99] [1.08, 1.30] J. Jung / System 74 (2018) 21e37 31 Table 2 Descriptive statistics for perceived task difficulty. Item Condition N Reported mental effort Text 1 #3 #4 Total Simple Complex Simple Complex Simple Complex 26 26 26 26 26 26 Text 2 Mean SD 95% CI Mean SD 95% CI 4.23 4.65 4.00 4.46 8.23 9.12 1.03 0.80 1.17 1.42 1.86 2.07 [3.83, 4.63] [4.62, 4.68] [3.55, 4.45] [3.91, 5.01] [7.52, 8.95] [8/32, 9.92] 3.77 4.46 3.62 4.19 7.38 8.65 1.03 1.07 1.10 1.27 1.92 2.08 [3.37, [4.05, [3.20, [3.70, [6.64, [7.85, 4.17] 4.87] 4.04] 4.68] 8.12] 9.45] Note. Maximum value for each item ¼ 7. and Item as random effects, and each of the four working memory measures was entered into the null models one by one in order to see if the inclusion improved model fit significantly. The results revealed that nonword repetition scores (NWS) and backward digit span scores (BDS) improved model fit for Text 1 (NWS: c2(4) ¼ .6.30, p ¼ .04, R2 ¼ 23, BDS: c2(4) ¼ 6.33, p ¼ .04, R2 ¼ 23), and forward digit span scores (DS) and nonword repetition scores (NWS) enhanced model fit for Text 2 (DS: c2(4) ¼ 9.56, p < 01, R2 ¼ 15, NWS: c2(4) ¼ 9.84, p < .01, R2 ¼ 15). For these measures, maximal linear-mixed-effects models were constructed. As can be seen in Table 6, significance was found for interaction between Complexity and nonword span scores for Text 1 and Text 2, and for interaction between Complexity and forward digit span scores for text 2. Next, post-hoc mixed-effects modeling was conducted in order to compare the differential contribution made by each of the working memory measures to reading comprehension scores between the simple and complex conditions. As Table 7 demonstrates, for both Text 1 and Text 2, nonword repetition scores were found to make significant contribution to L2 reading comprehension scores in the complex conditions. In other words, when assigned in the complex condition, participants with higher nonword repetition scores were better in answering reading comprehension items than those with lower nonword repetition scores. 5. Discussion and conclusion In this study, it was investigated whether task complexity affected Korean undergraduate students’ English reading comprehension, and whether working memory capacity played as a moderator of the effects of task complexity. Task complexity of the reading tasks was manipulated by disarranging five segments of each reading passage, based on the understanding that incoherent text structure might interrupt reading comprehension and thereby promote careful and attentive reading. The results showed that, even though the actual time on task did not differ between the two task conditions, the participants significantly overestimated the time duration taken to task completion when assigned in the complex condition, supporting that task manipulation of this study was successful. This finding implies the feasibility of adjusting cognitive demands of L2 reading tasks through modifying various task features. Furthermore, it seems possible to assume that reading tasks may become more demanding they necessitate thorough and scrupulous processing of the text through careful reading (Khalifa & Weir, 2009). That said, more empirical research that includes various ways of task manipulation and their impact on L2 reading performance appears imperative in order to establish and refine the theoretical framework for analyzing and assessing cognitive complexity of L2 reading tasks. The results of mixed-effects modeling, however, demonstrated that reading comprehension scores were not affected by task complexity of the reading tasks. It should be noted that, as shown in the relatively high mean scores and small SDs, participants overall performed well in the reading comprehension tests, and thus a ceiling effect might have masked between-group differences. In addition, as described in the method section, the texts were divided into five segments and related multiple-choice reading comprehension items followed each segment, as in the original TOEFL reading tests. As such, while the disarrangement of the segments might have impeded global understanding of the text as a whole, it could have failed to impede answering the reading comprehension items. It should be also noted that there was no time limit and participants were allowed to stay on the task as long as they felt necessary, which might have further contributed to the nonsignificant findings. In this study, when assigned in the complex condition, participants with higher nonword span scores performed better in answering reading comprehension questions than those with lower nonword span scores. This finding appears to indicate that a larger phonological short-term memory span allowed the participants to retain more textual information, facilitating more efficient handling of mixed paragraph order. The findings also indicate that, when the processing demands increase, phonological short-term memory emerges as an additional significant factor in addition to complex working memory, which have been supported as a stable explanatory factor in L2 reading comprehension. This finding is particularly noteworthy given that most previous studies have focused on the role of complex working memory in L2 reading comprehension, predominantly using a reading span task, while that of phonological short-term memory has largely been neglected. Except for 32 J. Jung / System 74 (2018) 21e37 Table 3 Descriptive statistics for reading comprehension scores. Group Simple Complex Total N 26 26 52 Text 1 Text 2 Mean SD 95% CI Mean SD 95% CI 11.15 10.85 11.04 1.35 2.04 2.22 [10.38, 11.92] [10.08, 11.62] [10.50, 11.58] 13.23 13.08 12.85 1.42 .76 1.85 [12.46, 14.00] [12.85, 13.85] [12.31, 13.39] Note. Maximum score ¼ 15. Table 4 Descriptive statistics for working memory tests. Tests N Mean SD 95% CI Digit span test Backward digit span test Nonword repetition test Operation span test 52 52 52 52 9.19 8.02 9.04 65.15 1.09 1.32 .86 6.35 [8.65, 9.73] [7.48, 8.56] [8.50, 9.58] [64.61, 65.69] Note. Maximum score: digit span test ¼ 11, backward digit span test ¼ 11. Nonword repetition test ¼ 11, Operation span test ¼ 75. Table 5 Correlations among working memory capacity indices. DS NWS BDS OSPAN Coefficient Significance Coefficient Significance Coefficient Significance Coefficient Significance DS NWS BDS OSPAN 1 .50** .00 1 .38* .00 .28* .05 1 .31* .03 .29* .04 -.11 .48 1 Note. DS ¼ digit span scores, NWS ¼ nonword span scores. BDS ¼ backward digit span scores, OSPAN ¼ operation span scores. Significance level: þp < .1, *p < .05, **p < .01. Table 6 Results for linear mixed-effects models for interaction among WMC and task complexity on reading comprehension scores. Fixed effects Random effects by-Subject Estimate Text 1 Intercept .38 Complexity*NWS .11 Formula: RC ~ Complexity * NWS þ (COMj Subject) þ (1j Item), R2 ¼ 23. Intercept .42 Complexity*BDS -.01 Formula: RC ~ Complexity * BDS þ (COMj Subject) þ (COMj Item), R2 ¼ 23. Text 2 Intercept .74 Complexity*DS .07 Formula: RC ~ Complexity * DS þ (1j Subject) þ (1j Item), R2 ¼ 15. Intercept .48 Complexity*NWS .08 Formula: RC ~ Complexity * NWS þ (1| Subject) þ (1| Item), R2 ¼ 15. by-Item SE t SD SD .24 .05 1.56 2.21 .00 e .02 e .18 .04 2.40 -.33 .00 e .04 e .13 .03 5.747 2.67 .00 e .01 e .16 .04 2.96 2.17 .00 e .01 e Note. Significance:  jt j > 2.0. Harrington and Sawyer’s (1992) seminal research that included L2 digit and word span tests, most L2 reading studies have not paid attention to the contribution made by phonological memory to L2 reading comprehension. The pronounced role of the phonological short-term memory in the present study, however, warrants caution, as the nonword repetition test was the only working memory measure that included linguistic stimuli. This may provide a partial explanation to why the other phonological short-term memory index, forward digit span scores, did not moderate the effects of task complexity on L2 reading comprehension scores, even though it significantly correlated with nonword span scores (rs(50) ¼ .50, p < .01). J. Jung / System 74 (2018) 21e37 33 Table 7 Results of post-hoc mixed-effects models for interaction among WMC and task complexity on reading comprehension scores. Fixed effects Estimate Text 1 Intercept 1.38 Simple NWS -.07 Formula: RC ~ NWS þ (NWSj Subject) þ (NWSj Item), R2 ¼ 29. Complex Intercept -.14 NWS .10 Formula: RC ~ NWS þ (NWSj Subject) þ (NWSj Item), R2 ¼ 21. Text 2 Intercept 1.05 Simple DS -.01 2 Formula: RC ~ DS þ (DSj Subject) þ (DSj Item), R ¼ 15. Complex Intercept .43 DS .05 Formula: RC ~ DS þ (DSj Subject) þ (DSj Item), R2 ¼ 15. Simple Intercept .84 NWS .01 Formula: RC ~ NWS þ (NWSj Subject) þ (NWSj Item), R2 ¼ 15. Complex Intercept .17 NWS .08 Formula: RC ~ NWS þ (NWS| Subject) þ (NWS| Item), R2 ¼ 15. Random effects by-Subject by-Item SE t SD SD .32 .04 4.32 1.88 .44 .00 .11 .00 .39 .04 -.35 2.51 .07 .00 .48 .00 .20 .02 5.26 -.60 .14 .00 .01 .00 .27 .03 1.57 1.93 .05 .00 .52 .00 .22 .02 3.88 .47 .05 .00 .01 .00 .30 .03 .57 2.65 .16 .00 .33 .00 Note. Significance:  jt j > 2.0. On a pedagogical level, the findings of this study seem to imply that cognitive complexity of a reading task, not just linguistic complexity of the text, should be considered as an important factor that influences L2 reading. That is, learners may be under greater processing demands when the task requires more attentive and scrupulous reading of the text, even though this may not necessarily surface in reading comprehension scores. That said, it appears essential to match cognitive complexity of reading tasks with learners’ L2 proficiency so that learners can better cope with the task requirements. For example, with the same text material, cognitively simpler tasks (e.g., reading to locate specific information) may need to be developed for beginning-level learners who suffer from insufficient L2 resources, whereas more demanding tasks (e.g., reading to critique) can be employed for advanced-level learners who are equipped with affluent L2 means. There are some limitations in this study. Firstly, task manipulation conducted by jumbling paragraphs was shown to affect the participants’ perceived level of task difficulty as measured with subjective time estimation, but not their reading comprehension scores. It was also pointed out that reading comprehension questions followed each paragraph, which might have failed to capture any effects induced by task manipulation on L2 reading comprehension scores. Hence, in order to better detect the effects of task complexity in the future research, task complexity manipulation may need to be conducted on a more localized level within each paragraph. Along the same line of logic, a time limit for task completion may place additional cognitive demands on learners, and in so doing, enhance the likelihood to observe the effects of task complexity as well as that of working memory capacity on reading comprehension scores. Next, it was speculated that a ceiling effect could have masked the effects of task complexity on reading comprehension scores. Indeed, mean scores were low while variances were small, suggesting an inherent limitation in detecting significant effects of task complexity on reading comprehension scores. Therefore, in future studies, more difficult reading comprehension items, ideally in an increased number, would be desirable to expand variances among the participants' scores. With respect to the moderating role of working memory capacity, as previously pinpointed, nonword repetition task was the only measure that entailed linguistic elements, suggesting a possible domain-specific influence. Despite these limitations, the present study sheds light on the possibility to manipulate cognitive complexity of L2 reading tasks, which has long been neglected in the fields of both task-based language teaching and L2 reading research. Given the scant research on the relationship between cognitive task demands and learners performance in L2 reading tasks, more studies are necessary to accumulate empirical findings as well as establish theoretical framework that can explain and predict the effects of task complexity on L2 reading comprehension. In addition, the pronounced importance of nonword span scores in answering reading comprehension items under increased task demands indicate the need to take phonological short-term memory into account when exploring the relationship between working memory capacity and L2 reading comprehension. Last but not least, future research incorporating process measures, obtained through verbal reports or eye-tracking technology, may provide a fuller picture of the relationship between task complexity and working memory capacity and its theoretical, methodological, and pedagogical implications in the context of L2 learning through L2 reading. Appendix. Sample task layout of simple and complex task conditions 34 J. Jung / System 74 (2018) 21e37 J. Jung / System 74 (2018) 21e37 35 References Alptekin, C., & Erçetin, G. (2009). Assessing the relationship of working memory to L2 reading: Does the nature of comprehension process and reading span task make a difference? System, 37, 627e639. Alptekin, C., & Erçetin, G. (2011). Effects of working memory capacity and content familiarity on literal and inferential comprehension in L2 reading. TESOL Quarterly, 45, 235e266. Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics. Cambridge: Cambridge University Press. Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417e423. Baddeley, A. D. (2003). Working memory and language: an overview. Journal of Communication Disorders, 36, 189e208. Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The psychology of learning and motivation (vol. 8, pp. 47e89). San Diego, CA: Academic Press. 36 J. Jung / System 74 (2018) 21e37 Baralt, M. (2010). Task complexity, the Cognition Hypothesis, and interaction in CMC and FTF environments. Unpublished Ph.D dissertation. Washington D.C: Department of Spanish and Applied Linguistics, Georgetown University. Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmnatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255e278. Barton, K. (2015). MuMIn: Multi-Model inference. R package version 1.13.4. http://cran.r-project.org/package¼MuMIn. Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1e48. Block, R. A., Hancock, P. A., & Zakay, D. (2010). How cognitive load affects duration judgments: A meta-analytic review. Acta Psychologica, 134, 330e343. Breen, M. P. (1987). Learner contributions to task design. In C. N. Candlin, & D. Murphy (Eds.), Language learning tasks. Lancaster practical papers in English language education (Vol. 7, pp. 23e46). Englewood Cliffs, NJ: Prentice-Hall International. ve sz, A. (2015). The role of task and listener characteristics in second language listening. TESOL Quarterly, 49, 141e168. Brunfaut, T., & Re Case, R., Kurland, M. D., & Goldberg, J. (1982). Operational efficiency and the growth of short-term memory span. Journal of Experimental Child Psychology, 33, 386e404. Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6), 671e684. Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104, 268e294. Cunnings, I., & Sturt, P. (2014). Coargumenthood and the processing of reflexives. Journal of Memory and Language, 75, 117e139. Daneman, M., & Carpenter, P. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450e466. Educational Testing Services. (2012). Official TOEFL iBT tests. New York: McGraw-Hill. Fink, A., & Neubauer, A. C. (2001). Speed of information processing, psychometric intelligence: And time estimation as an index of cognitive load. Personality and Individual Differences, 30, 1009e1021. Freedle, R., & Kostin, I. (1999). Does the text matter in a multiple-choice test of comprehension? The case for the construct validity of TOEFL's minitalks. Language Testing, 16, 2e32. , C. L., & Spalding, T. L. (2009). Constituent integration during the processing of compound words: Does it involve the use of relational structures? Gagne Journal of Memory and Language, 60, 20e35. Gathercole, S. E. (1995). Is nonword repetition a test of phonological memory or long- term knowledge? It all depends on the nonwords. Memory & Cognition, 23(1), 83e94. Gathercole, S. E., & Baddeley, A. D. (1990). Phonological memory deficits in language disordered children: Is there a causal connection? Journal of Memory and Language, 29, 336e360. Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press. n, R., & Vasylets, O. (2016). Mode in theoretical and empirical TBLT research: Advancing research agendas. Annual Review of Applied Gilabert, R., Mancho Linguistics, 36, 117e135. Grabe, W. (2009). Reading in a second language: Moving from theory to practice. Cambridge: Cambridge University Press. Harrington, M., & Sawyer, M. (1992). L2 working memory capacity and L2 reading skill. Studies in Second Language Acquisition, 14, 25e38. Horiba, Y. (2000). Reader control in reading: Effects of language competence, text type, and task. Discourse Processes, 29, 223e267. Jung, J. (2012). Relative roles of grammar and vocabulary in different L2 reading tasks. English teaching, 67(1), 57e77. Jung, J. (2016). Effects of task complexity on L2 reading and L2 learning. English Teaching, 71(4), 141e166. Khalifa, H., & Weir, C. J. (2009). Examining reading: Research and practice in assessing second language learning. Cambridge, UK: Cambridge University Press. Kim, Y.-J. (2012). Task complexity, learning opportunities and Korean EFL learners' question development. Studies in Second Language Acquisition, 34, 627e658. Kim, Y.-J., Payant, C., & Pearson, P. (2015). The intersection of task-based interaction, task complexity, and working memory: L2 question development through recasts in a laboratory setting. Studies in Second Language Acquisition, 37, 549e581. Kintsch, W., & van Dijk. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363e394. Kormos, J., & Trebits, A. (2011). Working memory capacity and narrative task performance. In P. Robinson (Ed.), Researching second language task complexity: Task demands, language learning and language performance (pp. 267e285). Amsterdam: John Benjamins. Leeser, M. J. (2007). Learner-based factors in L2 reading comprehension and processing grammatical form: Topic familiarity and working memory. Language Learning, 57, 229e270. Long, M. H. (2016). In defense of tasks and TLBT: Non-issues and real issues. Annual Review of Applied Linguistics, 36, 5e33. Meyer, B. J. F., & Ray, M. N. (2011). Structure strategy interventions: Increasing reading comprehension of expository text. International Electronic Journal of Elementary Education, 4(1), 127e152. Michel, M. C. (2013). The use of conjunctions in cognitively simple versus complex oral L2 tasks. The Modern Language Journal, 97, 178e195. Norris, J. M., Bygate, M., & van den Branden, K. (2009). Task-based language assessment. In K. van den Branden, M. Bygate, & J. M. Norris (Eds.), Task- based language teaching. A reader (pp. 431e434). Amsterdam/Philadelphia: John Benjamins Publishing Company. Nuevo, A.-M., Adams, R., & Ross-Feldman, L. (2011). Task complexity, modified output, and L2 development in learner-learner interaction. In P. Robinson (Ed. ), Researching second language task complexity: Task demands, language learning and language performance (pp. 175e201). Amsterdam, The Netherlands: John Benjamins. Osaka, M., & Osaka, N. (1992). Language-independent working memory as measured by Japanese and English reading span tests. Bulletin of the Psychonomic Society, 30, 287e289. Osaka, M., Osaka, N., & Groner, R. (1993). Language-independent working memory: Evidence from German and French reading span tests. Bulletin of the Psychonomic Society, 31, 117e118. Pass, F., Tuovinen, J. E., Tabbers, H., & Van Gerven, P. W. M. (2003). Cognitive load measurement as a means to advance cognitive load theory. Educational Psychologist, 38, 63e71. Perfetti, C. A. (1999). Comprehending written language: A blueprint of the reader. In C. M. Brown, & P. Hagoort (Eds.), The neurocognition of language (pp. 167e210). Oxford: Oxford University Press. Plonsky, L., & Oswald, F. L. (2014). How big is “big”? Interpreting effect sizes in L2 research. Language Learning, 64, 878e912. R Development Core Team. (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. http:// www.R-project.org/. Redick, T. S., Broadway, J. M., Meier, M. E., Kuriakose, P. S., Unsworth, N., Kane, M. J., et al. (2012). Measuring working memory capacity with automated complex span tasks. European Journal of Psychological Assessment, 28, 164e171. ve sz, A. (2009). Task complexity, focus on form, and second language development. Studies in Second Language Acquisition, 31, 437e470. Re   Revesz, A. (2011). Task complexity, focus on L2 constructions, and individual differences: A classroom-based study. Modern Language Journal, 95, 162e181. ve sz, A. (2014). Towards a fuller assessment of cognitive models of task-based learning: Investigating task-generated cognitive demands and processes. Re Applied Linguistics, 35, 87e92. Robinson, P. (1995). Task complexity and second language narrative discourse. Language Learning, 45, 99e140. Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied Linguistics, 22, 27e57. Robinson, P. (2011). Researching second language task complexity: Task demands, language learning and language performance. Amsterdam: John Benjamins. Rogers, J. R. (2016). Developing implicit and explicit knowledge of L2 case marking under incidental learning conditions. Unpublished dissertation. London, UK: University College London Institute of Education. Sasayama, S. (2016). Is a ‘complex’ task really complex? Validating the assumption of cognitive task complexity. The Modern Language Journal, 100, 231e254. J. Jung / System 74 (2018) 21e37 37 Skehan, P. (2009). Modelling second language performance: Integrating complexity, accuracy, fluency, and lexis. Applied Linguistics, 30, 510e532. Stanovich, K. E. (1980). Toward an interactive-compensatory model of individual differences in the development of reading fluency. Reading Research Quarterly, 16, 32e71. Taillefer, G. E. (1996). L2 reading ability: Further insight into the short-circuit hypothesis. The Modern Language Journal, 80, 461e477. Thomas, E. A. C., & Weaver, W. B. (1975). Cognitive processing and time perception. Perception & Psychophysics, 17, 363e367. Turner, M. L., & Engle, R. W. (1989). Is working memory capacity task dependent? Journal of Memory and Language, 28, 127e154. Unsworth, N., Heitz, R., Schrock, J. C., & Engle, R. W. (2005). An automated version of the operation span task. Behavior Research Methods, 37, 498e505. Waters, G. S., & Caplan, D. (1996). The measurement of verbal working memory capacity and its relation to reading comprehension. The Quarterly Journal of Experimental Psychology Section A, 49, 51e79.