[go: up one dir, main page]

Academia.eduAcademia.edu
Journal of Ethnographic & Qualitative Research 2008, Vol. 2, 269-280 ISSN: 1935-3308 USE OF A CODING MANUAL WHEN PROVIDING A META-ANALYSIS OF INTERNAL-VALIDITY MECHANISMS AND DEMOGRAPHIC DATA REPORTED IN PEER-REVIEWED EDUCATIONAL JOURNALS Anna O. Soter Sean P. Connors Ohio State University Ohio State University Lucila Rudge Ohio State University In a prior investigation (Wilkinson, Soter, & Murphy, 2004), we conducted a meta-analysis of how quantitative studies use small group discussions to promote high-level thinking. In the present project, our initial intent was to develop an equivalent mechanism for qualitative studies that focused on the same subject. However, our efforts to tease out effects-versus-claims and to identify measures used in the studies led to an evaluative coding manual. Our findings revealed that the majority of studies we investigated either neglected to provide sufficient background information regarding their participant populations or failed to contextualize the settings in which the studies occurred. This article emerged from issues Soter encountered during a 3-year qualitative investigation (Wilkinson, Soter, & Murphy, 2004, 2007) funded via the Spencer Foundation which, in its first year, focused on identifying converging evidence about the use of group discussions to promote high-level thinking and comprehension of literary text. In that project, we conducted an intensive narrative analysis of extant research drawn from Anna Soter, Ph.D., is Associate Professor of Education at Ohio State University in Columbus, Ohio. Correspondence concerning this article should be addressed to soter.1@osu.edu. Sean P. Connors is a graduate assistant at Ohio State University in Columbus, Ohio. Lucila Rudge is a graduate associate at Ohio State University in Columbus, Ohio. Correspondence concerning this article should be addressed to soter.1@osu.edu. nine different approaches to small group discussion. Subsequently, we included studies that used quantifiable measures in a meta-analysis to study effect sizes on the use of discussion and the development of high-level comprehension. We excluded 74 qualitative studies from the meta-analysis because their methodology precluded inclusion. Such exclusion raises issues about what is considered scientifically-based evidence (Almasi, Garas, & Shanahan, 2002). The exclusion of qualitative studies also eliminates findings that such research foregrounds, namely, information regarding contexts that not only may support instruction, but also often may provide detailed information about participants, their histories, and factors that may influence performance (Geertz, 1983). Most of the excluded studies included information about how discussion groups promoted the use of authentic literary texts to encourage high levels of motivation and student engagement; eliminating these from meta- 270 SOTER, CONNORS, RUDGE analyses eliminated potential instructional alternatives in the field of reading comprehension. A paucity of research exists about the use of meta-analysis in the field of reading comprehension. To date, the published research includes two studies (Almasi et al., 2002; Almasi, Palmer, Gambrell, & Pressley, 1994). The authors attempted such an inquiry with 6 qualitative studies in whole-language research and, subsequently, with 12 qualitative studies that had been excluded from the National Reading Panel meta-analysis of reading comprehension from 1979 to the present. Despite the dearth of meta-analyses (or their equivalent) in the field of qualitative research, qualitative inquiry remains important in overall educational research. As Pressley (2002) summed: “It [qualitative research] allows for research that evaluates a phenomenon that is actually happening in real schools, not something invented by the researcher who drops into schools for a few days and leaves” (p. 39). In an extensive evaluation of qualitative methods, Erickson (1986) argued that qualitative research elicits: “The immediate and local meanings of actions as defined from the actor’s point of view” (p. 96). The author further stressed that interpretive methods include: “Appropriate when one needs to know more about the specific structure of occurrences rather than their general character and overall distribution” (p. 98). Nevertheless, the National Reading Panel (National Reading Report, 2000) generated considerable debate among reading comprehension researchers when it excluded qualitative studies from a meta-analysis of studies that were identified as representing scientifically-based research, thereby implying that qualitative methods do not count as scientific research. Why might qualitative research in education benefit from being represented in meta-analyses or their equivalent? We believe the answer is that meta-analyses can provide a vehicle for parsimonious summation of evidence in support of the efficacy of instructional approaches or interventions, such as Hillocks’ (1986) comprehensive meta-analysis of the negative or positive impact of grammar instruction on writing performance among K-12 students. Furthermore, in the process of conducting meta-analyses, researchers have an opportunity to evaluate the relationships between claims and evidence. With respect to qualitative research, Eisner and Peshkin (1990) suggested that several challenges may manifest in the process, including the ease with which educational administrators and teachers, parent groups, and/or policymakers are able to interpret data disseminated from qualitative research. The authors observed that cogently and lucidly articulated findings in qualitative research may beguile non-researchers who impact policy and/or decision-making regarding instruction into believing that the findings themselves have value. A second challenge is related to whether or not teacher-research (more often identified as a form of qualitative research) should be included in meta-analyses. Eisner and Peshkin (1990) acknowledged that such research is of particular interest to teachers who might want to conduct similar research in their own classrooms but may not have broader relevance. A third challenge includes identifying, in works intended for publication, the information best to include in descriptions of methodology. How much detail, for example, should the researcher provide about decisions and procedures with respect to data collection or procedures undertaken in preparation for and during data analysis? If the report of research is intended for the research community, or classroom practitioners, or educational policy-makers or administrators, then quite different answers may emerge. On the other hand, if, in the interest of focus and brevity, qualitative researchers choose to omit a detailed account of such procedures, they may find their work challenged on the grounds that their procedures lack transparency or that interpretations of their data are not supported. These challenges are likely to endure since qualitative inquiry generally is process oriented and, according to Denzin and Lincoln (2000): “Focused on qualities of entities, processes and meanings that are not experimentally measured in terms of quantity, amount, intensity, or frequency” (p. 9). The authors also argued that because qualitative researchers take the view of reality as socially constructed, they typically are sensitive to and account: “For the intimate relationship between the researcher and what is studied, and the situational constraints that shape inquiry” (p.9). These are legitimate arguments in support of the particular, rather than the generalizable, and arguments that underscore essential differences between qualitative and experimental/quasi-experimental inquiry. Robinson, Levin, Thomas, Pituch, and Vaughn (2007) expressed concern that, in the absence of comparison groups where intervention in educational settings occurs, qualitative research often does not provide convincing evidence USE OF A CODING MANUAL that such intervention improves educational practice. The authors found that causal conclusions in teaching and learning research journals almost doubled between 1994-2004. The same was true of non-intervention studies and, in the same period, causal conclusions increased in non-intervention studies in which variables were not manipulated. They concluded the following regarding an unfortunate consequence of such reporting: “Our participants made no distinction between experimental, correlational, or qualitative methodologies when it came to the validity of causal claims” (p. 408). Almasi et al. (2002) found that studies included in their meta-interpretive analysis of qualitative research in reading comprehension lacked rigor: “Often these studies simply interviewed or observed individuals and reported findings without any type of data reduction or analysis” (p. 23). These troubling trends also were apparent in the studies included in the intensive narrative analysis of extant research drawn from nine different approaches to small group discussion to which we previously referred. We did not intend to critique the qualitative studies selected for our project; rather, our initial objective was the development of a coding tool that would serve for a meta-analysis of qualitative studies regarding the use of small group discussions as a mechanism for high-level comprehension. In the process of developing that tool, we also wanted to identify what the researchers’ rich contextual descriptions would be able to describe about how discussions might impact high-level thinking. We also ideally hoped that the tool would enable us to efficiently convey the value of qualitative research in providing what Erickson (1986) argued was the ability to provide information about the classroom contexts that operate in the everyday life of the classroom: “The fine shadings of local meaning and social organization that are of primary importance in understanding learning” (p.130). We developed our coding manual of 104 items for two major reasons. First, we designed it in response to Soter’s frustration with the exclusion of qualitative studies from the meta-analysis of quantitative studies previously referenced. Second, as noted previously, Soter had already conducted an exhaustive narrative interpretation in the first year of the project from which the article’s literature review was drawn. That process revealed gaps in what the studies reported, among them, the duration of data collection, the nature of the conceptual/theoretical frameworks, information regarding 271 students and teachers in the studies, and evidence in support of claims for effects. Following is a description of the procedures used to develop the coding manual. Method Development of the Coding Manual The codes and coding categories emerged initially from Erickson’s (1986) criteria for sound qualitative research: (a) adequacy of description of the contexts of the study, (b) adequacy of description of methodology and analysis of data, and (c) adequacy of information regarding the interpretation of the data. We also included codes similar to ones used in the quantitative metaanalysis developed for the project from which our studies were drawn: (a) codes relating to population and sample size, (b) codes delineating other general contextual information, (c) codes for the general nature of the methods used and the theoretical frames of the studies, and (d) codes for whether the studies were conducted by initial proponents of a particular discussion approach or by others who were interested in extending original findings. Briefly, the coding manual contained the following categories: (a) publication and coder information, (b) total sample and special characteristics of sample, (c) research design, (d) research methodology, (e) analysis and interpretation of findings, (f) claimed effects, and (g) evidence for effects. Appendix A contains an excerpt from the coding manual. Sample The final “participants” for this study were 49 studies selected from a total of 74 qualitative studies that were part of a larger research project regarding the use of small group discussions of literary text in the development of high-level comprehension (Wilkinson, Soter, & Murphy, 2004). The studies were drawn from the database that was developed for a larger study on classroom talk funded by the US Department of Education (2003-2006) for which the principal author of the present study was a Co-Principal Investigator. In that larger study, all extant, published research (whether in the form of journal articles, dissertations, or technical reports) related to group discussions of literary texts, and identified as one of nine small group discussion approaches that had a track record for publication, were included. Studies for inclusion were identified through an exhaustive search using PsychoInfo, ERIC, Digital SOTER, CONNORS, RUDGE 272 Dissertations, Social Sciences Citations Index, and Educational Abstracts databases. See Table 1 for study sources. In the present research, all 74 qualitative studies included in this meta-interpretation were drawn from the larger pool. Since we wanted a representative sampling of qualitative studies from all 9 discussion approaches, we chose to select a random sample of 60% of the studies identified in each of the 9 discussion approaches. Studies that were further excluded included those that contained quantitative data (mixed methodology). As studies were eliminated for this reason, they were randomly replaced with others from the remaining 40% of the total pool. The primary purpose of the present study was to determine if we could fairly and informatively assess the efficacy of qualitative studies to provide information about instructional outcomes and to determine whether our coding manual could identify what qualitative studies do and do not tell us about the reported research. Therefore, the studies themselves were not the primary fo- cus of our research. Rather, we used the studies as tools to determine the above two purposes. Procedures for Training Prior to coding, we read each study using precoding reading guidelines (as illustrated in Figure 1) to familiarize ourselves with the content, orientation, theoretical framework, methods, analyses, findings, and interpretations. See Figure 1 on next page. We began our coding procedure with the initial reading of each study followed by a meeting to clarify questions that arose about each study. Next, each of us coded the study independently and then met to discuss discrepancies. After the first two studies were coded in this manner, we met again to discuss coding reliability issues. At this meeting, we shared and entered our recorded notes about the coding process and the use of the manual and made any necessary changes to the manual before proceeding with the next set of two studies. After we coded a total of six studies using this process, our inter-rater reliability had reached TABLE 1: Sources for the Studies Studies were drawn from the following journals Research in the Teaching of English The Reading Teacher Reading Horizons Bilingual Research Journal Reading Research Quarterly Reading and Writing Quarterly Journal of Adolescent and Adult Literacy Remedial and Special Education Primary Voices Journal of Reading Education Reading Research and Instruction The New Advocate Journal of Educational Research The Ohio Reading Teacher Journal of Teacher Education Journal of Reading Behavior Language Art Journal of Moral Education Reading Horizons Journal of Literacy Research Childhood Education Urban Education Mind, Culture, and Activity Thinking Cognition and Instruction 33 studies were drawn from the above journals 6 studies were unpublished dissertations 7 studies were chapters in books published in books by the National Reading Conference or in National Reading Conference Yearbooks 2 Studies were Technical Reports 1 study was a published book All studies focused on the use of group discussions about and around literary texts with the following purposes: fostering engagement with literary texts; improving reading performance, specifically, comprehension. USE OF A CODING MANUAL FIGURE 1: Pre-Coding Reading and Note- Taking Guidelines for Articles to be Coded 1. The scope of the project 2. Subjects/participants (age; gender; ethnicity; number of participants) 3. Context and rationale for the study 4. Focus/research question(s) 5. Theoretical/conceptual framework 6. Research methodology and methods 7. How the data were analyzed 8. How the data were interpreted 9. Appropriateness of data and methodology for the research question(s) 10. The presence (or absence) of qualitative tools such as triangulation, member-checking to establish validity 11. The use of measures for assessing effects against claims 12. Written summary of findings 13. Additional notes criterion (81-84%). We then coded eight more studies (two per week by each coder). Following completion of coding these eight studies, we reached stable agreement with over 80% agreement for each study. The studies that we utilized for this trial period were not used again, nor were their results incorporated in our final results. We coded the remaining 49 studies (drawn from the total pool of 74 qualitative studies) in sets of three per week during a period of six months, employing peer debriefings (Denzin & Lincoln, 2000) and cross-checking of data interpretation throughout the duration of coding. Procedures for coding are provided in Appendix B. 273 appeared to be duplications. Throughout the project, we met weekly and maintained additional contact through telephone and email communication. We made no changes to the coding manual without mutual consultation. When coding was completed, we revisited and discussed all notes and made final changes to codes in the manual. Following initial training in the application of the codes, any changes to the manual were few and of a minor nature. Once we coded the studies, we entered the codes into a specially prepared database using an Excel spreadsheet comprised of 104 columns, one for each code. We entered only the final agreed code, although we retained all original paper coding. Additionally, we entered any notes made in the margins of our paper coding log. To guard against error, we read aloud the information in the database and double-checked this process against the data in the coding log, repeating the process for entry of every five studies. As a final safeguard, we arranged the database to tabulate the sum of the data entered in each column. To ensure accuracy, we then checked these totals against the point totals each question elicited across the 49 studies. As a result, we were able to identify and rectify perceived inconsistencies. We then viewed the data in several ways. As well as counting the responses each question elicited, we noted the data in the form of percentages. We were better able to identify studies conducted by teacher-researchers (nine studies) and academic researchers (40 studies), acknowledging the validity of the argument that Donmoyer (2001) and others have made in support of viewing teacher research as a nested type of qualitative research. However, any grouping of the studies in this way is a post-analysis grouping and did not influence the selection of studies included in our analysis. Data Analysis Procedures for Recording Ongoing Changes to the Coding Manual We recognized that we developed the coding manual in a reiterative process and that we would need to modify and revisit codes even as coding continued. Therefore, we revisited all initial coding of the studies following any changes to the manual. We kept notes about the coding categories, the coding process, issues of ambiguity in codes, and our refinements to coding categories, as well as any codes we eliminated or which To guide our data analysis, we revisited the research questions that framed the study: (a) Can we informatively and fairly assess “effects” claimed by qualitative studies, particularly those that purport to have pedagogical significance, by using a tool developed from criteria drawn from qualitative research to assess the efficacy of that research in providing parsimonious information about instructional outcomes? (b) What information could such a tool and analysis yield, and to what extent can the tool function as a mechanism 274 SOTER, CONNORS, RUDGE for the inclusion of qualitative studies in federal and state meta-analyses of extant research in the field of literacy? To answer these questions, we approached our analysis of the data using the conceptual categories established in the coding manual. For example, data that stem from questions related to participant populations were approached as a group, as were questions that pertained to research methodology and claims for effects. By approaching the data in this manner, we were able to identify recurring themes and patterns and highlight their relevance to the larger questions the study sought to answer. During the process of data analysis, we met regularly to share notes about our pre-coding readings of the studies, discuss the coding process, identify interpretive inconsistencies, and locate instances of disconfirming evidence. Three major thematic findings emerged that related to methodology, analysis and interpretation, and relationships between claims and evidence provided in support of claim. Recall that one of our research questions sought to determine whether it was possible to informatively and fairly assess “effects” qualitative studies claimed, particularly ones that purport to have pedagogical significance, by using a tool developed from criteria drawn from qualitative research to assess the efficacy of that research in providing parsimonious information about instructional outcomes. Our coding manual assessed issues related to credibility, transferability, dependability, and conformability. We considered a study’s credibility high when researchers practiced prolonged engagement, employed triangulation, included negative case analysis, made use of member checks, and validated findings in peer-debriefings. Transferability, on the other hand, was considered high when researchers offered thick, detailed descriptions regarding various facets of the study. Conformability and dependability are related to reliability. For example, if coding was employed, we then asked: “How did the researcher account for agreement as to those codes or provide a reasonable account of how such codes were developed and of the interpretive measures used to draw inferences from them?” Results Despite the emphasis placed on thick description by many proponents of qualitative research (e.g., Denzin & Lincoln, 2000), the majority of the studies investigated either neglected to provide sufficient background information regarding their participant populations or failed adequately to contextualize the settings in which the studies occurred. These issues were, for example, captured in questions that focused on duration or length of time in which data were collected, frequency of data collection, as well as contextual information (e.g., nature of the curriculum). Of the 49 analyzed studies, 33% reported having collected data between a period of 0 and 3 months. Data were reported as having been gathered between a period of 7 and 9 months (the equivalent of one school year) in 20% of the studies, while only 10% reported data collection as having comprised a period of more than 9 months. Surprisingly, 31% of the studies investigated did not account for the duration of data collection altogether. Similar variations were evident in reports pertaining to the frequency with which data were collected. For example, data were said to have been collected once a week in 10% of the studies, twice a week in 12%, and three or more times a week in 10%. Half of the studies failed to account for frequency of data collection and 18% relied on intermittent data collection, which made it difficult to establish a timeline for the investigators’ presence in the classroom with any degree of certainty. The majority of the studies did not provide sufficient demographic information for us adequately to define the sample populations. Of the studies investigated, 43% did not account for the ethnicity of the students involved in the respective studies’ sample populations. Similarly, 53% did not identify the sample population’s socioeconomic status, while 48% did not account for the population’s academic ability level. Additionally, 58% did not identify gender proportions in the sample population. The percentage of limited English-proficiency students involved in the sample populations was not reported in 75% of the case studies, although the majority reported the research as having taken place in an urban setting. The responses generated by question #24 in the coding manual (“What kind of background information was provided about students?”) also attested to the relative dearth of information provided about the studies’ participant populations. For example, 18% of the studies yielded comprehensive background information regarding the participant populations, while 22% provided a fair amount of information. Although 35% of the studies provided limited background information (two or fewer items), 25% failed to provide any background information. CODING MANUAL Background information regarding the teachers who participated in the various studies was equally limited. Of the studies that included a classroom teacher among the sample population, 61% did not provide information regarding the number of years for which the teachers had taught; 18% of the studies provided substantive background information about the teacher(s), 29% yielded limited background information, and 39% failed to provide any such background information. Given the nature of the intervention in all the studies we analyzed (i.e., the use of small group discussions to foster high-level comprehension), we expected to find the role of teachers, and hence the nature and extent of their teaching experience, orientation toward instruction, and professional development, would have been fore-grounded in these studies. That is, successfully implementing and sustaining group discussions as an instructional approach seems to require a significant change in the role of teachers and a level of professional experience and expertise that enables teachers to effectively manage discussions so that learning goals are met. Recognizing that the majority of the studies provided little background information regarding their respective sample populations, one might expect the potential for transferability to be rather low. This problem was further complicated by the lack of information regarding the contexts in which the studies occurred. Nearly half of the studies we examined (49%) did not account for the demographics in which the research took place. Measures We found that, although measures were taken to account for trustworthiness (i.e., triangulation, negative case analysis, member checking), they were not employed with the degree of frequency we would expect to find in the field of qualitative research. Furthermore, while measures of trustworthiness typically were employed during data collection, their utilization extended to data analysis and interpretation less frequently. Of the 49 studies analyzed, we found that efforts to account for credibility generally were strongest in (a) triangulation of data and (b) availability of raw data. Data collection was triangulated in 69% of the coded studies, while methods of analysis were triangulated in 59%. Extensive links to raw data were evident in 73% of the studies, while only 2% provided no links. Although more than half 275 (55%) of the studies accounted for the trustworthiness of data analyses in a substantial way, 22% did so to a limited extent, and 21% of the studies neglected to provide any account for the trustworthiness of their analyses. While instances of negative case analysis (or disconfirming evidence) were identified in 59% of the studies, member checking was used less frequently. Of the 49 coded studies, 6% used member checking on a limited basis, and 31% used the construct in a substantive manner. Surprisingly, 45% of the studies neglected to use member checking when doing so appeared appropriate to us. Member checking did not constitute a part of the methodological approach employed in 18% of the studies that we investigated. That measures used to account for trustworthiness extended less frequently to data analysis and interpretation was further evidenced by responses to two questions in the coding manual. Question #51 asked: “To what extent was the subjectivity of the research/researchers fore-grounded by the researcher?” To answer the question, we considered the triangulation of data sources and analysis and interpretation high in objectivity and teacher or student reports of perceived improvement low in objectivity. Fewer than half (47%) of the studies fore-grounded their subjectivity to a great extent, while 29% did so moderately. On the other hand, subjectivity was foregrounded to a low extent in nearly one quarter (25%) of the studies we investigated. Question #62 in the coding manual sought to determine the extent to which the findings of a study were nuanced: “Are findings nuanced contextually to account for complexity, interrelatedness, ambiguity related to the relativity of findings (e.g., in relation to contextual factors, individual preferences, unpredictable factors)?” We found that, although 49% of the studies nuanced their findings to account for mitigating factors, the majority (51%) did not. Claims for Effects We also found that, although the studies representing the nine different discussion approaches investigated the use of an instructional intervention (i.e., a particular discussion approach), 10% of the studies did not indicate whether the intervention was an established part of the regular curriculum (30% of the studies investigated) or had only recently been introduced (59%). Claims that discussion improved students’ interest in 276 SOTER, CONNORS, RUDGE and engagement with reading and led to probing and inquiry were supported by evidence in 39% of the studies. Claims that discussion led to students’ critique and evaluation of texts read were supported by evidence in 22% of the studies. Other claims made for which evidence was reported included: (a) change in discourse patterns during discussions (33%), (b) students facilitating each others’ learning (29%), and (c), benefits of teacher modeling in discussion (41%). Given that the discussion approaches under investigation had only recently been introduced in the majority of the studies, and considering that the overwhelming majority of the studies (94%) presented claims for the effects of these approaches, one might have expected prolonged, persistent observation to have played a greater role in the methodological designs of the studies. Perhaps the relatively short duration of observations account for the lack of reportable evidence in support of those claims. This general lack of congruence between claims and reportable evidence may suggest that a dilemma in qualitative research is how to capture what is occurring in the live, day-to-day interaction of a classroom being observed and report it in such a way that others who did not participate in the observations are convinced that what is seen and/or heard does indeed count as evidence. Discussion & Implications In light of our findings, we conclude that the coding manual potentially may distinguish between sound and unsound qualitative studies. Whether this tool can function as a mechanism for the inclusion of qualitative studies in federal and state meta-analyses is yet to be argued conclusively. The introduction noted that initially we did not intend for our coding manual to serve as an evaluative function, but in the process of reading and coding the studies subsequently analyzed, we found methodological, analytic, and interpretive inconsistencies in a sufficient number of studies that resulted in a morphing of our tool from a meta-analytic tool to one that is more evaluative in nature. Similar to Almasi et al. (2002), we found that limitations and barriers accompanied decisions to include (or not) studies in a meta-analysis of qualitative research, including: (a) classifying qualitative studies, lack of rigor in research design and analysis, (b) lack of systematicity in data collection and analysis, and (c) limited (if any) data reduction or in-depth analysis. It is all too easy to read qualitative research globally and to lose oneself in the narratives and details that qualitative studies typically provide. A problem arises, however, when such reading leads to the acceptance of victory narratives, a term we use to describe studies that put forth claims in the absence of supporting evidence. The coding manual has the potential to counteract this problem and provide a conceptual framework readers could use to approach qualitative research in a fine-grained way. By directing readers’ attention to a given study’s methodological design, as well as to its claims, the coding manual provides a tool that can be used to tease out information that is (and is not) provided in narrative inquiry. Consequently, it may become possible to distinguish between well designed and poorly designed qualitative studies. We recognize that our study challenges the nature of qualitative inquiry in that we identified areas where the practice of qualitative research has not matched its basic premises and promises. For example, within the studies we analyzed, contextual information relating to the local and particular was often too limited and could not enable the reader to discern how and why learning was or was not occurring in the classrooms being observed (Geertz, 1983). Yet qualitative researchers have typically emphasized that one of the major strengths of qualitative inquiry is that it provides thick descriptions and contextual information that enables researchers to nuance findings. Similarly, despite the intention to honor the grounded nature of qualitative inquiry, our coding categories revealed methodological shortcomings in the analyzed studies. Our goal was not to foreground these shortcomings, but they nevertheless emerged in the process of identifying whether or not the studies provided information that could reasonably be expected to be present in qualitative research. Limitations & Future Research Although we tested the coding manual independently with two teams of coders (the first trial with three previous members of the larger research project) it has yet to withstand scrutiny and the test of application by others who have not been involved in its development. We are mindful of the essentially subjective nature of any form of interpretation, whether in the form of coding categories or in the form of narrative. The tool is also labor-intensive and required painstaking effort, involving many meetings, extensive notetaking, and consultation with the primary investigator. This need raises the question of viability. CODING MANUAL As useful as any such tool may be, we are faced with issues related to resources as Almasi et al. (2002) also found with significantly fewer studies involved in their analyses. Although painstaking care was taken in ensuring reliability in coding, we realize that our work is susceptible to the beliefs and conclusions that drove what we constructed. Our intent was not to critique the studies we included but to raise issues of delivery, utility, and impact. However, our coding scheme raised questions that imply critique, and we have not yet found a way around that problem short of not doing such a study at all. In our project, we noted similar limitations to the kind of work we engaged in, as did Almasi et al. (2002). First, the studies were difficult to classify because the qualitative research we investigated varied widely in terms of goals and in terms of how the research was described and the methodologies used. Second, several of the studies lacked rigor and, as noted by Almasi et al.: “Many of the studies seemed exploratory in nature and occurred across such brief periods of time that prolonged engagement, persistent observation, and/ or the triangulation of data sources were impossible” (p. 23). Third, as also found by Almasi et al. (2002), while many of the studies used qualitative methods, they lacked systematic design, and their findings were often reported without evidence that data were analyzed or reduced where appropriate for analysis. Given the similarity in these conclusions, we suggest that additional research and scholarship along these lines should not focus on foregrounding these as issues in what “counts” as research in the field of qualitative research and what is more clearly conceptual work that draws on restricted data sources for illustrative purposes. We realize that this is potentially contentious but believe that there is a place in the field for a thorough discussion of “quality” in qualitative research. A potentially more contentious discussion is likely to occur about the issue of subjecting teacher research to the same methodological, analytical, and interpretive criteria as those we might apply to qualitative research conducted by academic researchers. While a sound case has been made for teacher research, many of the narratives of classroom practice appear to have quite significant methodological limitations. Wolf’s (1992) implied question of “who is the audience?” may provide a useful focus for this discussion. 277 In conclusion, our research tool may, despite its limitations, be tested as a tool for use in training doctoral students interested in utilizing qualitative research methods in their own work. As the graduate students involved in this project can attest, despite the wealth of reading they have done as part of their research methods courses, it was not until they were forced to pay attention to particulars in a microscopic way that they really came to appreciate both the limitations and the strengths of qualitative inquiry. References Almasi, J., Garas, K., & Shanahan, L. (2002, December). Qualitative research and the Report of the National Reading Panel: No methodology left behind? Paper presented at the 52nd Annual Meeting of the National Reading Conference, Miami, FL. Almasi, J., Palmer, B., Gambrell, I., & Pressley, M. (1994). Toward disciplined inquiry: A methodological analysis of whole language research. Educational Psychologist, 29, 193-202. Denzin N., & Lincoln, Y. (2000). The discipline and practice of qualitative research. In N. Denzin & Y. Lincoln, (Eds.), Handbook of qualitative research (pp. 1-29). Thousand Oaks, CA: Sage. Donmoyer, R. (2001). Paradigm talk reconsidered. In V. Richardson (Ed.), Handbook of research on teaching (pp. 216-251). Washington, D.C.: American Educational Research Association. Eisner, E., & Peshkin, A. (1990). Closing comments on a continuing debate. In E. Eisner & A. Peshkin (Eds.), Qualitative inquiry in education: The continuing debate (pp. 365-370). New York: Teachers College Press. Erickson, F. (1986). Qualitative methods in research on teaching. In W. C. Wittrock, (Ed.), Handbook of research on teaching (pp. 119-161). New York: Macmillan. Geertz, C. (1983). Local knowledge: Further essays in interpretive anthropology. New York: Basic Books. Hillocks, G. (1986). Research on written composition: New directions for teaching. Urbana, IL: ERIC Clearinghouse on Reading and Communication Skills, and The National Conference on Research in English. (ERIC Document Reproduction Service No. ED265552) National Reading Report (2000). Teaching children to read: An evidence- based assessment of the scientific research literature on reading and its implications for reading instruction (Report of the Subgroups). Washington, DC: Department of Health and Human Services, Public Health Services, National Institutes of Health, and the National Institute of Child Health and Human Development. 278 SOTER, CONNORS, RUDGE Pressley, M. (2002).What I have learned up until now about research methods in reading education. In D. Schallert, C. Fairbanks, J. Warton, B. Maloch, & J. Haffre (Eds.), 51st Yearbook of the National Reading Conference (pp.33-43). Oak Creek, WI: National Reading Conference, Inc. Robinson, D., Levin, J., Thomas, G., Pituch, K., & Vaughn, S. (2007). The incidence of “causal” statements in teaching-and-learning research journals. American Educational Research Journal, 44, 400413. Wilkinson, I., Murphy, P., & Soter, A. (2007). Final Grant Performance Report: Group Discussions as a mechanism for promoting high-level comprehension of text. (PR/Award No. R305G020075). Columbus, OH: Ohio State University Research Foundation. Wilkinson, I., Soter, A., & Murphy, P. (2004). Group discussions as a mechanism for promoting highlevel comprehension of text: Grant performance report for year 2 (PR/Award No. R305G020075). Columbus, OH: Ohio State University Research Foundation. Wolf, M. (1992). A thrice-told tale: Feminism, postmodernism, and ethnographic responsibility. Stanford, CA: Stanford University Press. CODING MANUAL 279 Appendix A Coding Manual Excerpt Research Design/Methodology: Qualitative Studies Note: “Cannot Tell” (999) can only be used (if included in the options), in response to the questions that follow if the strategy was used but the researcher does not report an “effect”’ related to its use. Otherwise, indicate “No.” “Not Applicable” means that the feature is not part of the construct and therefore does not apply. We have, however, included some dichotomous responses for features that are typically referred to as “desirable outcomes” in the questions related to “effects/findings” and have not added “not applicable” to these. 21. Type of research design 1 Single case study – individual 2 Single case study – selective small group(s) within a whole class but not all students in a class 3 Single case study – whole class(es) 4 Multiple case studies - individuals 5 Multiple case studies - small group(s) 6 Multiple case studies – whole class(es) 7 Other (specify) 999 Cannot tell Note: if research is conducted by K+ teacher in own classroom, use TR. If the research is conducted by a college/institute researcher, simply indicate type of research. 22. Unit of assignment for application or “treatment” of discussion approach (i.e., who gets the application of the approach) 1 individual participant (e.g., one student, or one teacher) 2 selective small group(s) within a class but not all students in whole class 3 one whole class 4 across a grade (e.g., more than one class of 9th grade) 5 across grades (e.g., more than one grade) 6 not applicable 999 cannot tell 23. Type of student population sampling (reader should be able to infer the purposefulness of the selection) 1 purposeful – a priori selection 2 purposeful – post hoc selection (e.g., selected two participants; selected one post-hoc; snowball phenomena) 3 random 4 other (e.g., combination) 999 cannot tell 24. Type of background information provided about students. 1 comprehensive (e.g., description which includes rich information of personal, academic, attitudinal, behavioral nature) 2 limited (“thin” description but information is provided across a range of information; e.g., personal, academic, attitudinal, behavioral) 3 selective (e.g., two or fewer of a range of information; e.g., personal and academic, or academic and attitudinal, etc.) 4 none provided 280 SOTER, CONNORS, RUDGE Appendix A (continued) 25. Was the population sample group/s pre-tested or academically ranked in some way (e.g., using standardized test scores, reading scores, district scores, teacher evaluations or teacher ratings, etc.)? 1 yes, evidence reported 2 yes, minimal evidence reported 3 no evidence reported (researcher may/may not have a rationale for no evidence, but code will still be ‘no evidence reported’). 26. Was membership of individuals in groups manipulated by gender, SES or ethnicity? 1 yes, evidence reported for manipulation 2 yes, evidence reported for no manipulation 3 no evidence reported 27. Was membership of individuals in groups manipulated according to student academic ability? 1 yes, evidence reported for manipulation 2 yes, evidence reported for no manipulation 3 no evidence reported 28. Was membership of individuals in groups manipulated according to prior experience or not with the approach? 1 yes, evidence reported for manipulation 2 yes, evidence reported for no manipulation 3 no evidence reported APPENDIX B Post-Training Coding and Reliability Procedures 1. For both phases (training and actual) of the coding process, studies were randomly assigned for coding instead of being coded in a pre-determined order in order to avoid (as much as possible) biases towards a specific approach. Studies that replaced others that had been initially selected were similarly randomized. 2. Coders initially read each study individually, making notes on a prepared entry sheet containing specific reading guidelines for coding. 3. Only when note-taking and reading were completed did coders begin individual coding. Throughout the coding process, they reiteratively revisited the study to look for evidence of their responses to the codes. 4. Coders recorded page numbers in studies reflecting features coded to facilitate checking codes and checking reliability. 5. Reliability checks were conducted every week (three studies). 6. One coder marked each score on a prepared coding sheet. Coders discussed disagreements until they decided on an agreed code. Original codes were kept on individual coder sheets. If coders could not reach agreement, the person responsible for notes on that week would take the question to the PI to be resolved. Upon resolution, the result was entered in the prepared recording sheet for all codes for each study. 7. All phone conferences were documented by one of the coders and sent to the PI for response prior to commencing the next set of three studies.