Journal of Ethnographic & Qualitative Research
2008, Vol. 2, 269-280
ISSN: 1935-3308
USE OF A CODING MANUAL WHEN PROVIDING A
META-ANALYSIS OF INTERNAL-VALIDITY MECHANISMS AND
DEMOGRAPHIC DATA REPORTED IN
PEER-REVIEWED EDUCATIONAL JOURNALS
Anna O. Soter
Sean P. Connors
Ohio State University
Ohio State University
Lucila Rudge
Ohio State University
In a prior investigation (Wilkinson, Soter, & Murphy, 2004), we conducted
a meta-analysis of how quantitative studies use small group discussions to
promote high-level thinking. In the present project, our initial intent was to
develop an equivalent mechanism for qualitative studies that focused on the
same subject. However, our efforts to tease out effects-versus-claims and to
identify measures used in the studies led to an evaluative coding manual.
Our findings revealed that the majority of studies we investigated either
neglected to provide sufficient background information regarding their participant populations or failed to contextualize the settings in which the studies occurred.
This article emerged from issues Soter encountered during a 3-year qualitative investigation
(Wilkinson, Soter, & Murphy, 2004, 2007) funded via the Spencer Foundation which, in its first
year, focused on identifying converging evidence
about the use of group discussions to promote
high-level thinking and comprehension of literary
text. In that project, we conducted an intensive
narrative analysis of extant research drawn from
Anna Soter, Ph.D., is Associate Professor of Education at
Ohio State University in Columbus, Ohio.
Correspondence concerning this article should be addressed to soter.1@osu.edu.
Sean P. Connors is a graduate assistant at Ohio State University in Columbus, Ohio.
Lucila Rudge is a graduate associate at Ohio State University in Columbus, Ohio.
Correspondence concerning this article should be addressed to soter.1@osu.edu.
nine different approaches to small group discussion. Subsequently, we included studies that used
quantifiable measures in a meta-analysis to study
effect sizes on the use of discussion and the development of high-level comprehension. We excluded 74 qualitative studies from the meta-analysis
because their methodology precluded inclusion.
Such exclusion raises issues about what is considered scientifically-based evidence (Almasi, Garas,
& Shanahan, 2002).
The exclusion of qualitative studies also eliminates findings that such research foregrounds,
namely, information regarding contexts that not
only may support instruction, but also often may
provide detailed information about participants,
their histories, and factors that may influence performance (Geertz, 1983). Most of the excluded
studies included information about how discussion groups promoted the use of authentic literary
texts to encourage high levels of motivation and
student engagement; eliminating these from meta-
270
SOTER, CONNORS, RUDGE
analyses eliminated potential instructional alternatives in the field of reading comprehension.
A paucity of research exists about the use
of meta-analysis in the field of reading comprehension. To date, the published research includes
two studies (Almasi et al., 2002; Almasi, Palmer, Gambrell, & Pressley, 1994). The authors attempted such an inquiry with 6 qualitative studies in whole-language research and, subsequently,
with 12 qualitative studies that had been excluded
from the National Reading Panel meta-analysis of
reading comprehension from 1979 to the present.
Despite the dearth of meta-analyses (or their
equivalent) in the field of qualitative research,
qualitative inquiry remains important in overall
educational research. As Pressley (2002) summed:
“It [qualitative research] allows for research that
evaluates a phenomenon that is actually happening in real schools, not something invented by the
researcher who drops into schools for a few days
and leaves” (p. 39). In an extensive evaluation of
qualitative methods, Erickson (1986) argued that
qualitative research elicits: “The immediate and
local meanings of actions as defined from the actor’s point of view” (p. 96). The author further
stressed that interpretive methods include: “Appropriate when one needs to know more about
the specific structure of occurrences rather than
their general character and overall distribution”
(p. 98). Nevertheless, the National Reading Panel (National Reading Report, 2000) generated considerable debate among reading comprehension
researchers when it excluded qualitative studies
from a meta-analysis of studies that were identified as representing scientifically-based research,
thereby implying that qualitative methods do not
count as scientific research.
Why might qualitative research in education
benefit from being represented in meta-analyses
or their equivalent? We believe the answer is that
meta-analyses can provide a vehicle for parsimonious summation of evidence in support of the
efficacy of instructional approaches or interventions, such as Hillocks’ (1986) comprehensive
meta-analysis of the negative or positive impact
of grammar instruction on writing performance
among K-12 students.
Furthermore, in the process of conducting
meta-analyses, researchers have an opportunity to
evaluate the relationships between claims and evidence. With respect to qualitative research, Eisner
and Peshkin (1990) suggested that several challenges may manifest in the process, including the
ease with which educational administrators and
teachers, parent groups, and/or policymakers are
able to interpret data disseminated from qualitative research. The authors observed that cogently
and lucidly articulated findings in qualitative research may beguile non-researchers who impact
policy and/or decision-making regarding instruction into believing that the findings themselves
have value.
A second challenge is related to whether or
not teacher-research (more often identified as a
form of qualitative research) should be included
in meta-analyses. Eisner and Peshkin (1990) acknowledged that such research is of particular interest to teachers who might want to conduct similar research in their own classrooms but may not
have broader relevance.
A third challenge includes identifying, in
works intended for publication, the information
best to include in descriptions of methodology.
How much detail, for example, should the researcher provide about decisions and procedures
with respect to data collection or procedures undertaken in preparation for and during data analysis? If the report of research is intended for the research community, or classroom practitioners, or
educational policy-makers or administrators, then
quite different answers may emerge. On the other hand, if, in the interest of focus and brevity,
qualitative researchers choose to omit a detailed
account of such procedures, they may find their
work challenged on the grounds that their procedures lack transparency or that interpretations of
their data are not supported.
These challenges are likely to endure since
qualitative inquiry generally is process oriented and, according to Denzin and Lincoln (2000):
“Focused on qualities of entities, processes and
meanings that are not experimentally measured in
terms of quantity, amount, intensity, or frequency” (p. 9). The authors also argued that because
qualitative researchers take the view of reality as
socially constructed, they typically are sensitive
to and account: “For the intimate relationship between the researcher and what is studied, and the
situational constraints that shape inquiry” (p.9).
These are legitimate arguments in support of the
particular, rather than the generalizable, and arguments that underscore essential differences between qualitative and experimental/quasi-experimental inquiry. Robinson, Levin, Thomas, Pituch,
and Vaughn (2007) expressed concern that, in the
absence of comparison groups where intervention in educational settings occurs, qualitative research often does not provide convincing evidence
USE OF A CODING MANUAL
that such intervention improves educational practice. The authors found that causal conclusions
in teaching and learning research journals almost
doubled between 1994-2004. The same was true
of non-intervention studies and, in the same period, causal conclusions increased in non-intervention studies in which variables were not manipulated. They concluded the following regarding an
unfortunate consequence of such reporting: “Our
participants made no distinction between experimental, correlational, or qualitative methodologies when it came to the validity of causal claims”
(p. 408).
Almasi et al. (2002) found that studies included in their meta-interpretive analysis of qualitative research in reading comprehension lacked
rigor: “Often these studies simply interviewed or
observed individuals and reported findings without any type of data reduction or analysis” (p.
23). These troubling trends also were apparent
in the studies included in the intensive narrative
analysis of extant research drawn from nine different approaches to small group discussion to
which we previously referred. We did not intend
to critique the qualitative studies selected for our
project; rather, our initial objective was the development of a coding tool that would serve for a
meta-analysis of qualitative studies regarding the
use of small group discussions as a mechanism
for high-level comprehension. In the process of
developing that tool, we also wanted to identify
what the researchers’ rich contextual descriptions
would be able to describe about how discussions
might impact high-level thinking. We also ideally
hoped that the tool would enable us to efficiently
convey the value of qualitative research in providing what Erickson (1986) argued was the ability
to provide information about the classroom contexts that operate in the everyday life of the classroom: “The fine shadings of local meaning and
social organization that are of primary importance
in understanding learning” (p.130).
We developed our coding manual of 104 items
for two major reasons. First, we designed it in response to Soter’s frustration with the exclusion of
qualitative studies from the meta-analysis of quantitative studies previously referenced. Second, as
noted previously, Soter had already conducted an
exhaustive narrative interpretation in the first year
of the project from which the article’s literature
review was drawn. That process revealed gaps in
what the studies reported, among them, the duration of data collection, the nature of the conceptual/theoretical frameworks, information regarding
271
students and teachers in the studies, and evidence
in support of claims for effects. Following is a description of the procedures used to develop the
coding manual.
Method
Development of the Coding Manual
The codes and coding categories emerged initially from Erickson’s (1986) criteria for sound
qualitative research: (a) adequacy of description
of the contexts of the study, (b) adequacy of description of methodology and analysis of data,
and (c) adequacy of information regarding the interpretation of the data. We also included codes
similar to ones used in the quantitative metaanalysis developed for the project from which our
studies were drawn: (a) codes relating to population and sample size, (b) codes delineating other
general contextual information, (c) codes for the
general nature of the methods used and the theoretical frames of the studies, and (d) codes for
whether the studies were conducted by initial proponents of a particular discussion approach or by
others who were interested in extending original
findings. Briefly, the coding manual contained the
following categories: (a) publication and coder information, (b) total sample and special characteristics of sample, (c) research design, (d) research
methodology, (e) analysis and interpretation of
findings, (f) claimed effects, and (g) evidence for
effects. Appendix A contains an excerpt from the
coding manual.
Sample
The final “participants” for this study were
49 studies selected from a total of 74 qualitative
studies that were part of a larger research project regarding the use of small group discussions
of literary text in the development of high-level comprehension (Wilkinson, Soter, & Murphy,
2004). The studies were drawn from the database
that was developed for a larger study on classroom talk funded by the US Department of Education (2003-2006) for which the principal author of
the present study was a Co-Principal Investigator.
In that larger study, all extant, published research
(whether in the form of journal articles, dissertations, or technical reports) related to group discussions of literary texts, and identified as one
of nine small group discussion approaches that
had a track record for publication, were included.
Studies for inclusion were identified through an
exhaustive search using PsychoInfo, ERIC, Digital
SOTER, CONNORS, RUDGE
272
Dissertations, Social Sciences Citations Index, and
Educational Abstracts databases. See Table 1 for
study sources.
In the present research, all 74 qualitative
studies included in this meta-interpretation were
drawn from the larger pool. Since we wanted a
representative sampling of qualitative studies
from all 9 discussion approaches, we chose to select a random sample of 60% of the studies identified in each of the 9 discussion approaches. Studies that were further excluded included those that
contained quantitative data (mixed methodology). As studies were eliminated for this reason,
they were randomly replaced with others from the
remaining 40% of the total pool.
The primary purpose of the present study was
to determine if we could fairly and informatively
assess the efficacy of qualitative studies to provide information about instructional outcomes
and to determine whether our coding manual
could identify what qualitative studies do and do
not tell us about the reported research. Therefore,
the studies themselves were not the primary fo-
cus of our research. Rather, we used the studies as
tools to determine the above two purposes.
Procedures for Training
Prior to coding, we read each study using precoding reading guidelines (as illustrated in Figure
1) to familiarize ourselves with the content, orientation, theoretical framework, methods, analyses,
findings, and interpretations. See Figure 1 on next
page.
We began our coding procedure with the initial reading of each study followed by a meeting
to clarify questions that arose about each study.
Next, each of us coded the study independently
and then met to discuss discrepancies. After the
first two studies were coded in this manner, we
met again to discuss coding reliability issues. At
this meeting, we shared and entered our recorded
notes about the coding process and the use of the
manual and made any necessary changes to the
manual before proceeding with the next set of two
studies. After we coded a total of six studies using
this process, our inter-rater reliability had reached
TABLE 1: Sources for the Studies
Studies were drawn from the following journals
Research in the Teaching of English
The Reading Teacher
Reading Horizons
Bilingual Research Journal
Reading Research Quarterly
Reading and Writing Quarterly
Journal of Adolescent and Adult Literacy
Remedial and Special Education
Primary Voices
Journal of Reading Education
Reading Research and Instruction
The New Advocate
Journal of Educational Research
The Ohio Reading Teacher
Journal of Teacher Education
Journal of Reading Behavior
Language Art
Journal of Moral Education
Reading Horizons
Journal of Literacy Research
Childhood Education
Urban Education
Mind, Culture, and Activity
Thinking
Cognition and Instruction
33 studies were drawn from the above journals
6 studies were unpublished dissertations
7 studies were chapters in books published in books by the
National Reading Conference or in National Reading Conference Yearbooks
2 Studies were Technical Reports
1 study was a published book
All studies focused on the use of group discussions about and around literary texts with the following purposes: fostering engagement with literary texts; improving reading performance, specifically,
comprehension.
USE OF A CODING MANUAL
FIGURE 1: Pre-Coding Reading and Note-
Taking Guidelines for Articles to
be Coded
1. The scope of the project
2. Subjects/participants (age; gender; ethnicity; number of participants)
3. Context and rationale for the study
4. Focus/research question(s)
5. Theoretical/conceptual framework
6. Research methodology and methods
7. How the data were analyzed
8. How the data were interpreted
9. Appropriateness of data and methodology
for the research question(s)
10. The presence (or absence) of qualitative
tools such as triangulation, member-checking to establish validity
11. The use of measures for assessing effects
against claims
12. Written summary of findings
13. Additional notes
criterion (81-84%). We then coded eight more
studies (two per week by each coder). Following completion of coding these eight studies, we
reached stable agreement with over 80% agreement for each study. The studies that we utilized
for this trial period were not used again, nor were
their results incorporated in our final results. We
coded the remaining 49 studies (drawn from the
total pool of 74 qualitative studies) in sets of three
per week during a period of six months, employing peer debriefings (Denzin & Lincoln, 2000) and
cross-checking of data interpretation throughout
the duration of coding. Procedures for coding are
provided in Appendix B.
273
appeared to be duplications. Throughout the project, we met weekly and maintained additional
contact through telephone and email communication. We made no changes to the coding manual
without mutual consultation. When coding was
completed, we revisited and discussed all notes
and made final changes to codes in the manual.
Following initial training in the application of the
codes, any changes to the manual were few and
of a minor nature.
Once we coded the studies, we entered the
codes into a specially prepared database using an Excel spreadsheet comprised of 104 columns, one for each code. We entered only the final agreed code, although we retained all original
paper coding. Additionally, we entered any notes
made in the margins of our paper coding log. To
guard against error, we read aloud the information in the database and double-checked this process against the data in the coding log, repeating
the process for entry of every five studies. As a final safeguard, we arranged the database to tabulate the sum of the data entered in each column.
To ensure accuracy, we then checked these totals against the point totals each question elicited
across the 49 studies. As a result, we were able to
identify and rectify perceived inconsistencies.
We then viewed the data in several ways. As
well as counting the responses each question elicited, we noted the data in the form of percentages. We were better able to identify studies conducted by teacher-researchers (nine studies) and
academic researchers (40 studies), acknowledging the validity of the argument that Donmoyer
(2001) and others have made in support of viewing teacher research as a nested type of qualitative research. However, any grouping of the studies in this way is a post-analysis grouping and did
not influence the selection of studies included in
our analysis.
Data Analysis
Procedures for Recording Ongoing Changes to
the Coding Manual
We recognized that we developed the coding manual in a reiterative process and that we
would need to modify and revisit codes even as
coding continued. Therefore, we revisited all initial coding of the studies following any changes to
the manual. We kept notes about the coding categories, the coding process, issues of ambiguity
in codes, and our refinements to coding categories, as well as any codes we eliminated or which
To guide our data analysis, we revisited the
research questions that framed the study: (a)
Can we informatively and fairly assess “effects”
claimed by qualitative studies, particularly those
that purport to have pedagogical significance, by
using a tool developed from criteria drawn from
qualitative research to assess the efficacy of that
research in providing parsimonious information
about instructional outcomes? (b) What information could such a tool and analysis yield, and to
what extent can the tool function as a mechanism
274
SOTER, CONNORS, RUDGE
for the inclusion of qualitative studies in federal and state meta-analyses of extant research in
the field of literacy? To answer these questions,
we approached our analysis of the data using the
conceptual categories established in the coding
manual. For example, data that stem from questions related to participant populations were approached as a group, as were questions that pertained to research methodology and claims for
effects. By approaching the data in this manner,
we were able to identify recurring themes and
patterns and highlight their relevance to the larger
questions the study sought to answer. During the
process of data analysis, we met regularly to share
notes about our pre-coding readings of the studies, discuss the coding process, identify interpretive inconsistencies, and locate instances of disconfirming evidence.
Three major thematic findings emerged that
related to methodology, analysis and interpretation, and relationships between claims and evidence provided in support of claim. Recall that
one of our research questions sought to determine
whether it was possible to informatively and fairly assess “effects” qualitative studies claimed,
particularly ones that purport to have pedagogical significance, by using a tool developed from
criteria drawn from qualitative research to assess
the efficacy of that research in providing parsimonious information about instructional outcomes.
Our coding manual assessed issues related to
credibility, transferability, dependability, and conformability. We considered a study’s credibility
high when researchers practiced prolonged engagement, employed triangulation, included negative case analysis, made use of member checks,
and validated findings in peer-debriefings. Transferability, on the other hand, was considered high
when researchers offered thick, detailed descriptions regarding various facets of the study. Conformability and dependability are related to reliability. For example, if coding was employed, we
then asked: “How did the researcher account for
agreement as to those codes or provide a reasonable account of how such codes were developed
and of the interpretive measures used to draw inferences from them?”
Results
Despite the emphasis placed on thick description by many proponents of qualitative research
(e.g., Denzin & Lincoln, 2000), the majority of the
studies investigated either neglected to provide
sufficient background information regarding their
participant populations or failed adequately to
contextualize the settings in which the studies occurred. These issues were, for example, captured
in questions that focused on duration or length of
time in which data were collected, frequency of
data collection, as well as contextual information
(e.g., nature of the curriculum). Of the 49 analyzed studies, 33% reported having collected data
between a period of 0 and 3 months. Data were reported as having been gathered between a period
of 7 and 9 months (the equivalent of one school
year) in 20% of the studies, while only 10% reported data collection as having comprised a period of more than 9 months. Surprisingly, 31% of
the studies investigated did not account for the
duration of data collection altogether.
Similar variations were evident in reports pertaining to the frequency with which data were collected. For example, data were said to have been
collected once a week in 10% of the studies, twice
a week in 12%, and three or more times a week in
10%. Half of the studies failed to account for frequency of data collection and 18% relied on intermittent data collection, which made it difficult to
establish a timeline for the investigators’ presence
in the classroom with any degree of certainty.
The majority of the studies did not provide
sufficient demographic information for us adequately to define the sample populations. Of the
studies investigated, 43% did not account for the
ethnicity of the students involved in the respective studies’ sample populations. Similarly, 53%
did not identify the sample population’s socioeconomic status, while 48% did not account for
the population’s academic ability level. Additionally, 58% did not identify gender proportions in
the sample population. The percentage of limited English-proficiency students involved in the
sample populations was not reported in 75% of
the case studies, although the majority reported
the research as having taken place in an urban
setting.
The responses generated by question #24 in
the coding manual (“What kind of background information was provided about students?”) also attested to the relative dearth of information provided about the studies’ participant populations.
For example, 18% of the studies yielded comprehensive background information regarding the
participant populations, while 22% provided a
fair amount of information. Although 35% of the
studies provided limited background information
(two or fewer items), 25% failed to provide any
background information.
CODING MANUAL
Background information regarding the teachers who participated in the various studies was
equally limited. Of the studies that included a
classroom teacher among the sample population, 61% did not provide information regarding the number of years for which the teachers
had taught; 18% of the studies provided substantive background information about the teacher(s),
29% yielded limited background information, and
39% failed to provide any such background information. Given the nature of the intervention in
all the studies we analyzed (i.e., the use of small
group discussions to foster high-level comprehension), we expected to find the role of teachers,
and hence the nature and extent of their teaching experience, orientation toward instruction,
and professional development, would have been
fore-grounded in these studies. That is, successfully implementing and sustaining group discussions as an instructional approach seems to require a significant change in the role of teachers
and a level of professional experience and expertise that enables teachers to effectively manage
discussions so that learning goals are met.
Recognizing that the majority of the studies provided little background information regarding their respective sample populations, one
might expect the potential for transferability to be
rather low. This problem was further complicated by the lack of information regarding the contexts in which the studies occurred. Nearly half of
the studies we examined (49%) did not account
for the demographics in which the research took
place.
Measures
We found that, although measures were taken
to account for trustworthiness (i.e., triangulation,
negative case analysis, member checking), they
were not employed with the degree of frequency
we would expect to find in the field of qualitative
research. Furthermore, while measures of trustworthiness typically were employed during data
collection, their utilization extended to data analysis and interpretation less frequently. Of the 49
studies analyzed, we found that efforts to account
for credibility generally were strongest in (a) triangulation of data and (b) availability of raw
data. Data collection was triangulated in 69% of
the coded studies, while methods of analysis were
triangulated in 59%. Extensive links to raw data
were evident in 73% of the studies, while only
2% provided no links. Although more than half
275
(55%) of the studies accounted for the trustworthiness of data analyses in a substantial way, 22%
did so to a limited extent, and 21% of the studies
neglected to provide any account for the trustworthiness of their analyses.
While instances of negative case analysis (or
disconfirming evidence) were identified in 59% of
the studies, member checking was used less frequently. Of the 49 coded studies, 6% used member checking on a limited basis, and 31% used
the construct in a substantive manner. Surprisingly, 45% of the studies neglected to use member
checking when doing so appeared appropriate to
us. Member checking did not constitute a part of
the methodological approach employed in 18% of
the studies that we investigated.
That measures used to account for trustworthiness extended less frequently to data analysis
and interpretation was further evidenced by responses to two questions in the coding manual.
Question #51 asked: “To what extent was the subjectivity of the research/researchers fore-grounded by the researcher?” To answer the question,
we considered the triangulation of data sources
and analysis and interpretation high in objectivity and teacher or student reports of perceived
improvement low in objectivity. Fewer than half
(47%) of the studies fore-grounded their subjectivity to a great extent, while 29% did so moderately. On the other hand, subjectivity was foregrounded to a low extent in nearly one quarter
(25%) of the studies we investigated.
Question #62 in the coding manual sought to determine the extent to which the findings of a study
were nuanced: “Are findings nuanced contextually to account for complexity, interrelatedness, ambiguity related to the relativity of findings (e.g.,
in relation to contextual factors, individual preferences, unpredictable factors)?” We found that,
although 49% of the studies nuanced their findings to account for mitigating factors, the majority (51%) did not.
Claims for Effects
We also found that, although the studies representing the nine different discussion approaches
investigated the use of an instructional intervention (i.e., a particular discussion approach), 10%
of the studies did not indicate whether the intervention was an established part of the regular
curriculum (30% of the studies investigated) or
had only recently been introduced (59%). Claims
that discussion improved students’ interest in
276
SOTER, CONNORS, RUDGE
and engagement with reading and led to probing
and inquiry were supported by evidence in 39%
of the studies. Claims that discussion led to students’ critique and evaluation of texts read were
supported by evidence in 22% of the studies. Other claims made for which evidence was reported included: (a) change in discourse patterns during discussions (33%), (b) students facilitating
each others’ learning (29%), and (c), benefits of
teacher modeling in discussion (41%). Given that
the discussion approaches under investigation
had only recently been introduced in the majority of the studies, and considering that the overwhelming majority of the studies (94%) presented claims for the effects of these approaches, one
might have expected prolonged, persistent observation to have played a greater role in the methodological designs of the studies. Perhaps the relatively short duration of observations account for
the lack of reportable evidence in support of those
claims. This general lack of congruence between
claims and reportable evidence may suggest that a
dilemma in qualitative research is how to capture
what is occurring in the live, day-to-day interaction of a classroom being observed and report it in
such a way that others who did not participate in
the observations are convinced that what is seen
and/or heard does indeed count as evidence.
Discussion & Implications
In light of our findings, we conclude that the
coding manual potentially may distinguish between sound and unsound qualitative studies.
Whether this tool can function as a mechanism
for the inclusion of qualitative studies in federal
and state meta-analyses is yet to be argued conclusively. The introduction noted that initially we
did not intend for our coding manual to serve
as an evaluative function, but in the process of
reading and coding the studies subsequently analyzed, we found methodological, analytic, and interpretive inconsistencies in a sufficient number
of studies that resulted in a morphing of our tool
from a meta-analytic tool to one that is more evaluative in nature. Similar to Almasi et al. (2002),
we found that limitations and barriers accompanied decisions to include (or not) studies in a meta-analysis of qualitative research, including: (a)
classifying qualitative studies, lack of rigor in research design and analysis, (b) lack of systematicity in data collection and analysis, and (c) limited
(if any) data reduction or in-depth analysis.
It is all too easy to read qualitative research
globally and to lose oneself in the narratives and
details that qualitative studies typically provide. A
problem arises, however, when such reading leads
to the acceptance of victory narratives, a term we
use to describe studies that put forth claims in
the absence of supporting evidence. The coding
manual has the potential to counteract this problem and provide a conceptual framework readers
could use to approach qualitative research in a
fine-grained way. By directing readers’ attention
to a given study’s methodological design, as well
as to its claims, the coding manual provides a tool
that can be used to tease out information that is
(and is not) provided in narrative inquiry. Consequently, it may become possible to distinguish between well designed and poorly designed qualitative studies.
We recognize that our study challenges the
nature of qualitative inquiry in that we identified
areas where the practice of qualitative research
has not matched its basic premises and promises.
For example, within the studies we analyzed, contextual information relating to the local and particular was often too limited and could not enable
the reader to discern how and why learning was
or was not occurring in the classrooms being observed (Geertz, 1983). Yet qualitative researchers
have typically emphasized that one of the major
strengths of qualitative inquiry is that it provides
thick descriptions and contextual information that
enables researchers to nuance findings.
Similarly, despite the intention to honor the
grounded nature of qualitative inquiry, our coding
categories revealed methodological shortcomings
in the analyzed studies. Our goal was not to foreground these shortcomings, but they nevertheless
emerged in the process of identifying whether or
not the studies provided information that could
reasonably be expected to be present in qualitative research.
Limitations & Future Research
Although we tested the coding manual independently with two teams of coders (the first trial with three previous members of the larger research project) it has yet to withstand scrutiny
and the test of application by others who have not
been involved in its development. We are mindful of the essentially subjective nature of any form
of interpretation, whether in the form of coding
categories or in the form of narrative. The tool
is also labor-intensive and required painstaking
effort, involving many meetings, extensive notetaking, and consultation with the primary investigator. This need raises the question of viability.
CODING MANUAL
As useful as any such tool may be, we are faced
with issues related to resources as Almasi et al.
(2002) also found with significantly fewer studies
involved in their analyses.
Although painstaking care was taken in ensuring reliability in coding, we realize that our
work is susceptible to the beliefs and conclusions
that drove what we constructed. Our intent was
not to critique the studies we included but to raise
issues of delivery, utility, and impact. However,
our coding scheme raised questions that imply
critique, and we have not yet found a way around
that problem short of not doing such a study at
all.
In our project, we noted similar limitations to
the kind of work we engaged in, as did Almasi et
al. (2002). First, the studies were difficult to classify because the qualitative research we investigated varied widely in terms of goals and in terms of
how the research was described and the methodologies used. Second, several of the studies lacked
rigor and, as noted by Almasi et al.: “Many of
the studies seemed exploratory in nature and occurred across such brief periods of time that prolonged engagement, persistent observation, and/
or the triangulation of data sources were impossible” (p. 23). Third, as also found by Almasi et
al. (2002), while many of the studies used qualitative methods, they lacked systematic design,
and their findings were often reported without evidence that data were analyzed or reduced where
appropriate for analysis.
Given the similarity in these conclusions, we
suggest that additional research and scholarship
along these lines should not focus on foregrounding these as issues in what “counts” as research in
the field of qualitative research and what is more
clearly conceptual work that draws on restricted
data sources for illustrative purposes. We realize
that this is potentially contentious but believe that
there is a place in the field for a thorough discussion of “quality” in qualitative research. A potentially more contentious discussion is likely to occur about the issue of subjecting teacher research
to the same methodological, analytical, and interpretive criteria as those we might apply to qualitative research conducted by academic researchers.
While a sound case has been made for teacher research, many of the narratives of classroom practice appear to have quite significant methodological limitations. Wolf’s (1992) implied question of
“who is the audience?” may provide a useful focus for this discussion.
277
In conclusion, our research tool may, despite
its limitations, be tested as a tool for use in training doctoral students interested in utilizing qualitative research methods in their own work. As
the graduate students involved in this project can
attest, despite the wealth of reading they have
done as part of their research methods courses,
it was not until they were forced to pay attention
to particulars in a microscopic way that they really came to appreciate both the limitations and the
strengths of qualitative inquiry.
References
Almasi, J., Garas, K., & Shanahan, L. (2002, December). Qualitative research and the Report of the National Reading Panel: No methodology left behind?
Paper presented at the 52nd Annual Meeting of the
National Reading Conference, Miami, FL.
Almasi, J., Palmer, B., Gambrell, I., & Pressley, M.
(1994). Toward disciplined inquiry: A methodological analysis of whole language research. Educational Psychologist, 29, 193-202.
Denzin N., & Lincoln, Y. (2000). The discipline and
practice of qualitative research. In N. Denzin & Y.
Lincoln, (Eds.), Handbook of qualitative research
(pp. 1-29). Thousand Oaks, CA: Sage.
Donmoyer, R. (2001). Paradigm talk reconsidered. In V.
Richardson (Ed.), Handbook of research on teaching (pp. 216-251). Washington, D.C.: American Educational Research Association.
Eisner, E., & Peshkin, A. (1990). Closing comments
on a continuing debate. In E. Eisner & A. Peshkin
(Eds.), Qualitative inquiry in education: The continuing debate (pp. 365-370). New York: Teachers
College Press.
Erickson, F. (1986). Qualitative methods in research
on teaching. In W. C. Wittrock, (Ed.), Handbook
of research on teaching (pp. 119-161). New York:
Macmillan.
Geertz, C. (1983). Local knowledge: Further essays in
interpretive anthropology. New York: Basic Books.
Hillocks, G. (1986). Research on written composition:
New directions for teaching. Urbana, IL: ERIC
Clearinghouse on Reading and Communication
Skills, and The National Conference on Research
in English. (ERIC Document Reproduction Service
No. ED265552)
National Reading Report (2000). Teaching children to
read: An evidence- based assessment of the scientific research literature on reading and its implications for reading instruction (Report of the Subgroups). Washington, DC: Department of Health
and Human Services, Public Health Services, National Institutes of Health, and the National Institute of Child Health and Human Development.
278
SOTER, CONNORS, RUDGE
Pressley, M. (2002).What I have learned up until now
about research methods in reading education. In
D. Schallert, C. Fairbanks, J. Warton, B. Maloch,
& J. Haffre (Eds.), 51st Yearbook of the National
Reading Conference (pp.33-43). Oak Creek, WI:
National Reading Conference, Inc.
Robinson, D., Levin, J., Thomas, G., Pituch, K., &
Vaughn, S. (2007). The incidence of “causal” statements in teaching-and-learning research journals.
American Educational Research Journal, 44, 400413.
Wilkinson, I., Murphy, P., & Soter, A. (2007). Final
Grant Performance Report: Group Discussions as a
mechanism for promoting high-level comprehension
of text. (PR/Award No. R305G020075). Columbus,
OH: Ohio State University Research Foundation.
Wilkinson, I., Soter, A., & Murphy, P. (2004). Group
discussions as a mechanism for promoting highlevel comprehension of text: Grant performance
report for year 2 (PR/Award No. R305G020075).
Columbus, OH: Ohio State University Research
Foundation.
Wolf, M. (1992). A thrice-told tale: Feminism, postmodernism, and ethnographic responsibility. Stanford,
CA: Stanford University Press.
CODING MANUAL
279
Appendix A
Coding Manual Excerpt
Research Design/Methodology: Qualitative Studies
Note: “Cannot Tell” (999) can only be used (if included in the options), in response to the questions that follow if the strategy was used but the researcher does not report an “effect”’ related to its use. Otherwise, indicate “No.” “Not Applicable” means that the feature is not part of the construct and therefore does not apply.
We have, however, included some dichotomous responses for features that are typically referred to as “desirable outcomes” in the questions related to “effects/findings” and have not added “not applicable” to these.
21. Type of research design
1 Single case study – individual
2 Single case study – selective small group(s) within a whole class but not all students in a class
3 Single case study – whole class(es)
4 Multiple case studies - individuals
5 Multiple case studies - small group(s)
6 Multiple case studies – whole class(es)
7 Other (specify)
999 Cannot tell
Note: if research is conducted by K+ teacher in own classroom, use TR. If the research is conducted by a
college/institute researcher, simply indicate type of research.
22. Unit of assignment for application or “treatment” of discussion approach (i.e., who gets the application
of the approach)
1 individual participant (e.g., one student, or one teacher)
2 selective small group(s) within a class but not all students in whole class
3 one whole class
4 across a grade (e.g., more than one class of 9th grade)
5 across grades (e.g., more than one grade)
6 not applicable
999 cannot tell
23. Type of student population sampling (reader should be able to infer the purposefulness of the
selection)
1 purposeful – a priori selection
2 purposeful – post hoc selection (e.g., selected two participants; selected one post-hoc; snowball
phenomena)
3 random
4 other (e.g., combination)
999 cannot tell
24. Type of background information provided about students.
1 comprehensive (e.g., description which includes rich information of personal, academic, attitudinal, behavioral nature)
2 limited (“thin” description but information is provided across a range of information; e.g., personal, academic, attitudinal, behavioral)
3 selective (e.g., two or fewer of a range of information; e.g., personal and academic, or academic
and attitudinal, etc.)
4 none provided
280
SOTER, CONNORS, RUDGE
Appendix A (continued)
25. Was the population sample group/s pre-tested or academically ranked in some way (e.g., using standardized test scores, reading scores, district scores, teacher evaluations or teacher ratings, etc.)?
1 yes, evidence reported
2 yes, minimal evidence reported
3 no evidence reported (researcher may/may not have a rationale for no evidence, but code will
still be ‘no evidence reported’).
26. Was membership of individuals in groups manipulated by gender, SES or ethnicity?
1 yes, evidence reported for manipulation
2 yes, evidence reported for no manipulation
3 no evidence reported
27. Was membership of individuals in groups manipulated according to student academic ability?
1 yes, evidence reported for manipulation
2 yes, evidence reported for no manipulation
3 no evidence reported
28. Was membership of individuals in groups manipulated according to prior experience or not with the
approach?
1 yes, evidence reported for manipulation
2 yes, evidence reported for no manipulation
3 no evidence reported
APPENDIX B
Post-Training Coding and Reliability Procedures
1. For both phases (training and actual) of the coding process, studies were randomly assigned for coding instead of being coded in a pre-determined order in order to avoid (as much as possible) biases towards a specific approach. Studies that replaced others that had been initially selected were similarly
randomized.
2. Coders initially read each study individually, making notes on a prepared entry sheet containing specific reading guidelines for coding.
3. Only when note-taking and reading were completed did coders begin individual coding. Throughout
the coding process, they reiteratively revisited the study to look for evidence of their responses to the
codes.
4. Coders recorded page numbers in studies reflecting features coded to facilitate checking codes and
checking reliability.
5. Reliability checks were conducted every week (three studies).
6. One coder marked each score on a prepared coding sheet. Coders discussed disagreements until they
decided on an agreed code. Original codes were kept on individual coder sheets. If coders could not
reach agreement, the person responsible for notes on that week would take the question to the PI to be
resolved. Upon resolution, the result was entered in the prepared recording sheet for all codes for each
study.
7. All phone conferences were documented by one of the coders and sent to the PI for response prior to
commencing the next set of three studies.