Critical Language Testing and Beyond
Critical Language Testing and Beyond
33 l-345, 1998
Pergamon Q 1998 Published by Elsewer Science Ltd. All nghtc reserved
Printed in Great Britain
0191-491X/98 $19.00 + 0.00
SO191-491X(98)00020-0
Elana Shohamy
Introduction
It is only recently that language testers have begun to focus on the role that
language tests play in society. For a long time language testers addressed mostly
measurement issues while overlooking the social and political dimensions of tests. Yet,
according to Messick (1981), tests are not isolated events, rather they are connected to a
whole set of psychological, social and political variables that have an effect on curriculum,
ethicality, social classes, bureaucracy, politics and language knowledge. In the past few
years language testers have begun to show a growing interest in the role that language
tests play in society, specifically addressing the issues of the extent to which tests define
linguistic knowledge, determine membership, classify people, and stipulate criteria for
success and failure of individuals and groups. Topics such as test ethicality, test bias, the
effect of tests on instruction, and the use of tests for power and control, are now being
addressed in research, publications and conferences. A recent issue of the journal
Lunguage Testing addressed test washback, referring to how language tests affect
instruction. A symposium on ethical concerns in language testing was held during the
recent AILA conference in Finland, focusing on the responsibility of language testers
beyond test construction, the conflict between professionalism and morality, between
fairness and validity, the politics of gatekeeping, and the unequal power relations between
test makers and test takers.
Philosophers and educationalists have been considering tests as powerful devices
in society for some time. For Madaus (1991) tests represent a social technology deeply
embedded in education, government and business; as such they provide the mechanism for
enforcing power and control.
331
332 Shohamy
Tests are most powerful as they are often the single indicators for determining the future of
individuals. As criteria for acceptance and rejection, they dominate other educational
devices such as curriculum, textbook and teaching.
Noam (1996), views tests as tools for imposing implicit ideas about success,
knowledge, and ability. He notes that, “How we assess can support careers and make
people successful, but it can also destroy people’s careers, place unfair burden on
individuals’ self perception and unnecessary hurdles in the path of their achievement” (p.
9). For Foucault
Viewing tests in reference to social, educational and political contexts situates the
field of testing in the domain of critical testing. In reference to language testing, it is
referred to here as critical language testing. Borrowing mostly from Pennycook (1994)
and from Kramsch (1993), critical language testing will now be described.
Critical language testing assumes that the act of testing is not neutral. Rather, it is
both a product and an agent of cultural, social, political, educational and ideological
agendas that shape the lives of individual participants, teachers and learners.
Critical language testing views test takers as political subjects in a political context.
It views language tests as tools directly related to levels of success, deeply embedded
in cultural, educational and political arenas where different ideological and social forms
struggle for dominance.
It asks questions about what sort of agendas are delivered through tests and whose
agendas they are.
It claims that language testers need to ask themselves what sort of vision of society
language tests create and what vision of society tests subserve; are language tests
merely intended to fullill predefined curricular or proficiency goals or do they have
other agendas?
Critical Language Testing 333
l It asks questions about whose knowledge the tests are based on. Is what is included
in language tests “truth” to be handed on to test takers, or is it something that can be
negotiated, challenged and appropriated?
l It considers the meaning of language test scores, the degree to which they are
prescriptive, final, or absolute, and the extent to which they are open to discussion and
interpretation.
Research
My own research in the past few years has focused on the consequential validity of
language tests. By using a variety of research designs and employing a variety of data
collection tools, such as interviews, questionnaires, observations and document analysis,
empirical data were collected to document the uses of language tests, their impact and
consequences in educational and political contexts.
The educational context is the natural domain for examining the uses of language
tests as this is where tests are used extensively for decisions about achievement, selection,
certification, graduation and placement.
334 Shohamy
This study (Shohamy, 1996) collected data on the uses and consequences of a
recently introduced national test for reading comprehension in Grades 4 and 5 in Israel.
The declared purpose of the test, as stated in official documents of the Ministry of
Education was:
...the test will enable us to find out about the level of achievement of children
in this important subject. It will help the schools plan their work,... The results
will be used by the Ministry for pedagogical purposes only, for research and
for establishing policy. The data will be confidential and kept in the data base
of the Ministry of Education (1992).
Yet, the undeclared purpose of the test, gathered from interviews with a number of
language inspectors, was to introduce the topic of reading comprehension in the
educational system, to demonstrate authority and to “shake up the system”. The
consequences of the test (after the results indicated that 30% of the population did not
pass the test, i.e., could not read) showed that the test was used for surveillance, for
quantification, classification, standardization, demonstrating authority, imposing sanctions
and controlling learning.
To use terms introduced by Foucault, the test was used for surveillance,
quantification and classification as each student in the country was compelled to be tested
(no permission asked), the results were recorded in numerical form and then classified as
pass or fail. The national education data-base now has records on each and every student
in the country, all classified as either “successes” or “failures”. The test was used for
standardization and the control of learning as teachers and students began to follow
identical formats for teaching-learning reading comprehension (short texts followed by
questions) in preparation for the administration of the test the following year. This test-like
teaching became the new de facto curriculum, overriding the existing curriculum. The test
was used for imposing sanctions, as those obtaining low scores were prevented from
participating in certain class levels or asked to repeat grades. Moreover, teachers, whose
students scored low on the test were removed to other classes; textbooks and programs
cloned the content of the test.
Thus, the declared purpose differed from the actual one; the bureaucratic agenda
was achieved as teachers were in fact teaching reading comprehension; the system was
indeed “shaken up” and the authority of the central body was clearly established. Yet, this
had a high cost in terms of moral and ethical behavior on the part of the authorities. It
should be noted that due to public protest the use of the test was terminated after two
years.
A Test of Arabic
A new national test of Arabic as a foreign language was introduced by the Ministry
of Education to 6th, 7th and 8th grade students (Shohamy, 1993). In this case the
Critical Language Testing 335
Inspector of Arabic publicly declared that the test was introduced for other purposes than
measuring students’ achievements. Specifically, it would be used
l for raising the prestige of the Arabic language;
l standardizing the levels of teaching Arabic;
l forcing teachers to increase the rate of teaching, and to
l increase the motivation of teachers and students.
For the students, it was a low stake test in that the results had no sanctioning power in a
subject considered to be of low prestige. The test had some impact on teaching only
before the first and second administrations. On these occasions teachers stopped teaching
the regular material, replaced the textbooks with test-like worksheets and increased the
use of tests and quizzes in class. But once the test was administered teachers switched to
“regular teaching”.
On examining the impact of the test after a number of years (Shohamy 1994;
Shohamy, Donitsa-Schmidt, & Ferman, 1996) it was found that the test did not fulfill any of
the inspector’s agendas. It did not raise the status and level of the subject, nor did it
increase the number of students studying Arabic (an indication of its prestige). Yet, the
inspector insisted that the test must continue to be administered every year. He feared that
if the test were canceled there would be a drop in the national level of Arabic proficiency
and a decrease in the number of students. He stated that the test promoted the status of
Arabic as perceived by teachers, students and parents. Clearly, the only impact the test had
is that it provided the inspector with a facade of action, with bureaucratic control and
with an excuse for not undertaking meaningful pedagogical action, which was probably
the main reason for introducing the test in the first place.
A Test of EFL
This study (Shohamy, 1993) examined the uses of an EFL (English as a Foreign
Language) oral proficiency test used for graduation from secondary school, the test
consisted of an oral interview, a monologue, and role play. The declared purpose for
introducing the test, as stated by the EFL inspector, was to attract teachers’ attention to
oral language, an area believed by the inspectorate to be overlooked. An earlier study
(Shohamy, 1993) on the impact of the test showed that the goal had been achieved as
teachers indeed spent substantially more tie teaching oral language. Yet, the teaching
included only the very tasks that appeared on the test, namely, interviews, monologue and
role play. It was therefore “oral test language”, substantially narrower than “oral language”,
that had been taught and became the de facto oral knowledge.
In 1996 a slightly modified version of the test was introduced consisting of an
extensive reading component where test takers report on two books they have read. The
role play was replaced by “modified” role play where students ask the tester questions, and
the interview and monologue were replaced by an extended interview. The declared
agenda, stated by the EFL inspector was: “to encourage students to read, to provide an
opportunity for authentic speech and communication and to gauge the pupils’ overall level
of oral proficiency” (Steiner, 1995). The results of a study that examined the effect of this
high stake test (detailed in Shohamy, et al., 1996) showed that the test triggered a
336 Shohamy
The use of language tests for promoting bureaucratic agendas is not unique to the
educational context. Politicians, as well, have discovered what a useful tool a language
test can be for solving complex political issues that they fail to address through regular
policy making. It should also be noted that one feature that makes tests so attractive to the
administrator is that they allow to set cutting scores without having to justify them and
thus create quotas in a flexible manner.
Critical Language Testing 331
For the government of Australia, struggling with the problem of reducing the
number of immigrants and issues related to refugees, language tests provided a very
efficient and practical solution. Two language tests, the ACCESS and the STEP were
introduced for gatekeeping immigrants to Australia and for accepting or rejecting refugees
already residents in Australia.
Hawthorne criticizes such uses of tests in Australia claiming that
The case of the STEP test offers a dramatic illustration of the increasing use of
language testing by Australian authorities to achieve political purposes. . ..the
STEP had a capacity to deliver the Australian government a solution which
was timely, cheap, and administratively simple. ...The federal government was
able to reimpose control over a politically volatile situation, the Australian
legal system was cleared of an unmanageably large backlog of refugee
applications;... Macro-political issues ...clearly have a profound potential
impact on test design, administration and outcomes. I believe they warrant
detailed consideration by applied linguists. Indeed, solutions such as STEP
may prove irresistible under similar circumstances in the future (1997, pp. 257-
258).
With the establishment of Latvia as an independent state and the efforts to create a
cohesive national society, language tests provided efficient barrier against the over 50%
Russians who were living in Latvia with no citizenship. Russians are required to pass strict
language tests in the Latvian language in order to apply for citizenship and for entering
the workplace, a procedure that may lead to a type of ethnic cleansing.
knowledge, and to impose new textbooks and teaching methods. Bureaucrats use tests to
define and standardize language knowledge, to raise proficiency, to communicate
educational agendas, and also to give an illusion of action and an excuse for no action. At
the school level principals use school-wide exams to drive teachers to teach, and teachers,
in their turn, use tests and quizzes to motivate students to learn and to impose discipline.
On the political levels tests are used to create de facto language policies, to raise the status
of some languages and to lower those of others, to control citizenship, to include, exclude,
gatekeep, maintain power, offer simplistic solutions to complex problems, and to raise the
power of nations to be “the best in the world”.
In nearly all cases test givers are organizations, while test takers are
individuals. Test-giving agencies use tests for the purpose of making decisions
or taking actions with reference to test takers - if they are to pass a course,
receive a certificate, be admitted to college, receive a fellowship, get a job or
promotion. That, together with the fact that organizations are more powerful
than individuals, means that the testing situation nearly always places test
givers in a position of power over test takers (1993, p. 19).
Tests are also powerful in that they symbolize social order. For parents who often
do not trust schools and teachers, tests are indications of control and order. For elite
groups, tests provide a means for perpetuating dominance. The paradox is that low status
groups, minorities and immigrants, who are constantly excluded by tests, have an
overwhelming respect for tests and often fight against their abandonment.
Tollefson (1995) identifies three aspects of power: state, discourse and ideology.
Tests represent all three: state power through the bureaucrats, discourse power, as tests are
imposed by unequal individuals (the tester and the test taker), and ideological power by
Critical Language Testing 339
their designation of what is right and what is wrong, what is good knowledge and what is
not, what is worthwhile economically and what is not.
The Consequences
What are the consequences of the fact that tests are used in the ways described in
this article? Of all the consequences, I will focus here on the three which I consider to be
most meaningful.
The negative consequence of using tests in such a way is the creation of two
parallel systems, one manifested through the curriculum or policy documents; the other
reflecting organizational aspirations through tests. These two systems are often in
contradiction with each other and there are many examples of this discrepancy. Consider
the case of a country declaring multilingual policy, yet only one language, say English,
gets tested. In Israel, both Hebrew and Arabic are official languages, yet, on the high
school exit exam Arabs are tested in Hebrew, while Hebrew speakers are not tested in
Arabic. Another example is multilingual Australia’s use of the ACCESS test. Marisi’s (1994)
work shows that Canadian speaking Quebec French are not considered natives by the
ACTFL testing guidelines. Stansfield (1997) demonstrates how elite groups define a
“language of the court” which is detached from reality.
Bernstein (1986) refers to the two parallel systems as primary and secondary:
primary is talk, while secondary is practice, de facto, and more relevant since it has the
enforcing power. There is therefore an “official” story and a “real” story; the latter is
enacted by tests and pushed by bureaucrats, and often not known to the public. It is
clearly the testing policy which is the de facto policy as “tests become targets for
contending parties who seek to maintain or establish a particular vision of what education
and society should be” (Noah & Eckstein, 1992; p. 14).
And Beyond
transferring power from elites and executive authorities who control the
economic and cultural apparatus of society to those producers who yield
power at the local level, and is made concrete through the organization and
exercise of horizontal power in which knowledge needs to be widely shared
through education and other technologies of culture (1995, p. 36).
If tests are to offer true pedagogical assessment, they must adhere to these
democratic principles. There is a need, therefore, for more democratic models of assessment
342 Shohamy
where the power of tests is transferred from elites and executive authorities and shared
with the local levels, test takers, teachers and students.
Some approaches to testing and assessment that are currently proposed follow such
principles of shared power and local representation. In these approaches local groups -
test takers, students, teachers and schools - share power by collecting their own
assessment evidence using multiple assessment procedures (portfolios, self assessment,
projects, observations, and tests).
In some models all the power is transferred from central bodies to local ones. Such is
the case where external examinations are abolished in favor of local and internal
assessment. Yet, such an approach is often criticized as power is simply being transferred
to the teacher, who may also engage in undemocratic behavior in classroom. Broadfoot
(1996) demonstrates how in some situations such approaches may yield an illusion of
democracy, as teachers become the new servants of the central systems, creating, in her
words, “a new order of domination” (p. 87)
Preferable, therefore, is a more democratic model, where power is not transferred
but rather shared with local bodies and is based on a broader representation of different
agents, central and local, who, together, go through a process of contextualization of the
evidence obtained from the different sources. In constructive, interpretive and dialogical
sessions each participant collects data relevant to the assessment and demonstrates it in an
interpretive and contextualized manner. This approach can then be applied on the
national, district or classroom context. Its description suggests that assessment of
students’ achievement ought to be seen as an art, rather than a science in that it is
interpretive, idiosyncratic, interpersonal and relative.
I have experimented with such a model in the language assessment of immigrant
students, where teachers collect data via tests and observations, students obtain evidence
through self assessment and portfolios, and a diagnostic test is administered by a central
body. The information is then gathered, processed and interpreted in an assessment
conference where language teacher, classroom teacher, student and even a parent, discuss
and interpret the information leading to meaningful recommendations for strategies of
language improvement. In this case the assessment follows democratic principles and is
also used for improving language proficiency.
The application of this type of a model in the classroom is especially relevant as it
provides students with the experience of democratic assessment behavior from an early
age. Both students and teachers provide evidence that is indicative of language
performance, to be discussed and evaluated in a conference between teachers and
students. Experiencing such democratic practices in the classroom is especially useful as
children become adults and become aware of the need to guard and protect their rights
from central bodies’ assessment machinery.
Such methods are clearly time consuming and costly. However, democratic practices
are not chosen because of their efficiency or low cost. Rather, they are chosen because of
their principles. This is why central bodies and testing factories always try to resist them.
Tests have grown from being a means to an end into being an end in itself, from
being the instrumental ideology to being the expressive ideology. It is therefore important
not to be naive and to realize that none of these suggested models offers a total solution.
There will always be those who will attempt to retain their dominance by continuing to
Critical Language Testing 343
use tests as part of the ongoing power struggle between individuals and groups and by
using terms such as “standards”, “quality”, ” indicators” and other symbols of social order.
For language testers who now realize that the products they create may be misused
in these struggles, this posesa threat to the ethicality of the profession. Language testers
therefore must get engaged in the discussion. They must actively follow the uses and
consequencesof language tests, and offer assessmentmodels which are more educational,
democratic and ethical in order to minimize misuses. In the sameway that applied linguists
must face up to the fact that there is no neutral applied linguistics, language testers cannot
remove themselves from the consequencesand usesof tests and therefore must also reject
the notion of neutral language testing. Pretending it is neutral only allows those in power
to misuselanguage testing with the very instrument that language testers have provided
them. Language testers must realize that much of the strength of tests is not their technical
quality but the way they are put to use in social and political dimensions. Studies of test
use, as part of test validation, on an on-going basis,are essential for the integrity of the
profession. The unique trait of language testers (as distinguised from testers in general) is
their expertise in their subject matter, language learning. As such, we can become agents
capable of bridging language learning, language testing and language use.
A focus on language testing policies of national educational systems should also be
of special concern to those working in the area of language policy. Researching the
language testing policies of countries and systemscan provide a rich source for exposing
the de facto language policies of nations and systems beyond stated intentions, nice words
and politically correct documents. The extent to which language testing policies actually
reflect language policies is an area that calls for research, which can provide a most useful
indication as to the validity of language policies. Unfortunately, this source has so far
been overlooked in research on language policy.
As was shown in this article, language tests, like languages, provide a reflection, a
mirror, of the complexities and power straggles of society. As such they deserve to be
studied, protected and guarded as part of the process of preserving, and perpetuating
democratic cultures. values and ethics, as well as of quality learning. This is an important
challenge for language testers, applied linguists and policy researcher in years to come.
Note
1. This article is based on a plenary talk the author gave at a meeting of the American Association
of Applied Linguistics (AAAL) held in Orlando, FL., in March 1997.
References
Bernstein, B. (1982). Codes, modalities and the process of cultural reproduction: A model. In
M. Apple (Ed,), Cultural and economic reproduction in educution. London: Routledge and Kegan
Paul.
Hanson, F.A. (1993). Testing testing: Social cnnsequences of the examined life. Berkeley:
University of California Press.
Giroux, H. (1995). Language, difference, and current theory: Beyond the politics and clarity.
In P. McLaren & J. Giarelli, (Eds.), Critical theory and educational research. New York: SUNY
Albany Press.
Kramsch, C. (1993). Context and culture in language teaching. Oxford: Oxford University
Press.
Madaus, G. (1991). Current trends in testing in the USA. Paper presented at the Conference
Testing and Evaluation, Feedback Strategies for Improvement of Foreign Language Learning, The
National Foreign Language Center, Washington DC, 4-5 February.
Marisi, P. (1994). Questions of regionalism in native speakers’ OPI performance: The French
Canadian Experience. Foreign Language Annals, 27 (4) 505-521.
Messick, S. (1981). Evidence and ethics in the evaluation of tests. Educational Researcher 10,
9-20.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13, 241-
257.
Noah, E., & Eckstein, M. (1992). Examinations in comparative and international studies.
Oxford: Pergamon.
MacIntyre, A. (1984). After virtue (2nd ed.). Notre Dame, IN: University of Notre Dame Press.
Shohamy, E. (1993). The power of tests: The impact of language tests on teaching and
learning. Washington, DC: The National Foreign Language Center at Johns Hopkins University.
Critical Language Testing 345
Shohamy, E. (1994). The use of language tests for power and control. In J. Alatis, (Ed.),
Georgetown University round table on language and linguistics (pp. 57-72). Washington, DC:
Georgetown University Press.
Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test impact revisited: Washback effect
over time. Language Testing, 13, 298-317.
Stansfield, C. (1997). Court interpreter certification tests: Problems and solutions. Paper
presented at the Second Language Acquisition Research and Second Language Testing Colloquium.
National Foreign Language Center, Washington, DC, February.
Steiner, J. (1995). Changes in the English bagrut exam. Jerusalem: Israel Ministry of
Education, Culture and Sport, English Inspectorate.
Weiss, C.H. (1977). (Ed.) Using social research in public policy making. Massachusetts:
Lexington.
The Author
ELANA SHOHAMY is a Professor and Chair of the language education program at the
School of Education, Tel Aviv University. Her research is mostly in the area of language
testing, focusing on oral testing, method effect, immigrant testing, alternative assessment,
tests impact and the political and social dimensions of tests. She is currently the co-
director of a language policy center developing and researching language policy in Israel.