[go: up one dir, main page]

0% found this document useful (0 votes)
19 views15 pages

Critical Language Testing and Beyond

utpl

Uploaded by

Santiago Lesano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views15 pages

Critical Language Testing and Beyond

utpl

Uploaded by

Santiago Lesano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Srrrdies in Educational Evaluation. Vol. 24. No. 4, pp.

33 l-345, 1998
Pergamon Q 1998 Published by Elsewer Science Ltd. All nghtc reserved
Printed in Great Britain
0191-491X/98 $19.00 + 0.00
SO191-491X(98)00020-0

CRITICAL LANGUAGE TESTING AND BEYOND’

Elana Shohamy

School of Education, Tel Aviv University, Tel Aviv, Israel

Introduction

It is only recently that language testers have begun to focus on the role that
language tests play in society. For a long time language testers addressed mostly
measurement issues while overlooking the social and political dimensions of tests. Yet,
according to Messick (1981), tests are not isolated events, rather they are connected to a
whole set of psychological, social and political variables that have an effect on curriculum,
ethicality, social classes, bureaucracy, politics and language knowledge. In the past few
years language testers have begun to show a growing interest in the role that language
tests play in society, specifically addressing the issues of the extent to which tests define
linguistic knowledge, determine membership, classify people, and stipulate criteria for
success and failure of individuals and groups. Topics such as test ethicality, test bias, the
effect of tests on instruction, and the use of tests for power and control, are now being
addressed in research, publications and conferences. A recent issue of the journal
Lunguage Testing addressed test washback, referring to how language tests affect
instruction. A symposium on ethical concerns in language testing was held during the
recent AILA conference in Finland, focusing on the responsibility of language testers
beyond test construction, the conflict between professionalism and morality, between
fairness and validity, the politics of gatekeeping, and the unequal power relations between
test makers and test takers.
Philosophers and educationalists have been considering tests as powerful devices
in society for some time. For Madaus (1991) tests represent a social technology deeply
embedded in education, government and business; as such they provide the mechanism for
enforcing power and control.

331
332 Shohamy

Tests are most powerful as they are often the single indicators for determining the future of
individuals. As criteria for acceptance and rejection, they dominate other educational
devices such as curriculum, textbook and teaching.
Noam (1996), views tests as tools for imposing implicit ideas about success,
knowledge, and ability. He notes that, “How we assess can support careers and make
people successful, but it can also destroy people’s careers, place unfair burden on
individuals’ self perception and unnecessary hurdles in the path of their achievement” (p.
9). For Foucault

The examination combines the technique of an observing hierarchy and those


of normalizing judgment. It is a normalizing gaze, a surveillance that makes it
possible to quantify, classify and punish. It establishes over individuals a
visibility through which one differentiates and judges them. That is why, in all
the mechanisms of discipline, the examination is highly ritualized. In it are
combined the ceremony of power and the form of the experiment, the
deployment of force and the establishment of truth. At the heart of the
procedures of disciplines, it manifest the subjection of those who are perceived
as objects and the objectification of those who are subjected (1979, p. 184).

Critical Language Testing

Viewing tests in reference to social, educational and political contexts situates the
field of testing in the domain of critical testing. In reference to language testing, it is
referred to here as critical language testing. Borrowing mostly from Pennycook (1994)
and from Kramsch (1993), critical language testing will now be described.
Critical language testing assumes that the act of testing is not neutral. Rather, it is
both a product and an agent of cultural, social, political, educational and ideological
agendas that shape the lives of individual participants, teachers and learners.

Critical language testing views test takers as political subjects in a political context.

It views language tests as tools directly related to levels of success, deeply embedded
in cultural, educational and political arenas where different ideological and social forms
struggle for dominance.

It asks questions about what sort of agendas are delivered through tests and whose
agendas they are.

It challenges psychometric traditions and considers interpretive ones.

It claims that language testers need to ask themselves what sort of vision of society
language tests create and what vision of society tests subserve; are language tests
merely intended to fullill predefined curricular or proficiency goals or do they have
other agendas?
Critical Language Testing 333

l It asks questions about whose knowledge the tests are based on. Is what is included
in language tests “truth” to be handed on to test takers, or is it something that can be
negotiated, challenged and appropriated?

l It considers the meaning of language test scores, the degree to which they are
prescriptive, final, or absolute, and the extent to which they are open to discussion and
interpretation.

l It perceives language testing as being caught up in an array of questions concerning


educational and social systems; the notion of “just a test” is an impossibility because it
is impossible to separate language testing from the many contexts in which it operates.

Critical language testing broadens the field of language testing by engaging it in a


wider sphere of social dialogue and debate about the forms and practices of language
testing and its relation to language teaching and language learning. This debate implicates
the roles that tests play and have been assigned to play in competing ideologies and their
respective discourses as well as the roles of language testers. In this debate language
testers are drawn towards areas of social processes and power struggles embedded in the
discourses on learning and democracy.
Critical language testing signifies a paradigm shift in language testing in that it
introduces new criteria for the validity of language tests. The consequential, systemic,
interpretive and ethical are a few of the new types of validity, calling for empirical data on
language test use. Thus, tests considered valid in the past, may no longer be so if they are
shown to have negative consequences. As Messick states:

The consequential aspect of construct validity includes evidence and rationale


for evaluating the intended and unintended consequences of score
interpretation and use in both the short and long term, especially those
associated with bias in scoring and interpretation, with unfairness in test use,
and with positive or negative washback effects on teaching and learning (1996,
p. 251).

Research

My own research in the past few years has focused on the consequential validity of
language tests. By using a variety of research designs and employing a variety of data
collection tools, such as interviews, questionnaires, observations and document analysis,
empirical data were collected to document the uses of language tests, their impact and
consequences in educational and political contexts.

The Educational Context

The educational context is the natural domain for examining the uses of language
tests as this is where tests are used extensively for decisions about achievement, selection,
certification, graduation and placement.
334 Shohamy

A Test of Reading Comprehension

This study (Shohamy, 1996) collected data on the uses and consequences of a
recently introduced national test for reading comprehension in Grades 4 and 5 in Israel.
The declared purpose of the test, as stated in official documents of the Ministry of
Education was:

...the test will enable us to find out about the level of achievement of children
in this important subject. It will help the schools plan their work,... The results
will be used by the Ministry for pedagogical purposes only, for research and
for establishing policy. The data will be confidential and kept in the data base
of the Ministry of Education (1992).

Yet, the undeclared purpose of the test, gathered from interviews with a number of
language inspectors, was to introduce the topic of reading comprehension in the
educational system, to demonstrate authority and to “shake up the system”. The
consequences of the test (after the results indicated that 30% of the population did not
pass the test, i.e., could not read) showed that the test was used for surveillance, for
quantification, classification, standardization, demonstrating authority, imposing sanctions
and controlling learning.
To use terms introduced by Foucault, the test was used for surveillance,
quantification and classification as each student in the country was compelled to be tested
(no permission asked), the results were recorded in numerical form and then classified as
pass or fail. The national education data-base now has records on each and every student
in the country, all classified as either “successes” or “failures”. The test was used for
standardization and the control of learning as teachers and students began to follow
identical formats for teaching-learning reading comprehension (short texts followed by
questions) in preparation for the administration of the test the following year. This test-like
teaching became the new de facto curriculum, overriding the existing curriculum. The test
was used for imposing sanctions, as those obtaining low scores were prevented from
participating in certain class levels or asked to repeat grades. Moreover, teachers, whose
students scored low on the test were removed to other classes; textbooks and programs
cloned the content of the test.
Thus, the declared purpose differed from the actual one; the bureaucratic agenda
was achieved as teachers were in fact teaching reading comprehension; the system was
indeed “shaken up” and the authority of the central body was clearly established. Yet, this
had a high cost in terms of moral and ethical behavior on the part of the authorities. It
should be noted that due to public protest the use of the test was terminated after two
years.

A Test of Arabic

A new national test of Arabic as a foreign language was introduced by the Ministry
of Education to 6th, 7th and 8th grade students (Shohamy, 1993). In this case the
Critical Language Testing 335

Inspector of Arabic publicly declared that the test was introduced for other purposes than
measuring students’ achievements. Specifically, it would be used
l for raising the prestige of the Arabic language;
l standardizing the levels of teaching Arabic;
l forcing teachers to increase the rate of teaching, and to
l increase the motivation of teachers and students.

For the students, it was a low stake test in that the results had no sanctioning power in a
subject considered to be of low prestige. The test had some impact on teaching only
before the first and second administrations. On these occasions teachers stopped teaching
the regular material, replaced the textbooks with test-like worksheets and increased the
use of tests and quizzes in class. But once the test was administered teachers switched to
“regular teaching”.
On examining the impact of the test after a number of years (Shohamy 1994;
Shohamy, Donitsa-Schmidt, & Ferman, 1996) it was found that the test did not fulfill any of
the inspector’s agendas. It did not raise the status and level of the subject, nor did it
increase the number of students studying Arabic (an indication of its prestige). Yet, the
inspector insisted that the test must continue to be administered every year. He feared that
if the test were canceled there would be a drop in the national level of Arabic proficiency
and a decrease in the number of students. He stated that the test promoted the status of
Arabic as perceived by teachers, students and parents. Clearly, the only impact the test had
is that it provided the inspector with a facade of action, with bureaucratic control and
with an excuse for not undertaking meaningful pedagogical action, which was probably
the main reason for introducing the test in the first place.

A Test of EFL

This study (Shohamy, 1993) examined the uses of an EFL (English as a Foreign
Language) oral proficiency test used for graduation from secondary school, the test
consisted of an oral interview, a monologue, and role play. The declared purpose for
introducing the test, as stated by the EFL inspector, was to attract teachers’ attention to
oral language, an area believed by the inspectorate to be overlooked. An earlier study
(Shohamy, 1993) on the impact of the test showed that the goal had been achieved as
teachers indeed spent substantially more tie teaching oral language. Yet, the teaching
included only the very tasks that appeared on the test, namely, interviews, monologue and
role play. It was therefore “oral test language”, substantially narrower than “oral language”,
that had been taught and became the de facto oral knowledge.
In 1996 a slightly modified version of the test was introduced consisting of an
extensive reading component where test takers report on two books they have read. The
role play was replaced by “modified” role play where students ask the tester questions, and
the interview and monologue were replaced by an extended interview. The declared
agenda, stated by the EFL inspector was: “to encourage students to read, to provide an
opportunity for authentic speech and communication and to gauge the pupils’ overall level
of oral proficiency” (Steiner, 1995). The results of a study that examined the effect of this
high stake test (detailed in Shohamy, et al., 1996) showed that the test triggered a
336 Shohamy

tremendous impact on classroom activities, time allotment, content and methodology.


Ample new commercial teaching material, designed specifically for the test, and including
video cassettes, TV series, cue-cards and an audio series, was published and marketed.
Teachers claimed to focus their teaching exclusively on the oral skills of the exam stating:
“Of course I teach the tasks for the exam, we have no choice but to teach as dictated by
the exam”. Yet, teachers in the lower levels, whose students did not take the exam, were
more creative in their teaching, using a variety of oral tasks. Most teachers in the upper
grades also reported high anxiety, fear and pressure to cover the material as they felt that
their students’ success or failure was a reflection on them. While teachers were very critical
of the quality of the test, they still appreciated the status attached to it (“The test gives oral
proficiency official status”) and would not want the Ministry to cancel the test.
It should be noted that no changes in the curriculum, teacher training, or teaching
content were introduced. The test became the de facto new curriculum; it prescribed a de
facto new model of teaching methods, and de facto new teaching material, which were all
very different from what was stated in the official curriculum. Thus, the test provided the
educational authorities with a simplistic device to trigger and impose a new educational
policy and practice.
However, for the EFL Inspectorate, the reactions to the test (without ever
examining its real impact) were overwhelming: “The introduction of the oral test was a
great success and created a very positive educational impact as emphasis on oral skills has
been achieved not only in the higher grades but also in the lower grades”. It gave the
Inspectorate much hope about the future of EFL: “We are con&dent that the changes in
the test will result in allowing pupils to become more involved with the English language,
more confident in their abilities to read and write, and above all, will enable pupils to learn
English instead of learning for the matriculation exam” (Steiner, 1995, p. 15).
In all the above cases the language tests were not used to assess language
proficiency. None of them gave any attention to the results in terms of language
proficiency, in none of the cases were students or teachers given any feedback or
diagnosis that could have served as input into language performance. Rather, the
language tests were used as triggers and vehicles by means of which administrators’
agendas could be carried out. It is the power of tests, their mythical and nearly ritualistic
aspect, that enables them to be used by bureaucratic agencies for all the above purposes.

The Political Context

The use of language tests for promoting bureaucratic agendas is not unique to the
educational context. Politicians, as well, have discovered what a useful tool a language
test can be for solving complex political issues that they fail to address through regular
policy making. It should also be noted that one feature that makes tests so attractive to the
administrator is that they allow to set cutting scores without having to justify them and
thus create quotas in a flexible manner.
Critical Language Testing 331

Gatekeeping Immigrants in Australia

For the government of Australia, struggling with the problem of reducing the
number of immigrants and issues related to refugees, language tests provided a very
efficient and practical solution. Two language tests, the ACCESS and the STEP were
introduced for gatekeeping immigrants to Australia and for accepting or rejecting refugees
already residents in Australia.
Hawthorne criticizes such uses of tests in Australia claiming that

The case of the STEP test offers a dramatic illustration of the increasing use of
language testing by Australian authorities to achieve political purposes. . ..the
STEP had a capacity to deliver the Australian government a solution which
was timely, cheap, and administratively simple. ...The federal government was
able to reimpose control over a politically volatile situation, the Australian
legal system was cleared of an unmanageably large backlog of refugee
applications;... Macro-political issues ...clearly have a profound potential
impact on test design, administration and outcomes. I believe they warrant
detailed consideration by applied linguists. Indeed, solutions such as STEP
may prove irresistible under similar circumstances in the future (1997, pp. 257-
258).

Discriminating Ethnic Groups in Latvia

With the establishment of Latvia as an independent state and the efforts to create a
cohesive national society, language tests provided efficient barrier against the over 50%
Russians who were living in Latvia with no citizenship. Russians are required to pass strict
language tests in the Latvian language in order to apply for citizenship and for entering
the workplace, a procedure that may lead to a type of ethnic cleansing.

Raising the Educational Level in the USA

US President Clinton, in the State of the Union Address, delivered on February 4,


1997, offered tests as a most practical solution for the troubled US educational system. He
proposed that “to help schools meet the standards and measure their progress, we will lead
an effort over the next two years to develop national tests of students’ achievement in
reading and math. Every state should adopt high national standards, so by 1999 every
fourth grade student will be tested in reading and every eighth grader in math to make
sure these standards are met”. Thus, the US government was planning to allocate large
resources for development of individual tests for all students, believing that these tests will
upgrade the deteriorated US educational system. As President Clinton concludes: “....when
we aim high and challenge our students, they will be the best in the world”.
If tests are meant to assess language, this certainly cannot be learned from any of
the above cases; instead they underscore the main concerns of CLT. Tests are used for
other purposes, for serving diverse political and bureaucratic agendas in different contexts
and on various levels. Policy makers in central agencies, aware of the authoritative power
of tests use them to manipulate educational systems, to control curricula, to create new
338 Shohamy

knowledge, and to impose new textbooks and teaching methods. Bureaucrats use tests to
define and standardize language knowledge, to raise proficiency, to communicate
educational agendas, and also to give an illusion of action and an excuse for no action. At
the school level principals use school-wide exams to drive teachers to teach, and teachers,
in their turn, use tests and quizzes to motivate students to learn and to impose discipline.
On the political levels tests are used to create de facto language policies, to raise the status
of some languages and to lower those of others, to control citizenship, to include, exclude,
gatekeep, maintain power, offer simplistic solutions to complex problems, and to raise the
power of nations to be “the best in the world”.

What Gives Tests So Much Power?

Tests use the language of numbers and numbers enable quantification,


classification, normalization and the standardization of people according to a common
yardstick . Numbers are symbols of objectivity, scientificity and rationality, all features
which those feed into illusions of truth, trust, legitimacy, status and authority. Broadfoot
(1996) states that, “This very objectivity, this recourse to specific rationality, lends to the
assessment a legitimacy which makes it hard to refute” (p. 86). Yet, the public is unaware
that numbers are subjective, scientism is relative and success and failure are determined by
arbitrary cutting scores. Furthermore, the information gleaned from tests is then used by
bureaucrats to support their beliefs. MacIntyre (1984) states that since the aim of the
bureaucrat is to adjust means to ends in the most economical and efficient way, s/he will
deploy scientific knowledge organized in terms of, and comprising a set of, universal law-
like generalizations to support this aim.
The power of tests also lies in the ownership of the information. The tester, not the
test taker, owns the testing information, the “the scientific knowledge”. The only numbers
a test taker can refer to are those provided by the tester and these can only be challenged
if one has “counter numbers”. Hanson notes that:

In nearly all cases test givers are organizations, while test takers are
individuals. Test-giving agencies use tests for the purpose of making decisions
or taking actions with reference to test takers - if they are to pass a course,
receive a certificate, be admitted to college, receive a fellowship, get a job or
promotion. That, together with the fact that organizations are more powerful
than individuals, means that the testing situation nearly always places test
givers in a position of power over test takers (1993, p. 19).

Tests are also powerful in that they symbolize social order. For parents who often
do not trust schools and teachers, tests are indications of control and order. For elite
groups, tests provide a means for perpetuating dominance. The paradox is that low status
groups, minorities and immigrants, who are constantly excluded by tests, have an
overwhelming respect for tests and often fight against their abandonment.
Tollefson (1995) identifies three aspects of power: state, discourse and ideology.
Tests represent all three: state power through the bureaucrats, discourse power, as tests are
imposed by unequal individuals (the tester and the test taker), and ideological power by
Critical Language Testing 339

their designation of what is right and what is wrong, what is good knowledge and what is
not, what is worthwhile economically and what is not.

The Consequences

What are the consequences of the fact that tests are used in the ways described in
this article? Of all the consequences, I will focus here on the three which I consider to be
most meaningful.

The Quality of the Knowledge


The knowledge created through tests is often referred to as “institutionalized
knowledge”; its main characteristics are that it is narrow, simplistic and often different from
experts’ knowledge. After all, the information tapped by tests is only a representation of
real knowledge; it is monologic, based on one instrument (a test) which is used on one
occasion, detached from a meaningful context, and usually with no feedback for
improvement. As one language educator reacted, in regard to the introduction of state-
wide exams requiring high school students to pass for graduation:

Originally envisioned as performance assessment tied to forward looking


learning goals in each of four disciplines, feasibility constraints associated with
costs, psychometric issues, and quick turn-around times (critical for students
for graduation) will require close-ended, multiple choice tests. These tests, in
turn, are likely to result in impoverished curriculum and instruction. Teachers
teach to the tests and reduce the curriculum to that which is tested, despite
successful efforts over the last decade to improve instruction through
constructivist and related performance-based approaches to teaching.

Using tests to determine de facto knowledge provides “a quick fix”, an instant


solution that overlooks the complexities of subject matter and is unmeaningful for repair.
Weiss (1977) differentiates between instrumental impact of tests, characterized as short-
range and goal oriented, and conceptual impact which is long range and meaningful,
followed by discussions on the nature of the tested topic, methods of teaching, and agreed
upon criteria of quality. In none of the tests reported above, was there any serious
discussion with teachers or students about the tested topics, whether they were learnable
or measurable. For bureaucrats, these simplistic, instant solutions are very attractive in that
they offer instant evidence of impact in their usually short terms in office. As Freire (1985)
states: “The more bureaucratic the evaluators are, not just from an administrative point of
view but above all from an intellectual view, the narrower and more inspection like the
evaluation will be” (pp. 23-24).
340 Shohamy

Creating Parallel Systems

The negative consequence of using tests in such a way is the creation of two
parallel systems, one manifested through the curriculum or policy documents; the other
reflecting organizational aspirations through tests. These two systems are often in
contradiction with each other and there are many examples of this discrepancy. Consider
the case of a country declaring multilingual policy, yet only one language, say English,
gets tested. In Israel, both Hebrew and Arabic are official languages, yet, on the high
school exit exam Arabs are tested in Hebrew, while Hebrew speakers are not tested in
Arabic. Another example is multilingual Australia’s use of the ACCESS test. Marisi’s (1994)
work shows that Canadian speaking Quebec French are not considered natives by the
ACTFL testing guidelines. Stansfield (1997) demonstrates how elite groups define a
“language of the court” which is detached from reality.
Bernstein (1986) refers to the two parallel systems as primary and secondary:
primary is talk, while secondary is practice, de facto, and more relevant since it has the
enforcing power. There is therefore an “official” story and a “real” story; the latter is
enacted by tests and pushed by bureaucrats, and often not known to the public. It is
clearly the testing policy which is the de facto policy as “tests become targets for
contending parties who seek to maintain or establish a particular vision of what education
and society should be” (Noah & Eckstein, 1992; p. 14).

Unethical and Undemocratic Ways of Making Policy

Using tests as de facto policies is undemocratic and unethical. The agenda


represents those in power, the elite; often it is not declared publicly and openly, but is
dictated from above. Those who are affected by the tests - teachers and the test takers are
excluded.
The following illustrates the unfolding of such a process. Policy makers believe a
certain language area, say reading comprehension, should be taught. This decision is often
a reaction to public or media demands, a conference an inspector has attended or a wish to
demonstrate control and action. A language test is then introduced and individuals are
forced to participate, as is the practice in national surveys. Hence, the test becomes a
means through which the policy makers communicate priorities to the system. In a high
stake situation (i.e., when the results of the test are used for important decisions about
individuals or programs), teachers react by teaching to test. They experience fear and
anxiety as students, principals, and parents demand preparation for the high-stake test.
Unsure of how to do this, teachers turn to the most immediate pedagogical source, the test
itself, to learn how to carry out these new orders. The test becomes the single most
influential pedagogical source, and the de facto knowledge. Teachers are reduced to
“following orders”; a frustrating role as their responsibility increases while their authority
diminishes. The test then becomes the device through which control is exercised
authoritatively, legitimizing the power of bureaucrats and other elite groups, in an
undemocratic, unethical way, usually with no resentment whatsoever on the part of those
who are affected by it.
Critical Language Testing 341

And Beyond

Language testing is crucially affected by a number of major polarities: between the


need of central agencies for control, and individuals’ desire for freedom; between the urge
of groups for a common unifying language, and multilingual tolerance; between public
need for symbolic devices of social order, and individuals and groups’ need for personal
expressions; between increased control in growing technological societies and fluid and
relative language knowledge. There are, therefore, different views regarding the future of
testing.
There are those who believe that the testing era is over, that there is no room for
such authoritarian tools in a post-modern, multicultural society where knowledge is
relative and fluid and where groups, linguistic and others, demand legitimacy for their
knowledge, identity and rights. Instruments which rely on standardization of whole
populations pose major obstacles to self expression especially when it is public knowledge
that tests are abused for promoting private agendas and for maintaining the power of the
elites.
On the other hand there are those who think to the contrary, specifically stating
that tests will continue to exist with more power and control than ever. After all, tests can
be extremely beneficial tools for battling those who demand to share power. It is easy to
fight the proponents of Ebonic with a standardized English test, for example; or prevent
bilingual education by introducing an English test as entrance criterion to college, as part
of the “English Only” movement.
Broadfoot (1996) is even more far-reaching in her view of the future role of tests,
arguing that the combination of technology and bureaucracy will enable central groups to
further increase their power as testing “moves away from overtly political judgments about
educational policy in favor of a technocratic ideology which legitimates policy decisions
in terms of an objective, rational process of decision making” leading to the growing
“powerlessness of the individual to resist the effects of an increasingly intrusive state
machinery” (pp. 217-218).
Yet, there is also another view, according to which tests are here to stay but in a
different shape and form. This view builds on the true power of tests, that of offering
pedagogical benefits in the form of feedback, leading to more effective learning and
teaching. Here the assumption is that tests can be used for beneficial and constructive
purposes, as long as we stay on guard against central bodies who try to use them in
unethical and undemocratic ways.
Giroux writes that democracy takes up the issue of

transferring power from elites and executive authorities who control the
economic and cultural apparatus of society to those producers who yield
power at the local level, and is made concrete through the organization and
exercise of horizontal power in which knowledge needs to be widely shared
through education and other technologies of culture (1995, p. 36).

If tests are to offer true pedagogical assessment, they must adhere to these
democratic principles. There is a need, therefore, for more democratic models of assessment
342 Shohamy

where the power of tests is transferred from elites and executive authorities and shared
with the local levels, test takers, teachers and students.
Some approaches to testing and assessment that are currently proposed follow such
principles of shared power and local representation. In these approaches local groups -
test takers, students, teachers and schools - share power by collecting their own
assessment evidence using multiple assessment procedures (portfolios, self assessment,
projects, observations, and tests).
In some models all the power is transferred from central bodies to local ones. Such is
the case where external examinations are abolished in favor of local and internal
assessment. Yet, such an approach is often criticized as power is simply being transferred
to the teacher, who may also engage in undemocratic behavior in classroom. Broadfoot
(1996) demonstrates how in some situations such approaches may yield an illusion of
democracy, as teachers become the new servants of the central systems, creating, in her
words, “a new order of domination” (p. 87)
Preferable, therefore, is a more democratic model, where power is not transferred
but rather shared with local bodies and is based on a broader representation of different
agents, central and local, who, together, go through a process of contextualization of the
evidence obtained from the different sources. In constructive, interpretive and dialogical
sessions each participant collects data relevant to the assessment and demonstrates it in an
interpretive and contextualized manner. This approach can then be applied on the
national, district or classroom context. Its description suggests that assessment of
students’ achievement ought to be seen as an art, rather than a science in that it is
interpretive, idiosyncratic, interpersonal and relative.
I have experimented with such a model in the language assessment of immigrant
students, where teachers collect data via tests and observations, students obtain evidence
through self assessment and portfolios, and a diagnostic test is administered by a central
body. The information is then gathered, processed and interpreted in an assessment
conference where language teacher, classroom teacher, student and even a parent, discuss
and interpret the information leading to meaningful recommendations for strategies of
language improvement. In this case the assessment follows democratic principles and is
also used for improving language proficiency.
The application of this type of a model in the classroom is especially relevant as it
provides students with the experience of democratic assessment behavior from an early
age. Both students and teachers provide evidence that is indicative of language
performance, to be discussed and evaluated in a conference between teachers and
students. Experiencing such democratic practices in the classroom is especially useful as
children become adults and become aware of the need to guard and protect their rights
from central bodies’ assessment machinery.
Such methods are clearly time consuming and costly. However, democratic practices
are not chosen because of their efficiency or low cost. Rather, they are chosen because of
their principles. This is why central bodies and testing factories always try to resist them.
Tests have grown from being a means to an end into being an end in itself, from
being the instrumental ideology to being the expressive ideology. It is therefore important
not to be naive and to realize that none of these suggested models offers a total solution.
There will always be those who will attempt to retain their dominance by continuing to
Critical Language Testing 343
use tests as part of the ongoing power struggle between individuals and groups and by
using terms such as “standards”, “quality”, ” indicators” and other symbols of social order.
For language testers who now realize that the products they create may be misused
in these struggles, this posesa threat to the ethicality of the profession. Language testers
therefore must get engaged in the discussion. They must actively follow the uses and
consequencesof language tests, and offer assessmentmodels which are more educational,
democratic and ethical in order to minimize misuses. In the sameway that applied linguists
must face up to the fact that there is no neutral applied linguistics, language testers cannot
remove themselves from the consequencesand usesof tests and therefore must also reject
the notion of neutral language testing. Pretending it is neutral only allows those in power
to misuselanguage testing with the very instrument that language testers have provided
them. Language testers must realize that much of the strength of tests is not their technical
quality but the way they are put to use in social and political dimensions. Studies of test
use, as part of test validation, on an on-going basis,are essential for the integrity of the
profession. The unique trait of language testers (as distinguised from testers in general) is
their expertise in their subject matter, language learning. As such, we can become agents
capable of bridging language learning, language testing and language use.
A focus on language testing policies of national educational systems should also be
of special concern to those working in the area of language policy. Researching the
language testing policies of countries and systemscan provide a rich source for exposing
the de facto language policies of nations and systems beyond stated intentions, nice words
and politically correct documents. The extent to which language testing policies actually
reflect language policies is an area that calls for research, which can provide a most useful
indication as to the validity of language policies. Unfortunately, this source has so far
been overlooked in research on language policy.
As was shown in this article, language tests, like languages, provide a reflection, a
mirror, of the complexities and power straggles of society. As such they deserve to be
studied, protected and guarded as part of the process of preserving, and perpetuating
democratic cultures. values and ethics, as well as of quality learning. This is an important
challenge for language testers, applied linguists and policy researcher in years to come.

Note

1. This article is based on a plenary talk the author gave at a meeting of the American Association
of Applied Linguistics (AAAL) held in Orlando, FL., in March 1997.

References

Bernstein, B. (1982). Codes, modalities and the process of cultural reproduction: A model. In
M. Apple (Ed,), Cultural and economic reproduction in educution. London: Routledge and Kegan
Paul.

Bernstein, B. (1986). On pedagogical discourse. In J. Richardson (Ed.), Handbook for theoq


and research for the sociology of education. New York: Greenwood Press.
344 Shohamy

Broadfoot, P. (1996). Education, assessment and society: A sociological analysis.


Buckingham: Open University Press.

Clinton, W. (February 4, 1997). State of the Union Address.

Foucault, M. (1979). Discipline and punishment. New York: Vintage.

Freire, P. (1985). The politics of education. Massachusetts: Bergin & Gravey.

Hanson, F.A. (1993). Testing testing: Social cnnsequences of the examined life. Berkeley:
University of California Press.

Hawthorne, L. (1997). The political dimension of English language testing in Australia.


Language Testing, 14 (3), 248-260.

Giroux, H. (1995). Language, difference, and current theory: Beyond the politics and clarity.
In P. McLaren & J. Giarelli, (Eds.), Critical theory and educational research. New York: SUNY
Albany Press.

Kramsch, C. (1993). Context and culture in language teaching. Oxford: Oxford University
Press.

Madaus, G. (1991). Current trends in testing in the USA. Paper presented at the Conference
Testing and Evaluation, Feedback Strategies for Improvement of Foreign Language Learning, The
National Foreign Language Center, Washington DC, 4-5 February.

Marisi, P. (1994). Questions of regionalism in native speakers’ OPI performance: The French
Canadian Experience. Foreign Language Annals, 27 (4) 505-521.

Messick, S. (1981). Evidence and ethics in the evaluation of tests. Educational Researcher 10,
9-20.

Messick, S. (1996). Validity and washback in language testing. Language Testing, 13, 241-
257.

Noah, E., & Eckstein, M. (1992). Examinations in comparative and international studies.
Oxford: Pergamon.

Noam, G. (1996). (Moderator). Assessment at a crossroads: Conversation. Harvard Educational


Review , 66, 631-657.

MacIntyre, A. (1984). After virtue (2nd ed.). Notre Dame, IN: University of Notre Dame Press.

Pennycook, A. (1994). The cultural politics of English as an international language. New


York: Longman.

Shohamy, E. (1993). The power of tests: The impact of language tests on teaching and
learning. Washington, DC: The National Foreign Language Center at Johns Hopkins University.
Critical Language Testing 345

Shohamy, E. (1994). The use of language tests for power and control. In J. Alatis, (Ed.),
Georgetown University round table on language and linguistics (pp. 57-72). Washington, DC:
Georgetown University Press.

Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test impact revisited: Washback effect
over time. Language Testing, 13, 298-317.

Stansfield, C. (1997). Court interpreter certification tests: Problems and solutions. Paper
presented at the Second Language Acquisition Research and Second Language Testing Colloquium.
National Foreign Language Center, Washington, DC, February.

Steiner, J. (1995). Changes in the English bagrut exam. Jerusalem: Israel Ministry of
Education, Culture and Sport, English Inspectorate.

Tollefson, J. (1995). Introduction: Language policy, power, and inequality. In J. Tollefson


(Ed.), Power and inequality in language education. London: Cambridge University Press.

Weiss, C.H. (1977). (Ed.) Using social research in public policy making. Massachusetts:
Lexington.

The Author

ELANA SHOHAMY is a Professor and Chair of the language education program at the
School of Education, Tel Aviv University. Her research is mostly in the area of language
testing, focusing on oral testing, method effect, immigrant testing, alternative assessment,
tests impact and the political and social dimensions of tests. She is currently the co-
director of a language policy center developing and researching language policy in Israel.

You might also like