Shiken: JALT Testing & Evaluation SIG Newsletter. 13 (2) May 2009 (p. 9 - 14)
At the LTTC, I am currently heading the testing department, where I focus on test
validation, research and development for the General English Proficiency Test (GEPT) and the
other foreign language testing programs conducted by the center.
How has the language testing field changed since you first got involved? Are there any
trends that concern you?
In recent years, large-scale standardized EFL tests have been adopting a task-based
performance assessment approach to test development. An increasing number of large-scale
tests now include direct speaking and writing as compulsory components to measure
test-takers English ability. This trend has introduced a number of improvements to language
testing in Taiwan. These include more communicative skills in the test content to ascertain
examinees use English for communicative purposes, increasing the proportion of constructed
response items, and developing new item formats to promote positive classroom washback
Another trend is that many tests are now delivered via computer, and some even employ
computer-adaptive tests or web-based language technology based on IRT psychometric
models. Some advocates of computerized language testing suggest that it could replace all
existing language tests. However, I agree with Alan Daviss observation that computerized
language testing is a new genre that can exist with older modalities but not replace them.
. . . it is important to consider the social
Recently, Ive been thinking a lot about the
context and ethicality of test use, and these
societal implications of language testing. The
are fundamental questions for language
need to take account of social context has
testers in the 21st century . . .
been discussed by Alan Davies, Bernard
Spolsky, Elana Shohamy, and Tim McNamara. I
agree that it is important to consider the social context and ethicality of test use, and these are
fundamental questions for language testers in the 21st century to reflect on. Such questions
focus on test use rather than form.
Language testers involved in the development of high-stakes tests should recognize the
fact that test results are powerful, and remain skeptical about the validity of our tests.
Therefore, we should collect evidence to support the reliability and validity of any tests, and
further, to justify the use of our tests. Alan Davis put it quite well that testing is not the same as
teaching, which means that as language testers we can not help or encourage learners directly
the way classroom teachers do, but we can collect the right evidence to help and encourage
If you had the power to change any one thing about language testing in your country, what
would it be?
Taiwan is an examination-oriented society like China, Hong Kong, Japan, Korea, and
India where examinations have long been used as tools to facilitate better teaching and
learning. Language tests can play a powerful role in influencing teaching and learning, as the
GEPT clearly shows. However, every coin has two faces. The more power a test has (the
higher the stakes), the more likely the test is to be over-used or mis-used. Weve observed a
number of emerging negative consequences of the GEPT, and as its developer, we are
wondering: How responsible is the test developer for the uses and misuses of tests? What
should the role of the test developer be once misuses are identified?
In the past, people tended to believe that it was not the testers responsibility to worry
about the test takers after a test had been handed to the users. However, I think that testers and
stakeholders should share the responsibility to guard against test misuses. It is definitely
necessary to have better communication among testers and stakeholders (teachers,
researchers, test-takers, score users). For test developers, it is also necessary to disclose
information about test-takers that is relevant to educators and the decisions they have to make.
Testers and stakeholders should work collaboratively to maximize the beneficial
consequences of the test and to minimize the unintended consequences of the test.
Another thing that Id like to see change about language testing in Taiwan is the
development of a code of ethics. Like the ILTA Code of Ethics, I hope that a code of standards
for the profession of language testing can be developed in my country. A code of ethics is
important because it would demonstrate to the members of our profession what the standards
are. It would operate not only as a reminder of what members of the profession should expect
of themselves and of one another, but also to demonstrate these standards to others.
The GEPT, which nearly 14% of the Taiwanese people have taken, has been widely used for
various purposes such as university admissions, academic placement, graduation criteria,
hiring, and promotions. As a person who helped develop that test, what do you feel it was
designed to accomplish?
Youre right that the GEPT has become a household name in Taiwan in both educational
and professional circles. To date, a total of 3.3 million Taiwanese EFL learners have registered
for the test since its launch in 2000. As someone who has been involved in its development,
Im proud to witness its success. The GEPT started as an in-house research project at the LTTC
in 1997. Aspiring to develop a public language test that could induce beneficial washback for
EFL classes in Taiwan, the LTTC invited a number of well-established EFL educators from
different parts of the country to form the GEPT Advisory Board and the GEPT Research
Committee. Two years later, Taiwans Ministry of Education recognized that these efforts were
in accord with its promotion of lifelong learning and therefore decided to sponsor the GEPT
project. Without the support of the GEPT Advisory Board, the GEPT Research Committee, and
the government, the GEPT could never have come to fruition in such a short period.
The GEPT is a five-level criterion-referenced EFL testing system that was developed in
response to comments by educators and by employers from various industries about the
general lack of ability to communicate in English due to old-fashioned approaches to English
education in Taiwan, which has over-emphasized the importance of grammatical accuracy. In
other words, it is hoped that the GEPT can not only assess learners knowledge of English but
also their ability to use English in real life situations. Therefore, each level of the GEPT consists
of listening, reading, writing, and speaking tasks. That was considered a rather revolutionary
move in comparison with Taiwans paper-and-pencil high school and university entrance
exams, which do not assess listening and speaking skills.
Before the GEPT was available, Taiwans EFL educators thought it would be impossible to
administer listening and speaking tests on a large scale. However, the GEPT has proved those
concerns to be incorrect. Now, not only has language assessment become a topic of wide
discussion in Taiwan, but the GEPT has also brought about positive washback effects. The
most significant effect is that productive skills of writing and speaking are receiving more
attention from teachers and learners, as reported in an impact study (Wu & Chin, 2006) and
by many students and teachers of English in high schools and universities (Wu, 2008). Its
worthwhile noting that the GEPT has successfully promoted a shift in English teaching and
learning to a more communicative orientation. Such an influence can be attributed to
successful interactions between the GEPT and teachers. More broadly, we can see a valuable
reciprocal relationship between teaching and testing, which is exactly what the GEPT project
has aimed to accomplish.
What research projects are you working on now - and which do you hope to become
involved in?
Im currently working on the GEPT Revision Project and may continue to work on that
project for some years. The primary concern of any language test revision process should be to
ensure that the test reflects as closely as possible real-life language use contexts and results in
favorable learning outcomes. Although the
The primary concern of any language test
GEPT has had some positive effects on the
revision process should be to ensure that the
teaching and learning of English, theres
test reflects as closely as possible real-life
always room to improve its quality. Based on
language use contexts and results in
the productive dialogue between the GEPT
favorable learning outcomes.
and teaching professionals, directions for
revising that test have been identified. Let me
cite two examples. First, based on score data and the opinions of local teachers, it was
proposed that mini-talk tasks be added to the Elementary Level Listening Test. And secondly,
it was also proposed that longer reading passages with a greater variety of genre types be
employed in the High-Intermediate Reading Test.
Guided by the present LTTC executive director, Prof. Kao Tien-en, and academic advisor,
Prof. Lin Yaofu, our institute has established a comprehensive research agenda focusing on
areas such as validation, reliability, bias reduction, access and accommodations,
administration and security, and social consequences (as suggested by Kunnan, 2000, 2004,
2005, 2008). These research aims can help defend the claims about all the LTTC tests with
sufficient evidence and convincing argumentation (Bachman, 2005; Bachman & Palmer,
forthcoming). I look forward to taking part in some of these research projects.
