High-stakes testing

A driving test is a high-stakes test: Without passing the test, the test taker cannot obtain a driver's license.

A high-stakes test is a test with important consequences for the test taker.^[1] Passing has important benefits, such as a high school diploma, a scholarship, or a license to practice a profession. Failing has important disadvantages, such as being forced to take remedial classes until the test can be passed, not being allowed to drive a car, or difficulty finding employment.

The use and misuse of high-stakes tests is a controversial topic in public education, especially in the United States and U.K., where they have become especially popular in recent years, used not only to assess school-age students but in attempts to increase teacher accountability.^[2]

Definitions

Five dancers leap on a stage. Each wears a number on her leotard. — This dance audition is a high-stakes test. If the dancers don't pass the test, they won't get a part in the upcoming show.

In common usage, a high-stakes test is any test that has major consequences or is the basis of a major decision.^[1]^[3]^[4]

Under a more precise definition, a high-stakes test is any test that:

is a single, defined assessment,
has a clear line drawn between those who pass and those who fail, and
has direct consequences for passing or failing (something "at stake").^[5]

For example, exit examinations for high school graduation are often high-stakes tests: there is a single, defined test (the student must pass this test; no other test can be substituted); some scores are high enough to pass and others are not; and failing has the direct consequence of preventing graduation. Similarly, driving tests are often high-stakes, as they also meet the same three criteria.

High-stakes testing is not synonymous with high-pressure testing. An American high school student might feel pressure to perform well on the SAT-I college aptitude exam. However, SAT scores do not directly determine admission to any college or university, and there is no clear line drawn between those who pass and those who fail, so it is not formally considered a high-stakes test.^[6]^[7] On the other hand, because the SAT-I scores are given significant weight in the admissions process at some schools, many people believe that it has consequences for doing well or poorly, and it could therefore be considered a high-stakes test under the simpler, common definition.^[8]^[9]

A high-stakes test can be contrasted with a medium-stakes test or a low-stakes test.^[7] A medium-stakes test might provide access to a desirable but less necessary benefit, such as an award, or it is only one component of a decision-making process, such as an admissions program that looks at the test results plus other factors. A low-stakes test has no significant consequences to the test taker.

The stakes

High stakes are not a characteristic of the test itself, but rather of the consequences placed on the outcome. For example, no matter what type of test is used—written essays, computer-based multiple choice, oral examination, performance test, or anything else—a medical licensing test must be passed to practice medicine.

The perception of the stakes may vary. For example, college students who wish to skip an introductory-level course are often given exams to see whether they have already mastered the material and can be passed to the next level. Passing the exam can reduce tuition costs and time spent at university. A student who is anxious to have these benefits may consider the test to be a high-stakes exam. Another student, who places no importance on the outcome, so long as he is placed in a class that is appropriate to his skill level, may consider the same exam to be a low-stakes test.^[5]

The phrase "high stakes" is derived directly from a gambling term. In gambling, a stake is the quantity of money or other goods that is risked on the outcome of some specific event. A high-stakes game is one in which, in the player's personal opinion, a large quantity of money is being risked. The term is meant to imply that implementing such a system introduces uncertainty and potential losses for test takers,^{[citation needed]} who must pass the exam to "win," instead of being able to obtain the goal through other means.^{[citation needed]}

Examples

Examples of high-stakes tests and their "stakes" include:

Driver's license tests and the legal ability to drive
College entrance examinations in some countries, such as Brazil's National High School Exam, and admission to a high-quality university
Visa interview/Citizenship test for migration and naturalization purposes
Many job interviews or drug tests and being hired
High school exit examinations and high-school diplomas
No Child Left Behind tests and school funding and ratings
Ph.D. oral exams and receiving the doctorate
Professional licensing and certification examinations (such as the bar exams, FAA written tests, and medical exams) and the license or certification being sought
Standardised test of language proficiency in work, school-placement and visa-application contexts
NCLEX-RN or NCLEX-PN exam for nursing students

Stakeholders

A high-stakes system may be intended to benefit people other than the test-taker. For professional certification and licensure examinations, the purpose of the test is to protect the general public from incompetent practitioners. The individual stakes of the medical student and the medical school are, hopefully, balanced against the social stakes of possibly allowing an incompetent doctor to practice medicine.^[10]

A test may be "high-stakes" based on consequences for others beyond the individual test-taker.^[4] For example, an individual medical student who fails a licensing exam cannot practice his or her profession. However, if enough students at the same school fail the exam, the school's reputation and accreditation may be jeopardized. Similarly, testing under the U.S.'s No Child Left Behind Act had no direct negative consequences for failing students,^[11] but potentially serious consequences for their schools, including loss of accreditation, funding, teacher pay, teacher employment, or changes to the school's management.^[12] The stakes were therefore high for the school, but low for the individual test-takers.

Assessments used

Any form of assessment can be used as a high-stakes test. Many times, an inexpensive multiple-choice test is chosen for convenience. A high-stakes assessment may also involve answering open-ended questions or a practical, hands-on section. For example, a typical high-stakes licensing exam for a medical nurse determines whether the nurse can insert an I.V. line by watching the nurse actually do this task. These assessments are called authentic assessments or performance tests.^[5]

Some high-stakes tests may be standardized tests (in which all examinees take the same test under reasonably equal conditions), with the expectation that standardization affords all examinees a fair and equal opportunity to pass.^[5] Some high-stakes tests are non-standardized, such as a theater audition.

As with other tests, high-stakes tests may be criterion-referenced or norm-referenced.^[5] For example, a written driver's license examination typically is criterion-referenced, with an unlimited number of potential drivers able to pass if they correctly answer a certain percentage of questions. On the other hand, essay portions of some bar exams are often norm-referenced, with the worst essays failed and the best essays passed, without regard for the overall quality of the essays.

The "clear line" between passing and failing on an exam may be achieved through use of a cut score: for example, test takers correctly answering 75% or more of the questions pass the test; test takers correctly answering 74% or fewer fail, or don't "make the cut". In large-scale high-stakes testing, rigorous and expensive standard-setting studies may be employed to determine the ideal cut score or to keep the test results consistent between groups taking the test at different times.

Criticisms

High-stakes tests, despite their extensive usage for determination of academic and non-academic proficiency, are subject to criticism for various reasons. Example concerns include the following:

The test does not correctly measure the individual's knowledge or skills. For example, a test might purport to be a general reading-skills test, but it might actually determine whether or not the examinee has read a specific book. In the context of computer-based high-stakes tests, low-income test takers and others without ready access to computers may be disadvantaged,^[13] if the test is supposed to measure reading skills but in practice measures the test takers' typing skills or their familiarity with answering questions on a computer.
The test may not measure what the critic wants measured. For example, a test might accurately measure whether a law student has acquired fundamental knowledge of the legal system, but the critic might want these would-be lawyers to be tested on legal ethics instead of legal knowledge.
High-stakes testing may encourage teachers to omit material that is not tested. "Teaching to the test" can result in a narrow curriculum and lower skills. For example, if a driving exam does not test parallel parking skills, then driving instructors might stop teaching that skill to a driving student, in favor of focusing instruction time on the material that will be tested, such as determining which vehicle has the right of way at a four-way stop. The result is that the student will be able to pass the test, but may be unable to park a car safely in some places. According to Campbell's law, the higher the stakes are (for the test taker or for the school), the more likely this is to happen.
Testing causes stress for some people. Critics suggest that since some people perform poorly under the pressure associated with tests, any test is likely to be less representative of their actual standard of achievement than a non-test alternative.^[14] This is called test anxiety or performance anxiety.
High-stakes tests are often given as a single long exam. Some critics prefer continuous assessment instead of one larger test. For example, the American Psychological Association (APA) opposes using a one-time high school exit examination as the single determinant of whether a student should graduate from high school, saying, "Any decision about a student's continued education, such as retention, tracking, or graduation, should not be based on the results of a single test, but should include other relevant and valid information."^[15] Since the stakes are related to consequences, not method, however, short tests can also be high-stakes.
High-stakes testing creates more incentive for cheating.^[16] Because cheating on a single critical exam may be easier than either learning the required material or earning credit through attendance, diligence, or many smaller tests, more examinees that do not actually have the necessary knowledge or skills, but who are effective cheaters, may pass. Also, some people who would otherwise pass the test but are not confident enough of themselves might decide to additionally secure the outcome by cheating, get caught and often face even worse consequences than just failing. Additionally, if the test results are used to determine the teachers' pay or continued employment, or to evaluate the school, then school personnel may fraudulently alter student test papers to artificially inflate performance.^[16]
Sometimes a high-stakes test is tied to a controversial reward. For example, some people may want a high-school diploma to represent the verified acquisition of specific skills or knowledge, and therefore use a high-stakes assessment to deny a diploma to anyone who cannot perform the necessary skills.^[17] Others may want a high school diploma to represent primarily a certificate of attendance, so that a person who faithfully attended class but cannot read or write will still get the social benefits of graduation. This use of tests—to deny a high school diploma, and thereby access to most jobs and higher education for a lifetime—is controversial even when the test itself accurately identifies students that do not have the necessary skills. Criticism is usually framed as over-reliance on a single measurement^[18] or in terms of social justice, if the absence of skill is not entirely the test taker's fault, as in the case of a student who cannot read because of unqualified teachers, or a person with advanced dementia that can no longer pass a driving exam due to loss of cognitive function.^[3]
Tests can penalize test takers that do not have the necessary skills through no fault of their own. An absence of skill may not be the test taker's fault, but high-stakes test measure only skill proficiency, regardless of whether the test takers had an equal opportunity to learn the material.^[3]^[19]^[20] Additionally, wealthy test takers may use private tutoring or test preparation programs to improve their scores. Some affluent parents pay thousands of dollars to prepare their children for university admissions tests.^[21] Critics see this as being unfair to families who cannot afford to pay for additional educational services.^[22]
High-stakes tests reveal that some examinees do not know the required material, or do not have the necessary skills. While failing these people may have many public benefits, the consequences of repeated failure can be very high for the individual. For example, a person who fails a practical driving exam will not be able to drive a car legally, which means they cannot drive to work and may lose their job if alternative transportation options are not available. The person may suffer social embarrassment when his acquaintances discover that his lack of skill resulted in loss of his driver's license. In the context of high school exit exams, poorly performing school districts have formally opposed high-stakes testing after low test results, which accurately and publicly exposed the districts' failures, proved to be politically embarrassing,^[23] and criticized high-stakes tests for correctly identifying students who lack the required knowledge.^[24]
Sometimes high-stakes testing is used on young children. Testing often starts as early as third grade, when children may be unable to properly allocate mental resources needed to succeed. If they fail, they may be assigned additional schooling, which can be internalized as a punishment.^[25]
Low test scores can often be synonymous with good tests.^[26] There can be a bias to assume that for a high stake test to be valid, test results must be poor. Alternatively, tests on which students generally perform well can often be disregarded as being too easy even if they are well aligned to standards. Additionally, this bias can encourage the creation of assessments in which the metric for how good the assessment is becomes the failure rate of students rather than alignment to standards.

Advantages

In addition to the criticisms, high-stakes testing retains some advantages:

Scores and score trends from high-stakes tests tend to be more reliable than those from low- or no-stakes tests because they are more likely to be administered securely and taken seriously by test-takers.^[27]^[28]^[29]^[30]

Lax security pervades the administration of no-stakes tests—tests that "don't count." Indeed, all but one of the tests involved in the famous "Lake Wobegon Effect" school testing scandal of the 1980s had no stakes for students, teachers, or schools. In many cases, schools could administer the tests at their own discretion, with teachers proctoring their own students or no proctors at all. With state and local education administrators free to direct most aspects of the tests' administration, scoring, and reporting, they could artificially inflate scores and score trends such that the students in all US states were "above the national average."^[31]

High-stakes tests are also more likely to be administered externally (by independent persons without a conflict of interest) and securely. Whereas high-stakes testing may create more incentive for cheating, low- or no-stakes testing can create more opportunity for cheating because it is typically administered internally (e.g., in students' schools by their own teachers) with less security. ^[32]^[33]^[34]

Adding stakes to a test has a generally positive impact on student achievement, suggesting greater motivation and effort. ^[35]

References

^ ^a ^b "Lexicon of Learning". Association for Supervision and Curriculum Development. Archived from the original on 2018-10-17. Retrieved 2013-02-21.
^ Rosemary Sutton; Kelvin Seifert (2009). "Chapter 1: The Changing Teaching Profession and You". Educational Psychology (PDF) (2nd ed.). p. 14.
^ ^a ^b ^c Togut, Torin D. "High-Stakes Testing: Educational Barometer for Success, or False Prognosticator for Failure". The Beacon. No. Fall 2004. Harbor House Law Press.
^ ^a ^b Torin D. Togut. "EDEX 790 Glossary of Education Terms". Archived from the original on January 11, 2009. Retrieved July 23, 2009.
^ ^a ^b ^c ^d ^e "The nature of assessment: A guide to standardized testing — Center for Public Education". Archived from the original on July 25, 2011. Retrieved July 23, 2009.
^ Pfeiffer, Steven I (Winter 2009). "The Debate about Using the SAT in College Admissions". Duke University Talent Identification Program. Archived from the original on 2009-10-14. Gaston Caperton, president of the College Board, which publishes the SAT, counters that the SAT I is "not a high-stakes test" but is a useful admissions tool when considered along with other evidence of a student's potential for college success.
^ ^a ^b Phelps, Richard P. (June 2010). "Source of Lake Wobegon" (PDF). Nonpartisan Education Review. Retrieved 2020-10-18.
^ Mari Pearlman (April 4, 2001). "High-stakes Testing: Perils & Opportunities". Archived from the original on 2009-09-25. Retrieved July 23, 2009.
^ Eddy Ramírez (30 April 2008). "Admissions Officials Shrug at SAT Writing Test". Retrieved 24 July 2009.
^ Mehrens, W.A. (1995). Legal and Professional Bases for Licensure Testing.' In Impara, J.C. (Ed.) Licensure testing: Purposes, procedures, and practices, pp. 33-58. Lincoln, NE: Buros Institute.
^ "NCLB has nothing to do with the high-stakes nature of the test for students". Archived from the original on 2012-12-13.
^ Greene, Jay P.; Marcus A. Winters; Greg Forster (February 2003). "Testing High Stakes Tests: Can We Believe the Results of Accountability Tests?". Civic Report. Manhattan Institute for Policy Research.
^ File, Thom; Ryan, Camille (November 2014). "Computer and Internet Use in the United States: 2013" (PDF). census.gov.
^ Zuriff GE (1997). "Accommodations for test anxiety under ADA?". J. Am. Acad. Psychiatry Law. 25 (2): 197–206. PMID 9213292.
^ "Appropriate Use of High-Stakes Testing in Our Nation's Schools". American Psychological Association. Retrieved 2008-01-09.
^ ^a ^b Jacob, Brian A. and Steven D. Levitt (Winter 2004). "To Catch a Cheat" (PDF). Education Next.
^ "Figure 1-10: Employee/faculty support for high stakes testing: 2000". Archived from the original on 2008-02-07. Retrieved 2008-02-06.
^ Lewis, Anne (April 2000). High-stakes testing: Trends and issues (PDF) (Report). Mid-Continent Research for Education and Learning. Archived from the original (PDF) on 2011-07-27.
^ Myers, David (2001). Psychology. New York: Worth Publishers. p. 464. ISBN 1-57259-791-7. Why blame the tests for exposing unequal experiences and opportunities?
^ Dang, Nick (18 March 2003). "Reform education, not exit exams". Daily Bruin. One common complaint from failed test-takers is that they weren't taught the tested material in school. Here, inadequate schooling, not the test, is at fault. Blaming the test for one's failure is like blaming the service station for a failed smog check; it ignores the underlying problems within the 'schooling vehicle.'^{[permanent dead link‍]}
^ "Tackling the SAT? Test-prep help abounds". Christian Science Monitor. Vol. 90, no. 175. Associated Press. August 4, 1998. pp. B3. ISSN 0882-7729. Retrieved 2007-07-09. Some parents spend thousands of dollars for private sessions...
^ Johnson, Dale, Bonnie Johnson, Stephen J. Farenga, & Daniel Ness. (2008). Stop High-Stakes Testing: An Appeal to America's Conscience. Lanham, MD: Rowman & Littlefield.
^ Weinkopf, Chris (2002). "Blame the test: LAUSD denies responsibility for low scores". Daily News. Archived from the original on 2017-02-02. Retrieved 2009-09-17. The blame belongs to 'high-stakes tests' like the Stanford 9 and California's High School Exit Exam. Reliance on such tests, the board grumbles, 'unfairly penalizes students that have not been provided with the academic tools to perform to their highest potential on these tests'.
^ "Blaming The Test". Investor's Business Daily. 11 May 2006. A judge in California is set to strike down that state's high school exit exam. Why? Because it's working. It's telling students they need to learn more. We call that useful information. To the plaintiffs who are suing to stop the use of the test as a graduation requirement, it's something else: Evidence of unequal treatment ... the exit exam was deemed unfair because too many students who failed the test had too few credentialed teachers. Well, maybe they did, but granting them a diploma when they lack the required knowledge only compounds the injustice by leaving them with a worthless piece of paper.^{[permanent dead link‍]}
^ Kozol, Jonathan (2005). The Shame of the Nation. New York: Crown Publishers. p. 53. ISBN 978-1-4000-5245-5.
^ Kohn, A. (1999) Confusing Harder with Better. Retrieved on 1/26/21 from https://www.alfiekohn.org/article/confusing-harder-better/
^ Eklöf, Hanna (2007). "Test-taking motivation and mathematics performance in TIMSS". International Journal of Testing. 7 (3): 311–326. doi:10.1080/15305050701438074. S2CID 144686714.
^ Finn B. (2015). Measuring motivation in low-stakes assessments (Research Report RR-15-19). Educational Testing Service.
^ Hawthorne, K.A.; Bol, L.; Pribesh, S.; Suh, Y. (2015). "Test-taking motivation and mathematics performance in TIMSS". Research and Practice in Assessment. 10: 30–38.
^ Wise, SL; DeMars, CE (2010). "Examinee noneffort and the validity of program assessment results". Educational Assessment. 15: 27–41. doi:10.1080/10627191003673216. S2CID 143794026.
^ "The Lake Wobegon Effect: Twenty Years Later". Nonpartisan Education Review.
^ Cizek, G.J. (1999). Cheating on Tests: How To Do It, Detect It, and Prevent It. Routledge. doi:10.4324/9781410601520. ISBN 9781410601520.
^ Steger, D.; Schroeders, U.; Gnambs, T. (2018). "A Meta-Analysis of Test Scores in Proctored and Unproctored Ability Assessments". European Journal of Psychological Assessment. 36: 1–11. doi:10.1027/1015-5759/a000494. S2CID 149485786.
^ U.S. Government Accountability Office (2013). K-12 Education: States' Test Security Policies and Procedures Varied (Report).
^ Phelps, R. P. (2019). "Test Frequency, Stakes, and Feedback in Student Achievement: A Meta-Analysis". Evaluation Review. 43 (3–4): 111–151. doi:10.1177/0193841X19865628. PMID 31382776. S2CID 199449477.