[go: up one dir, main page]

0% found this document useful (0 votes)
136 views9 pages

1.uses of The Test Scores: Use of Testing For Measuring Knowledge, Comprehension and Other Thinking Skills Guided by The

This document discusses test scores and their meaning. It notes that test scores are often misinterpreted as precise measures of innate ability, when in reality they have measurement error and are influenced by external factors. The document then provides details on the SAT and ACT tests, including their content, scoring scales, and purposes. It explains that scaled scores are used to report scores on a consistent scale across test forms of varying difficulty. The document also discusses how right/wrong scoring can lose meaningful information from students' work and thinking processes.

Uploaded by

Angelika Belario
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views9 pages

1.uses of The Test Scores: Use of Testing For Measuring Knowledge, Comprehension and Other Thinking Skills Guided by The

This document discusses test scores and their meaning. It notes that test scores are often misinterpreted as precise measures of innate ability, when in reality they have measurement error and are influenced by external factors. The document then provides details on the SAT and ACT tests, including their content, scoring scales, and purposes. It explains that scaled scores are used to report scores on a consistent scale across test forms of varying difficulty. The document also discusses how right/wrong scoring can lose meaningful information from students' work and thinking processes.

Uploaded by

Angelika Belario
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

ANGELIKA BELARIO

III-BEED-12 
USES OF THE TEST SCORES
GROUP 5 
6TH REPORTER 

Use of testing for measuring knowledge,comprehension and other


thinking skills guided by the:

1.USES OF THE TEST SCORES

What Do Test Scores Really Mean?


Test scores weigh heavily in many admissions decisions. While published college
rankings provide average freshman class scores for individual schools, schools use the
scores they receive in a wide variety of ways. As we have noted, arguments about many
of these uses have landed in courtrooms across the nation and will likely soon be heard
by the Supreme Court. The tests that provide these scores, however, are complicated
instruments with specific purposes, and their technical characteristics are not as well
understood as they should be, given their important role.
Perhaps the most pervasive misconception about both the SAT and the ACT is that
they are precisely calibrated scientific measures (akin to scales or thermometers) of
something immutable: ability. A score on either test is, in the eyes of many people, a
statement of the individual's intellectual capacity as pitiless and mutely accurate as the
numbers denoting his or her height or weight. Although most students who take an
admissions test more than once know that scores fluctuate, even very small score
differences will seem significant if the measure is regarded as very precise. One
consequence of the misconception is that it contributes to misunderstandings in volatile
discussions of fairness. Test scores are often used as key evidence in support of claims
that an admissions decision was unfair: that is, if student X's score was higher than that of
student Y, admitting student Y but not student X was unfair. This argument rests on two
important assumptions that deserve examination: that the test measures the criterion that
should bear the greatest weight in an admissions decision and that the score is a precise
measure of this criterion.
To evaluate these assumptions, it is necessary to begin with a closer look at the content of
the tests and at the available evidence regarding their statistical properties. The first step
is to recognize the important differences between the tests—although many of the
individual items on the two tests may look quite similar, the scores represent different
approaches to the task of predicting academic success
THE SAT
as a means of identifying the likelihood that students with a wide range of academic
preparation could success- fully do college-level work.1 It was designed to measure
verbal and mathematical reasoning by means of multiple-choice questions. (The
mathematics section also includes some machine-scorable items in which the students
generate answers and record them on a grid.) In its current form, the test devotes 75-
minutes to the verbal section and 60-minutes to the mathematics section.2 The verbal
questions are of three kinds (descriptions from College Board materials quoted in Jaeger
and Wightman, 1998:32):

 analogy questions, which assess ''knowledge of the meaning of words, ability to


see a relationship in a pair of words, and ability to recognize a similar or parallel
relationship;
 sentence completion questions, which assess "knowledge of the meaning of
words" and "ability to understand how the different parts of a sentence fit logically
together;" and
 critical reading questions, which assess "ability to read and think carefully about
several different reading passages."

The mathematics section also has several question or item types, all of which contribute
to the goal of assessing "how well students understand mathematics, how well they can
apply what is known to new situations, and how well they can use what they know to
solve nonroutine problems" (Wightman and Jaeger, 1998:34). Each of the sections
generates a score on a scale of 200 to 800; thus, the combined scores range from 400 to
1600. No subscores are calculated. Because of the procedures used to ensure that scores
from different administrations3 of the test can be com pared, it is actually possible to
score 800 without answering all of the questions correctly.
The fact that neither section is intended to draw on specific knowledge of course
content is the foundation for the claim that the test provides an equal opportunity for
students from any school to demonstrate their abilities. Reading passages, for example,
include contextual information about the material, and all questions are meant to be
answerable without "outside knowledge" of the content. Supporters argue that the test
thus ameliorates disparities in school quality. Others have criticized it for precisely this
reason, arguing that a test that is independent of curriculum sends the message to students
that effort and achievement are less significant than "innate" ability.
The ACT, first administered in 1959, has a different design. First, there are more parts to
it. In addition to multiple-choice tests of "educational development," which are the basis
for the score, students also complete two questionnaires that cover the courses they have
taken; their grades, activities, and the like; and a standardized interest inventory.
The test battery has four parts:

 a 45-minute, 75-item English test that yields subscores (that is, scores on a portion
of the domain covered by a subset of the test questions) in usage/mechanics and
rhetorical skills, as well as an overall score;
 a 60-minute, 60-item mathematics test that yields an overall score and three
subscores, in pre-algebra and elementary algebra, intermediate algebra and coordinate
geometry, and plane geometry and trigonometry;
 a 35-minute, 40-item reading test that yields an overall score and two subscores,
for arts and literature and social sciences and science; and
 a 35-minute, 40-item science reasoning test that yields only a total score. It
addresses content "likely to be found in a high school general science course" drawn from
biology, chemistry, physics, geology, astronomy, and meteorology.

Each of the four tests is scored on a scale from 1 to 36 (subscores within the tests are on a
1 to 18 scale); the four scores are combined into a composite score on the 1 to 36 scale.
There are two types of test scores: raw scores and scaled scores. A raw score is a
score without any sort of adjustment or transformation, such as the simple number of
questions answered correctly. A scaled score is the result of some transformation(s)
applied to the raw score.
The purpose of scaled scores is to report scores for all examinees on a consistent
scale. Suppose that a test has two forms, and one is more difficult than the other. It has
been determined by equating that a score of 65% on form 1 is equivalent to a score of
68% on form 2. Scores on both forms can be converted to a scale so that these two
equivalent scores have the same reported scores. For example, they could both be a
score of 350 on a scale of 100 to 500.
Two well-known tests in the United States that have scaled scores are the ACT and the
SAT. The ACT's scale ranges from 0 to 36 and the SAT's from 200 to 800 (per section).
Ostensibly, these two scales were selected to represent a mean and standard
deviation of 18 and 6 (ACT), and 500 and 100. The upper and lower bounds were
selected because an interval of plus or minus three standard deviations contains more
than 99% of a population. Scores outside that range are difficult to measure, and return
little practical value.
Note that scaling does not affect the psychometric properties of a test; it is something
that occurs after the assessment process (and equating, if present) is completed.
Therefore, it is not an issue of psychometrics, per se, but an issue of interpretability.
Scoring inform Scoring information loss[edit]
A test question might require a student to calculate the area of a triangle. Compare the
information provided in these two answers.

Area = 7.5 cm2

Base = 5 cm; Height = 3 cm


Area = 1/2(Base × Height)
= 1/2(5 cm × 3 cm)
= 7.5 cm2

When tests are scored right-wrong, an important assumption has been made about
learning. The number of right answers or the sum of item scores (where partial credit is
given) is assumed to be the appropriate and sufficient measure of current performance
status. In addition, a secondary assumption is made that there is no meaningful
information in the wrong answers.
In the first place, a correct answer can be achieved using memorization without any
profound understanding of the underlying content or conceptual structure of the problem
posed. Second, when more than one step for solution is required, there are often a
variety of approaches to answering that will lead to a correct result. The fact that the
answer is correct does not indicate which of the several possible procedures were used.
When the student supplies the answer (or shows the work) this information is readily
available from the original documents.
Second, if the wrong answers were blind guesses, there would be no information to be
found among these answers. On the other hand, if wrong answers reflect interpretation
departures from the expected one, these answers should show an ordered relationship
to whatever the overall test is measuring. This departure should be dependent upon the
level of psycholinguistic maturity of the student choosing or giving the answer in the
vernacular in which the test is written.
In this second case it should be possible to extract this order from the responses to the
test items.[3] Such extraction processes, the Rasch model for instance, are standard
practice for item development among professionals. However, because
the wrong answers are discarded during the scoring process, analysis of these answers
for the information they might contain is seldom undertaken.
Third, although topic-based subtest scores are sometimes provided, the more common
practice is to report the total score or a rescaled version of it. This rescaling is intended
to compare these scores to a standard of some sort. This further collapse of the test
results systematically removes all the information about which particular items were
missed.
Thus, scoring a test right–wrong loses 1) how students achieved their correct answers,
2) what led them astray towards unacceptable answers and 3) where within the body of
the test this departure from expectation occurred.
This commentary suggests that the current scoring procedure conceals the dynamics of
the test-taking process and obscures the capabilities of the students being assessed.
Current scoring practice oversimplifies these data in the initial scoring step. The result of
this procedural error is to obscure diagnostic information that could help teachers serve
their students better. It further prevents those who are diligently preparing these tests
from being able to observe the information that would otherwise have alerted them to
the presence of this error.
A solution to this problem, known as Response Spectrum Evaluation (RSE), [4] is
currently being developed that appears to be capable of recovering all three of these
forms of information loss, while still providing a numerical scale to establish current
performance status and to track performance change.
This RSE approach provides an interpretation of every answer, whether right or wrong,
that indicates the likely thought processes used by the test taker. [5] Among other
findings, this chapter reports that the recoverable information explains between two and
three times more of the test variability than considering only the right answers. This
massive loss of information can be explained by the fact that the "wrong" answers are
removed from the information being collected during the scoring process and are no
longer available to reveal the procedural error inherent in right-wrong scoring. The
procedure bypasses the limitations produced by the linear dependencies inherent in test
data.

References[edit]
1. "What Do Test Scores Really Mean?." National Research Council. 1999. Myths and
Tradeoffs: The Role of Tests in Undergraduate Admissions. Washington, DC: The
National Academies Press. doi: 10.17226/9632.
1.
2. ^ Thissen, D., & Wainer, H. (2001). Test Scoring. Mahwah, NJ: Erlbaum. Page 1, sentence 1.
3. ^ Iowa Testing Programs guide for interpreting test scores Archived 2008-02-12 at
the Wayback Machine
4. ^ Powell, J. C. and Shklov, N. (1992) The Journal of Educational and Psychological
Measurement, 52, 847–865
5. ^ "Welcome to the Frontpage". Archived from the original on 30 April 2015. Retrieved  2
May 2015.
6. ^ Powell, Jay C. (2010) Testing as Feedback to Inform Teaching. Chapter 3 in; Learning and
Instruction in the Digital Age, Part 1. Cognitive Approaches to Learning and Instruction. (J. Michael
Spector, Dirk Ifenthaler, Pedro Isaias, Kinshuk and Demetrios Sampson, Eds.), New York:
Springer. ISBN 978-1-4419-1551-1, doi:10.1007/978-1-4419-1551-1
Benefits
We turn now to the assumption that the score on an admissions test should be given the
greatest weight in the selection process. Performance on both the SAT and ACT is used
as an indicator of how well students are likely to do in college. This outcome is most
frequently measured by freshman-year grade point average, and numerous studies have
been conducted with data from both tests to determine how well their scores do predict
freshman grades—that is, their predictive validity. Warren Willingham provided an
overview of current understandings of predictive validity for the workshop. In practice,
both tests have an average correlation with first-year college grades that ranges from .45
to .55 (a perfect correlation would be 1.0).4 The correlations vary for a number of
reasons, and research suggests that several factors work to make them seem lower than
they actually are. Most important of these is selection bias. Student self-selection restricts
the pool of applicants to any given institution, and it is only the scores and grades of the
students who were selected from that pool that are used to calculate predictive validity.
Since those students are very likely to be academically stronger than those not selected,
the capacity of tests and scores to have predicted the rejected students' likely lower
performance does not enter into the equation. In addition, freshman grades are not based
on uniform standards, but on often subjective judgments that vary across disciplines and
institutions; this factor also tends to depress the tests' predictive validity (Willingham,
1998:3, 6–8). This point also underscores the problems with using freshman-year grades
as the criterion variable; like the test scores themselves, GPAs that are calculated to two
decimal points lend this measure a deceptively precise air. They are used as the criteria
for prediction because there is no superior alternative.
Most colleges rely in admissions as much (or more) on high school GPAs or class rank
as they do on test scores, and the predictive validity of both numbers together is higher
than that of either one alone (Willingham, 1998:8). It is important to note that the high
school GPAs are also a "soft" measure—grading standards range as widely at that level
as they do in college. However, GPAs reflect several years of performance, not just
several hours of testing time. Using high school grades and test
scores together is very useful specifically because they are the sources of different kinds
of information about students, and "two measures are better than one" (Willingham,
1998:16). Moreover, because both SAT and ACT scores generally predict slightly higher
college grades for minority students than they actually receive, "it is not dear that the
answer to minority group representation in higher education lies in improved
prediction.... The challenge is not conventional academic prediction but rather to find
valid, socially useful, and workable bases for admitting a broader range of talent
It is likely that test scores also play a significant role in the decisions students make
about the schools to which they will apply, and it is worth noting that students' self-
selection is a significant factor in their access to higher education. Most students see their
first scores when they are in the 11th grade or earlier, and they have ample opportunity to
compare them to the mean scores at various colleges. A decision not to apply to a
particular school may make a great deal of sense if the criteria on which students are
evaluated are extremely clear.
For a nonselective public institution, the criteria are likely to be straightforward
eligibility requirements, and the decision of whether to apply is likely to be
straightforward as well. At a more selective institution, however, the criteria are likely to
be far more complex and opaque to an aspiring student. The tendency for lower scoring
students to opt out of competition at highly selective schools is likely to have a disparate
effect on minorities since they have lower average test scores. This tendency is also likely
to limit selective schools' opportunity to consider some of the very students they might
want to recruit.
Uses of test scores outside of the selection process have effects as well. Scores have
been used to identify talented middle-school students for academic enrichment programs
and other similar purposes for which they were not intended. Scores calculated for
neighborhoods, geographic regions, and the nation as a whole are cited as indicators of
academic success and school quality and can even influence real estate values.
Comparisons of the average SAT scores of black and white students are also cited as
evidence of the advantage given to black applicants at particular institutions (Bowen and
Bok, 1998:15–16). However, because black students are underrepresented among high
scorers, their average scores would be lower if the selection process were completely race
blind. Thus, the fact that black students are in fact underrepresented among high scorers
at selective institutions is not evidence of anything in particular about selection at those
institutions. Such uses of test scores only further dilute public understanding of
standardized admissions tests, distorting the picture of both their benefits and their
limitations.

Neither the SAT nor the ACT was designed to make fine distinctions at any point on
their scales; rather, both were designed to spread students out across the scales, and both
are constructed to provide a balance of questions at a wide range of difficulty levels.
These tests are most useful, then, for sorting an applicant pool into broad categories:
those who are quite likely to succeed academically at a particular institution, those who
are quite unlikely to do so, and those in the middle. Such categories are likely to be
defined differently by different institutions, depending on the rigor of their programs and
their institutional goals.
As Warren Willingham (1998:21) concluded about this point:
In the early stages of the admissions process, the [predictive] validity of school grades and test
scores is put to work through college recruitment, school advising, self-selection, and college
selection. In the process, applicants disperse to institutions that differ widely.... In later stages of
the admissions process, colleges ... have already profited from the strong validity of these
traditional academic predictors. At this point colleges face decisions among applicants in a grey
area.... This is the time when decisions must ensure that multiple goals of a college receive
adequate attention.
Given that a score is a point in a range on a measure of a limited domain, the claim that a
higher score should guarantee one student preference over another is not justifiable. Thus,
schools that rely too heavily on scores to distinguish among applicants are extremely
vulnerable to the charge of unfairness. Any institution is justified in looking beyond
scores
Some policy makers, acknowledging the inability fairly to identify effective or ineffective teachers
by their students’ test scores, have suggested that low test scores (or value-added estimates)
should be a “trigger” that invites further investigation. Although this approach seems to allow for
multiple means of evaluation, in reality 100% of the weight in the trigger is test scores. Thus, all
the incentives to distort instruction will be preserved to avoid identification by the trigger, and
other means of evaluation will enter the system only after it is too late to avoid these distortions

1. While those who evaluate teachers could take student test scores over time into
account, they should be fully aware of their limitations, and such scores should
be only one element among many considered in teacher profiles. Some states
are now considering plans that would give as much as 50% of the weight in
teacher evaluation and compensation decisions to scores on existing poor-quality
tests of basic skills in math and reading. Based on the evidence we have
reviewed above, we consider this unwise. If the quality, coverage, and design of
standardized tests were to improve, some concerns would be addressed, but the
serious problems of attribution and nonrandom assignment of students, as well
as the practical problems described above, would still argue for serious limits on
the use of test scores for teacher evaluation.

RUBRIC FOR VIRTUAL ORAL PRESENTATION.


Total
Category Scoring Criteria Points Score
The type of presentation is appropriate for the 5
Organization topic and 
(15 points) audience.
Information is presented in a logical sequence. 5
Presentation appropriately cites requisite number 5
of references.
Introduction is attention-getting, lays out the 5
Content problem well, and 
(45 points) establishes a framework for the rest of the
presentation.
Technical terms are well-defined in language 5
appropriate for 
the target audience.
Presentation contains accurate information. 10
Material included is relevant to the overall 10
message/purpose.
Appropriate amount of material is prepared, and 10
points made 
reflect well their relative importance.
There is an obvious conclusion summarizing the 5
presentation.
Speaker maintains good eye contact with the 5
Presentation audience and is 
(40 points) appropriately animated (e.g., gestures, moving
around, etc.).
Speaker uses a clear, audible voice. 5
Delivery is poised, controlled, and smooth. 5
Good language skills and pronunciation are used. 5
Visual aids are well prepared, informative, 5
effective, and not 
distracting.
Length of presentation is within the assigned time 5
limits.
Information was well communicated. 10
Score Total Points 100

You might also like