P S Y C H O L O G I C A L A S S E S S M E N T
Testing in the
Schools
P A R T 4 : T H E S E T T I N G S
S O M B R I O & C O R P U Z , M A .
PRESCHOOL ASSESSMENT
- an important tool used to evaluate a child’s
readiness for school and to identify any
special needs that may require educational
support.
Objectives of
Preschool Assessment
(1) screening of children at risk
(2) diagnostic assessment to determine the presence or
absence of a particular condition.
(3) program evaluation
General Problems
- Limited reading skills; self-reports unusable
- Restricted verbal and visual-motor responses
- Limited information-processing abilities
- Fear and unfamiliarity with testing situations
- Lack of motivation/understanding of testing
- Examiner feedback may not reinforce
- Hard to gauge effort vs. ability
5 Major Approaches of
Psychometric Testing
1. Parent/Teacher Interviews
2. Direct Behavioral Observation
3. Rating Scales
4. Projective Techniques
5. Standardized Tests
4 methods available to
assess preschool children
1. Individual Tests
2. Multidimensional Batteries
3. Adaptive Skill Assessment Measures.
4. Adaptive Process Measures
Psychometric tests
1. Piagetian-Based Scales
2. Comprehensive Developmental Assessment Tools
3. Process-Oriented Assessment Approaches
Equivalence of instruments
Psychometric Concern:
Are scores on different tests equivalent (e.g., Stanford-Binet, WPPSI, K-ABC)?
Study Findings:
Stanford-Binet IV & K-ABC: No significant score differences, moderate to
high correlation.
Stanford-Binet L-M & WPPSI-R: Higher IQ on Stanford-Binet (77.93 vs.
75.62), strong correlation (0.82).
Lowered Reliability Test requirements
Tests for young children often show lower Tests useful for typical children
reliability, even if the same instrument may be less effective for children
performs well with older children. with disabilities.
This reflects rapid developmental Challenge: Finding scales that are
changes, not psychometric weaknesses. technically adequate, practical for
Preschool children are a "unique" intervention planning, and sensitive
population with wide variability in to developmental progress.
experiences.
ASSESSMENT IN THE PRIMARY
GRADES
In the primary grades, assessment continues to address many
early developmental concerns but shifts to emphasize academic
achievement. As children progress in school, measuring what
they have learned becomes central to assessment efforts.
California Achievement Tests (CAT)
a widely used standardized achievement test
battery for assessing basic skills from
kindergarten through grade 12.
3 Main Versions of CAT
1. Basic Skills Battery
2. Complete Battery
3. Survey Tests
Reliability Content Validity
Internal Consistency: K-R20 Aligned with curriculum
coefficients between .65 - .95 (mostly objectives through careful item
.80s and .90s). design and expert review.
Alternate-form and Test-retest: Avoids gender, ethnic, and racial
Coefficients .75 - .85 and .80 - .95, biases.
respectively. Revised to meet standards using
Reliability lowers in younger grades item-response theory.
and shorter subtests.
Teacher Rating Scales
- are widely used by school psychologists to
assess children's classroom behavior. They help
identify behavioral issues, guide follow-up
evaluations, and monitor treatment effects.
Neeper and Lahey’s (1984) study found
5 behavioral factors
1. Conduct Disorders
2. Inattentive-Perceptual
3. Anxiety-Depression
4. Language Processing
5. Social Competence.
Social Competence in High School
Social competence is essential in adolescence,
involving the ability to set and achieve
appropriate goals and navigate social
challenges.
Cavell and Kelley (1994)
created a self-report measure for identifying
interpersonal difficulties among adolescents by
categorizing common social challenges reported by
students in grades 7, 9, and 11.
Seven (7) Factors
1. Keep Friends: Secrets, trust issues
2. Problem Behavior: Peer pressure, drinking
3. Siblings: Embarrassment, conflicts
4. School: Teacher conflicts
5. Parents: Invasiveness, conflict
6. Work: Job dissatisfaction
7. Make Friends: Peer acceptance
Tests of General Educational
Development (GED Tests)
Purpose: Awards high-school equivalency credentials to adults
without a diploma.
Test Components: Writing Skills, Social Studies, Science, Reading
Skills, Mathematics.
Annual Participation: 700,000+ adults take the GED; 470,000+
receive diplomas.
GED contains 5 tests:
1. Writing skills
2. Social studies
3. Science
4. Reading skills
5. Mathematics.
General Educational Development
(GED) Validity
Content Validity
Measures key high school outcomes, created by educators and reviewed
by specialists.
Concurrent Validity
GED scores correlate with high school performance, ensuring
equivalency.
Predictive Validity
GED graduates report better pay and career opportunities post-test.
The National Assessment of
Educational Progress (NAEP)
Mandated by Congress, first conducted in 1969
Measures educational achievement across U.S. students
(ages 9, 13, 17)
Assesses subjects: reading, math, writing, science, social
studies, music, computer competence
Test Structure Scoring Method
Items similar to classroom and Focuses on individual item success
standardized tests but focus on rates rather than global scores
population proficiencies Serves as an educational progress
Uses item-response theory for analysis indicator (like Consumer Price Index
Content areas developed through expert for economics)
consensus Data reported based on the
percentage of students who
complete exercises
Essay vs Multiple Choice
Measure the ability to organize Easier to score, more reliable, and
material and produce arguments better for statistical analysis
Lower reliability (e.g., KR-20 for Higher reliability (e.g., KR-20 for
American History: .54) American History: .90)
Weaker correlation with GPA Stronger correlation with GPA.
compared to multiple-choice
The Scholastic Aptitude Test (SAT):
Admission into College
It was introduced in 1926, is a standardized test used
for college admissions in the U.S., designed to assess
verbal and mathematical reasoning skills, providing
equal opportunities for students from diverse
backgrounds.
SAT Structure, Scoring, and Revisions
The SAT assesses verbal (critical reading, vocabulary) and
mathematical reasoning abilities with multiple-choice questions.
Each section is scored from 200 to 800, totaling a possible score of
1600.
Key revisions include the 1994 update with longer reading
comprehension and new math questions, as well as 1995 recentering
to standardize scores.
Criticism and Debate on the SAT
Critics argue that multiple-choice tests don't fully measure critical
thinking or complex idea organization, unlike essays that allow deeper
analysis.
While the SAT’s focus on general reasoning provides broad coverage, it
limits the assessment of specialized skills, yet remains central to
college admissions.
The SAT and Mexican- American
Goldman and Richards (1974) aimed to explore how well the
SAT (a standardized college admission test) predicts the
academic performance of Mexican-American and Anglo-
American college students, based on their second-quarter
college GPA.
Aptitude vs. Achievement Test
Aptitude Test
- aims to measure a student’s capacity for future learning and general cognitive
ability (ex: SAT)
Achievement Test
- assesses knowledge and skills acquired in a specific subject areas.
Utility or Validity in College Admissions
James Crouse’s Argument
- Colleges Should replace the SAT with standardized achievement tests.
Studies on the Role of Achievement Test
1. Schrader (1971)
- adding achievement test scores to high-school GPA and SAT scores slightly
improved the prediction of college grades.
2. K.M. Wilson (1974)
-achievement tests were better predictors of college success than the SAT.
3. Baron & Norman (1192)
- High-school rank and achievement tests better predicted college GPA than SAT.
4. Kelley (1927)
- Both tests overlap significantly, but the distinction remains debated.
High-school Achievement Tests Intelligence vs. aptitude
High-school GPA was better Intelligence measures, specifically the
predictor of college grades vocabulary and information subtests
than either the ACT, the SAT, or of the WAIS, were found to be better
the CAT. ( G. Halpin, G. Halpin, predictors of college success than
and Schaer , 1981 ) other tests like the SAT.
Decline in SAT scores
Has been attributed to various factors, including family and societal
factors, but is likely due to a more diverse and less elite group of college
applicants each year.
Coaching and Its Impact on SAT Scores
What is coaching?
Basic Test Orientation
Practice with Similar Test Items
Teaching Cognitive Skills
Concerns in Coaching ( D. E. Powers):
Fairness
Validity of the SAT
Opportunity Cost
Effects of Coaching on Validity
(N. Cole, 1982)
Could inflate scores beyond a person’s “true” ability, affecting the tests
validity.
Could make people perform better than usual, reducing fairness
Could distort scores on tests that measure stable traits.
The Criterion: First year GPA
The SAT predicts about 10% to 15% of the variance in first-year college grades.
Reliability
Test-retest, internal consistency, and alternative form reliability (ranges from
high .80s to low .90s.)
Validity
Most of the validity information is predictive validity, and most consists of
correlations of SAT scores with first-year college GPA
Criticism of the SAT
Ralph Nader argued that the SAT was not a valid predictor of college
success and reflected family income more than scholastic potential.
Fairness of Admission Tests:
Most test takers believe SAT and GRE are fair
Test scores did not heavily influence college choices.
Institutions prioritize grades and academic factors over test scores.
Studies show limited impact of test scores on admissions.
THE GRADUATE RECORD
EXAMINATION (GRE)
- designed to offer a global measure of the
verbal, quantitative, and analytical
reasoning abilities acquired over a long
period of time and not related to a specific
field of study.
GRE TWO PRIMARY
LIMITATIONS:
(1) It doesn't measure all qualities needed for predicting
graduate success.
(2) It’s scores are imprecise, with only differences
beyond the standard error of measurement being
reliable indicators of abilities.
The General Test The Suject Test
Type of Test : Type of Test:
1. Verbal Portion several areas ranging from
2. Quantitative Portion biochemistry to sociology
3. Analytical Portion Score Ranges:
Score Ranges: 200 to 990, but the specific score
200 to 800, with 500 being the average. distribution varies by subject.
Development: Development:
Questions are written by ETS staff with Items are written by field experts and
relevant expertise, reviewed by specialists, follow a similar review process.
and analyzed by a technical advisory
committee of university professors.
Reliability
High Reliability
- Verbal and quantitative - low .90s
- Analytical - high .80s.
- Subject Tests - ranges from .80 to .96, with most around the mid to high .80s.
Validity
GRE verbal and quantitative sections typically show low validity coefficients
(ranging from .20 to .36).
GRE Subject Tests are better predictors of first-year GPA in specific departments.
Combining GRE scores with undergraduate GPA improves prediction
Validity Issues
criterion problem
range restriction or a very low selection ratio
Restriction of Range
Dollinger (1989)
Huitema & Stein (1993)
Validity in Psychology
Marston (1971)
Merenda & Reilly (1971)
Rawls et al. (1969)
House & Johnson (1993)
3 categories of criteria for graduate school success (Hartnett and Willingham)
1. Traditional criteria
2. Professional accomplishment
3. Specially developed criteria
ENTRANCE INTO PROFESSIONAL
TRAINING
The Medical College Admission Test (MCAT)
- to measure achievement levels and the
expected prerequisites that are generally
relevant to the practice of medicine.
Five key questions regarding the MCAT's validity
1. How do MCAT scores compare in predictive validity with undergraduate GPA?
2. Do MCAT scores contribute unique information not already provided by
undergraduate GPA?
3. What is the relative predictive validity of the individual MCAT scores in relation
to overall performance in the basic medical sciences?
4. What is the relative predictive validity of the individual MCAT scores in relation
to performance in specific areas of the medical school curriculum?
5. How well does the MCAT predict medical school competence?
R. F. Jones & Thomae-Forgues (1984) Findings:
1. Comparison to Undergraduate GPA
2. MCAT's Predictive Value
3. Subtest Predictive Power
4. Specific Curriculum Performance
5. General Effectiveness
Admissions vs. Advising
MCAT studies mainly assess its ability to predict medical school success,
often based on GPA and NBME scores
Interstudy variability
The predictive validity of MCAT scores for medical school performance varies
widely, with correlations ranging from .02 to .47.
New MCAT vs. Original MCAT
new MCAT has better predictive validity than the original MCAT.
Differential validity
The validity of MCAT scores in predicting academic success differs depending
on the undergraduate institution, with correlations ranging from .03 to .66
across 10 universities.
6 Components of Test
Preparation Program (N. COLE)
1. supplying the correct answers (as in cheating)
2. taking the test for practice
3. maximizing motivation
4. optimizing test anxiety
5. instruction in test-taking skills
6. instruction in test content
The Dental Admission
Testing Program (DAT)
- standardized test used by dental schools to
assess the qualifications of applicants. It
measure general academic ability,
understanding of scientific information, and
perceptual ability
Current Dental
Admission Testing
1. Survey of Natural Science (100 questions)
2. The Perceptual Ability (90 questions)
3. The Reading Comprehension (50 questions)
4. The Quantitative Reasoning (50 questions)
Reliability
K-R reliability coefficients range 0.80s, but the test may emphasize speed too much.
Validity
DAT predicts dental school grades with correlations in the .30s; it is similar to
undergraduate GPA in predicting success.
Criticism
Focuses on academic skills rather than psychomotor skills needed in dentistry
Test Form Changes
Limited Data
TESTS FOR LICENSURE
AND CERTIFICATION
Licensure
a process whereby the government permits an individual to engage in a
particular occupation.
Certification
the recognition that a person has met certain qualifications set by a
credentialing agency and is therefore permitted to use a designated title.
Validity
Content Validity
Criterion Validity
Construct Validity
Cutoff scores
used to various settings to distinguish those who pass a test from those who
do not.
Methods for setting cutoff scores:
1.Norm-Referenced Methods
2.Criterion-Referenced Methods
Angoff Method
Ebel Procedure
Nedelsky Method
3. Second Criterion-Referenced Methods
Contrasted-groups analysis
P S Y C H O L O G I C A L A S S E S S M E N T
Thank you
S O M B R I O & C O R P U Z , M A .