[go: up one dir, main page]

0% found this document useful (0 votes)
18 views67 pages

Assessment of Learning 1 Module Compress

The document provides a comprehensive overview of assessment in education, detailing key concepts such as assessment, measurement, and evaluation, along with types of tests and their purposes. It distinguishes between norm-referenced and criterion-referenced tests, outlines various types of assessments, and emphasizes principles of high-quality classroom assessment. Additionally, it discusses the importance of validity, reliability, fairness, and ethical considerations in the assessment process.

Uploaded by

christianmonteto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views67 pages

Assessment of Learning 1 Module Compress

The document provides a comprehensive overview of assessment in education, detailing key concepts such as assessment, measurement, and evaluation, along with types of tests and their purposes. It distinguishes between norm-referenced and criterion-referenced tests, outlines various types of assessments, and emphasizes principles of high-quality classroom assessment. Additionally, it discusses the importance of validity, reliability, fairness, and ethical considerations in the assessment process.

Uploaded by

christianmonteto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

ASSESSMENT

OF
LEARNING 1

TEACHING
MATERIALS

Compiled by Giovanni A. Alcain, CST, LPT, (MAEd-Eng on-going)


EDUC10

ASSESSMENT OF LEARNING 1

MODULE 1 – BASIC CONCEPTS IN ASSESSMENT OF LEARNING

Assessment –refers to the process of gathering, describing or quantifying information about the student
performance. It includes paper and pencil test, extended responses (example essays) and performance
assessment are usually referred to as‖authentic assessment‖ task (example presentation of research work)

Measurement-is a process of obtaining a numerical description of the degree to which an individual


possesses a particular characteristic. Measurements answers the questions‖how much?

Evaluation- it refers to the process of examining the performance of student. It also determines whether
or not the student has met the lesson instructional objectives.

Test –is an instrument or systematic procedures designed to measure the quality, ability, skill or
knowledge of students by giving a set of question in a uniform manner. Since test is a form of assessment,
tests also answer the question‖how does individual student perform?

Testing-is a method used to measure the level of achievement or performance of the learners. It also
refers to the administration, scoring and interpretation of an instrument (procedure) designed to elicit
information about performance in a simple of a particular area of behavior.

Types of Measurement

There are two ways of interpreting the student performance in relation to classroom instruction. These are
the Norm-reference tests and Criterion-referenced tests.

Norm-reference test is a test designed to measure the performance of a student compared with other
students. Each individual is compared with other examinees and assigned a score-usually expressed as
percentile, a grade equivalent score or a stanine. The achievement of student is reported for broad skill
areas, although some norm referenced tests do report student achievement for individual.

The purpose is to rank each student with respect to the achievement of others in broad areas of knowledge
and to discriminate high and low achievers.

Criterion- referenced test is a test designed to measure the performance of students with respect to some
particular criterion or standard. Each individual is compared with a pre determined set of standard for
acceptable achievement. The performance of the other examinees are irrelevant. A student’s score is
usually expressed as a percentage and student achievement is reported for individual skills,

The purpose is to determine whether each student has achieved specific skills or concepts. And to find out
how mush students know before instruction begins and after it has finished.

Other terms less often used for criterion-referenced are objective referenced, domain referenced, content
referenced and universe referenced.
According to Robert L. Linn and Norma E. gronlund (1995) pointed out the common characteristics and
differences of Norm-Referenced Tests and Criterion-Referenced Tests

Common Characteristics of Norm-Referenced Test and Criterion-Referenced Tests

1. Both require specification of the achievement domain to be measured


2. Both require a relevant and representative sample of test items
3. Both use the same types of test items
4. Both used the same rules for item writing (except for item difficulty)
5. Both are judge with the same qualities of goodness (validity and reliability)
6. Both are useful in educational assessment

Differences between Norm-Referenced Tests and Criterion Referenced Tests

Norm –Referenced Tests Criterion-Referenced Tests

1. Typically covers a large domain of 1.Typically focuses on a delimited domain of


learning tasks, with just few items learning tasks, with a relative large number of
measuring each specific task. items measuring each specific task.

2. Emphasizes discrimination among 2.Emphasizes among individuals can and


individuals in terms of relative of level of cannot perform.
learning.
3. Favors items of large difficulty and 3.Matches item difficulty to learning tasks,
typically omits very easy and very hard without altering item difficulty or omitting
items easy or hard times

4. Interpretation requires clearly defined 4.Interpretation requires a clearly defined and


group delimited achievement domain

TYPES OF ASSESSMENT

There are four type of assessment in terms of their functional role in relation to classroom instruction.
These are the placement assessment, diagnostic assessment, formative assessment and summative
assessment.

A. Placement Assessment is concerned with the entry performance of student, the purpose of
placement evaluation is to determine the prerequisite skills, degree of mastery of the course
objectives and the best mode of learning.
B. Diagnostic Assessment is a type of assessment given before instruction. It aims to identify the
strengths and weaknesses of the students regarding the topics to be discussed. The purpose of
diagnostic assessment:
1. To determine the level of competence of the students
2. To identify the students who have already knowledge about the lesson;
3. To determine the causes of learning problems and formulate a plane for remedial action.
C. Formative Assessment is a type of assessment used to monitor the learning progress of the
students during or after instruction. Purpose of formative assessment:
1. To provide feed back immediately to both student and teacher regarding the success and
failure of learning.
2. To identify the learning errors that is need of correction
3. To provide information to the teacher for modifying instruction and used for improving
learning and instruction
D. Summative Assessment is a type of assessment usually given at the end of a course or unit.
Purpose of summative assessment:
1. To determine the extent to which the instructional objectives have been met;
2. To certify student mastery of the intended outcome and used for assigning grades;
3. To provide information for judging appropriateness of the instructional objectives
4. To determine the effectiveness of instruction

MODULE 2 - PRINCIPLES OF HIGH-QUALITY CLASSROOM ASSESSMENT

1. Clarity of learning targets


2. Appropriateness of Assessment Methods
3. Validity
4. Reliability
5. Fairness
6. Positive Consequences
7. Practicality and Efficiency
8. Ethics

1. CLARITY OF LEARNING TARGETS

Assessment can be made precise, accurate and dependable only if what are to be achieved are
clearly stated and feasible. The learning targets, involving knowledge, reasoning, skills, products
and effects, need to be stated in behavioral terms which denote something which can be observed
through the behavior of the students.

Cognitive Targets

Benjamin Bloom (1954) proposed a hierarchy of educational objectives at the cognitive level.
These are:

Knowledge – acquisition of facts, concepts and theories

Comprehension - understanding, involves cognition or awareness of the interrelationships


Application – transfer of knowledge from one field of study to another of from one concept to
another concept in the same discipline

Analysis – breaking down of a concept or idea into its components and explaining g the concept
as a composition of these concepts

Synthesis – opposite of analysis, entails putting together the components in order to summarize
the concept

Evaluation and Reasoning – valuing and judgment or putting the ―worth‖ of a concept or
principle.

Skills, Competencies and Abilities Targets

Skills – specific activities or tasks that a student can proficiently do


Competencies – cluster of skills
Abilities – made up of relate competencies categorized as:

 Cognitive
 Affective
 Psychomotor

Products, Outputs and Project Targets

 tangible and concrete evidence of a student’s ability


 need to clearly specify the level of workmanship of projects
 expert
 skilled
 novice

2. APPROPRIATENESS OF ASSESSMENT METHODS

Written-Response Instruments
Objective tests – appropriate for assessing the various levels of hierarchy of educational
objectives

Essays – can test the students’ grasp of the higher level cognitive skills

Checklists – list of several characteristics or activities presented to the subjects of a study, where
they will analyze and place a mark opposite to the characteristics.

Product Rating Scales

 Used to rate products like book reports, maps, charts, diagrams, notebooks, creative
endeavors
 Need to be developed to assess various products over the years
Performance Tests - Performance checklist

 Consists of a list of behaviors that make up a certain type of performance


 Used to determine whether or not an individual behaves in a certain way when asked to
complete a particular task

Oral Questioning – appropriate assessment method when the objectives are to:

 Assess the students’ stock knowledge and/or


 Determine the students’ ability to communicate ideas in coherent verbal sentences.

Observation and Self Reports


 Useful supplementary methods when used in conjunction with oral questioning and
performance tests

3. VALIDITY

 Something valid is something fair.


 A valid test is one that measures what it is supposed to measure.

Types of Validity

 Face: What do students think of the test?


 Construct: Am I testing in the way I taught?
 Content: Am I testing what I taught?
 Criterion-related: How does this compare with the existing valid test?
 Tests can be made more valid by making them more subjective (open items).

MORE ON VALIDITY

Validity – appropriateness, correctness, meaningfulness and usefulness of the specific


conclusions that a teacher reaches regarding the teaching-learning situation.

Content validity – content and format of the instrument


 Students’ adequate experience
 Coverage of sufficient material
 Reflect the degree of emphasis

Face validity – outward appearance of the test, the lowest form of test validity

Criterion-related validity – the test is judge against a specific criterion

Construct validity – the test is loaded on a ―construct‖ or factor


4.RELIABILITY

 Something reliable is something that works well and that you can trust.
 A reliable test is a consistent measure of what it is supposed to measure.

Questions:
 Can we trust the results of the test?
 Would we get the same results if the tests were taken again and scored by a different
person?

Tests can be made more reliable by making them more objective (controlled items).

 Reliability is the extent to which an experiment, test, or any measuring procedure yields
the same result on repeated trials.

 Equivalency reliability is the extent to which two items measure identical concepts at an
identical level of difficulty. Equivalency reliability is determined by relating two sets of
test scores to one another to highlight the degree of relationship or association.

 Stability reliability (sometimes called test, re-test reliability) is the agreement of


measuring instruments over time. To determine stability, a measure or test is repeated on
the same subjects at a future date.

 Internal consistency is the extent to which tests or procedures assess the same
characteristic, skill or quality. It is a measure of the precision between the observers or of
the measuring instruments used in a study.

 Interrater reliability is the extent to which two or more individuals (coders or raters)
agree. Interrater reliability addresses the consistency of the implementation of a rating
system.

RELIABILITY – CONSISTENCY, DEPENDABILITY, STABILITY WHICH CAN BE


ESTIMATED BY

Split-half method
 Calculated using the following: Spearman-Brown prophecy formula and Kuder-
Richardson – KR 20 and KR21

 Consistency of test results when the same test is administered at two different time
periods such as Test-retest method and Correlating the two test results.

5. FAIRNESS

The concept that assessment should be 'fair' covers a number of aspects.


 Student Knowledge and learning targets of assessment
 Opportunity to learn
 Prerequisite knowledge and skills
 Avoiding teacher stereotype
 Avoiding bias in assessment tasks and procedures

6. POSITIVE CONSEQUENCES

Learning assessments provide students with effective feedback and potentially improve their
motivation and/or self-esteem. Moreover, assessments of learning gives students the tools to
assess themselves and understand how to improve positive consequence on students, teachers,
parents, and other stakeholders

7. PRACTICALITY AND EFFICIENCY

 Something practical is something effective in real situations.


 A practical test is one which can be practically administered.

Questions:

 Will the test take longer to design than apply?


 Will the test be easy to mark?

Tests can be made more practical by making it more objective (more controlled items)

 Teacher Familiarity with the Method


 Time required
 Complexity of Administration
 Ease of scoring
 Ease of Interpretation
 Cost Teachers should be familiar with the test, - does not require too much time -
implementable

8. ETHICS

 Informed consent
 Anonymity and Confidentiality

1. Gathering data
2. Recording Data
3. Reporting Data

ETHICS IN ASSESSMENT – ―RIGHT AND WRONG‖


 Conforming to the standards of conduct of a given profession or group
 Ethical issues that may be raised
1. Possible harm to the participants.
2. Confidentiality.
3. Presence of concealment or deception.
4. Temptation to assist students.

MODULE 3 – DEVELOPMENT OF CLASSROOM TOOLS FOR MEASURING


KNOWLEDGE AND UNDERSTANDING

DIFFERENT TYPES OF TESTS


MAIN POINTS FOR TYPES OF TEST
COMPARSON

Psychological Educational

 Aims to measure  Aims to measure the


students intelligence result of instructions
or mental ability in a and learning (e.g.
large degree without Performance Tests)
reference to what the
students has learned
 Measures the
intangible
Purpose
characteristics of an
individual (e.g.
Aptitude Tests,
Personality Tests,
Intelligence Tests)
Survey Mastery

 Covers a broad range  Covers a specific


of objectives objective
 Measures general  Measures
achievement in fundamental skills
Scope of Content certain subjects and abilities
 Constructed by  Typically constructed
trained professional by the teacher
Norm- Referenced Criterion-Referenced

 Result is interpreted  Result is interpreted


by comparing one by comparing
student’s performance student’s performance
with other students’ based on a predefined
performance standard
 Some will really pass  All or none may pass
 There is competition  There is no
for a limited competition for a
percentage of high limited percentage of
scores high score
 Describes pupil’s  Describes pupil’s
performance mastery of course
compared to others objectives
Interpretation

 Verbal  Non-verbal
 Words are used by  Students do not use
students in attaching words in attaching
meaning to or meaning to or in
responding to test responding to test
items items (e.g. graphs,
Language Mode numbers, 3-D
subjects)
Standardized Informal

 Constructed by a  Constructed by a
professional item classroom teacher
writer
 Covers a broad range  Covers a narrow
of content covered in range of content
a subject area
 Uses mainly multiple  Various types of
choice items are used
 Items written are  Teacher picks or
Construction screened and the best writes items as
items were chosen for needed for the test
the final instrument
 Can be scored by a  Scored manually by
machine the teacher
 Interpretation of  Interpretation is
results is usually usually criterion-
norm-referenced referenced
Individual Group

 Mostly given orally or  This is a paper-and-


requires actual pen test
demonstration of skill
 One-on-one  Loss of rapport,
situations, thus, many insight and
opportunities for knowledge about each
clinical observation examinee
 Chance to follow-up  Same amount of time
examinee’s response needed to gather
in order to clarify or information from one
Manner of Administration comprehend it more student
clearly
Objective Subjective

 Scorer’s personal  Affected by scorer’s


judgement does not personal opinions,
affect the scoring biases and judgement
 Worded that only one  Several answers are
answer is acceptable possible
 Little or no  Possible to
disagreement on what disagreement on what
is the correct answer is the correct answer
Effect of Biases

Power Speed

 Consists of series of  Consists of items


items arranged in approximately equal
ascending order of in difficulty
difficulty
 Measures student’s  Measure’s student’s
ability to answer more speed or rate and
Time Limit and Level of
and more difficult accuracy in
Difficulty items responding
Selective Supply

 There are choices for  There are no choices


the answer for the answer
 Multiple choice, True  Short answer,
or False, Matching Completion,
Type Restricted or
Extended Essay
 Can be answered  May require a longer
quickly time to answer
 Prone to guessing  Less chance to
Format guessing but prone to
bluffing
 Time consuming to  Time consuming to
construct answer and score
TYPES OF TESTS ACCORDING TO FORMAT

1. Selective Type – provides choices for the answer


a. Multiple Choice – consists of a stem which describes the problem
and 3 or more alternatives which give the suggested solutions. The incorrect
alternatives are the distractions.
b. True-False or Alternative Response – consists of declarative statement
that one has to mark true or false, right or wrong, correct or incorrect, yes or no, fact or
opinion,, and the like
c. Matching Type – consists of two parallel columns: Column A, the
column of premises from which a match is sought; Column B, the column of responses
from which the selection is made.
2. Supply Test
a. Short Answer – uses a direct question that can be answered by a word,
phrase, a number, or a symbol
b. Completion Test – It consists of an incomplete statement
3. Essay Test
a. Restricted Response – limits the content of the response by
restricting the scope of the topic
b. Extended Response – allows the students to select any factual
information that they think is pertinent, to organize their answers in accordance with their best
judgement
Projective Test
 A psychological test that uses images in order to evoke responses from a subject and
reveal hidden aspects of the subject’s mental life

 These were developed in an attempt to eliminate some of the major problems


inherent in the use of self-report measures, such as the tendency of some
respondents to give “socially responsible” responses.

Important Projective Techniques


1. Word Association Test. An individual is given a clue or hint and asked to respond
to the first thing that comes to mind.
2. Completion Test. In this the respondents are asked to complete an incomplete
sentence or story. The completion will reflect their attitude and state of mind.
3. Construction Techniques (Thematic Apperception Test) – This is more or less
like completion test. They can give you a picture and you are asked to write a story about it. The
initial structure is limited and not detailed like the completion test. For e.g.: 2 cartoons are
given and a dialogue is to be written.
4. Expression Techniques - : In this the people are asked to express the feeling or
attitude of each other people.

GUIDELINES FOR CIINSTRUCTING TEST ITEMS


When to use Essay Test
Essays are appropriate when:
1. the group to be tested is SMALL and the test is NOT TO BE USED again;
2. you wish to encourage and reward the development of student’s SKILL
WRITING;
3. you are more interested in explori ng the student’s ATTITDES than in
measuring his/her academic achievement;
4. you are more confident of your ability as a critical and fair reader than as an
imaginative writer of good objective test items

When to Use Objective Test Items

Objective test items are especially appropriate when:


1. The group to be tested is LARGE and the test may be REUSED;
2. HIGHLY RELIABLE TEST SCOREs must be obtained as efficiently as possible;
3. IMPARTIALITY of evaluation, ABSOLUTE FAIRNESS, and FREEDOM from
possible test SCORING INFLUENCES – fatigue, lack of anonymity are essential;
4. You are more confident of your ability to express objective test items clearly
than your ability to judge essay test answers correctly;
5. There is more PRESSURE IN SPEEDY REPORTING OF SCORES than for speedy
test preparation.

Multiple Choice Items


 It consists of:
1. Stem – which identifies the question or problem
2. Response alternatives or Options
3. Correct answer
Example:
Which of the following is a chemical change? (STEM)
a. Evaporation of alcohol c. burning oil
b. Freezing of water d. melting of wax Alternatives
Advantage of Using Multiple Choice Items
Multiple choice items can provide:
1. Versatility in measuring all levels of cognitive ability
2. Highly reliable test scores
3. Scoring efficiency and accuracy
4. Objective measurement of student achievement or ability
5. A wide sampling of content or objectives
6. A reduce guessing factor when compared to true-false items
7. Different response alternatives which can provide diagnostic feedback.

Limitations of Multiple Choice Items


1. Difficult and time consuming to construct
2. Lead a teacher to favour simple recall of facts
3. Place a high degree of dependence on student’s reading ability and te acher’s
writing ability
SUGGESTIONS FOR WRITING MULTIPPLE CHOICE ITEMS
1. When possible, state the stem as a direct question rather than as an incomplete
statement.
Poor: Alloys are ordinarily produced by…
Better: How are alloys ordinarily produced?
2. Present a definite, explicit singular question or problem in the stem.
Poor: Psychology…
Better: The science of mind and behaviour is called…
3. Eliminate excessive verbiage or irrelevant information from the stem.
Poor: While ironing her formal polo shirt, June burned her hand accidentally on
the hot iron. This was due to a heat transfer because…
Better: Which of the following ways of heat transfer explains why June’s hand
was burned after she touched a hot iron?
4. Include in the stem any word(s) that might otherwise be repeated in each
alternative.
Poor:
In national elections in the US, the President is officially
a. Chosen by the people
b. Chosen by the electoral college
c. Chosen by members of the Congress
d. Chosen by the House of Representative’
Better:
In national elections in the US, the President is officially chosen by
a. the people
b. the electoral college
c. members of the Congress
d. the House of Representative
5. Use negatively stated questions sparingly. When used, underline and/or
capitalize the negative word.
Poor: Which of the following is not cited as an accomplishment of Arroyo
administration?
Better: Which of the following is NOT cited as an accomplishment of Arroyo
administration?
6. Make all alternatives plausible and attractive to the less knowledge or skilful
student.
What process is most nearly opposite of photosynthesis?
Poor Better
a. Digestion a. Digestion
b. Relaxation b. Assimilation
c. Respiration c. Respiration
d. Exertion d. Catabolism
7. Make the alternative grammatically parallel with each other and consistent with
the stem.
Poor: What would advance the application of atomic discoveries to medicine?
a. Standardized techniques for treatment of patients
b. Train the average doctor to apply the radioactive treatments
c. Remove the restriction of the use of radioactive substances
d. Establishing hospital staffed by highly trained radioactive therapy
specialist.
Better: What would advance the application of atomic discoveries to medicine?
a. Development of standardized techniques for treatment of patients
b. Removal of restriction on the use of radioactive substances
c. Addition of trained radioactive therapy specialist to hospital staffs
d. Training the average doctor in applicant of radioactive treatments.
8. Make the alternatives mutually exclusive.
Poor: The daily minimum required amount of milk that a 10-year old should
drink is
a. 1-2 glasses
b. 2-3 glasses*
c. 3-4 glasses*
d. At least 4 glasses
Better: What is the daily minimum required amount of milk a 10-year old child
should drink?
a. 1 glass
b. 2 glasses
c. 3 glasses
d. 4 glasses
9. When possible, present alternatives in some logical order (chronological, most to
least, alphabetical).
At 7 a.m. two trucks leave a diner and travel north. One truck averages 42 miles
per hour and the other truck averages 38 miles per hour. At what time will be 24 miles apart?
Undesirable Desirable
a. 6 p.m. a. 1 a.m.
b. 9 a.m. b. 6 a.m.
c. 1 a.m. c. 9 a.m.
d. 1 p.m. d. 1 p.m.
e. 6 a.m. e. 6 p.m.
10. Be sure there is only one correct or best response to the item.
Poor: The two most desired characteristics in a classroom test are validity and
a. Precision
b .Reliability*
c. Objectivity
d. Consistency*
Best: The two most desired characteristics in a classroom test are validity and
a. Precision
b. Reliability*
c. Objectivity
d. Standardization
11. Make alternative approximately equal in length.
Poor: The most general cause of low individual incomes in the US is
a. Lack of valuable productive services to sell*
b. Unwillingness to work
c. Automation
d. Inflation
Better: What is the most general cause of low individual incomes in the US?
a. A lack of valuable productive services to sell*
b. The population’s overall unwillingness to work
c. The nation’s increase reliance on automation
d. An increasing national level of inflation.
12. Avoid irrelevant clues, such as grammatical structure, well-known verbal
associations or connections between stem and answer.
Poor: (grammatical clue) A chain of islands is called an
a. Archipelago
b. Peninsula
c. Continent
d. Isthmus
Poor: (verbal association) The reliability of a test can be estimated by a
coefficient of
a. Measurement
b. Correlation*
c. Testing
d. Error
Poor: (connections between stem and answer) The height to which a water
dam is build depends on
a. The length of the reserve behind the dam.
b. The volume of the water behind the dam.
c. The height of water behind the dam.*
d. The strength of the reinforcing the wall.
13. Use at least four alternatives for each item to lower the probability of getting
the item correctly by guessing.
14. Randomly distribute the correct responses among the alternative positions
throughout the test having approximately the same proportion of the alternatives a, b, c, d and
e as correct response.
15. Use the alternative NONE OF THE ABOVE and ALL OF THE ABOVE sparingly.
When used, such alternatives should occasionally be used as the correct response.
True-False Test Items
True-false test items are typically used to measure the ability to identify whether or not the
statements of facts are correct. The basic format is simply a declarative statement that the
student must judge as true or false. No modification of the basic form in which the student
must respond “yes” or “no”, “agree” or “disagree.”
Three Forms:
1. Simple – consists of only two choices
2. Complex – consists of more than two choices
3. Compound – two choices plus a conditional completion response
Examples:
Simple: The acquisition of morality is a developmental process. True False
Complex: The acquisition of morality is a developmental process. True False Opinion
Compound: An acquisition of morality is a developmental process. True False
If the statement is false, what makes it false?

Advantages of True-False Items


True-false items can provide:
1. The widest sampling of content or objectives per unit of testing time
2. Scoring efficiency and accuracy
3. Versatility in measuring all levels of cognitive ability
4. Highly reliable test scores; and
5. An objective measurement of student achievement or ability.
Limitations of True-False Items
1. Incorporate an extremely high guessing factor
2. It can often lead the teacher to write ambiguous statements due to the difficulty
of writing statements which are unequivocally true or false.
3. Do not discriminate between students varying ability as well as other item types.
4. It can often include more irrelevant clues than do other item types.
5. It can often lead a teacher to favour testing of trivial challenge.

Suggestions for Writing True-False Items (Payne, 1984)


1. Base true-false items upon statements that are absolutely true or false, without
qualifications or exceptions.
Poor: Nearsightedness is hereditary in origin.
Better: Geneticists and eye specialists believe that the predisposition to
nearsightedness is hereditary.
2. Express the item statement as simply as clearly as possible.
Poor: When you see a highway with a marker that reads: “Interstate 80,” you
know that the construction and upkeep of that road is built and maintained by the local and
national government.
Better: The construction and maintenance of the interstate highways are are
provided by both local and national government.
3. Express a single idea in each test item.
Poor: Water will boil at a higher temperature if the atmospheric pressure on its
surface is increased and more heat is applied to the container.
Better: Water will boil at a higher temperature if the atmospheric pressure on its
surface is increased; or water will boil at a higher temperature if more heat is applied to the
container.
4. Include enough background information and qualifications so that the ability to
respond correctly to the item does not depend on some special, uncommon knowledge.
Poor: The second principle of education is that the individual gathers
knowledge.
Better: According to John Dewey, the second principle of education is that the
individual gathers knowledge.
5. Avoid lifting statements directly from the text lecture or other materials so that
memory alone will not permit a correct answer.
Poor: For every actions there is an opposite or equal reaction.
Better: If you were to stand in a canoe and throw a life jacket forward to another
canoe, chances are, you canoe will jerk backward.
6. Avoid using negatively stated item statements.
Poor: The Supreme Court is not composed of nine justices.
Better: The Supreme Court is composed of nine justices
7. Avoid the use of unfamiliar vocabulary.
Poor: According to some politicians, the raison d’etre for capital punishment is
retribution.
Better: According to some politicians, justification for the existence of capital
punishment is retribution.
8. Avoid the use of specific determiners which should permit a test wise but
unprepared examinee to respond correctly. Specific determiners refer to sweeping terms like
always, all, none, never, impossible, inevitable. Statements including such terms are likely to be
false. On the other hand, statements using qualifying determiners such as usually, sometimes,
often, are likely to be true. When statements require specific determiners, make sure they
appear in both true and false items.
Poor: All sessions of Congress are called by the President (F)
The Supreme Court is frequently required to rule on the constitutionality
of the law. (T)
The objectives test is generally easier to score than an essay test. (T)
Better: When specific determiners are used, reverse the expected outcomes.
The sum of angles of a triangle is always 180 degrees. (T)
Each molecule of a given compound is chemically the same as every
other molecule of that compound. (T)
The galvanometer is the instrument usually used for the metering of
electrical energy use in a home. (F)
9. False items tend to discriminate more highly than true items. Therefore, use
more false items than true items (but not more than 15% additional false items).
Matching Test Items
In general matching items consists of a column of stimuli presented on the left side
of the exam page and a column of responses placed on the right side of the page. Students are
required to match the response associated with a given stimulus.
Advantages of Using Matching Test Items
1. Require short period of reading and response time allowing the teacher to cover
more content.
2. Provide objective measurement of student achievement or ability.
3. Provide highly reliable test scores.
4. Provide scoring efficiency and accuracy.
Disadvantages of Using Matching Test Items
1. Have difficulty measuring learning objectives requiring more than simple recall
or information.
2. Are difficult to construct due to the problem of selecting a common set of stimuli
and responses.
Suggestions for Writing Matching Test items
1. Include directions which clearly state the basis for matching the stimuli with the
responses. Explain whether or not the response can be used more than once and indicate
where to write the answer.
Poor: Directions: Match the following.
Better: Directions: On the line to the left of each identifying location and
characteristics in Column 1, write the letter on the country in column III that is best defined.
Each country in Column may be used more than once.
2. Use only homogeneous material in matching items.
Poor: Directions: Match the following.
1. _______Water A. NaCI
2. _______Discovered Radium B. Fermi
3. _______Salt C. NH3
4. _______Year of the First Nuclear Fission by man D. 1942
5. _______Ammonia E. Curie
Better: Directions: On the line to the left of each compound in column I, write
the letter of the compound’s formula presented in column II. Use each formula once.
Column I Column II
1. _______Water A.H2SO4
2. _______Salt B. HCI
3. _______Ammonia C. NaCI
4. _______Sulfuric Acid D. H2O
E. H2HCI
3. Arrange the list of responses in some systematic order if possible – chronological,
alphabetical.
Directions: On the line to the left of each definition in column I, write the letter of the
defense mechanism in column II that is described. Use each defense mechanism only once.
Column I Column II
Undesirable Desirable
_____1. Hunting for reason to support A. Rationalization A. Denial of Reality
one’s belief
_____2. Accepting the values B. Identification B. Identification
and norms of others
_____3. As one’s own even if C. Projection C. Projection
they are contrary to previously held values
______4. Attributing to other’s D. Introjection D. Projection
own unacceptable
impulse and thoughts and desires
______5. Ignoring disagreeable E. Denial of Reality E. Rationalization
And situations, thoughts and
desires

4. Avoid grammatical or other clues to correct response.


Poor: Directions: Match the following in order to complete the sentence on the left.
___1. Igneous rocks are formed A. a hardness of 7
___2. The formation of coal requires B. with crystalline rock
___3. Ageode is filled C. a metamorphic rock
___4. Feldspar is classified as D. through the solid formation of molten
Better: Avoid sentence completion due to grammatical clues.
Note:
1. Keep matching items brief, limiting the list of stimuli to under 10
2. Include more responses than stimuli to help prevent answering through the
process of elimination.
3. When possible, reduce the amount of reading time by including only short
phrases or single word in the response list.

Completion Test Items


The completion items require the student to answer a question or to finish an
incomplete statement by filling in a blank with correct word or phrase.
Example:
According to Freud, personality is made up of three major systems, the______,
the________, and the__________.
Advantages of Using Completion Items
Completion items can:
1. Provide a wide sampling of content;
2. Efficiency measure lower levels of cognitive ability;
3. Minimize guessing as compared to multiple choice or true-false items; and
4. Usually provide an objective measure of student achievement or ability
Limitations of Using Completion Items
Completion items:
1. Are difficult to construct so that the desired response is clearly indicated;
2. Have difficulty in measuring learning objectives requiring more than simple recall
of information;
3. Can often include irrelevant clues than do other item types;
4. Are more time consuming to score when compared to multiple choice or true-
false items; and
5. Are more difficult to score since more than one answer may have to be
considered correct if the item was not properly prepared.
Suggestions for Writing Completion Test Items
1. Omit only significant words from the statement.
Poor: Every atom has a central (core) called nucleus.
Better: Every atom has a central core called a (an) (nucleus)
2. Do not omit so many words from the statement that the intended meaning is
lost.
Poor: The__ were to Egypt was the__ were to Persia as__ were to the clearly
tribes of Israel.
Better: The Pharaohs were to Egypt as the__ were to Persia as__ were to the
early tribes of Israel.
3. Avoid grammatical or other clues to the correct response.
Poor: Most if the United States’ libraries are organized according t o the
(Dewey) decimal system.
Better: Which organizational system is used by most of the United States’
libraries? (Dewey Decimal)
4. Be sure there is only one correct response.
Poor: Trees which shed their leaves annually are (seed-bearing, common).
Better: Trees which shed their leaves annually are called (delicious).
5. Make the blanks of equal length.
Poor: In Greek mythology, Vulcan was the sun of (Jupiter and Juno).
Better: In Greek mythology, Vulcan was the son of___ and___.
6. When possible, delete words at the end of the statement after the student has
been presented a clearly defined problem.
Poor: (122.5) is the molecular weight of KC103.
Better: The molecular weight of KC103 is___.
7. Avoid lifting statements directly from the text, lecture or other sources.
8. Limit the required response to a single word or phrase.

Essay Test Items


A classroom essay test consists of a small number of questions to which the
student is expected to demonstrate his/her ability to:
a. Recall factual knowledge;
b. Organize this knowledge; and
c. Present the knowledge is a logical, integrated answer to the question.
Classification of Essay Test:
1. Extended-response essay item
2. Limited Response or Short-answer essay item
Example of Extended-Response Essay Item:
Explain the difference between the S-R (Stimulus-Response) and the S-O-R (Stimulus-
Organism-Response) theories of personality. Include in your answer the following:
a. Brief description of both theories
b. Supporters of both theories
c. Research methods used to study each of the two theories (20 pts)
Example of Short-Answer Essay Item:
Identify research methods used to study the (Stimulus-Response) and the S-O-R
(Stimulus-Organism-Response) theories of personality. (10pts)
Advantages of Using Essay Items
Essay items:
1. Are easier and less time consuming to construct than most item types;
2. Provide a means for testing students’ ability to compose an answer and present it
in a logical manner; and
3. Can efficiently measure higher order cognitive objectives – analysis, synthesis,
evaluation.

Limitations of Using Essay Items


Essay Items:
1. Cannot measure a large amount of content or objectives;
2. Generally prove a low test scorer reliability;
3. Require an extensive amount of instructor’s time to read and grade; and
4. Generally do not provide an objective measure of student achievement or ability
(subject to bias on the part of the grader)
Suggestions for Writing the Essay Test Items
1. Prepare essay items that elicit the type of behaviour you want to measure.
Learning Objective: The student will be able to explain how the normal curve
serves as a statistical model.
Poor: Describe a normal curve in terms of symmetry, modality, kurtosis and
skewness.
Better: Briefly explain how the normal curve serves as statistical model for
estimation and hypothetical testing.
2. Phrase each items so that the student’s task is clearly indicated.
Poor: Discuss the economic factors which led to stock market crash of 2008.
Better: Identify the three economic conditions which led to the stock market
crash of 2008. Discuss briefly each condition in correct chronological sequence and in one
paragraph indicate how the three factors were interrelated.
3. Indicate for each item appoint or weight and an estimated the limit for
answering.
Poor: Compare the writing of Bret Harte and Mark Twain in terms of setting,
depth of characterization, and dialogue styles of their main characters.
Better: Compare the writings of Bret and Mark Twain in terms of setting, depth
of characterization, and dialogue styles of their main characters. (10 points 20 points)
4. Ask questions that will elicit responses on which experts could agree that one
answer is better than another.
5. Avoid giving a student a choice among optional items as this greatly reduces the
reliability of the test.
6. It is generally recommended for classroom examinations to administer several
short-answer items rather than only on or two extended-response items.

Guidelines for Grading Essay Items


1. When writing each essay item, simultaneously develop a scoring rubric.
2. To maintain a consistent scoring system and ensure same criteria are applied
to all assessments, score one essay across all test prior to scoring the next essay.
3. To reduce the influence of the halo effect, bias and other subconscious factors,
all essay questions should be graded blind to the identity of the student.
4. Due to the subjective nature of graded essays, the score on one essay may be
influenced by the quality of previous essays. To provide this type of bias, reshuffle the order of
assessments after reading through each item.
Principle 3: Balanced
- A balanced assessments sets target in all sets in domains of learning (cognitive,
effective, and psychomotor) or domains of intelligence (verbal-linguistics, logic
mathematical, bodily kinaesthetic, visual-spatial, musical-rhythmic, intrapersonal-social,
intrapersonal-introspection, physical world-natural-existential-spiritual)
- A balanced assessment makes use of both traditional and alternative assessment.
Principle 4. Validity
Validity – is a degree to which the assessment instrument measures what it intends
to measure.
 It is also refers to the usefulness of the instrument for a given purpose.
 It is the most important criterion of a good assessment instrument
Ways in Establishing Validity
1. Face Validity- is done by examining the physical appearance of the
instrument.
2. Content Validity- is done through a careful and critical examination of the
objectives of assessment so that it reflects the curricular objectives.
3. Criterion-related Validity- is established statistically such that a set of scores
revealed by the measuring instrument IS CORRELATED with the scores obtained in another
EXTERNAL PREDICTOR OR MEASURE.
It has two purposes:
a. Concurrent Validity- describe the present status of the individual by correlating the sets of
scores obtained FROM TWO MEASUREs GIVEN CONCURRENTLY.
Example: Relate the reading test result with pupils’ average grades in reading given by the
teacher.
b. Predictive Validity- describes the future performance of an individual by
correlating the sets of scores obtained from TWO MEASURES GIVEN AT A LONGER TIME
INTERVAL.
Example: The entrance examination scores in a test administered to a freshmen class at the
beginning of the school year is correlated with the average grades at the end of the school year.
4. Construct Validity- Validity established by analysing the activities and processes
that correspond to a particular concept; is established statistically by comparing psychological
traits or factors that theoretically influence scores in a test.
a. Convergence validity helps to establish construct validity when you use two different
measurement procedures and research methods (e.g., participant observation and a survey) in
your study to collect data about a construct (e.g., anger, depression, motivation, task
performance).
b. Divergent validity helps to establish construct validity by demonstrating that the construct
you are interested in (e.g., anger) is different from other constructs that might be present in
your study (e.g., depression).

Factors Influencing the Validity of an Assessment Instrument


1. Under Directions - directions that do not clearly indicate to the students how to
respond to the task and how to record the responses tend to reduce validity.
2. Reading Vocabulary and sentence structure too difficult- Vocabulary and
sentences structure that are too complicated for the student result in the assessment of
reading comprehension thus altering the meaning of assessment result.
3. Ambiguity- Ambiguous statements in assessments task contribute to
misinterpretations and confusion. Ambiguity sometimes confuses the better students more
than it does the poor students.
4. Inadequate time limits- time limits that do not provide students with enough
time to consider the tasks and provide thoughtful responses can reduce the validity of
interpretations of results.
5. Overemphasis of easy- to assess aspects of domain at the expense of
important, but hard- to assess aspects (construct under the presentation). It is easy to develop
test question that asses factual recall and generally harder to develop ones that tap conceptual
understanding or higher-order thinking processes such as the evaluation of completing
positions or arguments. Hence it is important to guard against under representation of task
getting the important, but more difficult to assess aspects of achievement.
6. Test items inappropriate for the outcomes being measured- attempting to
measure understanding, thinking, skills and other complex types of achievement with test
forms that are appropriate for only measuring factual knowledge will invalidate the results.
7. Poorly constructed test items- test items that unintentionally provide clues to
the answer tend to measure the students’ alertness in detecting clues as well as mastery of
skills or knowledge the test is intended to measure
8. Test too short- if a test is too short to provide a representative sample of the
performance we are interested in its validity will suffer accordingly.
9. Improper arrangement of items- test items are typically arranged in order of
difficulty, with the easiest items first. Placing difficult items first in the test may cause students
to spend too much time on these and prevent them from reaching items they could easily
answer. Improper arrangement may also influenced validity by having a detrimental effect on
student motivation.
10. Identifiable pattern of answer- Placing correct answers in some systematic
pattern (e.g., T,T,F,F, or B,B,BC,C,C,D,D,D) enables students to guess the answers to some items
more easily, and this lowers validity.
TABLE OF SPECIFICATIONS – TOS

Table of specification is a device for describing test items in terms of the content and the process
dimensions. That is, what a student is expected to know and what he or she is expected to do with that
knowledge. It is described by combination of content and process in the table of specification.

Sample of One way table of specification in Linear Function

Content Number of Class Number of Items Test Item

Sessions Distribution

1. Definition of linear function 2 4 1-4

2. Slope of a line 2 4 5-8

3. Graph of linear function 2 4 9-12

4. Equation of linear function 2 4 13-16

5. Standard Forms of a line 3 6 17-22

6. Parallel and perpendicular lines 4 8 23-30

7. Application of linear functions 5 10 31-40

TOTAL 20 40 40

Number of items= Number of class sessions x desired total number of itens

Total number of class sessions

Example :

Number of items for the topic‖ definition of linear function‖

Number of class session= 2

Desired number of items= 40

Total number of class sessions=20


Number of items= Number of class sessions x desired total number of itens
Total number of class sessions

=2x40
20

Number of items= 4

Sample of two-way table of specification in Linear Function


Content Class hours Know Com App Analysis Synthesis Evaluati Tota
on l

1.Definition of linear 2 1 1 1 1 4
function

2.Slope of a line 2 1 1 1 1

3.Graph of linear function 2 1 1 1 1 4

4.Equation of linear function 2 1 1 1 1 4

5.Standard Forms of a line 3 1 1 1 1 1 1 6

6.Parallel and perpendicular 4 1 2 1 2 8


line

7.Application of linear 5 1 1 3 1 3 10
functions

TOTAL 20 4 6 8 8 7 7 40

MODULE 4: DESCRIPTION OF ASSESSMENT DATA

TEST APPRAISAL

ITEM ANALYSIS

Item analysis refers to the process of examining the student’s responses to each item in the test.
According to Abubakar S. Asaad and William M. Hailaya (Measurement and Evaluation Concepts &
Principles) Rex Bookstore (2004 Edition), there are two characteristics of an item. These are desirable
and undesirable characteristics. An item that has desirable characteristics can be retained for subsequent
use and that with undesirable characteristics is either be revised or rejected.

These criteria in determining the desirability and undesirability of an item.


a. Difficulty if an item
b. Discriminating power of an item
c. Measures of attractiveness

Difficulty index refers to the proportion of the number of students in the upper and lower groups who
answered an item correctly. In a classroom achievement test, the desired indices of difficulty not lower
than 0.20 nor higher than 0.80. the average index difficulty form 0.30 or 0.40 to maximum of 0.60.

DF = PUG + PLG
2

PUG = proportion of the upper group who got an item right


PLG = proportion of the lower group who get an item right

Level of Difficulty of an Item

Index Range Difficulty Level

0.00-0.20 Very difficult

0.21-0.40 Difficult

0.41-0.60 Moderately Difficult

0.61-0.80 Easy

0.81-1.00 Very Easy

Index of Discrimination

Discrimination Index is the differences between the proportion of high performing students who got the
item and the proportion of low performing students who got an item right. The high and low performing
students usually defined as the upper 27% of the students based on the total examination score and the
lower 27% of the students based on total examination score. Discrimination are classified into positive
Discrimination if the proportion of students who got an item right in the upper performing group is
greater than the students in the upper performing group. And Zero Discrimination if the proportion of the
students who got an item right in the upper performing group and low performing group are equal.

Discrimination Index Item Evaluation

0.40 and up Very good item


0.30-0.39 Reasonably good item but possibly subject to improvement

0.20-0.29 Marginal, usually needing and being subject to improvement

Below 0.19 Poor Item, to be rejected or improved by version

Maximum Discrimination is the sum of the proportion of the upper and lower groups who answered the
item correctly. Possible maximum discrimination will occur if the half or less of the sum of the upper and
lower groups answered an item correctly.

Discriminating Efficiency is the index of discrimination divided by the maximum discrimination.

PUG = proportion of the upper group who got an item right

PLG= proportion of the lower group who got an item right

D i = discrimination index

DM – Maximum discrimination

DE = Discriminating Efficiency

Formula:

Di = PUG – PLG

DE = Di
DM

DM= PUG + PLG

Example: Eighty students took an examination in Algebra, 6 students in the upper group got the correct
answer and 4 students in the lower group got the correct answer for item number 6. Find the
Discriminating efficiency

Given:

Number of students took the exam = 80

27% of 80 = 21.6 or 22, which means that there are 22 students in the upper performing group and 22
students in the lower performing group.

P UG = 6/22 = 27%
P LG = 4/22 = 18%

Di = PUG- PLG

= 27%- 18%

Di= 9%

DM = PUG +PLG

= 27% + 18%

DM= 45%

DE = Di/DM

= .09/.45

DE = 0.20 or 20%

This can be interpreted as on the average, the item is discriminating at 20% of the potential of an item of
its difficulty.

Measures of Attractiveness

To measure the attractiveness of the incorrect option ( distracters) in multiple-choice tests, we count the
number if students who selected the incorrect option in both upperand lower groups. The incorrect
option is said to be effective distracter if there are more students in the lower group chose that
incorrect option than those students in the upper group.

Steps of Item Analysis

1. Rank the scores of the students from highest score to lowest score.
2. Select 27% of the papers within the upper performing group and 27% of the papers within the
lower performing group.
3. Set aside the 46% of papers because they will not be used for item analysis.
4. Tabulate the number of students in the upper group and lower group who selected each
alternative.
5. Compute the difficulty of each item
6. Compute the discriminating powers of each item
7. Evaluate the effectiveness of the distracters
MODULE 5: INTERPRETATION AND UTILIZATION OF TEST RESULTS

CRITERION-REFERENCED INTERPRETATION VS. NORM-REFRENCED


INTERPRETATION OF TEST RESULTS

STATISTICAL ORGANIZATION OF TEST SCORES

We shall discusse different statistical technique used in describing and analyzing test results.

1. Measures of Central Tendency (Averages)


2. Measures of Variability ( Spread of Scores
3. Measures of Relationship (Correlation)
4. Skewness

Measures of Central Tendency it is a single value that is used to identify the center of the data, it is taught
as the typical value in a set of scores. It tends to lie within the center if it is arranged form lowest to
highest or vice versa. There are three measures of central tendency commonly used; the mean, median
and mode.

The Mean

The Mean is the common measures of center and it also know as the arithmetic average.

Sample Mean = ∑x
n

∑= sum of the scores

X= individual scores

n = number of scores

Steps in solving the mean value using raw scores

1. Get the sum of all the scores in the distribution


2. Identify the number of scores (n)
3. Substitute to the given formula and solve the mean value

Example: Find the mean of the scores of students in algebra quiz

(x) scores in algebra

45
35
48
60
44
39
47
55
58
54
∑x = 485
n= 10

Mean = ∑x
n
= 485÷ 10
Mean = 48.5

Properties of Mean
1. Easy to compute
2. It may be an actual observation in the data set
3. It can be subjected to numerous mathematical computation
4. Most widely used
5. Each data affected by the extremes values
6. It is easily affected by the extremes values
7. Applied to interval level data

The Median
The median is a point that divides the scores in a distribution into two equal parts when the scores are
arranged according to magnitude, that is from lowest score to highest score or highest score to lowest
score. If the number of score is an odd number, the value of the median is the middle score. When the
number of scores is even number, the median values is the average of the two middle scores.

Example: 1. Find the median of the scores of 10 students in algebra quiz.

(x) scores of students in algebra


45
35
38
60
44
39
47
55
58
54

First , arrange the scores from lowest to highest and find the average of two middle most scores since the
number of cases in an even.
35
39
44
45
47
48
54
55
58
60

Mean = 47 + 48
2
= 47.5 is the median score

50% of the scores in the distribution fall below 47.5


Example 2. Find the median of the scores of 9 students in algebra quiz

(x) scores of students in algebra


35
39
44
45
47
48
54
55
58

The median value is the 5th score which is 47. Which means that 50% of the scores fall below 47.

Properties of Median

1. It is not affected by extremes values


2. It is applied to ordinal level of data
3. The middle most score in the distribution
4. Most appropriate when there are extremes scores

The Mode

The mode refers to the score or scores that occurred most in the distribution. There are classification of
mode: a) unimodal is a distribution that consist of only one mode. B) bimodal is a distribution of scores
that consist of two modes, c) multimodal is a score distribution that consist of more than two modes.

Properties of Mode

1. It is the score/s occurred most frequently


2. Nominal average
3. It can be used for qualitative and quantitative data
4. Not affected by extreme values
5. It may not exist

Example 1. Find the mode of the scores of students in algebra quiz: 34,36,45,65,34,45,55,61,34,46

Mode= 34 , because it appeared three times. The distribution is called unimodal.


Example 2. Find the mode of the scores of students in algebra quiz: 34,36,45,61,34,45,55,61,34,45

Mode = 34 and 45, because both appeared three times. The distribution is called bimodal

Measures of Variability

Measures of Variability is a single value that is used to describe the spread out of the scores in
distribution, that is above or below the measures of central tendency. There are three commonly used
measures variability, the range, quartile deviation and standard deviation

The Range

Range is the difference between highest and lowest score in the data set.

R=HS-LS

Properties of Range

1. Simplest and crudest measure


2. A rough measure of variation
3. The smaller the value, the closer the score to each other or the higher the value, the more
scattered the scores are.
4. The value easily fluctuate, meaning if there is a changes in either the highest score or lowest score
the value of range easily changes.

Example: scores of 10 students in Mathematics and Science. Find the range and what subject has a greater
variability?

Mathematics Science

35 35

33 40

45 25

55 47

62 55

34 35

54 45
36 57

47 39

40 52

Mathematics Science

HS = 62 HS =57

LS= 33 LS= 25

R = HS-LS R= HS-LS

R= 62-33 R= 57-25

R= 29 R= 32

Based form the computed value of the range, the scores in Science has greater variability. Meaning,
scores in Science are more scattered than in the scores in Mathematics

The Quartile Deviation

Quartile Deviation is the half of the differences the third quartile (Q3) and the first quartile (Q1). It is
based on the middle 50% of the range, instead the range of the entire set

Of distribution. In symbol QD = Q3-Q1


2

QD= quartile deviation

Q3= third quartile value

Q1= first quartile value

Example : In a score of 50 students, the Q3 = 50.25 and Q1 = 25.45, Find the QD

QD = Q3-Q1
2

=50.25 – 25.4
2

QD= 12.4
The value of QD =12.4 which indicates the distance we need to go above or below the median to include
approximately the middle 50% of the scores.

The standard deviation

The standard deviation is the most important and useful measures of variation, it is the square root of the
variance. It is an average of the degree to which each set of scores in the distribution deviates from the
mean value. It is more stable measures of variation because it involves all the scores in a distribution
rather than range and quartile deviation.

SD = √∑( x-mean)2
n-1

where ,x = individual score

n= number of score in a distribution

Example: 1. Find the standard deviation of scores of 10 students in algebra quiz. Using the given data
below.

X (x-mean)2

45 12.25

35 182.25

48 0.25

60 132.25

44 20.5

39 90.25

47 2.25

55 42.25

58 90.25

54 30.25

∑x= 485 ∑(x-mean)2 = 602.25

N= 10
Mean = ∑x

= 485

10

Mean= 48.5

SD= √∑(x-mean)2
n-1

SD= √ 602.5
10-1

SD= √ 66.944444

SD= 8.18, this means that on the average the


amount that deviates from the mean value= 48.5
is 8.18

Example 2: Find the standard deviation of the score of 10 students below. In what subject has greater
variability

Mathematics Science

35 35

33 40

45 25

55 47

62 55
34 35

54 45

36 57

47 39

40 52

Solve for the standard deviation of the scores in mathematics

Mathematics (x) (x-mean)2

35 82.81

33 123.21

45 0.81

55 118.81

62 320.41

34 102.01

54 98.01

36 65.61

47 8.41

40 16.81

∑x = 441 ∑(x-mean)2 = 936.9

Mean = 44.1 ∑(x-mean)2= 918

SD= √∑(x-mean)2
n-1

= √ 936.9
10-1

= 104.1

SD = 10.20 for the mathematics subject

Solve for the standard deviation of the score in science

Science (x) (x-mean)2

36 64

40 9

25 324

47 16

55 144

35 64

45 4

57 196

39 16

52 81

∑x= 430 ∑(x-mean)2= 918

Mean =430
10
Mean= 43

SD= √∑(x-mean)2
n-1

= √ 918
10-1
=√ 102
SD= 10.10 for science subject
The standard deviation for mathematics subject is 10.20 and the standard deviation foe science subject is
10.10, which means that mathematics scores has a greater variability than science scores. In other words,
the scores in mathematics are more scattered than in science.

Interpretation of Standard Deviation

When the value of standard deviation is large, on the average, the scores will be far form the mean. On
the other hand, If the value of standard deviation is small, on the average, the score will be close form the
mean.

Coefficient of Variation

Coefficient of variation is a measure of relative variation expressed as percentage of the arithmetic mean.
It is used to compare the variability of two or more sets of data even when the observations are expressed
in different units of measurement. Coefficient of variation can be solve using the formula.

( )
CV = SD x 100%
Mean

The lower the value of coefficient of variation, the more the overall data approximate to the mean or more
the homogeneous the performance of the group

Group Mean Standard deviation

A 87 8.5

B 90 10.25

CV Group A= standard deviation x 100%


Mean

= 8.5 x 100%
87
CV Group A=9.77%

CV GroupB= standard deviation x 100%


Mean
= 10.25 x 100%
90
CV Group B=11.39%

The CV of Group A is 9.77% and CB of Group B is 11/39%, which means that group A has homogenous
performance.

Percentile Rank
The Percentile rank of a score is the percentage of the scores in the frequency distribution which are
lower. This means that the percentage of the examinees in the norm group who scored below the score of
interest. Percentile rank are commonly used to clarify the interpretation of scores on standardized tests.

Z- SCORE
Z- score (also known as standard score) measures how many standard deviations an observations is
above or below the mean. A positive z-score measures the number of standard deviation a score is above
the mean, and a negative z-negative z-score gives the number of standard deviation a score is below the
mean.

The z-score can be computed using the formula

Z= x-µ for population


o

Z= x-mean for sample


SD
Where

X= is a raw score

0= is the standard deviation of the population

µ= is the mean of the population

SD= is the standard deviation of the sample

EXAMPLE:

James Mark’s examination results in the three subjects are as follows:

Subject Mean Standard deviation James Mark’s Grade

Math Analysis 88 10 95

Natural Science 85 5 80

Labor Management 92 7.5 94


EXAMPLE:A study showed the performance of two Groups A and B in a certain test given by a
researcher. Group A obtained a mean score of 87 points with standard deviation of 8.5 points, Group B
obtained a mean score of 90 points with standard deviation of 10.25 points. Which of the two group has a
more homogeneous performance?

In what subject did James Mark performed best? Very Poor?

Z math analysis = 95-88

10

Z math analysis = 0.70

Z natural science= 80-85

Z natural Science= -1

Z labor management = 94-92

7.5

Z labor management = 0.27

James Mark had a grade in Math Analysis that was 0.70 standard deviation above the mean of the Math
Analysis grade, while in Natural Science he was -1.0 standard deviation below the mean of Natural
Science grade. He also had a grade in Labor Management that was 0.27 standard deviation above the
mean of the Labor Management grades. Comparing the z scores, James Mark performed best in
Mathematics Analysis while he performed very poor in Natural Science in relation to the group
performance.

T-score

T-score can be obtained by multiplying the z-score by 10 and adding the product to 50. In symbol, T-
score = 10z +50

Using the same exercise, compute the T-score of James Mark in Math Analysis, Natural Science and
Labor Management

T- score (math analysis) = 10 (.7) +50


= 57

T- score (natural science) = 10(-1)+50

= 40

T-score (labor management) = 10(0.27) +50

=52.7

Since the highest T-score us in math analysis = 57, we can conclude that James Mark performed best in
Math analysis than in natural science and labor management.

Stanine

Stanine also known as standard nine, is a simple type of normalized standard score that illustrate the
process of normalization. Stanines are single digit scores ranging form 1 to 9.

The distribution of new scores is divided into nine parts

Percent 4% 7% 12% 17% 20% 17% 12% 7% 4%


in
Stanines

2 3 4 5 6 7 8 9

Stanines 1

Skewness

Describes the degree of departures of the distribution of the data from symmetry.

The degree of skewness is measured by the coefficient of lsewness, denoted as SK and computed as,

SK= 3(mean-media)
SD
Normal curve is a symmetrical bell shaped curve, the end tails are continuous and asymptotic. The mean,
median and mode are equal. The scores are normally distributed if the computed value of SK=0

Areas Under the Normal Curve

Positively skewed when the curve is skewed to the right, it has a long tail extending off to the right but a
short tail to the left. It increases the presence of a small proportion of relatively large extreme value SK˃0
When the computed value of SK is positive most of the scores of students are very low, meaning to say
that they performed poor in the said examination

Negatively skewed when a distribution is skewed to the left. It has a long tail extending off to the left but
a short tail to the right. It indicates the presence of a high proportion of relatively large extreme values
SK˂0.

When the computed value of SK is negative most of the students got a very high score, meaning to say
that they performed very well in the said examination

MODULE 6: MARKS/GRADES AND GRADING SYSTEM

BASIC TERMINOLOGY

 Marks= Cumulative grades that reflect students’ academic progress during a period of instruction
 Score= Reflect performance on a single assessment
 Grades= Can be used interchangeably with marks
 Most of the time these terms are used to mean the same thing

FEEDBACK AND EVALUATION

 Test results can be used for a variety of reasons, such as informing students of their progress,
evaluating achievement, and assigning grades
 Formative evaluation= Activities that are aimed at providing feedback to the students
 Summative evaluation= Activities that determine the worth, value, or quality of an outcome
 Often involve the assignment of a grade

INFORMAL AND FORMAL EVALUATION

 Informal evaluation= Not planned and not standardized


 Can come in the form of commentary such as
 ―great work‖ or ―try that one again‖
 Formal evaluation= More likely to be applied consistently and be written out
 Includes scores and commentary, often written down

THE USE OF FORMATIVE EVALUATION IN SUMMATIVE EVALUATION

 Sometimes, formative assessment and evaluation can feed into summative evaluation
 This is recommended more in courses of study that are topical, rather than sequential, as mastery
of earlier concepts may not reflect on the assessment of later ones, and vice versa

REPORTING STUDENT PROGRESS: WHICH SYMBOLS TO USE?

 This is often decided by the administration or state


 Most teachers are familiar with letter (A, B, C, D, F) and numerical (0-100) grades
 Verbal descriptors= Grades like excellent or needs improvement
 Pass-fail= A variant of mastery grading in which most students are expected to master the content
(i.e. ―pass‖)
 Supplemental systems= Using means of communication like phone calls home, checklists of
objectives, or other methods to communicate feedback

BASIS OF GRADES

Before assigning grades, consider: Are the grades solely based on academic achievement, or are there
other factors to consider?

 Factors could include attendance, participation, attitudes, etc.


 Most experts recommend making academic chievement the sole basis for assigning grades
 If desired, the recommendation is to keep a separate rating system for such nonachievement
factors to keep achievement grades unbiased
 When comparing grades (5th grade to 6th grade, for example) it is critical to consider how grades
were calculated. Grades based heavily on homework will not be comparable to grades based
heavily on testing.

FRAME OF REFERENCE

 After deciding what to base your grades on, you will then have to decide how you’re going to
interpret and compare student scores
 There are several different frames of reference that suit different needs

NORM-REFERENCE GRADING (RELATIVE GRADING)

 Involves comparing each student’s performance to that of a reference group


 Also known as ―grading on a curve‖
 In this arrangement, a certain amount of students receive each grade (10% receive A’s, 20%
receive B’s, and so on)
 Straightforward method of grading, and helps reduce grade inflation
 However, depending on the reference group used as a basis, this frame of reference is not always
considered fair
 Another approach is to use ranges instead of exact percentages (10-20% A’s, 20-30% B’s,
etc.)

CRITERION-REFERENCED GRADING (ABSOLUTE GRADING)

 Involves comparing a student’s performance to a specified level of performance


 One common system is the percentage system (A= 90-100%, B=8089%, etc.)
 Marks directly describe student performance
 However, there may be considerable variability between teachers of how they assign grades
(lower vs. higher expectations)

ACHIEVEMENT IN RELATION TO IMPROVEMENT OR EFFORT


 Students who make higher learning gains earn better grades than those who make smaller gains
 This method of grading can be risky, as students may figure out to start the year or unit low and
finish high to earn a better grade
 There are also many other technical factors, including the fact that this is not a pure measure of
achievement, but a measure of effort as well
 Can motivate poor students, but may have a negative effect on strong students

ACHIEVEMENT RELATIVE TO ABILITY

Usually based on performance on an intelligence test

There are also numerous technical and consistency issues to be taken into consideration when using this
approach

RECOMMENDATION

Most experts recommend using absolute than relative grading systems, as they represent pure measures of
student achievement

Both grading systems have their strengths and limitations, which should be taken into consideration when
deciding which to use

 Reporting both styles of grades is also an option

COMBINING GRADES INTO A COMPOSITE

The decision of how much certain grades should weigh into the composite (or final grade) is up to the
teacher or department and is based on the importance of different types of assignments (e.g., five response
papers might be 10% each, with 12.5% assigned to 4 tests; this is different from 50% assigned to one
major paper and 50% to one cumulative test)

There are several different methods of equating scores into composite scores, although most schools have
commercial gradebook programs that do this for the teacher

INFORMING STUDENTS OF GRADING SYSTEM

 Students should be informed early on in the course about exactly how they will be graded well
before any assessment procedures have taken place
 Parents should also be informed of grading procedures
 It is the professional responsibility of a teacher to explain the scores/grades to students and
parents in ways that the explanation is understandable
 This can be done by simply handing out a sheet with a breakdown of the weights of different
grades, though it is recommended that Q & A sessions are conducted

PARENT CONFERENCES
 Parent-teacher conferences should be professional and the information disclosed should be kept
confidential
 Discussion should only concern the individual student
 Teachers should have a file folder or computer file of the student’s performance and grades
readily available
 Presenting the students work as evidence/an indicator of grades is also recommended

FUNCTIONS OF GRADING AND REPORTING IN GRADING SYSTE,

1. Improve students’ learning by:

 clarifying instructional objectives for them


 showing students’ strengths & weaknesses
 providing information on personal-social development
 enhancing students’ motivation (e.g., short-term goals)
 indicating where teaching might be modified

Best achieved by:

 day-to-day tests and feedback


 plus periodic integrated summaries

2. Reports to parents/guardians

 Communicates objectives to parents, so they can help promote learning


 Communicates how well objectives being met, so parents can better plan

3.Administrative and guidance uses

 Help decide promotion, graduation, honors, athletic eligibility


 Report achievement to other schools or to employers
 Provide input for realistic educational, vocational, and personal counseling

TYPES OF GRADING AND REPORTING SYSTEM

1.Traditional letter-grade system

 Easy and can average them


 But of limited value when used as the sole report, because:
1. they end up being a combination of achievement, effort, work habits, behavior
2. teachers differ in how many high (or low) grades they give
3. they are therefore hard to interpret
4. they do not indicate patterns of strength and weakness

2. Pass-fail
 Popular in some elementary schools
 Used to allow exploration in high school/college
 Should be kept to the minimum, because:
1. do not provide much information
2. students work to the minimum
 In mastery learning courses, can leave blank till ―mastery‖ threshold reached

3. Checklists of objectives

 Most common in elementary school


 Can either replace or supplement letter grades
 Each item in the checklist can be rated: Outstanding, Satisfactory, Unsatisfactory; A, B, C, etc.
 Problem is to keep the list manageable and understandable

4. Letters to parents/guardians

 Useful supplement to grades


 Limited value as sole report, because:
1. very time consuming
2. accounts of weaknesses often misinterpreted
3. not systematic or cumulative
 Great tact needed in presenting problems (lying, etc.)

5. Portfolios

 Set of purposefully selected work, with commentary by student and teacher


 Useful for:
1. showing student’s strengths and weaknesses
2. illustrating range of student work
3. showing progress over time or stages of a project
4. teaching students about objectives/standards they are to meet

6. Parent-teacher conferences

 Used mostly in elementary school


 Portfolios (when used) are useful basis for discussion
 Useful for:
1. two-way flow of information
2. getting more information and cooperation from parents
 Limited in value as the major report, because
1. time consuming
2. provides no systematic record of progress
3. some parents won’t come

HOW YOU SHUOLD DEVELOP GRADING SYSTEM?


1. Guided by the functions to be served

 will probably be a compromise, because functions often conflict


 but always keep achievement separate from effort

2. Developed cooperatively (parents, students, school personnel)

 more adequate system


 more understandable to all

3. Based on clear statement of learning objectives

 are the same objectives that guided instruction and assessment


 some are general, some are course-specific
 aim is to report progress on those objectives
 practicalities may impose limits, but should always keep the focus on objectives

4. Consistent with school standards

 should support, not undermine, school standards


 should use the school’s categories for grades and performance standards
 should actually measure what is described in those standards

5. Based on adequate assessment

 implication: don’t promise something you cannot deliver


 design a system for which you can get reliable, valid data

6. Based on the right level of detail

 detailed enough to be diagnostic


 but compact enough to be practical
1. not too time consuming to prepare and use
2. understandable to all users
3. easily summarized for school records
 probably means a letter-grade system with more detailed supplementary reports

7. Providing for parent-teacher conferences as needed

 regularly scheduled for elementary school


 as needed for high school

ASSIGNING LETEER GRADES

What to include?

 Only achievement
 Avoid temptation to include effort for less able students, because:
1. difficult to assess effort or potential
2. difficult to distinguish ability from achievement
3. would mean grades don’t mean same thing for everyone (mixed message, unfair)

How to combine data?

 Properly weight each component to create a composite


 Must put all components on same scale to weight properly:
1. equate ranges of scores (see example on p. 389, where students score 10-50 on one test and
80-100 on another)
2. or, convert all to T-scores or other standard scores (see chapter 19)

What frame of reference?

 Relative—score compared to other students (where you rank)


1. grade (like a class rank) depends on what group you are in, not just your own performance
2. typical grade may be shifted up or down, depending on group’s ability
3. widely used because much classroom testing is norm-referenced
 Absolute—score compared to specified performance standards (what you can do)
1. grade does NOT depend on what group you are in, but only on your own performance
compared to a set of performance standards
2. complex task, because must
I. clearly define the domain
II. clearly define and justify the performance standards
III. do criterion-referenced assessment
3. conditions hard to meet except in complete mastery learning settings
 Learning ability or improvement—score compared to learning ―potential‖ or past performance
4. widely used in elementary schools
5. inconsistent with a standards-based system (each child is their own standard)
6. reliably estimating learning ability (separate from achievement) is very difficult
7. can’t reliably measure change with classroom measures
8. therefore, should only be used as a supplement

What distribution of grades?

 Relative (have ranked the students)—distribution is a big issue


1. normal curve defensible only when have large, unselected group
2. when ―grading on the curve,‖ school staff should set fair ranges of grades for different
groups and courses
3. when ―grading on the curve,‖ any pass-fail decision should be based on an absolute
standard (i.e., failed the minimum essentials)
4. standards and ranges should be understood and followed by all teachers
 Absolute (have assessed absolute levels of knowledge)—not an issue
1. system seldom uses letter grades alone
2. often includes checklists of what has been mastered (see example on p. 395)
3. distribution of grades is not predetermined

Guidelines for Effective Grading

1. Describe grading procedures to students at beginning of instruction.


2. Clarify that course grade will be based on achievement only.
3. Explain how other factors (effort, work habits, etc.) will be reported.
4. Relate grading procedures to intended learning outcomes.
5. Obtain valid evidence (tests, etc.) for assigning grades.
6. Try to prevent cheating.
7. Return and review all test results as soon as possible.
8. Properly weight the various types of achievements included in the grade.
9. Do not lower an achievement grade for tardiness, weak effort, or misbehavior.
10. Be fair. Avoid bias. When in doubt, review the evidence. If still in doubt, give the higher
grade.

Conducting Parent-Teacher Conferences

Productive when:

 Carefully planned
 Teacher is skilled

Guidelines for a good conference

1. Make plans

 Review your goals


 Organize the information to present
 Make list of points to cover and questions to ask
 If bring portfolios, select and review carefully

2. Start positive—and maintain a positive focus

3. Present student’s strong points first

 Helpful to have example of work to show strengths and needs


 Compare early vs. later work to show improvement

4. Encourage parents to participate and share information

 Be willing to listen
 Be willing to answer questions

5. Plan actions cooperatively

 What steps you can each take


 Summarize at the end

6. End with positive comment

 Should not be a vague generality


 Should be true

7. Use good human relations skills

DO

 Be friendly and informal


 Be positive in approach
 Be willing to explain in understandable terms
 Be willing to listen
 Be willing to accept parents’ feelings
 Be careful about giving advice

DON’T

 Argue, get angry


 Ask embarrassing questions
 Talk about other students, parents, teachers
 Bluff if you don’t know
 Reject parents’ suggestions
 Be a know-it-all with pat answers

Reporting Standardized Test Results to Parents

Aims

 Present test results in understandable language, not jargon


 Put test results in context of total pattern of information about the student
 Keep it brief and simple

Actions

1. Describe what the test measures

 Use a general statement: e.g., ―this test measures skills and abilities that are useful in
school learning‖
 Refer to any part of the test report that may list skill clusters
 Avoid misunderstandings by:
a. not referring to tests as ―intelligence‖ tests
b. not describing aptitudes and abilities as fixed
c. not saying that a test predicts outcomes for an individual person (can say ―people
with this score usually….‖
 Let a counselor present results for any non-cognitive test (personality, interests, etc.)

2. Explain meaning of test scores (chapter 19 devoted to this)

 For norm-referenced
1. explain norm group
2. explain score type (percentile, stanine, etc.)
3. stay with one type of score, if possible
 For criterion-referenced
1. more easily understood than norm-referenced
2. usually in terms of relative degree of mastery
3. describe the standard of mastery
4. may need to distinguish percentile from percent correct

3. Clarify accuracy of scores

 Say all tests have error


 Stanines already take account of error (because so broad). Two stanine difference is probably
a real difference
 For other scores, use confidence bands when presenting them
 If you refer to subscales with few items, describe them as only ―clues‖ and look for related
evidence.

4. Discuss use of test results

 Coordinate all information to show what action they suggest

Decisions in Assigning Grades

1. What should grades include (effort, achievement, neatness, spelling, good behavior, etc.)?

2. Grades for individual assessments

 criterion-reference or norm-referenced?
1. if criterion-referenced, what standard?
2. if norm-referenced, what reference group?
 letter grades or numbers?

3. Combining assessments for a composite grade

 what common numerical scale?


1. percentages
2. standard scores
3. range of scores (max-min)
4. combining absolute and relative grades
 weight to give different assessments?
 what cut-off points for letter grades?

MODULE 7: AUTHENTIC ASSESSMENT

MODE OF ASSESSMENT

A. Traditional Assessment
1. Assessment in which students typically select an answer or recall information to complete the
assessment. Test may be standardized or teacher made test, these tests may be multiple-
choice, fill-in-the-blanks, true-false, matching type.
2. Indirect measures of assessment since the test items are designed to represent competence by
extracting knowledge and skills from their real life context.
3. Items on standardized instrument tends to test only the domain of knowledge and skill to
avoid ambiguity to the test takers.
4. One-time measures to rely on a single correct answer to each item. There is a limited
potential for traditional test to measure higher order thinking skills.
B. Performance assessment
1. Assessment in which students are asked to perform real-world tasks that demonstrate
meaningful application of essential knowledge and skills
2. Direct measures of students’ performance because task are design to incorporate contexts,
problems, and solutions strategies that students would use in real life.
3. Designed ill-structured challenges since the goal is to help students prepare for the complex
ambiguities in life.
4. Focus on processes and rationales. There is no single correct answer, instead students are led
to craft polished, thorough and justifiable responses, performances and products.
5. Involve long-range projects, exhibits, and performances are linked to the curriculum
6. Teacher is an important collaborator in creating tasks, as well as in developing guidelines for
scoring and interpretation
C. Portfolio Assessment
1. Portfolio is a collection of student’s work specifically to tell a particular story about the
student.
2. A portfolio is not a pie of student work that accumulates over a semester or year
3. A portfolio contains a purposefully selected subset of student work
4. It measures the growth and development of students.

TRADITIONAL ASSESSMENT VS. AUTHENTIC ASSESSMENT

Traditional ----------------------------- Authentic

Selecting a Response ------------------- Performing a Task


Contrived -------------------------------- Real-life

Recall/Recognition -------------------------------- Construction/Application

Teacher-structured ---------------------------------- Student-structured

Indirect Evidence --------------------------------- Direct Evidence

Seven Criteria in Selecting a Good Performance Assessment Task

1. Authenticity – the task is similar to what the students might encounter in the real
world as opposed to encountering only in school.

2. Feasibility – the task is realistically implemented in relation to its cost, space, time,
and equipment requirements.

3. Generalizability – the likelihood that the students’ performance on the task will
generalize to comparable tasks.

4. Fairness – the task is fair to all students regardless of their social status or gender.

5. Teachability – the task allows one to master the skill that one should be proficient in.

6. Multi Foci – the task measures multiple instructional outcomes.

7. Scorability – the task can be reliably and accurately evaluated.

Rubrics

Rubrics is a scoring scale and instructional tool to assess the performance of student using a task-specific
set of criteria. It contains two essential parts: the criteria for the task and levels of performance for each
criterion. It provides teachers an effective means of students-centered feedback and evaluation of the
work of students. It also enables teachers to provide a detailed and informative evaluations of their
performance.
Rubrics is very important most especially if you are measuring the performance of students against a set
of standard or pre-determined set of criteria. Through the use of scoring rubrics or rubrics the teachers can
determine the strengthens and weaknesses of the students, hence it enables the students to develop their
skills.

Steps in developing a Rubrics


1. Identify your standards, objectives and goals for your students. Standard is a statement of what
the students should be able to know or be able to perform. It should indicate that your students
should be able to know or be able to perform. It should indicate that your students should met
these standards. Know also the goals for instruction, what are the learning outcomes.
2. Identify the characteristics of a good performance on the task, the criteria, when the students
perform or present their work, it should indicate that they performed well in the task given to
them; hence they met that particular standards.
3. Identify the levels of performance for each criterion. There is no guidelines with regards to the
number of levels of performance, it vary according to the task and needs. It can have as few as
two levels of performance or as many as the teacher can develop. In this case, the rater can
sufficiently discriminate the performance of the students in each criteria. Through this levels of
performance, the teacher or the rater can provide more detailed feedback about the performance
of the students. It is easier also for the teacher and students to identify the areas needed for
improvement.

Types of Rubrics
1. Holistic Rubrics
In holistic rubrics does not list a separate levels of performance for each criterion. Rather,
holistic, rubrics assigns a level of performance along with a multiple criterion as a whole, in other
words you put all the component together.
Advantage: quick scoring, provide overview of students’ achievement.
Disadvantage: does not provide detailed information about the student performance in specific
areas of the content and skills. May be difficult to provide one overall score.

2. Analytic Rubrics
In analytic rubrics the teacher or the rater identify and assess components of a finished product.
Breaks down the final product into component parts and each part is scored independently. The
total score is the sum of all the rating for all the parts that are to be assessed or evaluated. In
analytic scoring, it is very important for the rater to treat each part as separate to avoid bias
toward the whole product.
Advantage: more detailed feedback, scoring more consistent across students and graders.
Disadvantage: time consuming to score.

Example of Holistic Rubric


3-Excellent Researcher
 Included 10-12 sources
 No apparent historical inaccuracies
 Can easily tell which sources information was drawn from
 All relevant information is included
2- Good Researcher
 Included 5-9 sources
 Few historical inaccuracies
 Can tell with difficulty where information came from
 Bibliography contains most relevant information
1-Poor Researcher

 Included 1-4 sources


 Lots of historical inaccuracies
 Cannot tell from which source information came from
 Bibliography contains very little information

Example of Analytic Rubric

Criteria Limited Acceptable Proficient

1 2 1

Made good observations Observations are Most observations All observations are
absent or vague are clear and detailed clear and detailed

Made good predictions Predictions are Most predictions are All predictions are
absent or irrelevant reasonable reasonable

Appropriate conclusion Conclusion is Conclusion is Conclusion is


absent or consistent with most consistent with
inconsistent with observations observations
observation

Advantages of Using Rubrics

When assessing the performance of the students using performance based assessment it is very important
to use scoring rubrics. The advantages of using rubrics in assessing student’s performance are:

1. Rubrics allow assessment to become more objective and consistent


2. Rubrics clarify the criteria in specific terms
3. Rubrics clearly show the student how work will be evaluated and what is expected
4. Rubrics promote student awareness of the criteria to use in assessing peer performance
5. Rubrics provide useful feedbacks regarding the effectiveness of the instruction: and
6. Rubrics provide benchmarks against which to measure and document progress

PERFORMANCE BASED ASSESSMENT

Performance based assessment is a direct and systematic observation of actual performances of the
students based from a pre-determined performance criterion as cited by (Gabuyo, 2011). It is an
alternative form of assessing the performance of the students that represent a set of strategies for the
application of knowledge, skills and work habits through the performance of tasks that are meaningful
and engaging to students‖
Framework of Assessment Approaches

Selection Type Supply Type Product Performance

True-false Completion Essay, story or poem Oral presentation of


report

Multiple-choice Label a diagram Writing portfolio Musical, dance or


dramatic performance

Matching type Short answer Research report Typing test

Concept man Portfolio exhibit, Art Diving


exhibit

Writing journal Laboratory


demonstration

Cooperation in group
works

Forms of Performance Based Assessment

1. Extended response task


a. Activities for single assessment may be multiple and varied
b. Activities may be extended over a period of time
c. Products from different students may be different in focus
2. Restricted-response tasks
a. Intended performances more narrowly defined than extended-response tasks.
b. Questions may begin like a multiple-choice or short answer stem, but then ask for
explanation, or justification.
c. May have introductory material like an interpretative exercise, but then asks for an
explanation of the answer, not just the answer itself
3. Portfolio is a purposeful collection of student work that exhibits the student’s efforts, progress
and achievements in one or more areas.

Uses of Performance Based Assessment

1. Assessing the cognitive complex outcomes such as analysis, synthesis and evaluation
2. Assessing non-writing performances and products
3. Must carefully specify the learning outcomes and construct activity or task that actually called
forth.

Focus of Performance Bases Assessment

Performance based assessment can assess the process, or product or both (process and product) depending
on the learning outcomes. It also involves doing rather that just knowing about the activity or task. The
teacher will assess the effectiveness of the process or procedures and the product used in carrying out the
instruction. The question is when to use the process and the product?

Use the process when:

1. There is no product
2. The process is orderly and directly observable;
3. Correct procedures/steps in crucial to later success;
4. Analysis of procedural steps can help in improving the product,
5. Learning is at the early age.

Use the product when:

1. Different procedures result in an equally good product;


2. Procedures not available for observation;
3. The procedures have been mastered already;
4. Products have qualities that can be identified and judge

The final step in performance assessment is to assess and score the student’s performance. To assess the
performance of the students the evaluator can used checklist approach, narrative or anecdotal approach,
rating scale approach, and memory approach. The evaluator can give feedback on a student’s performance
in the form of narrative report or grade. There are different was to record the results of performance-based
assessments.

1. Checklist Approach are observation instruments that divide performance whether it is certain or
not certain. The teacher has to indicate only whether or not certain elements are present in the
performances
2. Narrative/Anecdotal Approach is continuous description of student behavior as it occurs,
recorded without judgment or interpretation. The teacher will write narrative reports of what was
done during each of the performances. Form these reports teachers can determine how well their
students met their standards.
3. Rating Scale Approach is a checklist that allows the evaluator to record information on a scale,
noting the finer distinction that just presence or absence of a behavior. The teacher they indicate
to what degree the standards were met. Usually, teachers will use a numerical scale. For instance,
one teacher may arte each criterion on a scale of one to five with one meaning ―skills barely
present‖ and five meaning ―skill extremely well executed.‖
4. Memory Approach the teacher observes the students when performing the tasks without taking
any notes. They use the information from memory to determine whether or not the students were
successful. This approach is not recommended to use for assessing the performance of the
students.

PORTFOLIO ASSESSMENT

Portfolio assessment is the systematic, longitudinal collection of student work created in response to
specific, know instructional objectives and evaluated in relation to the same criteria. Student Portfolio is a
purposeful collection of student work that exhibits the students’ efforts, progress and achievements in one
or more areas. The collection must include student participation in selecting contents, the criteria for
selection, the criteria for judging merit and evidence of student self-reflection.

Comparison of Portfolio and Traditional Forms of Assessment

Traditional Assessment Portfolio Assessment

Measures student’s ability at one time Measures student’s ability over time

Done by the teacher alone, students are not aware Done by the teacher and the students, the students
of the criteria are aware of the criteria

Conducted outside instruction Embedded in instruction

Assigns student a grade Involves student in own assessment

Does not capture the students language ability Capture many facets if language learning
performance

Does not include the teacher’s knowledge of Allows for expression of teacher’s knowledge of
student as a learner student as learner

Does not gives student responsibility Student learns how to take responsibility

THREE TYPES OF PORFOLIO

There are three basic types of portfolio to consider for classroom use. These are working portfolio,
showcase portfolio and progress portfolio

1. Working Portfolio
The first type of portfolio is working portfolio also known as ―teacher -student portfolio‖. As the
name implies that it is a project ―in work‖ it contains the work in progress as well as the finished
samples of work use to reflect in process by the students and teachers. It documents the stages of
learning and provides a progressive record of student growth. This is interactive teacher-student
portfolio that aids in communication between teacher and student.

The working portfolio may be used to diagnose student needs. In both student and teacher have
evidence of student strengths and weakness in achieving learning objectives, information
extremely useful in designing future instruction.
2. Showcase Portfolio
Showcase portfolio is the second type of portfolio and also known as best works portfolio or
display portfolio. In this kind of portfolio, it focuses on the student’s best and most representative
work. It exhibits the best performance of the student. Best works portfolio may document student
activities beyond school for example a story written at home. It is just like an artist’s portfolio
where a variety of work is selected to reflect breadth of talent, painters can exhibit the best
paintings. Hence, in this portfolio the student selects what he or she thinks is representative work.
This folder is most often seen at open houses and parent visitations.

The most rewarding use of student portfolios is the display of student’s best work, the work that
makes them proud. In this case, it encourages self-assessment and build self-esteem to students.
The pride and sense of accomplishment that students feel make the effort well worthwhile and
contribute to a culture for learning in the classroom

3. Progress Portfolio
This third type of portfolio is progress portfolio and it is also known as Teacher Alternative
Assessment Portfolio. It contains examples of student’s work with the same types done over a
period of time and they are utilized to assess their progress

All the works of the students in this type of portfolio are scored, rated, ranked, or evaluated.

Teachers can keep individual student portfolios that are solely for the teacher’s use as an
assessment tool. This a focused type of portfolio and is a model approach to assessment.

Assessment portfolios used to document student learning on specific curriculum outcomes and
used to demonstrate the extent of mastery in any curricular area,

Uses of Portfolios

1. It can provide both formative and summative opportunities for monitoring progress toward
reaching identified outcomes
2. Portfolios can communicate concrete information about what us expected of students in terms
of the content and quality of performance in specific curriculum areas.
3. A portfolio is that they allow students to document aspects of their learning that do not show
up well in traditional assessments
4. Portfolios are useful to showcase periodic or end of the year accomplishment of students such
as in poetry, reflections on growth, samples of best works, etc.
5. Portfolios may also be used to facilitate communication between teachers and parents
regarding their child’s achievement and progress in a certain period of time.
6. The administrator may use portfolios for national competency testing to grant high school
credit, to evaluate education programs.
7. Portfolios may be assembled for combination of purposes such as instructional enhancement
and progress documentation. A teacher reviews students portfolios periodically and make
notes for revising instruction for next year used.
According to Mueller (2010) there are seven steps in developing portfolios of students.

Below are the discussions of each step.

1. Purpose: What is the purposes of the portfolio?


2. Audience: For what audience will the portfolio be created?
3. Content: What samples of student work will be included?
4. Process: What processes (e.g. selection of work to be included, reflection in work, conferencing)
will be engaged in during the development of the portfolio?
5. Management: How will time and materials be managed in the development of the portfolio?
6. Communication: How and when will the portfolio be shared with pertinent audiences?
7. Evaluation: If the portfolio is to be used for evaluation, when and how should it be evaluated?

Guidelines for Assessing Portfolios

1. Include enough documents (items) on which to base judgment


2. Structure the contents to provide scorable information
3. Develop judging criteria and a scoring scheme fir raters to use in assessing the portfolios
4. Use observation instruments such as checklists and rating when possible to facilitate scoring.
5. Use trained evaluators or assessors

GUIDANCE AND COUNSELING

Guidance and Counseling are both process to solve problems of life, they differ only on the approach
used. In guidance the client’s problems are listened carefully and readymade solutions are provided by the
experts. While in counseling the client’s problem are discussed and relevant information are provided in-
between. Through these information, the client will gain an insight to the problem and become
empowered to take his own decision.

Guidance Counselor assist each student to benefit from the school experience through attention to their
personal, social and academic needs.

Guidance (Downing) as pointed out by Lao (2006) is an organized set of specialized services established
as an integral part of the school environment designed to promote the development of students and assist
them toward a realization of sound, wholesome adjustment and maximum accomplishment commensurate
with their potentialities.

Guidance (Good) is a process id dynamic interpersonal relationship designed to influence the attitude and
subsequent behavior of the person.

Counseling is both process and relationship. It is a process by which concentrated attention is given by
both counselor and counselee to the problems and concerns of the students in a setting of privacy,
warmth, mutual acceptance and confidentiality. As a process it utilizes appropriate tools and procedure
which contribute to experience. Counseling is also a relationship characterized by trust, confidence and
intimacy in which the students gains intellectual and emotional stability from which he can resolve
difficulties, make plans and realize greatest self-fulfillment.
Villar (2207) pointed out the different guidance services based from Rules and Regulations of Republic
Act 9258, Rule 1, Section 3 Manila standard, 2007) and other services not mentioned in Rules and
Regulations

1. Individual inventory/ analysis


2. Information
3. Counseling
4. Research
5. Placement
6. Referral
7. Follow-up
8. Evaluation
9. Consultation
10. Program development
11. Public relations

Roles of the Guidance Counselor

There are 5 roles of the guidance counselor are discussed by Dr. Imelda V.G. Villar in her book
―implementing a comprehensive Guidance and Counseling Programs in the Philippines (2007)

1. As Counselor
2. As Coordinator
3. As Consultant\
4. As Conductor of Activities
5. As Change Agent

Essential Elements of Counseling Process

1. Anticipating the interview


2. Developing a positive working relationship
3. Exploring feelings and attitudes
4. Reviewing and determining present status
5. Exploring alternatives
6. Reading decision
7. Post counseling contact

Techniques and Methodologies used in the Guidance Process

1. Autobiography
2. Anecdotal record
3. Case study
4. Cumulative record
5. Interview
6. Observation
7. Projective techniques
8. Rating scale
9. Sociometry

Ethical Consideration of the Counselor

1. Counselor’s responsibility to the client and to his family


2. Recognize the boundaries of their competence and their own personal and professional limitations
3. Confidentiality
4. Imposition of one’s values and philosophy of life on the client is considered unethical.

Four Important Functions of Guidance Services

1. Counseling
 Individual counseling
 Small group counseling
 Crisis counseling
 Career counseling
 Referrals
 Peer helping programs
2. Prevention
 Primary, secondary, tertiary plans and programs
 Individual assessments coordinated student support team activities
 Students activities
 Transitional planning
REFERENCES

Acero, V.O et, al., (2000). Principles and strategies of learning. Manila: Rex Book Store.

Calderon, J.F and Expectation, C.G (1993). Measurement and evaluation. Manila:
Solares printing Press.

Clamorin, L.P (1984). Educational measurement and evaluation. Metro Manila: National
Book Store, Inc.

Garcia, C.D (2004). Educational measurement and evaluation. Mandaluyong City:


Books, Atbp. Publishing Corp.

Hopkins, C.D et, al., (1990). Classroom measurement and evaluation. Illinois: F.E.
Peacoock Publisher, Inc.

Mercado-Del Rosario, A.C. (2001). Educational measurement and evaluation. Manila:


JNPM Design and Printing.

McMillan, J.H (2004). Classroom assessment: principles and practice foe effective
instruction. Boston: Pearson Education, Inc.

Navarro, R.L et. al., (2013). Authentic assessment of student learning outcomes:
Assessment of learning 2. Quezon City, Metro Manila Philippines: LORIMAR Publishing
INC.

Seng, T.O et al., (2003). Educational Psychology: A Practioner-Researcher Approach.


USA: Thomson Asia Pub. Ltd.

Tilestone, D.W. (2004). Student assessment. California: Corwin Press.

Internet site:

http//jonathan,Mueller,faculty.noctrl.edu

http://www1.udel.edu/educ/gottfredson/451/unit11-chap15.htm

https://www.google.com/search?q=uses+of+marks+and+grades+in+assessment&oq=U
SES+OF+MARKS+AND+GRADES&aqs=chrome.

You might also like