http://changingminds.org/explanations/research/design/types_validity.
ht
m
Different types of reliability
Type of When How What
reliability
Correlate each individual
Internal Assess a single All the items on your test
item score with the total
consitency dimension assess the same construct.
score.
The reliability coefficient for
Find consistency in your test indicates a poor,
Examine the percentage of
Interrater the rating of some moderate, or high degree of
agreement between raters.
outcome agreement between
respondents.
Correlate the scores from
Compare several
one form of the test with
different forms of a Two forms of your test are
Parallel scores from a second form
test to see if they are equivalent to one other.
of the same test with the
equivalent or reliable
same content.
The test gives the same
Correlate the scores from
results even if the
Test-retest Reliability over time time 1 with the scores of
participants didn't all take it
time 2.
at the same time.
Types of validity
In a research project there are several types of validity that may be sought. In summary:
• Construct: Constructs accurately represent reality.
o Convergent: Simultaneous measures of same construct correlate.
o Discriminant: Doesn't measure what it shouldn't.
• Internal: Causal relationships can be determined.
• Conclusion: Any relationship can be found.
• External: Conclusions can be generalized.
• Criterion: Correlation with standards.
o Predictive: Predicts future values of criterion.
o Concurrent: Correlates with other tests.
• Face: Looks like it'll work.
Construct validity
Construct validity occurs when the theoretical constructs of cause and effect accurately
represent the real-world situations they are intended to model. This is related to how well the
experiment is operationalized. A good experiment turns the theory (constructs) into actual
things you can measure. Sometimes just finding out more about the construct (which itself
must be valid) can be helpful.
Construct validity is thus an assessment of the quality of an instrument or experimental
design. It says 'Does it measure the construct it is supposed to measure'. If you do not have
construct validity, you will likely draw incorrect conclusions from the experiment (garbage
in, garbage out).
Convergent validity
Convergent validity occurs where measures of constructs that are expected to correlate do so.
This is similar to concurrent validity (which looks for correlation with other tests).
Discriminant validity
Discriminant validity occurs where constructs that are expected not to relate do not, such that
it is possible to discriminate between these constructs.
Convergence and discrimination are often demonstrated by correlation of the measures used
within constructs.
Convergent validity and Discriminant validity together demonstrate construct validity.
Nomological network
Defined by Cronbach and Meehl, this is the set of relationships between constructs and
between consequent measures. The relationships between constructs should be reflected in
the relationships between measures or observations.
Multitrait-Multimethod Matrix (MTMM)
Defined by Campbell and Fiske, this demonstrates construct validity by using multiple
methods (eg. survey, observation, test) to measure the same set of 'traits' and showing
correlations in a matrix, where blocks and diagonals have special meaning.
Content validity
Content validity occurs when the experiment provides adequate coverage of the subject being
studied. This includes measuring the right things as well as having an adequate sample.
Samples should be both large enough and be taken for appropriate target groups.
The perfect question gives a complete measure of all aspects of what is being investigated.
However in practice this is seldom likely, for example a simple addition does not test the
whole of mathematical ability.
Content validity is related very closely to good experimental design. A high content validity
question covers more of what is sought. A trick with all questions is to ensure that all of the
target content is covered (preferably uniformly).
Internal validity
Internal validity occurs when it can be concluded that there is a causal relationship between
the variables being studied. A danger is that changes might be caused by other factors.
It is related to the design of the experiment, such as in the use of random assignment of
treatments.
Conclusion validity
Conclusion validity occurs when you can conclude that there is a relationship of some kind
between the two variables being examined.
This may be positive or negative correlation.
External validity
External validity occurs when the causal relationship discovered can be generalized to other
people, times and contexts.
Correct sampling will allow generalization and hence give external validity.
Criterion-related validity
This examines the ability of the measure to predict a variable that is designated as a criterion.
A criterion may well be an externally-defined 'gold standard'. Achieving this level of validity
thus makes results more credible.
Criterion-related validity is related to external validity.
Predictive validity
This measures the extent to which a future level of a variable can be predicted from a current
measurement. This includes correlation with measurements made with different instruments.
For example, a political poll intends to measure future voting intent.
College entry tests should have a high predictive validity with regard to final exam results.
Concurrent validity
This measures the relationship between measures made with existing tests. The existing tests
is thus the criterion.
For example a measure of creativity should correlate with existing measures of creativity.
Face validity
Face validity occurs where something appears to be valid. This of course depends very much
on the judgment of the observer. In any case, it is never sufficient and requires more solid
validity to enable acceptable conclusions to be drawn.
Measures often start out with face validity as the researcher selects those which seem likely
prove the point.
Threats
Validity as concluded is not always accepted by others and perhaps rightly so. Typical
reasons why it may not be accepted include:
• Inappropriate selection of constructs or measures.
• Insufficient data collected to make valid conclusions.
• Measurement done in too few contexts.
• Measurement done with too few measurement variables.
• Too great a variation in data (can't see the wood for the trees).
• Inadequate selection of target subjects.
• Complex interaction across constructs.
• Subjects giving biased answers or trying to guess what they should say.
• Experimental method not valid.
• Operation of experiment not rigorous.