THE BIG PICTURE OF STATISTICS
Theory
Question to answer / Hypothesis to test
Design Research Study
Collect Data
(measurements, observations)
Organize and make sense of the #s
USING STATISTICS!
Depends on our goal:
Describe characteristics Test hypothesis, Make conclusions,
organize, summarize, condense data interpret data, understand relations
DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
Some Definitions:
Construct: Intelligence
abstract, theoretical, hypothetical
can’t observe/measure directly
IQ Vocabulary Achievement
Variable: reflects
construct, but is directly measurable and can differ from subject to subject
(not a constant). Variables can be Discrete or Continuous.
Operational SAT Vocab
WISC Grades
Definition: Test
concrete, measurable
Defines variable by specific operations used to measure it
Types of Variables
Quantitative Qualitative
Measured in amounts Measured in categories
Ht, Wt, Test score Gender, race, diagnosis
Discrete: Continuous:
separate categories infinite values in between
Letter grade GPA
Scales of Measurement
Nominal Scale: Categories, labels, data carry no numerical
value
Ordinal Scale: Rank ordered data, but no information about
distance between ranks
Interval Scale: Degree of distance between scores can be
assessed with standard sized intervals
Ratio Scale: Same as interval scale with an absolute zero
point.
Errors in Hypothesis Testing
Type I Errors
You reject a null hypothesis when you
shouldn’t
You conclude that you have an effect when
you really do not
The alpha level determines the probability of
a Type I Error (hence, called an “alpha error”)
Type II Errors
Failure to reject a false null hypothesis
Sometimes called a “Beta” Error.
Statistical Power
How sensitive is a test to detecting real
effects?
A powerful test decreases the chances of
making a Type II Error
Ways of Increasing Power:
Increase sample size
Make alpha level less conservative
Use one-tailed versus a two-tailed test
Assumptions of Parametric
Hypothesis Tests (z, t, anova)
Random sampling or random assignment
was used
Independent Observations
Variability is not changed by experimental
treatment (homogeneity of variance)
Distribution of Sample Means is normal
Measuring Effect Size
Statistical significance alone does not imply a substantial
effect; just one larger than chance
Cohen’s d is the most common technique for assessing
effect size
Cohen’s d = Difference between the means divided by
the population standard deviation.
d > .8 means a large effect!
Introduction to the t Statistic
Since we usually do not know the population variance,
we must use the sample variance to estimate the
standard error
Remember? S2 = SS/n-1 = SS/df
Estimated Standard Error = SM = √S2/n
t = M – μ0/SM
Differences between the distribution of
the t statistic and the normal curve
t is only normally distributed when n is very
large. Why?
The more statistics you have in a formula, the more sources of
sampling fluctuation you will have.
M is the only statistic in the z formula, so z will be normal
whenever the distribution of sample means is normal
In “t” you have things fluctuating in both the numerator and the
denominator
Thus, there are as many different t distributions as there are
possible sample sizes. You have to know the degrees of
freedom (df) to know which distribution of t to use in a problem.
All t distributions are unimodal and symmetrical around zero.
Comparing Differences between
Means with t Tests
There are two kinds of t tests:
t Tests for Independent Samples
Also known as a “Between-Subjects” Design
Two totally different groups of subjects are compared;
randomly assigned if an experiment
t Tests for related Samples
Also known as a “Repeated Measures” or “Within-Subjects”
or “Paired Samples” or “Matched Groups” Design
A group of subjects is compared to themselves in a different
condition
Each individual in one sample is matched to a specific
individual in the other sample
Paired Sample T-Test
• The paired sample t-test, sometimes called the dependent
sample t-test, is a statistical procedure used to determine
whether the mean difference between two sets of observations
is zero. In a paired sample t-test, each subject or entity is
measured twice, resulting in pairs of observations. Common
applications of the paired sample t-test include case-control
studies or repeated-measures designs. Suppose you are
interested in evaluating the effectiveness of a company training
program. One approach you might consider would be to
measure the performance of a sample of employees before and
after completing the program, and analyze the differences using
a paired sample t-test.
Paired Sample T-Test
• The paired sample t-test, sometimes called the dependent
sample t-test, is a statistical procedure used to determine
whether the mean difference between two sets of observations
is zero. In a paired sample t-test, each subject or entity is
measured twice, resulting in pairs of observations. Common
applications of the paired sample t-test include case-control
studies or repeated-measures designs. Suppose you are
interested in evaluating the effectiveness of a company training
program. One approach you might consider would be to
measure the performance of a sample of employees before and
after completing the program, and analyze the differences using
a paired sample t-test.
Independent T-Test
•The Independent Samples t
Test compares the means of
two independent groups in order to
determine whether there is statistical
evidence that the associated
population means are significantly
different. The Independent Samples t
Test is a parametric test.
•When using a two-tailed test, regardless
of the direction of the relationship you
hypothesize, you are testing for the
possibility of the relationship in both
directions. For example, we may wish to
compare the mean of a sample to a
given value x using a t-test. Our null
hypothesis is that the mean is equal to
x. A two-tailed test will test both if the
mean is significantly greater than x and
if the mean significantly less than x.
A one-tailed test will test
either if the mean is
significantly greater than x
or if the mean is
significantly less than x, but
not both.
Advantages of Independent Sample
Designs
Independent Designs have no carryover effects
Independent designs do not suffer from fatigue or
practice effects
You do not have to worry about getting people to
show up more than once
Demand characteristics may be stronger in repeated
measure studies than in independent designs
Since more individuals participate in independent
design studies, the results may be more
generalizeable
Disadvantages of Independent
Sample Designs
Usually requires more subjects (larger n)
The effect of a variable cannot be assessed for each
individual, but only for groups as a whole
There will be more individual differences between
groups, resulting in more variability
Advantages of Paired-Sample
Designs
Requires fewer subjects
Reduces variability/more statistically efficient
Good for measuring changes over time
Eliminates problems caused by individual
differences
Effects of variables can be assessed for each
individual
Disadvantages of Paired Sample
Designs
Carryover effects (2nd measure influenced by 1st
measure)
Progressive Error (Fatigue, practice effects)
Counterbalancing is a way of controlling carryover and practice
effects
Getting people to show up more than once
Demand characteristics may be stronger
What is really going on with t Tests?
Essentially the difference between the means of the two
groups is being compared to the estimated standard
error.
t = difference between group means/estimated standard
error
t = variability due to chance + independent
variable/variability due to chance alone
The t distribution is the sampling distribution of
differences between sample means. (comparing
obtained difference to standard error of differences)
Assumptions underlying t Tests
Observations are independent of each other (except
between paired scores in paired designs)
Homogeneity of Variance
Samples drawn from a normally distributed
population
At least interval level numerical data
Analysis of Variance (anova)
Use when comparing the differences between means
from more than two groups
The independent variable is known as a “Factor”
The different conditions of this variable are known as
“levels”
Can be used with independent groups
Completely randomized single factor anova
Can be used with paired groups
Repeated measures anova
The F Ratio (anova)
F = variance between groups/variance within groups
F = Treatment Effect + Differences due to
chance/Differences due to chance
F = Variance among sample means/variance due to
chance or error
The denominator of the F Ratio is known as the “error
term”
Evaluation of the F Ratio
Obtained F is compared with a critical value
If you get a significant F, all it tells you is that at
least one of the means is different from one of
the others
To figure out exactly where the differences are,
you must use Multiple Comparison Tests
Multiple Comparison Tests
The issue of “Experimentwise Error”
Results from an accumulation of “per comparison
errors”
Planned Comparisons
Can be done with t tests (must be few in number)
Unplanned Comparisons (Post Hoc tests)
Protect against experimentwise error
Examples:
Tukey’s HSD Test
The Scheffe Test
Fisher’s LSD Test
Newman-Keuls Test
Measuring Effect Size in Anova
Most common technique is “r2”
Tells you what percent of the variance is due
to the treatment
r2 = SS between groups/SS total
Single Factor Anova
(One-Way Anova)
Can be Independent Measures
Can be Repeated Measures
Do you know
population SD?
Yes
No
Use Z
Test Are there only
2 groups to
Compare?
No -
Yes - More than
Only 2 2
No -
Yes - More than
Only 2 2 Groups
Groups
Use
Do you have ANOVA
Independent data?
If F not If F is
Yes No Significant, Significant,
Retain Null Reject Null
Do you have If F not If F is
Independent data? Significant, Significant,
Retain Null Reject Null
Yes No
Compare Means
With Multiple
Use Comparison Tests
Use Paired
Independent
Sample
Sample
T test
T test
Use
Use Paired
Independent
Sample
Sample
T test
T test
Is t test
Significant?
Yes No
Compare Reject Retain
means Null Null
Hypothesis Hypothesis
Correlational Method
No manipulation: just observe 2+
variables, then measure relationship
Also called:
Descriptive Non-experimental
Naturalistic Observational
Survey design
Advantages & Disadvantages of
Correlational Methods
ADVANTAGE: Efficient for collecting lots of data in a
short time
ADVANTAGE: Can study problems you cannot study
experimentally
DISADVANTAGE: Leaves Cause-Effect Relationship
Ambiguous
DISADVANTAGE: No control over extraneous variables
The Uses of Correlation
Predicting one variable from another
Validation of Tests
Are test scores correlated with what they say
they measure?
Assessing Reliability
Consistency over time, across raters, etc
Hypothesis Testing
Correlation Coefficients
Can range from -1.0 to +1.0
The DIRECTION of a relationship is indicated by the sign
of the coefficient (i.e., positive vs. negative)
The STRENGTH of the relationship is indicated by how
closely the number approaches -1.0 or +1.0
The size of the correlation coefficient indicates the
degree to which the points on a scatterplot approximate
a straight line
As correlations increase, standard error of estimate gets smaller
& prediction becomes more accurate
The closer the correlation coefficient is to zero, the
weaker the relationship between the variables.
Types of Correlation Coefficients
The Pearson r
Most common correlation
Use with scale data (interval & ratio)
Only detects linear relationships
The coefficient of determination (r2) measures proportion of
variability in one variable accounted for by the other variable.
Used to measure “effect size” in ANOVA
The Spearman Correlation
Use with ordinal level data
Can assess correlations that are not linear
The Point-Biserial Correlation
Use when one variable is scale data but other variable is
nominal/categorical
Problems with Interpreting Pearson’s r
Cannot draw cause-effect conclusions
Restriction of range
Correlations can be misleading if you do not
have the full range of scores
The problem of outliers
Extreme outliers can disrupt correlations,
especially with a small n.
Introduction to Regression
In any scatterplot, there is a line that provides the “best
fit” for the data
This line identifies the “central tendency” of the data and it can
be used to make predictions in the following form:
Y = bx + a
“b” is the slope of the line, and a is the Y intercept (the value of Y when X =
0)
The statistical technique for finding the best fitting line is
called “linear regression,” or “regression”
What defines whether a line is the best fit or not?
The “least squares solution” (finding the line with the smallest
summed squared deviations between the line and data points)
The Standard Error of Estimate
Measure of “average error;” tells you the precision of your
predictions
As correlations increase, standard error of estimate gets smaller
Simple Regression
Discovers the regression line that provides
the best possible prediction (line of best fit)
Tells you if the predictor variable is a
significant predictor
Tells you exactly how much of the
variance the predictor variable accounts
for
Multiple Regression
Gives you an equation that tells you how
well multiple variables predict a target
variable in combination with each other.
Nonparametric Statistics
Used when the assumptions for a
parametric test have not been met:
Data not on an interval or ratio scale
Observations not drawn from a normally
distributed population
Variance in groups being compared is not
homogeneous
Chi-Square test is the most commonly used
when nominal level data is collected