CORRELATION AND
REGRESSION
PANKAJ CHOUDHARY - 07
MANDEEP SINGH - 03
Correlation
Relative position of one variable
correlates with relative distribution
of another variable
linear pattern of relationship
between one variable (x) and
another variable (y) – an
association between two variables
Specific Example
Water
Temperature Consumption
For seven (F) (ounces)
random summer 75 16
days, a person
recorded the 83 20
temperature and 85 25
their water
consumption, 85 27
during a three-hour 92 32
period spent
outside. 97 48
99 48
How would you describe the graph?
Measuring the Relationship
Pearson’s Sample
Correlation Coefficient, r
measures the direction and the
strength of the linear association
between two numerical paired
variables.
Correlations:
Positive NegativeC1 vs C2
C1 vs C2
120.0
20.0
80.0
13.3
C2
C2
6.7 40.0
0.0
0.0 4.0 8.0 12.0
0.0
0.0 83.3 166.7 250.0
C1
C1
Large values of X Large values of X
associated with large associated with small
values of Y, values of Y
small values of X & vice versa
associated with small e.g. SPEED and
values of Y. ACCURACY
e.g. IQ and SAT
Correlation does not imply
causality
Two variables might be associated
because they share a common cause.
For example, SAT scores and College
Grade are highly associated, but
probably not because scoring well on the
SAT causes a student to get high grades
in college.
Being a good student, etc., would be the
common cause of the SATs and the
grades.
Formula
= the sum
n = number of paired
items
xi = input variable yi = output variable
x = x-bar = mean of y = y-bar = mean of
x’s y’s
sx= standard deviation sy= standard
of x’s deviation of y’s
Regression
Regression
Specific statistical methods for
finding the “line of best fit” for one
response (dependent) numerical
variable based on one or more
explanatory (independent)
variables.
Regression: 3 Main Purposes
To describe (or model)
To predict (or estimate)
To control (or administer)
Simple Linear Regression
Statistical method for finding
the “line of best fit”
for one response (dependent)
numerical variable
based on one explanatory
(independent) variable.
Least Squares Regression
GOAL -
minimize the
sum of the
square of
the errors of
the data
points.
This minimizes the Mean Square Error
Example
Plan an outdoor party.
Estimate number of soft drinks to buy
per person, based on how hot the
weather is.
Use Temperature/Water data and
regression.
Steps to Reaching a Solution
Draw a scatterplot of the data.
Steps to Reaching a Solution
Draw a scatterplot of the data.
Visually, consider the strength of the
linear relationship.
Steps to Reaching a Solution
Draw a scatterplot of the data.
Visually, consider the strength of the
linear relationship.
If the relationship appears relatively
strong, find the correlation coefficient
as a numerical verification.
Steps to Reaching a Solution
Draw a scatterplot of the data.
Visually, consider the strength of the
linear relationship.
If the relationship appears relatively
strong, find the correlation coefficient
as a numerical verification.
If the correlation is still relatively
strong, then find the simple linear
regression line.
(CONT.S)
Learn to Use the for Correlation
and Regression.
Interpret the Results (in the
Context of the Problem).
Finding the Solution
Example
Water
Temperature Consumption
(F) (ounces)
75 16
83 20
85 25
85 27
92 32
97 48
99 48
THANKS