REGRESSION
AND
CORRELATION ANALYSIS
REGRESSION ANALYSIS
• Regression analysis is used to investigate and model the
relationship between a response variable and one or more
predictors.
• Very often when 2 (or more) variables are observed,
relationship between them can be visualized
• Regression analysis is used to help formulate these
predictions and relationships
REGRESSION ANALYSIS
Observe and note what is happening in a
systematic way
Form some kind of theory about the observed
facts
Draw a scatter diagram to visualize relationship
Generate the relationship by mathematical formula
Make use of the mathematical formula to predict
REGRESSION ANALYSIS
• Linear Regression is to fit a line to the data,
producing an equation that shows the
relationship of the data, so that we might
predict one variable by measuring the other
variable.
Variation of Regression
Linear Regression
Where:
Y = Predictor Y = α + βX
X = Response
α = Intercept
β = Slope’s Gradient
Variation of Regression
Second order Regression
Where:
Y = Predictor
X = Response
α = Intercept
Y = α + β1X + β2X2
β2 = Coefficient
Variation of Regression
Third order Regression
Where:
Y = Predictor
X = Response
Y = α + β1X + β2X2 + β3X3
α = Intercept
β3 = Coefficient
Least Squares Method
• From a scatter diagram, there is virtually no limit as to the
number of lines that can be drawn to make a linear
relationship between the 2 variables
• The objective is to create a BEST FIT line to the data
concerned
• The criterion is the called the method of least squares
Least Squares Method
• The sum of squares of the vertical deviations
from the points to the line be a minimum (based
on the fact that the dependent variable is drawn on
the vertical axis)
• The linear relationship between the dependent
variable (Y) and the independent variable can be
written as Y = α + βX, where α and β are
parameters describing the vertical intercept and
the slope of the regression line respectively
Least Squares Method
Vertical Deviation
CLICK FOR SAMPLE EXERCISE
The coefficient of multiple determination (r2)
In a regression analysis, one way to measure how
well a straight line fits the data is to compute the
square of the correlation r2 . This statistic is
interpreted as the proportion of total variation in
the data explained by the straight-line relationship
with the explanatory variable.
The coefficient of multiple determination (r2)
An r2 value “close” to 1 is often taken
as evidence that the predictions made
using the model are going to be
adequate.
0 ≤ r2 ≤ 1
Correlation – Definition
Correlation calculates the Pearson product moment
coefficient of correlation (also called the correlation
coefficient or correlation) for pairs of variables. The
correlation coefficient is a measure of the degree of
linear relationship between two variables.
Variation of Correlation
No pattern. Data points are Negative correlation. Larger
scattered randomly in the values of one variable (input)
chart. associated with smaller
values of other variable
(effect).
Positive correlation. Larger Complex pattern. This often
values of one variable occur when there is come
(cause) associated with other factor at work that
larger values of other variable interact one of the factors.
(effect).
Suppose we wished to graph the relationship between
foot length and height of 20 subjects.
In order to create the graph, which is called a
scatterplot or scattergram, we need the foot length
and height for each of our subjects.
74
72
70
Height
68
66
64
62
60
58
4 6 8 10 12 14
Foot Length
1. Find 12 inches on the x-axis.
2. Find 70 inches on the y-axis.
3. Locate the
Assume
intersection
our first
of subject
12 and 70.
had a 12
4. Place a dot
inchatfoot
the and
intersection of 12 and
was 70 inches tall.70.
74
72
70
Height
68
66
64
62
60
58
4 6 8 10 12 14
Foot Length
5. Find 8 inches on the x-axis.
6. Find 62 inches on the y-axis.
Assume
7. Locate the that our
intersection of second subject
8 and 62.
hadatan
8. Place a dot the8 inch foot andofwas
intersection 62 62.
8 and
9. Continueinches
to plot tall.
points for each pair of scores.
74
72
70
68
66
64
62
60
58
4 6 8 10 12 14
Notice how the scores cluster to form a pattern.
The more closely they cluster to a line that is drawn
through them, the stronger the linear relationship between
the two variables is (in this case foot length and height).
74
72
70
68
66
64
62
60
58
4 6 8 10 12 14
Pearson's correlation coefficient ( r )
Measures the degree of linear relationship between
two variables. The correlation coefficient assumes
a value between -1 and +1. If one variable tends to
increase as the other decreases, the correlation
coefficient is negative. Conversely, if the two
variables tend to increase together the correlation
coefficient is positive.
If the points on the scatterplot 74
72
have an upward movement 70
68
from left to right, we say the 66
relationship between the 64
62
variables is positive. 60
58
4 6 8 10 12 14
74 If the points on the
72
70
scatterplot have a
68 downward movement from
66
64 left to right, we say the
62
60
relationship between the
58
4 6 8 10 12 14
variables is negative.
A positive relationship means that high scores on one
variable are associated with high scores on the other
variable
It also indicates that low scores on one variable
are associated with low scores on the other variable.
74
72
70
68
66
64
62
60
58
4 6 8 10 12 14
A negative relationship means that high scores on one
variable are associated with low scores on the other variable.
It also indicates that low scores on one variable
are associated with high scores on the other variable.
74
72
70
68
66
64
62
60
58
4 6 8 10 12 14
Not only do relationships have direction (positive and
negative), they also have strength (from 0.00 to 1.00 and
from 0.00 to –1.00).
The more closely the points cluster toward a straight line,
the stronger the relationship is.
r = 1.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
A set of scores with r= –0.60 has the same strength as
a set of scores with r= 0.60 because both sets cluster
similarly.
For this unit, we use Pearson’s r. This statistical
procedure can only be used when BOTH variables are
measured on a continuous scale and you wish to measure
a linear relationship.
NO
Pearson r
Linear Relationship Curvilinear Relationship