[go: up one dir, main page]

0% found this document useful (0 votes)
22 views23 pages

Chapter 9-Correlation and Regression

This document discusses Simple Linear Regression, explaining how it models the relationship between a dependent variable (Y) and an independent variable (X) using a linear equation. It provides examples of calculating regression coefficients, correlation coefficients, and the coefficient of determination, illustrating the process with data on hours studied and quiz grades. The document emphasizes the importance of regression analysis for predicting outcomes and understanding relationships between variables.

Uploaded by

danar.talabani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views23 pages

Chapter 9-Correlation and Regression

This document discusses Simple Linear Regression, explaining how it models the relationship between a dependent variable (Y) and an independent variable (X) using a linear equation. It provides examples of calculating regression coefficients, correlation coefficients, and the coefficient of determination, illustrating the process with data on hours studied and quiz grades. The document emphasizes the importance of regression analysis for predicting outcomes and understanding relationships between variables.

Uploaded by

danar.talabani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Chapter 9

Correlation and Regression


Simple Linear Regression

24 January 2024

24 January 2024 Statistics - Spring Semester 2023-2024


24 January 2024 Statistics - Spring Semester 2023-2024
24 January 2024 Statistics - Spring Semester 2023-2024
24 January 2024 Statistics - Spring Semester 2023-2024
24 January 2024 Statistics - Spring Semester 2023-2024
24 January 2024 Statistics - Spring Semester 2023-2024
24 January 2024 Statistics - Spring Semester 2023-2024
Regression

• Using regression analysis, we can derive an equation by which the


dependent variable (Y) is expressed (and estimated) in terms of
its relationship with the independent variable (X).
• In simple regression, there is only one independent variable (X) and
one dependent variable (Y). The dependent variable is the
outcome we are trying to predict.

• In multiple regression, there are several independent variables (X1,


X2, … ), and still only one dependent variable, Y. We are trying to use
the X variables to predict the Y variable.

• We will be studying Simple Linear Regression, in which we


investigate whether X and Y have a linear relationship.

24 January 2024 Statistics - Spring Semester 2023-2024


A Simple Regression Example

• A researcher wishes to determine the relationship between X and Y. It is


important to know which is the dependent variable, the one you are
trying to predict. Presumably, hours studied affects grades on a quiz, not
the other way around. The data:

X (hours) Y (Grade on quiz)


1 40
2 50
3 60
4 70
5 80

24 January 2024 Statistics - Spring Semester 2023-2024


• If X and Y have a linear relationship, and you would like to plot the line,
what would it look like?
• Take a moment and try it. Note that the data points fall on a straight
line.
• Now, add another point: If X=6, then Y= ?
• We see that as X changes by 1, Y changes by 10. That is, the slope, called
b1 is equal to 10.
• b1 = ∆Y / ∆X = 10.
• The Y-intercept, or the value of Y when X=0, called b0 is equal to 30.
• When X = 0, then Y= 30. Someone who studies 0 hours for the quiz,
should expect a grade of 30.
• Then, every additional hour studied adds 10 points to the score on the
quiz.
• The equation for the plot of this data is: Ŷ = 30 + 10X

24 January 2024 Statistics - Spring Semester 2023-2024


• If you plot the above X and Y, you will find that all the points are
on a straight line. This, of course, means that we have a perfect
relationship between X and Y and all the points are on the line.
• From studying correlation you know that for this data r=+1 and
R2=100%.
90

80

70

60

50

40

30

20

10

0
0 1 2 3 4 5 6

24 January 2024 Statistics - Spring Semester 2023-2024


Simple Linear Regression
• In general, the simple linear regression equation is:
Ŷi = b0 + b1Xi

• Why do we need regression in addition to correlation?


• To predict a Y for a new value of X
• To answer questions regarding the slope. For example,
• With additional shelf space (X), what effect will there be on sales
(Y)?
• If we raise prices by a particular amount or percentage, will it
cause sales to drop?
• It makes the scatter plot a better display (graph) of the data if we
can plot a line through it. It presents much more information on
the diagram.
.

• In correlation, on the other hand, we just want to know if two variables


are related. This is used a lot in social science research.
24 January 2024 Statistics - Spring Semester 2023-2024
• The regression equation Ŷi = b0 + b1Xi is a sample estimator of the true
population regression equation, which we could build were we to take a
census:

• Yi = β0 + β1Xi + εi
where,
β0 = true Y intercept for the population
β1 = true slope for the population
εi = random error in Y for observation i

• In regression analysis, we hypothesize that there is a true regression line


for the population. The b0 and b1 coefficients are estimates of the true
population coefficients, β0 and β1.

24 January 2024 Statistics - Spring Semester 2023-2024


Ŷi is a point on the regression line
Yi is an individual data value

• The deviations of the individual observations (the points) from the


regression line, (Yi - Ŷi), the residuals, are denoted by ei where ei = (Yi - Ŷi).

• Some deviations are positive (for the points above the line); some are
negative (for the points below the line). If a point is on the line, its
deviation = 0. Note that the Σei = 0.
24 January 2024 Statistics - Spring Semester 2023-2024
• Mathematically, the regression line minimizes the sum of the squared
errors (SSE):
Σei2 = Σ(Yi - Ŷi)2 = Σ[Yi – (b0 + b1Xi)]2
• Regression is called a “least squares” method.
• Taking partial derivatives, we get the “normal equations” that are
used to solve for b0 and b1.
• In regression, the levels of X are considered to be fixed. Y is the
random variable.
• This is why the regression line is called the least squares line. It is the
line that minimizes the sum of squared residuals (SSE).

• In the example below, we see that


most of the points are either above
the line or below the line. Only about
5 points are actually on the line or
touching it.

24 January 2024 Statistics - Spring Semester 2023-2024


Steps in Regression
1- For Xi (independent variable) and Yi (dependent variable),
Calculate:
ΣYi
ΣXi
ΣXiYi
ΣXi2
ΣYi2

2- Calculate the correlation coefficient, r:


nX i Yi − (X i )(Yi )
r=
nX i
2
− (X i )
2
 nY
i
2
− (Yi )
2

-1 ≤ r ≤ 1
[This can be tested for significance. H0: ρ=0. If the correlation is no
then X and Y are not related. You really should not be doing this re
24 January 2024 Statistics - Spring Semester 2023-2024
3- Calculate the coefficient of determination: r2 = (r)2
0 ≤ r2 ≤ 1
This is the proportion of the variation in the dependent variable (Yi) explained by
the independent variable (Xi)

4- Calculate the regression coefficient b1 (the slope):


nX i Yi − (X i )(Yi )
b1 =
nX i2 − (X i )
2

Note that you have already calculated the numerator and the denominator for parts
of r. Other than a single division operation, no new calculations are required.
BTW, r and b1 are related. If a correlation is negative, the slope term must be
negative; a positive slope means a positive correlation.

5- Calculate the regression coefficient b0 (the Y-intercept, or constant):


b0 = Y − b1 X

The Y-intercept (b0) is the predicted value of Y when X = 0.


6- The regression equation (a straight line) is:
Yˆi = b0 + b1Xi
24 January 2024 Statistics - Spring Semester 2023-2024
Example: The shear resistance of soil, y (kN/m2), is determined by x y
measurements as a function of the normal stress, x (kN/m2). The 10 14.1
data are as shown below: 11 15.6
12 16.9
1- Find the simple linear regression model. 13 17.7
2- State the strength and the nature of the relation between x and y. 14 18.3
15 20.0
Solution: step-1 16 21.0
n=10 x y x.y x2 y2 17 21.7
1 10 14.1 141 100 198.81 18 22.6
෍ 𝑥𝑖 = 145 2 11 15.6 171.6 121 243.36 19 24.0
3 12 16.9 202.8 144 285.61
෍ 𝑦𝑖 = 191.9 4 13 17.7 230.1 169 313.29
5 14 18.3 256.2 196 334.89
෍ 𝑥𝑖 . 𝑦𝑖 = 2869.4 6 15 20.0 300 225 400
7 16 21.0 336 256 441
෍ 𝑥𝑖2 = 2185 8 17 21.7 368.9 289 470.89
9 18 22.6 406.8 324 510.76
10 19 24.0 456 361 576
෍ 𝑦𝑖2 = 3774.61
sum 145 191.9 2869.4 2185 3774.61

24 January 2024 Statistics - Spring Semester 2023-2024


2- Calculate the correlation coefficient, r:
nX i Yi − (X i )(Yi )
r=
nX i
2
− (X i )
2
 nY
i
2
− (Yi )
2

-1 ≤ r ≤ 1
[This can be tested for significance. H0: ρ=0. If the correlation is not
then X and Y are not related. You really should not be doing this reg
10 2869.4 − (145)(191.9)
𝑟= = 0.9966
10 2185 − (145)2 10 3774.61 − (191.9)2

Therefore the relation is strong and direct.


3- Calculate the coefficient of determination: r2 = (r)2
0 ≤ r2 ≤ 1
This is the proportion of the variation in the dependent variable (Yi) explained by
r2 = 0.99662 = 0.993
the independent variable (Xi)

4- Calculate the regression coefficient b1 (the slope):


nX i Yi − (X i )(Yi )
24 January 2024 Statistics - Spring Semester 2023-2024

b =
nX Y − (X )(Y )
i i i i
b1 =
X i − (X i ) coefficient b (the slope):
nregression
2 2
4- Calculate the 1
Note that you nXhave
i Yi − (X i )(Yi )
already calculated the numerator and the denominator for parts
b =
− (X i division
)2
1
of r. Other than
nXai2single operation, no new calculations are required.
BTW,that
Note r andyoub1have
are related.
already If a correlation
calculated is negative,and
the numerator thethe
slope term mustfor
denominator be parts
negative;
of r. Other 868.5
a than
positivea slopedivision
single means aoperation,
positive correlation.
no new calculations are required.
𝑏1 = = 1.0527
BTW, r and b825 1 are related. If a correlation is negative, the slope term must be
negative; a positive slope means a positive correlation.
5- Calculate the regression coefficient b0 (the Y-intercept, or constant):
b0 = Y − b1 X
5- Calculate the regression coefficient b0 (the Y-intercept, or constant):
b = Y
191.9
The Y-intercept − b (b
X 0 ) is the 145 value of Y when X = 0.
predicted
𝑏𝑜 0= 1 − 1.0527 = 3.9258
10 10
The Y-intercept (b0) is the predicted value of Y when X = 0.
6- The regression equation (a straight line) is:
Yˆi = b0 + b1Xi

ഥ𝑖 = 3.9258 + 1.0527𝑋𝑖
𝑌
7- [OPTIONAL] Then we can test the regression for statistical significance.

There are 3 ways to do this in simple regression:


(a) t-test for correlation:
24 January 2024 Statistics - Spring Semester 2023-2024

H0: ρ=0
Example: Hours Studied & Grades

Let’s examine the relationship between hours studied and grade on a quiz.
n=7 pairs of data. Highest grade on quiz is a 15.
X ≡ hours studied; Y ≡ grade on quiz.

Xi Yi XiYi Xi2 Yi2


1 5 5 1 25
2 8 16 4 64
3 9 27 9 81
4 10 40 16 100
5 11 55 25 121
6 12 72 36 144
7 14 98 49 196
ΣX= 28 ΣY= 69 ΣXY= 313 ΣX2= 140 ΣY2= 731

24 January 2024 Statistics - Spring Semester 2023-2024


∑X = 28
∑Y = 69
∑XY = 313
∑X2 = 140
∑Y2 = 731

Calculate the correlation coefficient, r:

7(313) − (28)(69) 259 259


r= = = = 0.98
(7(140) − (28) )(7(731) − (69) )
2 2
(196)(356) 264.2

24 January 2024 Statistics - Spring Semester 2023-2024


Calculate the coefficient of determination, R2:
R2 = (0.98)2 = .9604

Calculate the regression coefficient b1 (the slope):

7(313) − (28)(69) 259


b1 = = = 1.3
7(140) − (28) 2
196

Calculate the regression coefficient b0 (the Y-intercept, or


constant):

bo=(69/7)-1.32(28/7)=4.58

• The regression equation:

Yˆi = 4.58 + 1.32 X i

24 January 2024 Statistics - Spring Semester 2023-2024

You might also like