0% found this document useful (0 votes)

31 views26 pages

Module 3 - SimpleLinearRegression - Afterclass1b

This document provides an overview of using linear regression to predict wine quality. It describes how wine professor Orley Ashenfelter used variables like weather conditions (average growing season temperature and rainfall), wine age, and French population to build a linear regression model for predicting wine prices from 1952-1978, which served as a proxy for wine quality. The document defines key concepts like the regression function, intercept, slope, residuals, and the ordinary least squares criterion used to estimate the regression model coefficients.

Uploaded by

Vanessa Wong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views26 pages

Module 3 - SimpleLinearRegression - Afterclass1b

Uploaded by

Vanessa Wong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

IIMT 2641 Introduction to Business Analytics

Module 3: Linear Regression

Topic 1: Simple Linear Regression

1
Bordeaux wine

§ Large differences in price and quality between years, although wine is

produced in a similar way
§ Meant to be aged, so hard to tell if wine will be good when it is on the
market
§ Expert tasters predict which ones will be good
§ Can analytics be used to come up with a different system for judging wine?
3
Predicting the quality of wine

§ March 1990 - Orley Ashenfelter, a Princeton economics professor, claims

he can predict wine quality without tasting the wine

4
Building a model

§ Ashenfelter used a method called linear regression

– Predicts an outcome variable, or dependent variable
– Predicts using a set of independent variables

5
Building a model
§ Dependent variable:
– Typical price in 1990-1991 wine auctions (approximates quality)
– Conduct logarithmic transformation
q A better linear fit

§ Independent variables:
– Age of wine (in 1990)
q Older wines are more expensive
– Weather
q Average Growing Season Temperature (AGST)
q Harvest Rain
q Winter Rain
– Population of France

6
The wine data (1952 - 1978)

8
The wine data (1952 - 1978)

Quick Question: What is the relationship between harvest rain, average

growing season temperature, and wine prices?
9
Baseline model (?)

10
Baseline model (Take the mean)
ne
y .....
In

&
11
One-Variable Linear Regression
me

e ⑧
12
Simple Regression Model
The population model of y with one predictor variable x is:
-

-
-
! =# +# %+ε
! "

I
-

↑
-

§ y is the dependent variable (DV) Bl

§ x is the independent variable (IV) Pr

§ Regression Function
§ E[Y|x] = -
-
!
- ! + !" # is the mean of Y given x
-

§ !! is the y-intercept (value of E[Y|0] when x=0)

e n

§ !" is the slope for x, which is the change in E[Y|x] for a unit increase&
in x
-

§ Random errors e (not required)

§ Random errors are a random sample from $ 0, '#

-
Random samples are i.i.d. or
independent and identically
-

Each observation has a random error

§ The output does not show these, but it does estimate se distributed random variables
§ The random errors e and IV (X) are uncorrelated
§ These assumptions are important for effective business analytics
-

13
Estimated Regression Function
§ Estimates the regression model with n observations (xi,yi) for i = 1, …, n
-

§ The estimated or predicted value of y given x is:

&
"! = $ ! + $" &
44

&
§ '! is the sample estimate of the population intercept #!-

§ '" is the sample estimate of the population slope #"

'! and '" are sample statistics

)
(similar to *)
and have sampling distributions

14
One-Variable Linear Regression

brtb i)
,

15
Data and Predicted Values
§ What is the observed y when x = 1?

YG =

§ What is the predicted y when x = 1?

4 =

§ What is the observed y when x = 4?

§ What is the predicted y when x = 4?

16
Data and Predicted Values
§ What is the observed y when x = 1?

y=6

§ What is the predicted y when x = 1?

!=1+(2)(1)
, =3

§ What is the observed y when x = 4?

y=4

§ What is the predicted y when x = 4?

!=1+(2)(4)
, =9

17
Estimated Model and-
Residuals
e

§ Residuals are the difference between the observed values of ! and

predicted values of !,
-

– y - $# 8
! =-=
– Each observation has one observed y, one predicted $,
# and one residual r.
§ The residuals are errors between the observed and predicted values.

"!$
## = "# − "!#
y1
"!" #$ = "$ − "!$
"!#
#! = "! − "!!

y4
"!! y2
#" = "" − "!"

18
Computing Residuals

↓ r4
r1 r2

j
§ What is the residual r2 at x = 2? 1

y Y
=
-

§ What is the residual r3 at x = 3?

19
Computing Residuals

r3
r4
r1 r2

r -

§ What is the residual r2 at x = 2?

-# = !# − !,# = 3 − 1 + 2 ∗ 2 = 3 − 5 = −2

§ What is the residual r3 at x = 3?

-$ = !$ − !,$ = 11 − 1 + 2 ∗ 3 = 11 − 7 = 4

20
Ordinary Least Squares Criterion or (OLS)
The least squares line finds the estimates '! and '" of the coefficients to
minimize the sums-of-squares error for a sample {(xi,yi)} with n observations:
W I
-
667 = !% − !,% # ∑'%&"
!,% = '! + '" %% for < = 1, … , ? SSE

Why squared?
↑ min
The sum of residuals could be zero.
-> '
667('! , '" ) = A !% − '! − '" %% #
∑'%&" %% − %̅ !% − !)
D
-

%&" '" =
B667('! , '" ) ∑'%&" %% − %̅ #

3
= 0
B'! C
'! = !) − '" %̅
B667('! , '" ) %:̅ sample average of independent variable
-

= 0 -

B'" ) sample average of dependent variable

!:
- -
- -

Do not need to memorize.

21
Estimate a linear model H0: AGST Coefficient = 0 versus HA: AGST
(One Variable ) Coefficient ≠ 0
-

O0
Estimated Standard Errors t-score = (Estimated Coefficient – 0)/(Standard Error)
intercept and for estimated
slope coefficients
Two-Tail Test: p-value = 2*P(T<-|t-score|)
V

-
Coefficient of Determination: R-Squared

23
One-Variable Linear Regression

, -3.4178 + 0.6351*AGST
!=
24
Estimate a linear model
(One Variable )
• Estimated model for price:
D
, -3.4178 + 0.6351*AGST
!=
• The predicted LogPrice increases by
-

-
0.6351 for every 1 degree increase in
-

average growing season temperature.

• If AGST = 15, then !=, ?
• If AGST = 18, then !=
, ?

O • If AGST = 20, then !=

, ?
i

25
T-Tests for the Coefficients: H0: bj = 0 versus HA: bj ≠ 0
& -

Two-Tail Test for the Slope

(Very important. Can you predict Y from X?)
H0: b1 = 0 versus HA: b1 ≠ 0
• t-score = (coefficient – 0)/(std.error)
-
• t-score = (0.6351-0)/0.1509 = 4.208
• p_value = 2*P(T < -|4.208|)

↓- =2*t.dist(-4.208, 23, 1) < .001

• df = n-1-#IV = df Error under Sum of squares
-

O ⑧
e⑪
men
-

df = 23

⑰
-

& A
n
-

1 -

#IF

I
st ↑

I
25 1 1 23
=

- -
=

4 208
. ,
of =

23 .

0 .
001 <P rate -
<0 .

01 **

P([s> 200) xP(ts -3 76) 0 001

4 2 <
.

<
2 x
-
.
.

26
How well the model fits data
§ The simplest commonly used measure of fit is R# (the coefficient of
determination): R# = 1 − SSE/SST
-
-

– SSE = ∑&$%" y$ − y. $ ' : sum of squared errors - *

q Variation of Y that cannot be explained by the regression R 1

– SST = ∑&$%" y$ − y0 ' : total sum of squares v
1-
e

q Total amount of variation of Y around its mean

q “Error” generated by a baseline model without any inputs
-

– Decomposition of variation of Y:
SSF

&
SSE =

∑'%&" !% − !) # = ∑'%&" !% − !,% # + ∑'%&" !,% − !) #

7
q

-
-> Total variation Unexplained variation Explained variation
1- SE
s =
-

SSE *
SS7

R# = Proportion of the variance in DV is explained by the

regression model.

27
Coefficient of Determination: R-Squared
• R-Squared is a measure of fit
• Bigger R-Squared indicates better fit all
else being equal
• 43.5% of the variation of prices is
explained by the simple regression on
AGST

-
• 0 < R-Squared < 1

28
Use each variable on its own
§ R# =0.44 using Average growing season temperature (Variable
-
Significant, 0.001)
R# =0.32 using Harvest rain (Variable Significant, 0.01)

↓
§
§ R# =0.22 using France Population (Variable Significant, 0.05)
§ R# =0.20 using Age (Variable Significant, 0.05)
§ R# =0.02 using Winter rain (Not Significant)
§ Multivariate linear regression allows us to use more than one
variable to potentially improve our predictive ability.

Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
2023 Statistics Fin 11
No ratings yet
2023 Statistics Fin 11
19 pages
Module 3 - MultipleLinearRegression - Afterclass1b
No ratings yet
Module 3 - MultipleLinearRegression - Afterclass1b
34 pages
Linear Regression Essentials
No ratings yet
Linear Regression Essentials
14 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Regression Analysis Guide
100% (1)
Regression Analysis Guide
280 pages
Regression
No ratings yet
Regression
56 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
50 pages
Simple Lin Regress Inference
No ratings yet
Simple Lin Regress Inference
51 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
Residual Analysis For Simple Linear Regression: X B B y N e N e
No ratings yet
Residual Analysis For Simple Linear Regression: X B B y N e N e
15 pages
9 W9INSE6220 Fall 2023
No ratings yet
9 W9INSE6220 Fall 2023
42 pages
292322356
No ratings yet
292322356
69 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
C6 Regression
No ratings yet
C6 Regression
27 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Eonometrics For Acct and Finance CH 3 2023
No ratings yet
Eonometrics For Acct and Finance CH 3 2023
12 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
Simple Linear Regression1
No ratings yet
Simple Linear Regression1
36 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Stat 302 Lec 12
No ratings yet
Stat 302 Lec 12
59 pages
Chapter 7 - New 1
No ratings yet
Chapter 7 - New 1
29 pages
Chapter 2 Simple Linear Regression - Jan2023
No ratings yet
Chapter 2 Simple Linear Regression - Jan2023
66 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
46 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
23 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
No ratings yet
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
21 pages
Classical LinearReg 000
No ratings yet
Classical LinearReg 000
41 pages
Linear Regression for Academics
No ratings yet
Linear Regression for Academics
28 pages
Uttam Linear Regression 17march24
No ratings yet
Uttam Linear Regression 17march24
82 pages
ch12 0
No ratings yet
ch12 0
43 pages
AI Lec23
No ratings yet
AI Lec23
36 pages
Simple Regression
100% (1)
Simple Regression
50 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
34 pages
COMM5005 Lecture 8
No ratings yet
COMM5005 Lecture 8
54 pages
Time Series Montg Notes
No ratings yet
Time Series Montg Notes
7 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Statistics Week3
No ratings yet
Statistics Week3
19 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Linear Regression Lecture Notes
100% (2)
Linear Regression Lecture Notes
228 pages
15.simple Linear Regression-530
No ratings yet
15.simple Linear Regression-530
54 pages
03 Regression
No ratings yet
03 Regression
33 pages
7 CSS Transitions and Actions, Languages
No ratings yet
7 CSS Transitions and Actions, Languages
20 pages
Lab Safety, Hazard Warning Labels
No ratings yet
Lab Safety, Hazard Warning Labels
8 pages
Module 2 - RV - Afterclass
No ratings yet
Module 2 - RV - Afterclass
44 pages
Module 2 - Sample - Afterclass
No ratings yet
Module 2 - Sample - Afterclass
36 pages
Module 6 - CART - Inclassb
No ratings yet
Module 6 - CART - Inclassb
50 pages
Beam Design Wizard Guide
No ratings yet
Beam Design Wizard Guide
2 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
3 pages
COMSATS University Islamabad, Wah Campus: FINAL Examinations Fall-2020
No ratings yet
COMSATS University Islamabad, Wah Campus: FINAL Examinations Fall-2020
2 pages
Yozolog
No ratings yet
Yozolog
6 pages
FI08 06 Asset - Year End Process
No ratings yet
FI08 06 Asset - Year End Process
38 pages
Cocos Creator
100% (2)
Cocos Creator
296 pages
Industrial Dehumidifier Specs
100% (1)
Industrial Dehumidifier Specs
4 pages
HP Laptop Comparison Guide
No ratings yet
HP Laptop Comparison Guide
2 pages
PP - Master Recipe Mapping Template Fields
No ratings yet
PP - Master Recipe Mapping Template Fields
13 pages
Derivatives: A Risk Management Guide
No ratings yet
Derivatives: A Risk Management Guide
30 pages
Grade 10 SSIP - TERM 2 2024 Learner Notes
100% (1)
Grade 10 SSIP - TERM 2 2024 Learner Notes
49 pages
S DynamicSimulation Petrofac
No ratings yet
S DynamicSimulation Petrofac
3 pages
Edu Cat en LMG FF v5r19 Toprint
No ratings yet
Edu Cat en LMG FF v5r19 Toprint
132 pages
Aspiring Research Manager's Profile
No ratings yet
Aspiring Research Manager's Profile
1 page
Platform Switching Implants Overview
No ratings yet
Platform Switching Implants Overview
4 pages
Eclipse Control Flow Graph Plugin
No ratings yet
Eclipse Control Flow Graph Plugin
31 pages
CSS 6
No ratings yet
CSS 6
9 pages
A Sum of 360 Degrees 2
No ratings yet
A Sum of 360 Degrees 2
10 pages
Operating Instructin SPS Step 7 en
No ratings yet
Operating Instructin SPS Step 7 en
39 pages
Swati Epan
No ratings yet
Swati Epan
1 page
Catalyser: Magnetic Effects of Electric Current Faculty Sheet Solution
No ratings yet
Catalyser: Magnetic Effects of Electric Current Faculty Sheet Solution
4 pages
Practicle Java Sem 1 - Vikalp Sharma
No ratings yet
Practicle Java Sem 1 - Vikalp Sharma
3 pages
4TH Quarterly Exam Gen Phys2 - Student's
No ratings yet
4TH Quarterly Exam Gen Phys2 - Student's
5 pages
Cee Syllabus
No ratings yet
Cee Syllabus
5 pages
Notes Functions in Python 2022 23
No ratings yet
Notes Functions in Python 2022 23
26 pages
Circle The Correct Answer On The Answer Sheet.: Multiple Choice Questions No. 1-15 (Each Number 0.5 Point)
100% (1)
Circle The Correct Answer On The Answer Sheet.: Multiple Choice Questions No. 1-15 (Each Number 0.5 Point)
8 pages
Completed BOM Template (SMS)
No ratings yet
Completed BOM Template (SMS)
9 pages
Laser Therapy in Veterinary Acupuncture
No ratings yet
Laser Therapy in Veterinary Acupuncture
10 pages
Icm20608 G RM 000030 v1.0
No ratings yet
Icm20608 G RM 000030 v1.0
23 pages
8 - Atoms and Nuclei PDF
67% (3)
8 - Atoms and Nuclei PDF
25 pages

Module 3 - SimpleLinearRegression - Afterclass1b

Uploaded by

Module 3 - SimpleLinearRegression - Afterclass1b

Uploaded by

IIMT 2641 Introduction to Business Analytics

Module 3: Linear Regression

§ Large differences in price and quality between years, although wine is

§ March 1990 - Orley Ashenfelter, a Princeton economics professor, claims

§ Ashenfelter used a method called linear regression

Quick Question: What is the relationship between harvest rain, average

§ y is the dependent variable (DV) Bl

§ x is the independent variable (IV) Pr

§ !! is the y-intercept (value of E[Y|0] when x=0)

§ Random errors e (not required)

§ Random errors are a random sample from $ 0, '#

Each observation has a random error

§ The estimated or predicted value of y given x is:

§ '" is the sample estimate of the population slope #"

'! and '" are sample statistics

§ What is the predicted y when x = 1?

§ What is the observed y when x = 4?

§ What is the predicted y when x = 4?

§ What is the predicted y when x = 1?

§ What is the observed y when x = 4?

§ What is the predicted y when x = 4?

§ Residuals are the difference between the observed values of ! and

§ What is the residual r3 at x = 3?

§ What is the residual r2 at x = 2?

§ What is the residual r3 at x = 3?

B'" ) sample average of dependent variable

Do not need to memorize.

average growing season temperature.

O • If AGST = 20, then !=

Two-Tail Test for the Slope

↓- =2*t.dist(-4.208, 23, 1) < .001

P([s> 200) xP(ts -3 76) 0 001

– SSE = ∑&$%" y$ − y. $ ' : sum of squared errors - *

q Variation of Y that cannot be explained by the regression R 1

q Total amount of variation of Y around its mean

∑'%&" !% − !) # = ∑'%&" !% − !,% # + ∑'%&" !,% − !) #

R# = Proportion of the variance in DV is explained by the

You might also like