0% found this document useful (0 votes)

39 views21 pages

Chapter 8 Regression Model - 2023

1) Regression models show the relationship between a dependent variable (Y) and one or more independent variables (X) through a fitted equation known as the mathematical model. 2) The goal of regression is to fit a straight line to observed data points using the method of least squares, which minimizes the sum of the squared vertical distances between the observed values (Y) and the fitted values (Ŷ). 3) The intercept and slope of the regression line help explain how the dependent variable changes in relation to the independent variable. The intercept indicates where the line crosses the y-axis, and the slope shows the rate of change of Y with respect to X.

Uploaded by

khang.nguyen1304

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views21 pages

Chapter 8 Regression Model - 2023

Uploaded by

khang.nguyen1304

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

11/2/2023

Chapter 7. Regression Models

Nguyen VP Nguyen, Ph.D.

Department of Industrial & Systems Engineering, HCMUT
Email: nguyennvp@hcmut.edu.vn

Overview
• Linear association implies a straight-line relationship
• Regression is used for a purpose: the regression of Y on
X
• The difference between regression and hypothesis
testing:
 Whereas hypothesis testing is used to test parameters of one or
two populations based on a sample
 regression is used to identify a relationship between two or more
variables.

1
11/2/2023

Regression Models
• A regression model is a simplified or ideal
representation of the real world.
Latin "re-" ("back") plus "-gredior, -gredi, -gressus sum" ("go");
the "-ion" suffix is common for forming nouns.
Thus "regression" literally means "going back".

• A regression model shows how to fit a straight

line to pairs of observations on the two
variables using the method of least squares.
• All scientific inquiry is based to some extent on
models - that is the set of simplifying
assumptions - on which regression is based.

Scatterplots
• Linear means that Y is proportional to X; that a straight line can be drawn to
describe their relationship to one another.
• Which of these plots seem to show a linear relationship?
Scatterplot of Y4 vs X4
Scatterplot of Y vs X
20
140

120

?
10
100

80
0
Y4
Y

40 -10

0 -20

0 2 4 6 8 10 12 14 16
0 2 4 6 8 10 12 14 16
X4
X

a b
Scatterplot of Y3 vs X3 Scatterplot of Y2 vs X2
250 200

200 150

100
150

50
Y3

100

0
50
-50

0
-100
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
X3 X2

4
c d

2
11/2/2023

Regression Models
Mathematical Model (fitted equation) Statistical Model
Yt  b0  b1t The “Hats” Yt  b0  b1t  
indicate
Yt  b0  b1 X t estimated Yt  b0  b1 X t  
numbers
where
Y: dependent variable (DV)
t or X: independent variable (IV)
b0: intercept of the fitted line
b1: slope of the line

Sum of squares of error

• One measure of the error in our model is the
sum of the squares of the errors:

 
2
SSE   Yt  Yt

• The best fit line for Yt or series of Yt is obtained

by minimizing sum of squared vertical
distances from data points to the line
 This is called Least Squares Criterion or
Least Squares Method

    Y  b  b X 
2
Minimize SSE   Yt  Yt
2
t 0 1

 b 0, b 1 6

3
11/2/2023

The Intercept and Slope

• The intercept (or "constant term“ – đoạn chắn, giao điểm)

indicates where the regression line intercepts the vertical axis.
Some people call this a "shift parameter" because it "shifts" the
regression line up or down on the graph.
• The slope (độ dốc/ hệ số góc) indicates how Y changes as X
changes (e.g., if the slope is positive, as X increases, Y also
increases -- if the slope is negative, as X increases, Y
decreases). 7

Example 2 of Mr. Bump’s data

4
11/2/2023

Residual Plots for Selling Level (1000 gallons) ~Y

Normal Probability Plot Versus Fits
99 5.0

90 2.5

Residual
Percent

50 0.0

10 -2.5

1 -5.0
-5.0 -2.5 0.0 2.5 5.0 5 10 15
Residual Fitted Value

Histogram Versus Order

5.0
2.0

1.5 2.5
Frequency

Residual
1.0 0.0

0.5 -2.5

0.0 -5.0
-4 -2 0 2 4 1 2 3 4 5 6 7 8 9 10
Residual Observation Order

Decomposition of Variance
Observation = Fit + Residuals
Y  Y  Y  Y 
minimizing sum
of squared
Total
Variation vertical distances
from data points
to the line
2 Explained Variation in Y
R 
Total Variation in Y

5
11/2/2023

Decomposition of Variance
Observation = Fit + Residuals
Y  Y  Y  Y   Y  Y  Y  Y  Y  Y  

 (Y  Y ) 2  (Y  Y  Y  Y ) 2 
Total
 (Y  Y )   (Y  Y )   (Y  Y )
2 2 2

Sum of SST  SSR  SSE

  df = n-1
2
Squares
SST   Y  Y
Sum of
 
2
Squares of SSR   Y  Y df = 1
Regression

 
Sum of 2
Squares of SSE   Y  Y df = n-2
Errors 11

ANalysis Of Variance or ANOVA table

6
11/2/2023

Standard errors of the estimate

• Used to measure the variability of data points on

the fitted line 𝑌 along 𝑌 direction

• Mean square error vs standard error of the

estimate

Coefficient of Determination

• R2=1: all of the variability in Y is explained when X is

known: The sample data points all lie on the fitted
regression line
• R2=0: none of variability in Y is explained by X

7
11/2/2023

Coefficient of Determinant

Example 6 of Mr. Bump’s data

8
11/2/2023

The PRESS statistic

• PRESS stands for "Prediction Sum of Squares."
 It is cross-validation used in statistical modeling,
particularly in linear regression.
 The PRESS statistic is a measure of how well a
regression model performs in predicting new data
points.
• A lower PRESS value indicates a model with
better predictive ability.
 It is useful for detecting whether a model is overfitting
the data.
 Overfitting occurs when a model is too complex and
starts to capture the random noise in the data, rather
than the underlying relationship.
17

Four Quick Checks (Simple Regression)

1. Does the model make sense (i.e., check slope

term)?
2. Is there a statistically significant relationship
between the dependent and independent
variables (t-test)?
3. What percentage of the variation in the
dependent variable does the regression model
explain (R-Square)?
4. Do the residuals violate assumptions (Analysis
of Residuals)?

9
11/2/2023

First Quick Check

• Does the model make sense?

• Does the slope term go with ours expectation
based on the time series plot?
• Correct model specified (no variables omitted)
• Appropriate model form (e.g., linear)

Second Quick Check

Are the coefficients statistically significant?
• Use a t-test to examine the null hypothesis that
the slope of the true relationship between X and Y
is equal to zero.
• The hypothesis test is: H0: b1=0, H1: b1 0
 if 𝑡 𝑡 : we accept (or fail to reject H0).
͟ Then the regression is not significant at 5% level of significant.
͟ It is saying we believe (or don’t have enough evidence to say
otherwise) that the slope of the line is zero.
͟ If the slope of the line is zero, then x tells us nothing about y.

 if 𝑡 𝑡 or 𝑡 𝑡 : Reject H0, then the

regression is significant at 5% level of significant 22

10
11/2/2023

Second Quick Check

Are the coefficients statistically significant?
Example 8:
.
•𝑡 4.8 𝑡
•𝑡 ,
.
.
2.306 (Check Table 3, page 485/510)
• We have 𝑡 4.8 < - 𝑡 , .  Reject H0

11
11/2/2023

Example 6.2

An alternative test on H0: F-statistic

F-statistic= Ratio of (regression mean square) to

(error mean square) has an F-distribution with df =
1,n-2

12
11/2/2023

An alternative test on H0: F-statistic

In F formula, MSR
H0: b1=0 is true MSR<<MSE is not larger than
MSE

In F formula, MSR
MSR>>MSE is more larger than
MSE

Example 9

• Look for F(1,n-2) at Table 5. p483 we have:

F(1, n-2)=F(1, 8)= 5.32  F of 23.4 > F(1,8) of
F=MSR/MSE=23.4 5.32 ??

13
11/2/2023

Alternative test of H0
• F <= Fdf(1,n-2) Fail to reject H0 
Regression is not significant
• F > Fdf(1,n-2)  Reject H0
Regression is significant
Note: F = t2
Example:
• F=MSR/MSE=23.4 F of 23.4 > F(1,8) of 5.32
Reject H0 at 5% level  the regression is significant
• Check F = t2  23.4 = (‐4.8)2

14
11/2/2023

Third Quick Check

We could reject H0 a
linear relation might
Large sample size, n>>100
exist regression is
even R2 small (<10%)
significant at 5% level of
significant
Small size n and large R2 Regression is significant
(>80%) at 5% level of significant

Need more sample

Very small size n (n<10) and evidence for concluding
large R2 (>80%) “regression is significant
at 5% level of significant”

Fourth Quick Check

1. The errors are normally distributed

2. The underlying relation (Y vs X) is linear
3. The errors have constant variance
4. The errors are independent

15
11/2/2023

Check the assumptions for the model

1 & 2: Histogram of residuals and Normal

probability plot
 Points appear to the straight line a good fit
data to straight line and errors has normal
distributions.
 Bell-shaped curves are expected but with
small data we could see any shapes of
histogram !!!

Histogram of residuals

• The histogram of the residuals shows the

distribution of the residuals for all observations.
Due to the number of intervals used to group the
data, don't use a histogram to assess the
normality of the residuals.
• A histogram is most effective when you have
approximately 20 or more data points. If the
sample is too small, then each bar on the
histogram does not contain enough data points
Pattern What the pattern may indicate
A long tailin one direction Skew ness
A bar thatis far aw ay from the other An outlier 35
bars

16
11/2/2023

Residual Plots for Sales

Normal Probability Plot Versus Fits
99 5.0

90
2.5

Residual
Percent

50 0.0

10 -2.5

1 -5.0
-5.0 -2.5 0.0 2.5 5.0 5 10 15
Residual Fitted Value

Histogram Versus Order

5.0
2.0

1.5 2.5
Frequency

Residual
1.0 0.0

0.5 -2.5

0.0 -5.0
-4 -2 0 2 4 1 2 3 4 5 6 7 8 9 10
Residual Observation Order

Residual Plots for Y11

Normal Probability Plot Versus Fits
99.9 4
99

90 2
Residual
Percent

50 0

10 -2
1
0.1 -4
-5.0 -2.5 0.0 2.5 5.0 60.0 62.5 65.0 67.5 70.0
Residual Fitted Value

Histogram Versus Order

16 4

12 2
Frequency

Residual

8 0

4 -2

0 -4
-3 -2 -1 0 1 2 3 1 10 20 30 40 50 60 70 80 90 100
Residual Observation Order

17
11/2/2023

Check the assumptions for the model

3. The errors have constant variance

Residuals vs fitted values

— Curved relationships- need to transform data in
order to standardize variance
— Increasing the magnitude of fitted values-) not
constant variance-) transform log of Y to X in order to
hold constant variance
— See Page.197 (202/510) Textbook.

Summary: assumptions of regression model

• Several assumptions are needed to fit the
regression model using the method described.
 Errors are uncorrelated random variables with
constant variance
– zero mean
– homoscedastic (constant variance)
– mutually independent (non-autocorrelated)
 If we test hypotheses or create confidence intervals,
we also need the errors to be normally distributed.
 We are assuming that the linear model is correct; that
y does not vary with any higher (or lower) power of x.
• These assumptions need to be checked!
39

18
11/2/2023

Summary: assumptions of regression model

• To check these assumptions, use the following
methods:
 Save residuals when running a regression (we will
check for autocorrelation).
 Scatterplot of data (see if linear model is correct)
 Scatterplot of residuals (a.k.a. “residual analysis,”
checks correlation of residuals, non-constant variance,
normality, and linearity assumptions – use Minitab
“four-in-one” plot)

Pattern What the pattern may indicate

Fanning or uneven spreading of

Nonconstant variance
residuals across fitted values
Curvilinear A missing higher‐order term
A point that is far away from
An outlier
zero
A point that is far away from
the other points in the x‐ An influential point
direction

19
11/2/2023

there are too many outliers, the The variance of the residuals increases
model may not be acceptable. You with the fitted values. Notice that, as the
should try to identify the cause of value of the fits increases, the scatter
any outlier. among the residuals widens. This pattern
indicates that the variances of the
residuals are unequal (nonconstant).

• If you identify any patterns or outliers in your residual

versus fits plot, consider the following solutions:
Issue Possible solution
Consider using Fit Regression Model with a Box‐
Nonconstant variance
Cox transformation or weights.
1.Verify that the observation is not a
measurement error or data‐entry error.
Consider removing data values that are
associated with abnormal, one‐time events
An outlier or influential point
(special causes).
2. Then, repeat the analysis without this
observation to determine how it impacts your
results.

20
11/2/2023

Check the assumptions for the model

4. The errors are independent

Residuals vs order of data

— Check error independency by observing the

graph and calculate rk (k=n/4)

— Check if there is no systematic patterns and

ACF of residuals are uniformly small.

Assigment

STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
Chapter 7 (I) Correlation and Regression Model - Oct21
No ratings yet
Chapter 7 (I) Correlation and Regression Model - Oct21
23 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Chapter 5 Statistics
No ratings yet
Chapter 5 Statistics
47 pages
ISOM2500 Spring 25 - Topic 10 - Linear Regression Interpretation and Diagnosis
No ratings yet
ISOM2500 Spring 25 - Topic 10 - Linear Regression Interpretation and Diagnosis
51 pages
1 - Simple Linear Regression
No ratings yet
1 - Simple Linear Regression
43 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
OLS Assumptions & Issues Guide
No ratings yet
OLS Assumptions & Issues Guide
4 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Stats Notes
No ratings yet
Stats Notes
48 pages
Chapter 14
No ratings yet
Chapter 14
65 pages
Fda Unit 5
No ratings yet
Fda Unit 5
20 pages
Session 19&20
No ratings yet
Session 19&20
54 pages
Lecturer 10 UET
No ratings yet
Lecturer 10 UET
54 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Estimation of Causal Relationships I: Illustration 1
No ratings yet
Estimation of Causal Relationships I: Illustration 1
8 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
Applied Statistics in Construction
No ratings yet
Applied Statistics in Construction
8 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Deck2 BusinessIntelligence M1 ACSA
No ratings yet
Deck2 BusinessIntelligence M1 ACSA
15 pages
Econometrics: Linear Regression Basics
No ratings yet
Econometrics: Linear Regression Basics
52 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Regression
No ratings yet
Regression
24 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
Ra Web
No ratings yet
Ra Web
70 pages
09 Inference For Regression Part1
No ratings yet
09 Inference For Regression Part1
12 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
Regression Analysis
No ratings yet
Regression Analysis
38 pages
F Regression
No ratings yet
F Regression
65 pages
Topic - Chapter 12 - Regression Models
No ratings yet
Topic - Chapter 12 - Regression Models
1 page
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
Bivariate
No ratings yet
Bivariate
28 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
03 - Simple Linear Regression
No ratings yet
03 - Simple Linear Regression
13 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Lecture 10
No ratings yet
Lecture 10
38 pages
Simple Liner Regration
No ratings yet
Simple Liner Regration
45 pages
Lecture3 4
No ratings yet
Lecture3 4
48 pages
Chương - Du Bao Hoi Quy Đon
No ratings yet
Chương - Du Bao Hoi Quy Đon
60 pages
BA Unit3
No ratings yet
BA Unit3
42 pages
Evans - Analytics2e - PPT - 07 and 08
No ratings yet
Evans - Analytics2e - PPT - 07 and 08
49 pages
Regression Lecture Summary
No ratings yet
Regression Lecture Summary
31 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
CFA Level 2 Resume
No ratings yet
CFA Level 2 Resume
38 pages
Reading 1
No ratings yet
Reading 1
49 pages
L1 QM07 High Yield Notes
No ratings yet
L1 QM07 High Yield Notes
4 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Statistics Week3
No ratings yet
Statistics Week3
19 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Lec8 Difference in Difference
No ratings yet
Lec8 Difference in Difference
62 pages
Chapter 3 Binomial Distribution 140
No ratings yet
Chapter 3 Binomial Distribution 140
4 pages
A. Ans: To Find The Future Value of Annuity Ordinary and Annuity Due We Need To
No ratings yet
A. Ans: To Find The Future Value of Annuity Ordinary and Annuity Due We Need To
12 pages
Gauss-Markov Theorem
No ratings yet
Gauss-Markov Theorem
5 pages
Business Statistics and RM: Hamendra Dangi 9968316938
No ratings yet
Business Statistics and RM: Hamendra Dangi 9968316938
21 pages
Virtual COMSATS Inferential Statistics Lecture-32: Ossam Chohan CIIT Abbottabad
No ratings yet
Virtual COMSATS Inferential Statistics Lecture-32: Ossam Chohan CIIT Abbottabad
16 pages
Quantitative Analysis 3
No ratings yet
Quantitative Analysis 3
22 pages
Cost Analysis for Production Dept
No ratings yet
Cost Analysis for Production Dept
6 pages
Machine Learning Model Selection
No ratings yet
Machine Learning Model Selection
7 pages
Tema-3-Econometria-Tema-3 en
No ratings yet
Tema-3-Econometria-Tema-3 en
21 pages
02 Forecasting
No ratings yet
02 Forecasting
9 pages
Chap03 4
No ratings yet
Chap03 4
49 pages
Econometric Methods: Time Series & Regression Analysis
No ratings yet
Econometric Methods: Time Series & Regression Analysis
2 pages
Single Factor Experimental Design Guide
No ratings yet
Single Factor Experimental Design Guide
50 pages
Statistics: Chapter 9: Inferences Based On Two Samples: Confidence Intervals and Tests of Hypotheses
No ratings yet
Statistics: Chapter 9: Inferences Based On Two Samples: Confidence Intervals and Tests of Hypotheses
62 pages
750-Article Text-3228-1-10-20221101
No ratings yet
750-Article Text-3228-1-10-20221101
10 pages
Exercises 7
No ratings yet
Exercises 7
4 pages
Nest Split
No ratings yet
Nest Split
32 pages
Generalized Linear Mixed Models Modern Concepts Methods and Applications 1st Edition Stroup Digital Version 2025
No ratings yet
Generalized Linear Mixed Models Modern Concepts Methods and Applications 1st Edition Stroup Digital Version 2025
112 pages
Moderation Meditation PDF
No ratings yet
Moderation Meditation PDF
11 pages
Case Analysis: Alumni Giving: Felicisimo, Jaurigue, Laset, Santos, Tiu
No ratings yet
Case Analysis: Alumni Giving: Felicisimo, Jaurigue, Laset, Santos, Tiu
34 pages
Multiple Regression: Problem Set 7
No ratings yet
Multiple Regression: Problem Set 7
3 pages
DOE Homework 5 Stefan Garnett Harmasi
No ratings yet
DOE Homework 5 Stefan Garnett Harmasi
8 pages
Solving Optimization Problems Using The Matlab Opt
No ratings yet
Solving Optimization Problems Using The Matlab Opt
50 pages
What Is Time Value of Money
No ratings yet
What Is Time Value of Money
14 pages
Path Analysis in Reading Skills
No ratings yet
Path Analysis in Reading Skills
5 pages
Slides 3 Iu
No ratings yet
Slides 3 Iu
22 pages
Business Statistics and Analytics2036
No ratings yet
Business Statistics and Analytics2036
3 pages
Problem Set 2
No ratings yet
Problem Set 2
3 pages
U.S Medical Insurance Costs: Wesley F. Maia
No ratings yet
U.S Medical Insurance Costs: Wesley F. Maia
30 pages

Chapter 8 Regression Model - 2023

Uploaded by

Chapter 8 Regression Model - 2023

Uploaded by

11/2/2023

Chapter 7. Regression Models

Nguyen VP Nguyen, Ph.D.

• A regression model shows how to fit a straight

Sum of squares of error

• The best fit line for Yt or series of Yt is obtained

The Intercept and Slope

• The intercept (or "constant term“ – đoạn chắn, giao điểm)

Example 2 of Mr. Bump’s data

Residual Plots for Selling Level (1000 gallons) ~Y

Histogram Versus Order

Sum of SST  SSR  SSE

ANalysis Of Variance or ANOVA table

Standard errors of the estimate

• Used to measure the variability of data points on

• Mean square error vs standard error of the

• R2=1: all of the variability in Y is explained when X is

Example 6 of Mr. Bump’s data

The PRESS statistic

Four Quick Checks (Simple Regression)

1. Does the model make sense (i.e., check slope

First Quick Check

• Does the model make sense?

Second Quick Check

 if 𝑡 𝑡 or 𝑡 𝑡 : Reject H0, then the

Second Quick Check

An alternative test on H0: F-statistic

F-statistic= Ratio of (regression mean square) to

An alternative test on H0: F-statistic

• Look for F(1,n-2) at Table 5. p483 we have:

Third Quick Check

Need more sample

Fourth Quick Check

1. The errors are normally distributed

Check the assumptions for the model

1 & 2: Histogram of residuals and Normal

• The histogram of the residuals shows the

Residual Plots for Sales

Histogram Versus Order

Residual Plots for Y11

Histogram Versus Order

Check the assumptions for the model

3. The errors have constant variance

Residuals vs fitted values

Summary: assumptions of regression model

Summary: assumptions of regression model

Pattern What the pattern may indicate

Fanning or uneven spreading of

• If you identify any patterns or outliers in your residual

Check the assumptions for the model

4. The errors are independent

Residuals vs order of data

— Check error independency by observing the

— Check if there is no systematic patterns and

You might also like