0% found this document useful (0 votes)

17 views40 pages

Multiple - Regression4 - Tagged

Uploaded by

Kwan Ting So

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views40 pages

Multiple - Regression4 - Tagged

Uploaded by

Kwan Ting So

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Multiple Regression and Model

Building
Multiple Regression Models

The General Multiple Regression Model

y  0  1 x1   2 x2  ...   k xk  
y is the dependent variable
are the independent variables
x1 , x2 ,..., xk
E  y   0  1 x1   2 x2  ...   k xk is the deterministic portion of
the model
i determines the contribution of the independent variable xi
Multiple Regression Models

Analyzing a Multiple Regression Model

1. Hypothesize the deterministic component of the model
2. Use sample data to estimate β0,β1,β2,… βk
3. Specify probability distribution of ε and estimate σ
4. Check that assumptions on ε are satisfied
5. Statistically evaluate model usefulness
6. Useful model used for prediction, estimation, other
purposes
The First-Order Model: Estimating
and Interpreting the -Parameters
For E  y   0  1 x1   2 x2  3 x3   4 x4  5 x5

the chosen fitted model yˆ ˆ0  ˆ1 x1  ...  ˆk xk

2
minimizes SSE   y  yˆ 

And x1, x2,…., x5 are not functions of other independent variables

The First-Order Model: Estimating
and Interpreting the -Parameters
y = β0 + β1x1 + β2x2 + β3x3 + ε

where
Y = Sales price (dollars)
X1 = Appraised land value (dollars)
X2 = Appraised improvements (dollars)
X3 = Area (square feet)
The First-Order Model: Estimating
and Interpreting the -Parameters
Plot of data for sample size n=20
The First-Order Model: Estimating
and Interpreting the -Parameters
Fit model to data
The First-Order Model: Estimating
and Interpreting the -Parameters
Interpret β estimates
E(y), the mean sale price of the property is
estimated to increase .8145 dollars for every $1
ˆ1 .8145 increase in appraised land value, holding other
variables constant
E(y), the mean sale price of the property is
ˆ2 .8204 estimated to increase .8204 dollars for every $1
increase in appraised improvements, holding other
variables constant
E(y), the mean sale price of the property is
estimated to increase 13.53 dollars for additional
ˆ1 13.53
3 square foot of living area, holding other variables
constant
The First-Order Model: Estimating
and Interpreting the -Parameters
Given the model E(y) = 1 +2x1 +x2, the
effect of x1 on E(y), holding x2 constant is
The First-Order Model: Estimating
and Interpreting the -Parameters
Given the model E(y) = 1 +2x1 +x2, the
effect of x1 on E(y), holding x2 constant is
Model Assumptions about Random
Error ε
Assumptions about Random Error ε
1. For any given set of values of x1, x2,…..xk, the
random error has a normal probability distribution with
mean 0 and variance σ2
2. The random errors are independent

Estimators of σ2 for a Multiple Regression Model

with k Independent Variables
SSE SSE
s=
2
=
n - Number of X variables - 1 n-k-1
Model Assumptions contd.

3. The random errors cannot be correlated

with any independent variables
4. The error terms must be homoskedastic,
i.e. have “constant” variance
5. Independent variables cannot be highly
correlated with one another. A violation of
this assumption is called “multicollinearity”.
Inferences about the -Parameters

2 types of inferences can be made, using either

confidence intervals or hypothesis testing [we are
generally using hypothesis testing since the
software uses this approach as standard].
For any inferences to be made, the assumptions
made about the random error term ε (normal
distribution with mean 0 and variance σ2,
independence or errors) must be met
Inferences about the -Parameters

A 100(1-α)% Confidence Interval for a -Parameter

ˆi t 2 sˆ

i
where tα/2 is based on n-k-1 degrees of freedom
and n = Number of observations
k = Number of independent variables in the model
n-k-1 is sometimes written n-(k+1)
Inferences about the -Parameters

A Test of an Individual Parameter Coefficient

Two-Tailed
One-Tailed Test
Test
H0: βi=0 H0: βi=0
Ha: βi<0 (or Ha: βi>0) Ha: βi≠0
ˆi
Test Statistic: t 
sˆ
i

Rejection region: t< -tα Rejection

(or t>+tα when Ha: β1>0) region: |t|> tα/2

Where t and t are based on n-(k+1) degrees of freedom

Checking the Overall Utility of a
Model
3 tests:
1. Multiple coefficient of determination R2
SSE SS yy  SSE Explained variability
R 2 1   
SS yy SS yy Total variability

2. Adjusted multiple coefficient of determination

  n  1   SSE    n  1 
Ra2 1    1  R 
2
   1  
 n   k  1   SS yy   n   k 1 

3. Global F-test

Test statistic :F 
 SS yy  SSE  k

R2 k
SSE  n   k  1  1  R 2  n   k 1 

Multiple coefficient of determination: R2 – How it is
Defined
Unexplained
Y Variation
Y (Y  Yˆ )
Yˆ a  bX

Total Variation
Ŷ
(Y  Y )
Explained Variation
(Yˆ  Y )
Y

∑(Ŷ – Y)2
R2 =
∑(Y – Y)2

X
Checking the Overall Utility of a
Model
Testing Global Usefulness of the Model: The
Analysis of Variance F-test
H0: β1 =β2=....βk=0
Ha: At least one βi ≠ 0

Test statistic :F 
 SS yy  SSE  k

R2 k

Mean Square Model
SSE  n   k  1  1  R 2  n   k 1  Mean Square Error
 

where n is the sample size and k is number of terms in the model

Rejection region: F>Fα, with k numerator degrees of freedom and [n-

(k+1)] denominator degrees of freedom
Checking the Overall Utility of a
Model
Checking the Utility of a Multiple Regression Model

1. Conduct a test of overall model adequacy

using the F-test. If H0 is rejected, proceed to
step 2
2. Conduct t-tests on β parameters of particular
interest
Using the Model for Estimation and
Prediction
As in Simple Linear Regression, intervals around a
predicted value will be wider than intervals around
an estimated value
Most statistics packages will print out both
estimation and prediction intervals
Model Building: Interaction Models

An Interaction Model relating E(y) to Two

Quantitative Independent Variables
E  y   0  1 x1   2 x2   3 x1 x2

where
 1   3 x2  represents the change in E(y) for every
1-unit increase in x1, holding x2 fixed
  2   3 x1  represents the change in E(y) for every
1-unit increase in x2, holding x1 fixed
Model Building: Interaction Models

When the relationship between y and When the linear relationship

xi is not impacted by a second x (no between y and xi depends on
interaction) another x
Model Building: Interaction Models
Model Building: Quadratic and
other Higher-Order Models
A Quadratic (Second-Order) Model
2
E  y   0   1 x   2 x
where

 0 is the y-intercept of the curve

1 is a shift parameter
2 is the rate of curvature
Model Building: Quadratic and
other Higher-Order Models
Home Size-Electrical
Usage Data
Size of Home, Monthly Usage,
x (sq. ft.) y (kilowatt-hours)
1,290 1,182
1,350 1,172
1,470 1,264
1,600 1,493
1,710 1,571
1,840 1,711
1,980 1,804
2,230 1,840
2,400 1,95
2,930 1,954
Model Building: Quadratic and
other Higher-Order Models

yˆ  1, 216.1  2.3989 x  .00045 x 2

Model Building: Qualitative
(Dummy) Variable Models
Dummy variables – coded, qualitative variables
•Codes are in the form of (1, 0), 1 being the presence of a
condition, 0 the absence
•Create Dummy variables so that there is one less dummy
variable than categories of the qualitative variable of
interest
Gender dummy variable coded
as x = 1 if male, x=0 if female
If model is E(y)=β0+β1x ,
β1 captures the effect of being
male on the dependent variable,
or the average difference between
males and females.
Model Building: Models with both
Quantitative and Qualitative Variables
Start with a first order model with one quantitative
variable, E(y)=β0+β1x

Adding a qualitative variable

with no interaction, and with
3 categories:
E(y)=β0+β1x1+ β2x2+ β3x3
Model Building: Models with both
Quantitative and Qualitative Variables
Adding an interaction term,
E(y)=β0+β1x1+ β2x2+ β3x3+ β4x1x2+ β5x1x3

Main effect, Main effect Interaction

x1 x2 and x3
Model Building: Stepwise
Regression
•Used when a large set of independent
variables
•Software packages will add in variables in
order of explanatory value.
•Decisions based on largest t-values at each
step
•Procedure is best used as a screening
procedure only
NOTE: we have not covered this material
Model Building: Best Subset Regression

•Used when a large set of independent variables

•Helps to identify the best subset of potential
variables
•Maximize adjusted R2
•Minimize Mallows C-p

•NOTE: we have not covered this material

Residual Analysis: Checking the
Regression Assumptions
Regression Residual – the difference
between an observed y value and its
corresponding predicted value
ˆ  y  yˆ 

Properties of Regression Residuals

•The mean of the residuals equals zero
•The standard deviation of the residuals is equal to the
standard deviation of the fitted regression model
Residual Analysis: Checking the
Regression Assumptions
Analyzing Residuals
Top plot of residuals against
independent variable [SIZE]
reveals non-random pattern,
curved shape

Second plot, based on

second-order term being
added to model, results in
random pattern, better
model
Residual Analysis: Checking the
Regression Assumptions
Identifying Outliers
Residual plots can reveal outliers
Outliers need to be checked to try
to determine if error is involved
If error is involved, or observation
is not representative, analysis can
be rerun after deleting data point
to assess the effect.
Outlier
Residual Analysis: Checking the
Regression Assumptions
Checking for Normal Errors

With Outlier Without Outlier

Residual Analysis: Checking the Regression
Assumptions:
Residual versus Fitted plot
Checking for Equal Variances

Pattern in residuals indicate violation of equal

variance assumption
Can point to use of transformation on the
dependent variable to stabilize variance
Residual Analysis: Checking the
Regression Assumptions
Steps in Residual Analysis
1. Check for mis-specified model by plotting
residuals against quantitative independent
variables
2. Examine residual plots for outliers
3. Check for non-normal error using normal
probability plot or frequency distribution of
residuals
4. Check for unequal error variances using plots
of residuals against predicted [fitted] values
Some Pitfalls: Estimability,
Multicollinearity, and Extrapolation
Estimability – the number of levels of
observed x-values must be one more than
the order of the polynomial in x that you
want to fit
Multicollinearity – when two or more
independent variables are highly correlated
- Use Variance Inflation Factors [VIF’s] > 4
to diagnose the problem
Some Pitfalls: Multicollinearity
- when two or more independent variables are highly correlated

Multicollinearity – Leads to confusing, misleading results,

incorrect parameter estimate, signs.
Can be identified by
-- checking for VIF’s greater than 4
– checking correlations among x’s [ r > 0.9]
– coefficients not-significant for most/all x’s
– signs opposite from expected in the estimated β parameters
Can be addressed by
-- Leaving alone if coefficients are significant
– Dropping one or more of the correlated variables in the model if
there are major problems with coefficients
– Restricting inferences to range of sample data, not making
inferences about individual β parameters based on t-tests.
Some Pitfalls: Extrapolation and
Correlated Errors
Extrapolation – use of model to predict
outside of range of sample data is
dangerous. Avoid if possible
Correlated Errors – most common when
working with time series data, values of y
and x’s observed over a period of time.
Solution is to develop a time series model.

CH 4 Multiple Regression Models
No ratings yet
CH 4 Multiple Regression Models
28 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
Modelbuilding
No ratings yet
Modelbuilding
52 pages
125.785 Module 2.2
No ratings yet
125.785 Module 2.2
95 pages
Data Science Q&A - Latest Ed (2020) - 3 - 1
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 1
2 pages
CH16
No ratings yet
CH16
69 pages
Stats 101 - Class 03
No ratings yet
Stats 101 - Class 03
94 pages
Multiple Regression
No ratings yet
Multiple Regression
61 pages
Chapter 3 Notes Part 3
No ratings yet
Chapter 3 Notes Part 3
9 pages
04 MLR
No ratings yet
04 MLR
32 pages
SBE11 CH 16
No ratings yet
SBE11 CH 16
59 pages
Regression Analysis Techniques
No ratings yet
Regression Analysis Techniques
16 pages
Chap 5
No ratings yet
Chap 5
13 pages
Week 4 - The Multiple Linear Regression Model (Part 1) PDF
No ratings yet
Week 4 - The Multiple Linear Regression Model (Part 1) PDF
35 pages
CH 06
No ratings yet
CH 06
22 pages
Week 5 Multiple Regression: Busa3500 Statistics For Business Ii Piedmont College
No ratings yet
Week 5 Multiple Regression: Busa3500 Statistics For Business Ii Piedmont College
57 pages
DECS Cheat Sheet
No ratings yet
DECS Cheat Sheet
8 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
Bivariate
No ratings yet
Bivariate
28 pages
Chapter 4 Multiple Regression Model
No ratings yet
Chapter 4 Multiple Regression Model
31 pages
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
No ratings yet
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
26 pages
Mult Regression
No ratings yet
Mult Regression
28 pages
Multiple Regression Slides Mod-Ed
No ratings yet
Multiple Regression Slides Mod-Ed
32 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
54 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
54 pages
Business Analytics
No ratings yet
Business Analytics
19 pages
Stat 473-573 Notes
No ratings yet
Stat 473-573 Notes
139 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Regression Analysis
No ratings yet
Regression Analysis
52 pages
Linear Regression Models 2018
No ratings yet
Linear Regression Models 2018
68 pages
Data Science: Stats & Regression
100% (1)
Data Science: Stats & Regression
21 pages
Multiple Regression and Model Building: Dr. Subhradev Sen Alliance School of Business
No ratings yet
Multiple Regression and Model Building: Dr. Subhradev Sen Alliance School of Business
150 pages
Model Builing
No ratings yet
Model Builing
45 pages
Econ 332 Lecture Notes April 2021
No ratings yet
Econ 332 Lecture Notes April 2021
57 pages
Qbus2810 Notes PDF
100% (1)
Qbus2810 Notes PDF
58 pages
Part 11 Multiple Linear Regression - Pdf.crdownload
No ratings yet
Part 11 Multiple Linear Regression - Pdf.crdownload
41 pages
Chapter 15
No ratings yet
Chapter 15
24 pages
Regression Linear
No ratings yet
Regression Linear
24 pages
Data Analysis for Regression Models
No ratings yet
Data Analysis for Regression Models
58 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Linear Model
No ratings yet
Linear Model
10 pages
Anderson Ch16
No ratings yet
Anderson Ch16
59 pages
MultivariableRegression 1
No ratings yet
MultivariableRegression 1
30 pages
Regression
No ratings yet
Regression
24 pages
Chapter 1
No ratings yet
Chapter 1
24 pages
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
No ratings yet
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
39 pages
02 SLR
No ratings yet
02 SLR
39 pages
Lecture9 Regression
No ratings yet
Lecture9 Regression
24 pages
Applied Business Forecasting and Planning: Multiple Regression Analysis
No ratings yet
Applied Business Forecasting and Planning: Multiple Regression Analysis
100 pages
SRM Formula Sheet-2
100% (1)
SRM Formula Sheet-2
11 pages
DISC 212 Session 13
No ratings yet
DISC 212 Session 13
29 pages
Econometrics Cheatsheet en
No ratings yet
Econometrics Cheatsheet en
3 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Chapter 2 Regression and Forecasting
No ratings yet
Chapter 2 Regression and Forecasting
88 pages
Texts and Ideas-Early China 3
No ratings yet
Texts and Ideas-Early China 3
3 pages
Textbook Exercise Hw3
No ratings yet
Textbook Exercise Hw3
4 pages
Textbook Exercise Hw5
No ratings yet
Textbook Exercise Hw5
4 pages
Textbook - Exercise - hw4 3
No ratings yet
Textbook - Exercise - hw4 3
2 pages
Applied Stat-Tutorial-5
No ratings yet
Applied Stat-Tutorial-5
2 pages
Pajares, Allan Mark L. - MLR.
No ratings yet
Pajares, Allan Mark L. - MLR.
2 pages
Practical 3 - Cropyield Forecasting - Exercise 5
No ratings yet
Practical 3 - Cropyield Forecasting - Exercise 5
7 pages
STATA Commands for Clinical Statistics
No ratings yet
STATA Commands for Clinical Statistics
3 pages
LONG TEST - Proba and Stat
No ratings yet
LONG TEST - Proba and Stat
7 pages
F-Test - Definition, Statistics, Calculation, Interpretation, Example
No ratings yet
F-Test - Definition, Statistics, Calculation, Interpretation, Example
2 pages
Statistics Assignment Solutions
67% (3)
Statistics Assignment Solutions
31 pages
(Ebook PDF) Statistics For Business Economics 13th Edition by David PDF Download
100% (2)
(Ebook PDF) Statistics For Business Economics 13th Edition by David PDF Download
50 pages
Pannasastra University of Cambodia
No ratings yet
Pannasastra University of Cambodia
2 pages
Linear Regression and Its Applications in Machine Learning: Topic
No ratings yet
Linear Regression and Its Applications in Machine Learning: Topic
9 pages
Sta 2023
No ratings yet
Sta 2023
5 pages
01 Introduction
No ratings yet
01 Introduction
15 pages
Ds Unit 1
No ratings yet
Ds Unit 1
77 pages
Business Statistics May 2012
No ratings yet
Business Statistics May 2012
4 pages
Chapter 14: Repeated Measures Analysis of Variance (ANOVA)
No ratings yet
Chapter 14: Repeated Measures Analysis of Variance (ANOVA)
20 pages
Ap Statistics FRQ 2017
No ratings yet
Ap Statistics FRQ 2017
16 pages
Name: - Section
No ratings yet
Name: - Section
3 pages
Stats Medic Ultimate Interpretations Practice
No ratings yet
Stats Medic Ultimate Interpretations Practice
2 pages
Machine Learning in Python
No ratings yet
Machine Learning in Python
36 pages
ESS Weighting Data 1
No ratings yet
ESS Weighting Data 1
8 pages
Height and Basketball True Shooting Percentage
No ratings yet
Height and Basketball True Shooting Percentage
15 pages
Economics
No ratings yet
Economics
26 pages
Non Unif HO
No ratings yet
Non Unif HO
58 pages
Specimen Paper CS1
No ratings yet
Specimen Paper CS1
7 pages
Improving Student Learning Outcomes in Economics
No ratings yet
Improving Student Learning Outcomes in Economics
7 pages
Six Sigma Tools in A Excel Sheet
No ratings yet
Six Sigma Tools in A Excel Sheet
23 pages
T Test As A Parametric Statistic: Tae Kyun Kim
No ratings yet
T Test As A Parametric Statistic: Tae Kyun Kim
7 pages
Final Examination Salcedo
No ratings yet
Final Examination Salcedo
8 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
98 pages
Stats Medic - Difference Between Two Sample Proportions Answer Key
No ratings yet
Stats Medic - Difference Between Two Sample Proportions Answer Key
3 pages

Multiple - Regression4 - Tagged

Uploaded by

Multiple - Regression4 - Tagged

Uploaded by

Multiple Regression and Model

The General Multiple Regression Model

Analyzing a Multiple Regression Model

the chosen fitted model yˆ ˆ0  ˆ1 x1  ...  ˆk xk

And x1, x2,…., x5 are not functions of other independent variables

Estimators of σ2 for a Multiple Regression Model

3. The random errors cannot be correlated

2 types of inferences can be made, using either

A 100(1-α)% Confidence Interval for a -Parameter

ˆi t 2 sˆ

A Test of an Individual Parameter Coefficient

Rejection region: t< -tα Rejection

Where t and t are based on n-(k+1) degrees of freedom

2. Adjusted multiple coefficient of determination

where n is the sample size and k is number of terms in the model

Rejection region: F>Fα, with k numerator degrees of freedom and [n-

1. Conduct a test of overall model adequacy

An Interaction Model relating E(y) to Two

When the relationship between y and When the linear relationship

 0 is the y-intercept of the curve

yˆ  1, 216.1  2.3989 x  .00045 x 2

Adding a qualitative variable

Main effect, Main effect Interaction

•Used when a large set of independent variables

•NOTE: we have not covered this material

Properties of Regression Residuals

Second plot, based on

With Outlier Without Outlier

Pattern in residuals indicate violation of equal

Multicollinearity – Leads to confusing, misleading results,

You might also like