0% found this document useful (0 votes)

306 views20 pages

Class 9 Validation of The Linear Regression Model

The document discusses validating linear regression models. It provides three key measures for validation: 1) Coefficient of determination (R-squared) which measures the percentage of variation in the dependent variable explained by the model. 2) Hypothesis testing of the regression coefficients to determine if an independent variable is statistically significant in predicting the dependent variable. 3) Analysis of variance to assess the overall validity of multiple linear regression models. Additional details and formulas are provided for calculating R-squared, confidence intervals, and testing hypotheses in simple and multiple linear regression analysis.

Uploaded by

Sumana Basu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

306 views20 pages

Class 9 Validation of The Linear Regression Model

Uploaded by

Sumana Basu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Validation of the Linear

Regression Model
Validation of the Simple Linear Regression Model

It is important to validate the regression model to ensure its validity

and goodness of fit before it can be used for practical applications.
The following measures are used to validate the simple linear
regression models:

• Co-efficient of determination (R-square).

• Hypothesis test for the regression coefficient
• Analysis of Variance for overall model validity (relevant more for
multiple linear regression).

The above measures and tests are essential, but not exhaustive.
Coefficient of Determination (R-Square or R2)
• The co-efficient of determination (or R-square or R2)
measures the percentage of variation in Y explained by the
model (0 + 1 X).
• The simple linear regression model can be broken into
explained variation and unexplained variation as shown in

In absence of the predictive model for Yi, the users will use the
mean value of Yi. Thus, the total variation is measured as the
difference between Yi and mean value of Yi (i.e.,Yi - ).
Description of total variation, explained variation
and unexplained variation

Variation Type Measure Description

Total Variation (SST) ( ) Total variation is the difference between the actual
value and the mean value.

Variation explained by the model ( ) Variation explained by the model is the difference
between the estimated value of Y i and the mean value
of Y

Variation not explained by model ( ) Variation not explained by the model is the difference
between the actual value and the predicted value of Y i
(error in prediction)
The relationship between the total variation, explained variation and
the unexplained variation is given as follows:
− ∧ − ∧

⏟
𝑌 𝑖 −𝑌 = ⏟
𝑌 𝑖 −𝑌 + ⏟
𝑌 𝑖 −𝑌 𝑖
Total Variation in Y Variation in Y explained by the model Variation in Y not explained by the model

It can be proved mathematically that sum of squares of total variation

is equal to sum of squares of explained variation plus sum of squares
of unexplained variation

∑ ( 𝑌 𝑖 −𝑌 ) =∑ (𝑌 𝑖 −𝑌 ) + ∑ ( 𝑌 𝑖 − 𝑌 𝑖 )
𝑛 − 2 𝑛 ∧ − 2 𝑛 ∧ 2

⏟
𝑖=1 ⏟
𝑖=1 ⏟
𝑖=1
𝑆𝑆𝑇 𝑆𝑆𝑅 𝑆𝑆𝐸

where SST is the sum of squares of total variation, SSR is the sum of
squares of variation explained by the regression model and SSE is the
sum of squares of errors or unexplained variation.
Coefficient of Determination or R-Square
The coefficient of determination (R2) is given by

( )
∧ − 2

2 Explained variation 𝑆𝑆𝑅 𝑌 𝑖 −𝑌

Coefficient of determination = R = = =
Total variation 𝑆𝑆𝑇
( )
− 2
𝑌 𝑖 −𝑌
Coefficient of Determination or R-Square
Thus, R2 is the proportion of variation in response variable Y explained
by the regression model. Coefficient of determination (R2) has the
following properties:

• The value of R2 lies between 0 and 1.

• Higher value of R2 implies better fit, but one should be aware of
spurious regression.
• Mathematically, the square of correlation coefficient is equal to
coefficient of determination (i.e., r2 = R2).
• We do not put any minimum threshold for R2; higher value of R2
implies better fit.
Spurious Regression

Number of Facebook users and the number of

people who died of helium poisoning in UK
Year Number of Facebook users in Number of people who died of helium
millions (X) poisoning in UK (Y)

2004 1 2

2005 6 2

2006 12 2

2007 58 2

2008 145 11

2009 360 21

2010 608 31

2011 845 40

2012 1056 51
Facebook users versus helium poisoning in UK

The regression model is given as Y = 1.9967 + 0.0465 X

The R-square value for regression model between the number of deaths due to
helium poisoning in UK and the number of Facebook users is 0.9928. That is,
99.28% variation in the number of deaths due to helium poisoning in UK is
explained by the number of Facebook users.
Hypothesis Test for Regression Co-efficient (t-Test)

• The regression co-efficient ( 1) captures the existence of a

linear relationship between the response variable and the
explanatory variable.
• If 1 = 0, we can conclude that there is no statistically
significant linear relationship between the two variables.
The standard error of 1 is given by

In above Eq. Se is the standard error of estimate (or standard error of the
residuals) that measures the accuracy of prediction and is given by

The denominator in above Eq. is (n  2) since 0 and 1 are estimated

from the sample in estimating Yi and thus two degrees of freedom are
∧

lost. The standard error of 𝛽 can be written as

1
The null and alternative hypotheses for the SLR model can be
stated as follows:
H0: There is no relationship between X and Y
HA: There is a relationship between X and Y
• 1 = 0 would imply that there is no linear relationship between
the response variable Y and the explanatory variable X. Thus, the
null and alternative hypotheses can be restated as follows:

H0 :  1 = 0
HA: 1  0
• The corresponding t-statistic is given as
Confidence Interval for Regression coefficients 0
and 1
The standard error of estimates of and are given by

√ 𝑆𝑒
𝑛 ∧
∧
𝑆𝑒 × ∑ 𝑋
2
𝑖 𝑆 𝑒 (𝛽1 )=
𝑆 𝑒 ( 𝛽 0)=
√𝑛 × 𝑆 𝑆 𝑋
𝑖=1
√ 𝑆 𝑆𝑋

√ (𝑌 − 𝑌 )
∧ 2
where 𝑖 𝑖
𝑆 𝑒=
𝑛 −2

Where Se is the standard error of residuals and SSX =

The interval estimate or (1-)100% confidence interval for and

are given by
∧ ∧ ∧ ∧
𝛽 1 ∓𝑡 𝛼/ 2 ,𝑛− 2 𝑆𝑒 (𝛽 1) 𝛽 0 ∓𝑡 𝛼 /2 , 𝑛−2 𝑆𝑒 ( 𝛽0 )
Multiple Linear Regression
• Multiple linear regression means linear in
regression parameters (beta values). The following
are examples of multiple linear regression:

An important task in multiple regression is to estimate the

beta values (1, 2, 3 etc…)
Co-efficient of Multiple Determination (R-Square) and
Adjusted R-Square
As in the case of simple linear regression, R-square measures
the proportion of variation in the dependent variable
explained by the model. The co-efficient of multiple
determination (R-Square or R2) is given by
• SSE is the sum of squares of errors and SST is the sum of squares
of total deviation. In case of MLR, SSE will decrease as the number
of explanatory variables increases, and SST remains constant.

• To counter this, R2 value is adjusted by normalizing both SSE and

SST with the corresponding degrees of freedom. The adjusted R-
square is given by
Statistical Significance of Individual Variables in MLR – t-
test
Checking the statistical significance of individual variables is achieved
through t-test. Note that the estimate of regression coefficient is
given by Eq:

This means the estimated value of regression coefficient is a linear

function of the response variable. Since we assume that the
residuals follow normal distribution, Y follows a normal distribution
and the estimate of regression coefficient also follows a normal
distribution. Since the standard deviation of the regression coefficient
is estimated from the sample, we use a t-test.
The null and alternative hypotheses in the case of individual
independent variable and the dependent variable Y is given,
respectively, by

• H0: There is no relationship between independent variable Xi and

dependent variable Y
• HA: There is a relationship between independent variable Xi and
dependent variable Y

Alternatively,
• H0: i = 0
• HA: i  0
The corresponding test statistic is given by
Validation of Overall Regression Model – F-test

Analysis of Variance (ANOVA) is used to validate the overall

regression model. If there are k independent variables in the
model, then the null and the alternative hypotheses are,
respectively, given by

H 0: 1 =  2 =  3 = … =  k = 0
H1: Not all  s are zero.

F-statistic is given by:

F = (SST-SSE)/k/SSE/(n-k-1) ~ Fk,n-k-1
F-test for the overall fit of the model

• The decision rule at significance level  is:

• Reject H0 if

• Where the critical value F(, k, n-k-1) can be found from an F-table.
• The existence of a regression relation by itself does not assure that
useful prediction can be made by using it.
• Note that when k=1, this test reduces to the F-test for testing in simple
linear regression whether or not 1= 0

Unit - I Introduction To Data Analytics
No ratings yet
Unit - I Introduction To Data Analytics
89 pages
Designing A Learning System
No ratings yet
Designing A Learning System
21 pages
R24-M.Tech (CSE) Course Structure and Syllabus
No ratings yet
R24-M.Tech (CSE) Course Structure and Syllabus
73 pages
Unit 01 Basic Concepts of DBMS & Data Models
100% (1)
Unit 01 Basic Concepts of DBMS & Data Models
150 pages
JNTUH FLAT Questions
No ratings yet
JNTUH FLAT Questions
11 pages
Ai Notes
No ratings yet
Ai Notes
68 pages
LP 4 Lab Manual
No ratings yet
LP 4 Lab Manual
52 pages
ISE-21 Scheme and Syllabus
No ratings yet
ISE-21 Scheme and Syllabus
174 pages
BCS 51 Sof - Eng Imp Ques
No ratings yet
BCS 51 Sof - Eng Imp Ques
4 pages
Todo1 PDF
No ratings yet
Todo1 PDF
55 pages
STM Unit-I Notes
No ratings yet
STM Unit-I Notes
45 pages
Simulation and Modeling
No ratings yet
Simulation and Modeling
15 pages
BMTC Student Bus Pass Application Form
No ratings yet
BMTC Student Bus Pass Application Form
14 pages
Slicing and Indexing
No ratings yet
Slicing and Indexing
16 pages
M.Sc. Mathematics
No ratings yet
M.Sc. Mathematics
22 pages
Bridge Course Computer Science
No ratings yet
Bridge Course Computer Science
2 pages
Java - Lab - Manual-21csl35 - Skit
No ratings yet
Java - Lab - Manual-21csl35 - Skit
30 pages
Statistics For The Terrified 9781538144862 9781538144879 9781538144886 1538144867
No ratings yet
Statistics For The Terrified 9781538144862 9781538144879 9781538144886 1538144867
270 pages
Mca sYLLABUS
No ratings yet
Mca sYLLABUS
90 pages
Daa Notes Unit 4
No ratings yet
Daa Notes Unit 4
14 pages
DS - Unit 3 - Notes
100% (1)
DS - Unit 3 - Notes
13 pages
Flat Unit1 Notes
No ratings yet
Flat Unit1 Notes
70 pages
Statistics and Numerical Method Important Questions
No ratings yet
Statistics and Numerical Method Important Questions
3 pages
Ch-4 Processor Memory Modeling Using Queuing Theory
100% (2)
Ch-4 Processor Memory Modeling Using Queuing Theory
19 pages
Lab Assignment Questions of Python
100% (1)
Lab Assignment Questions of Python
2 pages
BTech - CSE (With Lab Manual)
No ratings yet
BTech - CSE (With Lab Manual)
77 pages
User Defined Functions in Javascript
No ratings yet
User Defined Functions in Javascript
6 pages
Machine Learning (6CS4-02) Unit-3 Notes
No ratings yet
Machine Learning (6CS4-02) Unit-3 Notes
21 pages
Gonzales&Rulona-effects of Diferent Colors of Light To Mongo's Growth
100% (1)
Gonzales&Rulona-effects of Diferent Colors of Light To Mongo's Growth
36 pages
AI Unit 1.
No ratings yet
AI Unit 1.
15 pages
2-1 MFCSQN Bank R16
No ratings yet
2-1 MFCSQN Bank R16
8 pages
15csl47 Daa Lab Manual-1
No ratings yet
15csl47 Daa Lab Manual-1
53 pages
Soft Computing
No ratings yet
Soft Computing
476 pages
WML Script
No ratings yet
WML Script
16 pages
Cohort 2.0 Syllabus
100% (1)
Cohort 2.0 Syllabus
2 pages
18CS42 Model Question Paper - 1 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
18CS42 Model Question Paper - 1 With Effect From 2019-20 (CBCS Scheme)
3 pages
Research Methodology
No ratings yet
Research Methodology
203 pages
Find - S Algorithm
No ratings yet
Find - S Algorithm
17 pages
QT
No ratings yet
QT
16 pages
IJPREMS Template January 2023
No ratings yet
IJPREMS Template January 2023
2 pages
Handoff Management
No ratings yet
Handoff Management
11 pages
Kurt Penberg BMW (Performance Appraisal 2014 Report)
50% (4)
Kurt Penberg BMW (Performance Appraisal 2014 Report)
73 pages
Tips - Researching Human Geography PDF
No ratings yet
Tips - Researching Human Geography PDF
368 pages
Soundarya Institute of Management and Science: Ca-C2T: Problem Solving Techniques
100% (1)
Soundarya Institute of Management and Science: Ca-C2T: Problem Solving Techniques
8 pages
PSQT Imp Questions
100% (1)
PSQT Imp Questions
2 pages
Class 03 04 Confidence Interval, Hypothesis Testing
No ratings yet
Class 03 04 Confidence Interval, Hypothesis Testing
87 pages
4-5. Mathematical Analysis of Recursive and NonRecursive Techniques
No ratings yet
4-5. Mathematical Analysis of Recursive and NonRecursive Techniques
59 pages
Algorithms Lab Viva Questions
No ratings yet
Algorithms Lab Viva Questions
2 pages
Computer Network Handwritten Notes ?
No ratings yet
Computer Network Handwritten Notes ?
75 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Unit 5
No ratings yet
Unit 5
34 pages
BCA 2 Year Syllabus-1
No ratings yet
BCA 2 Year Syllabus-1
21 pages
Q4 Weeks 4 Week 5 Statistics and Probability
100% (2)
Q4 Weeks 4 Week 5 Statistics and Probability
14 pages
Class 06 07 Naive Bayes
No ratings yet
Class 06 07 Naive Bayes
91 pages
CGC Template
No ratings yet
CGC Template
19 pages
Ibrahim BCSP064
No ratings yet
Ibrahim BCSP064
184 pages
7 Steps of Hypothesis Testing
No ratings yet
7 Steps of Hypothesis Testing
3 pages
Validation Jerry Banks
100% (1)
Validation Jerry Banks
31 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Machine Learning Class Notes
No ratings yet
Machine Learning Class Notes
1 page
VTU B.E CSE Sem 8 Software Testing Notes
No ratings yet
VTU B.E CSE Sem 8 Software Testing Notes
26 pages
Interval Estimation
No ratings yet
Interval Estimation
45 pages
Mind Map-Sem-1
No ratings yet
Mind Map-Sem-1
2 pages
Today Final Reivew
No ratings yet
Today Final Reivew
42 pages
Formulating Appropriate Null-Alternative Hypothesis-Onetailed-Twotailed
No ratings yet
Formulating Appropriate Null-Alternative Hypothesis-Onetailed-Twotailed
29 pages
Data Science Ethics - Lecture 9 - Ethical Reporting
No ratings yet
Data Science Ethics - Lecture 9 - Ethical Reporting
35 pages
Cell Theory
No ratings yet
Cell Theory
32 pages
Lab 4 T-Tests - BJJean
No ratings yet
Lab 4 T-Tests - BJJean
4 pages
Decision Making and Looping
No ratings yet
Decision Making and Looping
104 pages
Hypothesis Testing - Online - Lecture - 1
No ratings yet
Hypothesis Testing - Online - Lecture - 1
15 pages
Exames Jorge
No ratings yet
Exames Jorge
14 pages
HYPOTHESIS
100% (1)
HYPOTHESIS
16 pages
Boe310 Lecture
No ratings yet
Boe310 Lecture
25 pages
3ap99 Udyez
No ratings yet
3ap99 Udyez
10 pages
Marketing Research UNIT - 5 Important Question ND Answer
No ratings yet
Marketing Research UNIT - 5 Important Question ND Answer
6 pages
Final Exam - Practice Exam - (Chapter 10, 11 and 12) - Part 2
No ratings yet
Final Exam - Practice Exam - (Chapter 10, 11 and 12) - Part 2
8 pages
Laboratory Technical Assistant Computer - Application
No ratings yet
Laboratory Technical Assistant Computer - Application
12 pages
Q1.Ans-Different Scholars Have Interpreted The Term Research'in Many Ways - For
No ratings yet
Q1.Ans-Different Scholars Have Interpreted The Term Research'in Many Ways - For
10 pages
Pps Practical File
100% (1)
Pps Practical File
61 pages
Assignment2 MGSC5125 Fall24
No ratings yet
Assignment2 MGSC5125 Fall24
3 pages
Tutorial Letter 003/0/2021: Statistical Inference I
No ratings yet
Tutorial Letter 003/0/2021: Statistical Inference I
7 pages
Journal Article Critique PDF
No ratings yet
Journal Article Critique PDF
2 pages
LR (0) Parser
No ratings yet
LR (0) Parser
8 pages
Module - 4 Bayeian Learning
No ratings yet
Module - 4 Bayeian Learning
44 pages
Statistics For All
No ratings yet
Statistics For All
7 pages
Performance Task in Statistics and Probability
No ratings yet
Performance Task in Statistics and Probability
4 pages
Syntax Tree
0% (1)
Syntax Tree
3 pages
Par Inc.
71% (7)
Par Inc.
6 pages
QI Macros Quick Start Guide
No ratings yet
QI Macros Quick Start Guide
6 pages
IML-IITKGP - Assignment 7 Solution
No ratings yet
IML-IITKGP - Assignment 7 Solution
8 pages
Engineering Mathematics - 2
No ratings yet
Engineering Mathematics - 2
2 pages
Lampiran SPSS Pengetahuan Dan Status Gizi
0% (1)
Lampiran SPSS Pengetahuan Dan Status Gizi
2 pages
Sample Report 22-23 1
No ratings yet
Sample Report 22-23 1
30 pages
Mc-Unit I
No ratings yet
Mc-Unit I
16 pages

Class 9 Validation of The Linear Regression Model

Uploaded by

Class 9 Validation of The Linear Regression Model

Uploaded by

Validation of the Linear

It is important to validate the regression model to ensure its validity

• Co-efficient of determination (R-square).

Variation Type Measure Description

It can be proved mathematically that sum of squares of total variation

2 Explained variation 𝑆𝑆𝑅 𝑌 𝑖 −𝑌

• The value of R2 lies between 0 and 1.

Number of Facebook users and the number of

The regression model is given as Y = 1.9967 + 0.0465 X

• The regression co-efficient ( 1) captures the existence of a

The denominator in above Eq. is (n  2) since 0 and 1 are estimated

lost. The standard error of 𝛽 can be written as

Where Se is the standard error of residuals and SSX =

The interval estimate or (1-)100% confidence interval for and

An important task in multiple regression is to estimate the

• To counter this, R2 value is adjusted by normalizing both SSE and

This means the estimated value of regression coefficient is a linear

• H0: There is no relationship between independent variable Xi and

Analysis of Variance (ANOVA) is used to validate the overall

F-statistic is given by:

• The decision rule at significance level  is:

You might also like