0% found this document useful (0 votes)

4 views12 pages

Chapter8 Regression IndicatorVariables

Chapter 8 discusses indicator variables, which are used in regression analysis to represent qualitative data on a nominal scale by coding them as binary values (0 and 1). It explains how these dummy variables can be incorporated into regression models, the implications of using multiple indicators, and the concept of the dummy variable trap. The chapter also covers interaction terms and the fitting of models with both quantitative and qualitative explanatory variables.

Uploaded by

alurdom22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views12 pages

Chapter8 Regression IndicatorVariables

Uploaded by

alurdom22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Chapter 8

Indicator Variables

In general, the explanatory variables in any regression analysis are assumed to be quantitative in nature. For
example, the variables like temperature, distance, age etc. are quantitative in the sense that they are recorded
on a well-defined scale.

In many applications, the variables can not be defined on a well-defined scale, and they are qualitative in
nature.

For example, the variables like sex (male or female), colour (black, white), nationality, employment status
(employed, unemployed) are defined on a nominal scale. Such variables do not have any natural scale of
measurement. Such variables usually indicate the presence or absence of a “quality” or an attribute like
employed or unemployed, graduate or non-graduate, smokers or non- smokers, yes or no, acceptance or
rejection, so they are defined on a nominal scale. Such variables can be quantified by artificially constructing
the variables that take the values, e.g., 1 and 0 where “1” usually indicates the presence of attribute and “0”
usually indicates the absence of the attribute. For example, “1” indicator that the person is male and “0”
indicates that the person is female. Similarly, “1” may indicate that the person is employed and then “0”
indicates that the person is unemployed.

Such variables classify the data into mutually exclusive categories. These variables are called indicator
variable or dummy variables.

Usually, the indicator variables take on the values 0 and 1 to identify the mutually exclusive classes of the
explanatory variables. For example,
1 if person is male
D
0 if person is female,
1 if person is employed
D
0 if person is unemployed.

Here we use the notation D in place of X to denote the dummy variable. The choice of 1 and 0 to identify
a category is arbitrary. For example, one can also define the dummy variable in the above examples as

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

1
1 if person is female
D
0 if person is male,
1 if person is unemployed
D
0 if person is employed.
It is also not necessary to choose only 1 and 0 to denote the category. In fact, any distinct value of D will
serve the purpose. The choices of 1 and 0 are preferred as they make the calculations simple, help in the easy
interpretation of the values and usually turn out to be a satisfactory choice.

In a given regression model, the qualitative and quantitative can also occur together, i.e., some variables are
qualitative, and others are quantitative.

When all explanatory variables are

- quantitative, then the model is called a regression model,
- qualitative, then the model is called an analysis of variance model and
- quantitative and qualitative both, then the model is called an analysis of covariance model.

Such models can be dealt with within the framework of regression analysis. The usual tools of regression
analysis can be used in the case of dummy variables.

Example:
Consider the following model with x1 as quantitative and D2 as an indicator variable

y   0  1 x1   2 D2   , E ( )  0, Var ( )   2
0 if an observation belongs to group A
D2  
1 if an observation belongs to group B.
The interpretation of the result is essential. We proceed as follows:
If D2  0, then

y   0  1 x1   2 .0  
  0  1 x1  
E ( y / D2  0)   0  1 x1

which is a straight line relationship with intercept  0 and slope 1 .

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

2
If D2  1, then

y   0  1 x1   2 .1  
 (  0   2 )  1 x1  
E ( y / D2  1)  (  0   2 )  1 x1

which is a straight-line relationship with intercept (  0   2 ) and slope 1.

The quantities E ( y / D2  0) and E ( y / D2  1) are the average responses when an observation belongs to

group A and group, B, respectively. Thus

 2  E ( y / D2  1)  E ( y / D2  0)
which has an interpretation as the difference between the average values of y with D2  0 and D2  1 .

Graphically, it looks like as in the following figure. It describes two parallel regression lines with the same
variances  2 .

y
E ( y / D2  1)  ( 0   2 )  1 x1

0   2 1
2
E ( y / D2  0)  0  1 x1

1
0

If there are three explanatory variables in the model with two indicator variables D2 , and D3 then they

will describe three levels, e.g., groups A, B and C. The levels of indicator variables are as follows:
1. D2  0, D3  0 if the observation is from group A

2. D2  1, D3  0 if the observation is from group B

3. D2  0, D3  1 if the observation is from group C

The concerned regression model is

y   0  1 x1   2 D2  3 D3   , E ( )  0, var( )   2 .
Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur
3
In general, if a qualitative variable has m levels, then (m  1) indicator variables are required, and each of
them takes value 0 and 1.

Consider the following examples to understand how to define such indicator variables and how they can be
handled.

Example:
Suppose y denotes the monthly salary of a person and D denotes whether the person is graduate or non-
graduate. The model is
y   0  1 D   , E ( )  0, var( )   2 .

With n observations, the model is

yi   0  1 Di   i , i  1, 2,..., n
E ( yi / Di  0)   0
E ( yi / Di  1)   0  1
1  E ( yi / Di  1)  E ( yi / Di  0)
Thus
-  0 measures the mean salary of a non-graduate.

- 1 measures the difference in the mean salaries of a graduate and a non-graduate person.

Now consider the same model with two indicator variables defined in the following way:
1 if person is graduate
Di1  
0 if person is nongraduate,
1 if person is nongraduate
Di 2  
0 if person is graduate.
The model with n observations is
yi   0  1 Di1   2 Di 2   i , E ( i )  0, Var ( i )   2 , i  1, 2,..., n.
Then we have
1. E  yi / Di1  0, Di 2  1   0   2 : Average salary of a non-graduate

2. E  yi / Di1  1, Di 2  0   0  1 : Average salary of a graduate

3. E  yi / Di1  0, Di 2  0   0 : cannot exist

4. E  yi / Di1  1, Di 2  1   0  1   2 : cannot exist.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur
4
Notice that in this case
Di1  Di 2  1 for all i

which is an exact constraint and indicates the contradiction as follows:

Di1  Di 2  1  person is graduate

Di1  Di 2  1  person is non-graduate

So multicollinearity is present in such cases. Hence the rank of the matrix of explanatory variables falls
short by 1. So  0 , 1 and  2 are indeterminate, and least-squares method breaks down. So the proposition

of introducing two indicator variables is useful, but they lead to serious consequences. This is known as the
dummy variable trap.

If the intercept term is ignored, then the model becomes

yi  1 Di1   2 Di 2   i , E ( i )  0,Var ( i )   2 , i  1, 2,..., n
then
E ( yi / Di1  1, Di 2  0)  1  Average salary of a graduate.
E ( yi / Di1  0, Di 2  1)   2  Average salary of a non  graduate.

So when intercept term is dropped, then 1 and  2 have proper interpretations as the average salaries of a

graduate and non-graduate persons, respectively.

Now the parameters can be estimated using ordinary least squares principle, and standard procedures for
drawing inferences can be used.

Rule: When the explanatory variable leads to m mutually exclusive categories classification, then use
(m  1) indicator variables for its representation. Alternatively, use m indicator variables but drop the
intercept term.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

5
Interaction term:
Suppose a model has two explanatory variables – one quantitative variable and other an indicator variable.
Suppose both interact and an explanatory variable as the interaction of them is added to the model.
yi   0  1 xi1   2 Di 2  3 xi1 Di 2   i , E ( i )  0, Var ( i )   2 , i  1, 2,..., n.

To interpret the model parameters, we proceed as follows:

Suppose the indicator variables are given by
1 if i th person belongs to group A
Di 2  
0 if i th person belongs to group B

yi  Salary of i th person.
Then
E  yi / Di 2  0    0  1 xi1   2 .0  3 xi1.0
  0  1 xi1.

This is a straight line with intercept  0 and slope 1 . Next

E  yi / Di 2  1   0  1 xi1   2 .1  3 xi1.1
 (  0   2 )  ( 1   3 ) xi1.

This is a straight line with intercept term (  0   2 ) and slope ( 1  3 ).

The model
E ( yi )   0  1 xi1   2 Di 2  3 xi1 Di 2

has different slopes and different intercept terms.

Thus
 2 reflects the change in intercept term associated with the change in the group of person i.e., when the
group changes from A to B.
3 reflects the change in slope associated with the change in the group of person, i.e., when group changes
from A to B.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

6
Fitting of the model
yi   0  1 xi1   2 Di 2   3 xi1 Di 2   i

is equivalent to fitting two separate regression models corresponding to Di 2  1 and Di 2  0 , i.e.

yi   0  1 xi1   2 .1   3 xi1.1   i
yi  (  0   2 )  ( 1   3 ) xi1 Di 2   i
and
yi   0  1 xi1   2 .0  3 xi1.0   i
yi   0  1 xi1   i
respectively.

The test of hypothesis becomes convenient by using an indicator variable. For example, if we want to test
whether the two regression models are identical, the test of hypothesis involves testing
H 0 :  2  3  0
H1 :  2  0 and/or 3  0.

Acceptance of H 0 indicates that only a single model is necessary to explain the relationship.

In another example, if the objective is to test that the two models differ with respect to intercepts only and
they have the same slopes, then the test of hypothesis involves testing
H 0 : 3  0
H1 :  3  0.

Indicator variables versus quantitative explanatory variable

The quantitative explanatory variables can be converted into indicator variables. For example, if the ages of
persons are grouped as follows:
Group 1: 1 day to 3 years
Group 2: 3 years to 8 years
Group 3: 8 years to 12 years
Group 4: 12 years to 17 years
Group 5: 17 years to 25 years
then the variable “age” can be represented by four different indicator variables.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

7
Since it is difficult to collect the data on individual ages, so this will help in an easy collection of data. A
disadvantage is that some loss of information occurs. For example, if the ages in years are 2, 3, 4, 5, 6, 7 and
suppose the indicator variable is defined as
1 if age of i th person is  5 years
Di  
0 if age of i person is  5 years.
th

Then these values become 0, 0, 0, 1, 1, 1. Now looking at the value 1, one can not determine if it
corresponds to age 5, 6 or 7 years.

Moreover, if a quantitative explanatory variable is grouped into m categories, then (m  1) parameters are
required whereas if the original variable is used as such, then only one parameter is required.

Treating a quantitative variable as a qualitative variable increases the complexity of the model. The degrees
of freedom for error is also reduced. This can affect the inferences if the data set is small. In large data sets,
such an effect may be small.

The use of indicator variables does not require any assumption about the functional form of the relationship
between study and explanatory variables.

Regression analysis and analysis of variance

The analysis of variance is often used in analyzing the data from the designed experiments. There is a
connection between the statistical tools used in the analysis of variance and regression analysis.

We consider the case of analysis of variance in one way classification and establish its relation with
regression analysis.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

8
One way classification:
Let there are k samples each of size n from k normally distributed populations N ( i ,  2 ), i  1, 2,..., k . The

populations differ only in their means, but they have the same variance  2 . This can be expressed as
yij  i   ij , i  1, 2,..., k ; j  1, 2,..., n
   ( i   )   ij
    i   ij

where yij is the j th observation for the i th fixed treatment effect  i  i   or factor level,  is the

general mean effect,  ij are identically and independently distributed random errors following N (0,  2 ).

Note that
k
 i  i   , 
i 1
i  0.

The null hypothesis is

H 0 :  1   2  ...   k  0
H1 :  i  0 for atleast one i.

Employing method of least squares, we obtain the estimator of  and  i as follows

S    ij2   yij     i 

k n k n
2

i 1 j 1 i 1 j 1

S 1 k n

 0  ˆ   yij  y
nk i 1 j 1
S 1 n
 0  ˆi   yij  ˆ  yi  y
 i n j 1

1 n
where yi   yij .
n j 1
Based on this, the corresponding test statistic is
 n k 2
 k  1  ( yi  y ) 
F0   k n i 1 
 2 
  ( yij  yi ) 
 i 1 j 1 
 k (n  1) 
 
 
which follows F -distribution with k  1 and k (n  1) degrees of freedom when the null hypothesis is true.
The decision rule is to reject H 0 whenever F0  F (k  1, k (n  1)) and it is concluded that the k treatment

means are not identical.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur
9
Connection with regression:
To illustrate the relationship between fixed effect one-way analysis of variance and regression, suppose
there are 3 treatments so that the model becomes
yij     i   ij , i  1, 2,...,3, j  1, 2,..., n.

There are 3 treatments which are the three levels of a qualitative factor. For example, the temperature can
have three possible levels – low, medium and high. They can be represented by two indicator variables as
1 if the observation is from treatment 1
D1  
0 otherwise,
1 if the observation is from treatment 2
D2   .
0 otherwise.

The regression model can be rewritten as

yij   0  1 D1 j   2 D2 j   ij , i  1, 2,3; j  1, 2,..., n

where
D1 j : value of D1 for j th observation with 1st treatment
D2 j : value of D2 for j th observation with 2nd treatment.

Note that
- parameters in the regression model are  0 , 1 ,  2 .

- parameters in the analysis of variance model are  , 1 , 2 , 3 .

We establish a relationship between the two sets of parameters.

Suppose treatment 1 is used on j th observation, so D1 j  1, D2 j  0 and

y1 j   0  1.1   2 .0  1 j
  0  1  1 j .

In case of analysis of variance model, this is represented as

y1 j     1  1 j
 1  1 j where 1     1

  0  1  1.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

10
If treatment 2 is applied on j th observation, then
- in the regression model set up,
D1 j  0, D2 j  1 and

y2 j   0  1.0   2 .1   2 j
 0  2   2 j

- in the analysis of variance model set up,

y2 j     2   2 j
  2   2 j where  2     2

  0   2  2 .

When treatment 3 is used on j th observation, then

- in the regression model set up,
D1 j  D2 j  0
y3 j   0  1.0   2 .0   3 j
 0   3 j

- in the analysis of variance model set up

y3 j     3   3 j
 3   3 j where 3     3

  0  3 .
So finally, there are following three relationships
 0  1  1
0   2  2
 0  3

  0  3
1  1  2
 2   2  3 .

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

11
In general, if there are k treatments, then (k  1) indicator variables are needed. The regression model is
given by
yij   0  1 D1 j   2 D2 j  ...   k 1 Dk 1, j   ij , i  1, 2,..., k ; j  1, 2,..., n

where
1 if j th observation gets i th treatment
Dij  
0 otherwise.
In this case, the relationship is
 0  k
 i  i  k , i  1, 2,..., k  1.

So  0 always estimates the mean of k th treatment and  i estimates the differences between the means of

i th treatment and k th treatment.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

Indicator Variables: Variable or Dummy Variables
No ratings yet
Indicator Variables: Variable or Dummy Variables
12 pages
Indicator Variables: Variable or Dummy Variables
No ratings yet
Indicator Variables: Variable or Dummy Variables
11 pages
Chapter10 Econometrics DummyVariableModel
No ratings yet
Chapter10 Econometrics DummyVariableModel
8 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Ecoometrics Chapter 5 Regression With Qualitative Information
No ratings yet
Ecoometrics Chapter 5 Regression With Qualitative Information
68 pages
Dummy Variables in Regression Analysis
No ratings yet
Dummy Variables in Regression Analysis
17 pages
Econometrics II (N)
No ratings yet
Econometrics II (N)
30 pages
EBE Dummy Variables
No ratings yet
EBE Dummy Variables
9 pages
Econometrics CH 1-4
100% (1)
Econometrics CH 1-4
315 pages
Introduction To Econometrics Ii (Econ-3062) : Mohammed Adem (PHD)
100% (5)
Introduction To Econometrics Ii (Econ-3062) : Mohammed Adem (PHD)
83 pages
Econometrics Cha 4
No ratings yet
Econometrics Cha 4
72 pages
Tesfaye M
No ratings yet
Tesfaye M
316 pages
Econometrics 2
No ratings yet
Econometrics 2
84 pages
Chapter Three QM
No ratings yet
Chapter Three QM
77 pages
An Economist Wished To Relate The Speed With Which A Particular Insurance Innovation Is Adopted
No ratings yet
An Economist Wished To Relate The Speed With Which A Particular Insurance Innovation Is Adopted
2 pages
Chapter 5 & 6
No ratings yet
Chapter 5 & 6
139 pages
Metrics Course Outline
No ratings yet
Metrics Course Outline
22 pages
Econometrics 2
No ratings yet
Econometrics 2
135 pages
Chapter 5 & 6
No ratings yet
Chapter 5 & 6
136 pages
Dummy Variables
No ratings yet
Dummy Variables
17 pages
Chapter 1 Econometrics
No ratings yet
Chapter 1 Econometrics
21 pages
Econometrics II Notes
No ratings yet
Econometrics II Notes
72 pages
Dummy Variable
No ratings yet
Dummy Variable
18 pages
Chap 1
No ratings yet
Chap 1
77 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
105 pages
Dummy Variables
No ratings yet
Dummy Variables
15 pages
CHapter 5 Acct
No ratings yet
CHapter 5 Acct
8 pages
Econometrics II Handout For Students
No ratings yet
Econometrics II Handout For Students
29 pages
Chapter Two Cep
No ratings yet
Chapter Two Cep
33 pages
Lectures PowerPoints PDF
No ratings yet
Lectures PowerPoints PDF
459 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
35 pages
Econometrics II Chapter Two
No ratings yet
Econometrics II Chapter Two
96 pages
Econometrics for Economics Students
No ratings yet
Econometrics for Economics Students
16 pages
Dummy Variables in Regression
No ratings yet
Dummy Variables in Regression
107 pages
Econometrics II All Chapters
No ratings yet
Econometrics II All Chapters
240 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
87 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Stat 136 Chapter 8 Dummy Variables
No ratings yet
Stat 136 Chapter 8 Dummy Variables
39 pages
Unit Ii ML (Mca)
No ratings yet
Unit Ii ML (Mca)
61 pages
Dummy Variables
No ratings yet
Dummy Variables
25 pages
Understanding Dummy Variables in Econometrics
No ratings yet
Understanding Dummy Variables in Econometrics
9 pages
Econometrics II-1
No ratings yet
Econometrics II-1
56 pages
Chapter1 Regression Introduction
No ratings yet
Chapter1 Regression Introduction
8 pages
Chapter1 Regression Introduction
No ratings yet
Chapter1 Regression Introduction
8 pages
Chapter1 Regression Introduction PDF
No ratings yet
Chapter1 Regression Introduction PDF
8 pages
Regression Analysis by Shalabh Sir
No ratings yet
Regression Analysis by Shalabh Sir
278 pages
Presentation G1
No ratings yet
Presentation G1
21 pages
BCF 937 D 486
No ratings yet
BCF 937 D 486
19 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
Topic0 Introduction
No ratings yet
Topic0 Introduction
9 pages
Econometrics
No ratings yet
Econometrics
49 pages
Lecture 08 Dummy Variables
No ratings yet
Lecture 08 Dummy Variables
6 pages
Chapter 1
No ratings yet
Chapter 1
76 pages
Dummy Variable
No ratings yet
Dummy Variable
10 pages
Ees 401 Econometrics II Module
No ratings yet
Ees 401 Econometrics II Module
77 pages
Econometrics For Finance Chapter 5
No ratings yet
Econometrics For Finance Chapter 5
12 pages
Topic 7 Regression (Cont.)
No ratings yet
Topic 7 Regression (Cont.)
47 pages
Biostatistics Mcqs With Key
93% (30)
Biostatistics Mcqs With Key
13 pages
Aczel Business Statistics Solutions Ch8-12
100% (4)
Aczel Business Statistics Solutions Ch8-12
112 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
Business Statistics (Syllabus) (NEP)
No ratings yet
Business Statistics (Syllabus) (NEP)
2 pages
Neyman Pearson Lemma
No ratings yet
Neyman Pearson Lemma
3 pages
Statistics & Econometrics Problems
No ratings yet
Statistics & Econometrics Problems
4 pages
Process Capability For Non Normal Data Notes
No ratings yet
Process Capability For Non Normal Data Notes
17 pages
A Dirichlet Multinomial Mixture Model-Based Approach For Short Text Clustering
No ratings yet
A Dirichlet Multinomial Mixture Model-Based Approach For Short Text Clustering
10 pages
ABB Electric Data (Customer Choice)
0% (2)
ABB Electric Data (Customer Choice)
63 pages
Econometrics Homework Solutions
No ratings yet
Econometrics Homework Solutions
3 pages
1) Workshop - Bias
No ratings yet
1) Workshop - Bias
4 pages
Statistical Data Analysis Summary
No ratings yet
Statistical Data Analysis Summary
44 pages
Experiment 5
100% (1)
Experiment 5
6 pages
Rules Refine The Riddle - Global Explanation For Deep Learning-Based Anomaly Detection in Security Applications
No ratings yet
Rules Refine The Riddle - Global Explanation For Deep Learning-Based Anomaly Detection in Security Applications
15 pages
Solutions Odd For Categorical
No ratings yet
Solutions Odd For Categorical
28 pages
Business Statistics, 4e: by Ken Black
No ratings yet
Business Statistics, 4e: by Ken Black
53 pages
Stats Tools Package OLD
No ratings yet
Stats Tools Package OLD
39 pages
Executions, Deterrence and Homicide - A Tale of Two Cities
No ratings yet
Executions, Deterrence and Homicide - A Tale of Two Cities
54 pages
OMGT2228 Lecture Notes Week 5
No ratings yet
OMGT2228 Lecture Notes Week 5
36 pages
Unit 5
No ratings yet
Unit 5
18 pages
Ch5 - Table of Z Scores
No ratings yet
Ch5 - Table of Z Scores
14 pages
Lecture 9 Moments
No ratings yet
Lecture 9 Moments
29 pages
Machine Learning Transport Analysis
100% (4)
Machine Learning Transport Analysis
42 pages
CA Found Test Paper - Correlation and Regression
No ratings yet
CA Found Test Paper - Correlation and Regression
6 pages
Employee Data
No ratings yet
Employee Data
36 pages
Practice
No ratings yet
Practice
3 pages
5.mean and Variance of Random Variables. Final
No ratings yet
5.mean and Variance of Random Variables. Final
13 pages
TÓM TẮT XSTK
No ratings yet
TÓM TẮT XSTK
39 pages
Levine Smume6 01
100% (1)
Levine Smume6 01
14 pages
3917-Article Text-6993-1-10-20220909
No ratings yet
3917-Article Text-6993-1-10-20220909
9 pages

Chapter8 Regression IndicatorVariables

Uploaded by

Chapter8 Regression IndicatorVariables

Uploaded by

Chapter 8

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

When all explanatory variables are

which is a straight line relationship with intercept  0 and slope 1 .

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

which is a straight-line relationship with intercept (  0   2 ) and slope 1.

group A and group, B, respectively. Thus

2. D2  1, D3  0 if the observation is from group B

3. D2  0, D3  1 if the observation is from group C

The concerned regression model is

With n observations, the model is

2. E  yi / Di1  1, Di 2  0   0  1 : Average salary of a graduate

3. E  yi / Di1  0, Di 2  0   0 : cannot exist

4. E  yi / Di1  1, Di 2  1   0  1   2 : cannot exist.

which is an exact constraint and indicates the contradiction as follows:

Di1  Di 2  1  person is non-graduate

If the intercept term is ignored, then the model becomes

graduate and non-graduate persons, respectively.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

To interpret the model parameters, we proceed as follows:

This is a straight line with intercept  0 and slope 1 . Next

This is a straight line with intercept term (  0   2 ) and slope ( 1  3 ).

has different slopes and different intercept terms.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

is equivalent to fitting two separate regression models corresponding to Di 2  1 and Di 2  0 , i.e.

Indicator variables versus quantitative explanatory variable

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

Regression analysis and analysis of variance

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

The null hypothesis is

Employing method of least squares, we obtain the estimator of  and  i as follows

S    ij2   yij     i 

means are not identical.

The regression model can be rewritten as

- parameters in the analysis of variance model are  , 1 , 2 , 3 .

We establish a relationship between the two sets of parameters.

In case of analysis of variance model, this is represented as

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

- in the analysis of variance model set up,

When treatment 3 is used on j th observation, then

- in the analysis of variance model set up

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

i th treatment and k th treatment.

Regression Analysis | Chapter 8 | Indicator Variables | Shalabh, IIT Kanpur

You might also like