Lecture 4
Multi regression analysis with
qualitative information: dummy variable
Đinh Thị Thanh Bình, PhD
Faculty of International Economics, FTU
Definition
• Quantitative variable: their values are measured with
numbers.
• Qualitative variable: reflex some characteristics of
the subject. Eg. Gender, race of an individual, the
industry of firm.
• To incorporate qualitative factors into regression
models, we have to “transfer” them into numbers
dummy variable
Example
• Female = 1 when the person is female, and female =
0 when the person is male.
• Married = 1 when the person got married, and = 0
otherwise.
• Construction = 1 when the person is working in
construction field, = 0 otherwise.
1. A single dummy independent variable
wage 0 0 female 1educ u (1)
0 E(wage | female 1, educ) E(wage | female 0, educ)
Female = 1 corresponds to females, female = 0
corresponds to male
0 E(wage | female, educ) E(wage | male, educ)
The level of education is the same in both expectations,
the difference, 0, is due to gender only.
Y
men: wage 0 1educ
slope 1
women : wage (0 0 ) 1educ
0
0 0
X
Figure 6.1: Graph of wage 0 0 female 1educ u; 0 0
- Men earn a fixed amount more per hour than women the
intercept is different
- Higher education, higher wage for both men and women
- The slopes are the same as the difference does not depend on the
amount of education.
Note: If one qualitative variable has n characteristics
include only n-1 dummy variables in the regression.
The dummy variable is not included in the model
base group or benchmark group.
E.g.: Gender has 2 characteristics: male and female
use only 1 dummy variable male or female
-If female is the base group, we have the model:
wage 0 0 male 1educ u
- Using 2 dummy variables would introduce perfect
collinearity because female + male = 1, which means that
male is perfect linear function of female.
2. Using multiple dummy variables in the model
- We can include more than 1 dummy variable in the
model:
wage 0 0 female 1married 1educ u (2)
However, an important limitation of this model is that the
effect of “married” on wage is assumed to be the same
for men and women.
- We can overcome this disadvantage by generating 4
groups: married man, married woman, single man, single
woman
-If the base group is single men, the model will be:
wage 0 0 marrmale 1marrfemale 2 sin gfem 1educ u (3)
Note: we have to exclude the variables female and
married from the model
Practice with file WAGE1
- For example, we have the results:
log(wage) 0.321 0.213marrmale 0.198marrfem
0.110sin gfem ....
- The coefficients present the difference in wage
compared with the base group, sing male.
- Married men are estimated to earn about 21.3% more
than single men, holding other factors fixed.
- Single women are estimated to earn 8.8% more than
single men ( =-0.110-(-0.198) = 0.088)
3. Incorporating ordinal information
by using dummy variables
-Ownership of firms
-Qualification of students
-Outside looks
4. Interactions between dummy variables
- Instead of using the model (3)
wage 0 0 marrmale 1marrfemale 2 sin gfem 1educ u (3)
We can generate an interaction variable of 2 dummy
variables:
wage 0 0 female 1married 2 female.married + 1educ u (4)
- The estimated results of 2 models are the same.
5. Interaction between dummy and quantitative
variables
- This interaction, for example, permits to check if the
effect of education on wage is the same for men and
women.
wage 0 0 female 1educ 1 female.educ u
wage (0 0 female) (1 1 female)educ u (5)
-If female = 0, constant coefficient of male is 0 and the
slope is 1
-If female = 1, constant coefficient of female is
0 0 and the slope is 1 1
. 0 Presents the difference of constant coefficient
between male and female.
. 1 Presents the difference of education’s effect on
income of male and female
Case 1: wage (0 0 female) (1 1 female)educ u
0 0, 1 0
- Higher education, higher
wage men wage for both male and
female
- Women have lower
wage then men at all
women levels of education
- The marginal effect of
education on the wage
of men is higher than
0 that of women
Higher education, higher
0 0
gap in wage between
0 educ male and female.
wage (0 0 female) (1 1 female)educ u
0 0, 1 0
-The intercept for women is below that for men, but the
slope on education is larger for women.
- This means that women earn less than men at low levels
of education, but the gap narrow as education increases.
- At some point, a woman earns more than a man.
Case 2: wage (0 0 female) (1 1 female)educ u
0 0, 1 0
- Higher education, higher
wage women wage for both male and
female
- At the lower level of
men education, men have higher
wage than women.
- The marginal effect of
education on wage of
women is higher than that
of men from a particular
0 level of education, women
0 0 have higher wage than
men.
0 educ
Hypothesis test:
Hypothesis 1: Return to education on wage is the same
for male and female.
H 0 : 1 0
- There is no constraint for 0 . It means that it is possible
to have difference in wage of male and female, but the
return to education on wage is the same. (Hình 6.1)
- Use t-test
Hypothesis 2: Wage is the same for both male and female
at different level of education.
H 0 : 0 0, 1 0
- Use F-test
6.5 Ví dụ về ứng dụng sử dụng biến giả
Số liệu tiết kiệm và thu nhập cá nhân ở nước Anh từ
1946-63 (triệu pounds)
TK I Tiết kiệm Thu nhập TK II Tiết kiệm Thu nhập
1946 0.36 8.8 1955 0.59 15.5
1947 0.21 9.4 1956 0.9 16.7
1948 0.08 10 1957 0.95 17.7
1949 0.2 10.6 1958 0.82 18.6
1950 0.1 11 1959 1.04 19.7
1951 0.12 11.9 1960 1.53 21.1
1952 0.41 12.7 1961 1.94 22.8
1953 0.5 13.5 1962 1.75 23.9
1954 0.43 14.3 1963 1.99 25.2
Mục tiêu: Kiểm tra hàm tiết kiệm có thay đổi cấu trúc
giữa 2 thời kỳ hay không.
Cách 1: Lập hai mô hình tiết kiệm ở 2 thời kỳ
- Thời kỳ tái thiết: 1946-54: Yi 1 2 X i u1i
- Thời kỳ hậu tái thiết: 1955-63: Yi 1 2 X i u2i
- Và kiểm định các trường hợp sau
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
Cách 2: Sử dụng biến giả
B1. Lập hàm tiết kiệm tổng quát của cả 2 thời kỳ
Yi ˆ1 ˆ2 X i ˆ3 Zi ˆ4 X i Zi ui
Với n = n1 + n2
Z=1 quan sát thuộc thời kỳ tái thiết
Z=0 quan sát thuộc thời kỳ hậu tái thiết
B2. Kiểm định giả thuyết H0: 3=0
Nếu chấp nhận H0: loại bỏ Z ra khỏi mô hình
B3. Kiểm định giả thuyết H0: 4=0
Nếu chấp nhận H0: loại bỏ ZiXi ra khỏi mô hình
Kết quả hồi quy theo mô hình như sau
Yi 1,75 0,15045 X i 1, 4839Zi 0,1034 X i Zi ui
t= (-5,27) (9,238) (3,155) (-3,109)
p= (0,000) (0,000) (0,007) (0,008)
Yi (1,75 1, 4839Zi ) (0,15045 0,1034Zi ) X i ui
Nhận xét
•Tung độ gốc chênh lệch và hệ số góc chênh lệch
có ý nghĩa thống kê
•Các hồi quy trong hai thời kỳ là khác nhau
Thời kỳ tái thiết: Z = 1
Yˆi 1,75 0,15045 X i 1,4839 0,1034 X i
Yˆi 0,2661 0,0475 X i
Thời kỳ hậu tái thiết: Z = 0
Yˆi 1,75 0,15045 X i
Tiết kiệm Yˆi 1,75 0,15045 X i
Thời kỳ hậu tái thiết
Yˆi 0,2661 0,0475 X i
Thời kỳ tái thiết
Thu nhập
-0.27
-1.75
Hình 6.4 Mô hình hồi quy cho 2 thời kỳ