Regression
Explaination relationship (includes causality)
Dependent variable: consumption expenditure (household)
Micro Macro
Factor Variable Data Sign Factor / var
Income Ok Ok + GDP
Price ??? No CPI
Taste No No
Demand No No
Education of ok Ok !!! ?
Size OK Ok + Population
Gender of Ok ?
householder
Age of Ok +/-
Weight
Height
Location Ok Urban/rural
Weather
Occupation / Interest rate
carear
Literature review
Topic
Research question
Review: background theory; economic theory (journal)
Model
Data
Estimate
Test / check
Analyze
Forecast
Turn back to the question:
Conclusion: Summary what you have done
^
w age i=2.23+1.654 ex pi
wag e i=2.23+1.654 ex pi + ei
2.23: estimated intercept: When experience is zero, average wage is
estimated by 2.23 (units)
1.654: estimated slope: When experience increases 1 unit (1 year),
average wage increases by 1.654, cet.par
*NOTE: in this sample of 5 observations only*
A
Value 5 6 8 9
Overall mean 7
Deviation -2 -1 +1 +2
Group Under Grad
Mean of group 5.5 8.5
Group - deviation -1.5 -1.5 1.5 1.5
Within - deviation -0.5 +0.5 -0.5 +0.5
Total sum of squares (TSS) = (-2)2 + (-1)2 + 12 + 22 = 10
Explained SS = (-1.5)2 + (-1.5)2 + .. + .. = 9
Residual SS = (-0.5)2 + … = 1
TSS = Explained SS + Residual SS
2 Explained SS 9
R= = =90 %
Total SS 10
90% of total variation in wage is explained by variation in Graduated
Total variation => Total SS
Between variation => Between SS : Explained SS
Within variation => Within SS: Residual SS
ANOVA Table: Factor => k groups
Source SS Df MS F-stat
Group / Factor: BSS k −1 BMS = BSS/(k – 1) BMS / WMS
Between
Error / Residual WSS n−k WMS = WSS / (n-k)
Within
Total TSS n−1
ANOVA: test for means
In k groups: means are: μ1 , μ 2 , … , μk
H 0 : μ1=μ2=…=μ k : Factor does not affect to the means
H 1 : not H 0: Factor affects to the means
F-test:
BSS/(k−1)
F stat =
WSS /(n−k )
IF F stat > f crit =f ( k−1 ,n−k ) α: Reject H0
P-value of the test: P(F (k−1 , n−k ) > F stat )
Example
n=4 ;Factor: training => k =2 groups
BSS = 9, RSS = 1; TSS = 10
Source SS Df MS F-stat
Training 9 1 9/1 = 9 = 9 / 0.5 = 18
Error 1 2 ½ = 0.5
Total 10 3
Test
H 0 : μGrad =μUnder
F stat =18 ; F crit =f ( 1 ,2) 0.05=¿
P[distribution]( x ¿ => P ( X < x )=F (x)
d[distribution]( x ¿ => f (x)
q[distribution]( β ¿ => x β : P ( X < x β ) =β
Critical value x α : P ( X > x α ) α =q (1−α )
f (1 ,2 ) 0.05=qf ( 0.95 , 1, 2 )=18.512
F stat <18.512 : Not reject Ho => Not enough evidence to say that: Factor
“Graduate” affects to the mean of wage
P−value = P ( F (1 ,2 )>18 )=1−pf ( 18 , 1 ,2 ) =0.0513
Source SS Df MS F-stat
Training 1 1 1/1 = 1 = 1 / 4.5 = 0.22
Error 9 2 9/2 = 4.5
Total 10 3
F stat =0.22 ; F crit =18.512 : Not reject
P−value=P ( F (1 , 2) >0.22 ) =1− pf ( 0.22, 1 , 2 )=0.685
Survey 25 workers in 4 factories, BSS = 24; Residual SS = 14. Test for equality of
means !
Source SS Df MS F-stat
Factories 24 3 24/3 = 8 = 8/0.67 = 12
Error 14 21 14/21 = 0.67
Total 38 24
H 0 : μ1=μ2=μ 3=μ4
F stat =12; F crit =f (3 ,21) 0.05=qf ( 0.95 , 3 , 21 )=3.07
Reject Ho => Means are not equal
P-value = 1- pf(12, 3, 21) = 0.000
2-factor (without interaction)
Source SS Df MS F-stat
Rows 24 4 24/4 = 6 6/3.67 = 1.64
Colums 21 1 21 / 1 = 21 21/3.67 = 5.72
Error 114 31 114/31 = 3.67
Total 37 - 1
H 0 : μR 1=μR 2=…=μ R 5: Factor “row” does not affect
F stat =1.64 ; P−value=1−pf ( 1.64 , 4 , 31 )=0.189
H 0 : μC 1=μC 2 : Factor “column” does not affect
F Stat =5.72 ; P−value=1− pf ( 5.72 ,1 , 31 )=0.02
2-factor (without interaction)
Source SS Df MS F-stat
Rows 24 4 24/4 = 6 6/3.7 = 1.62
Colums 21 1 21 / 1 = 21 21/3.7 = 5.68
Row*Colum 14 4*1 = 4 14 / 4 = 3.5 3.5 / 3.7 = 0.95
n
Error 100 27 100/27 = 3.7
Total 37 - 1
Test for affection of Row
P−value=1− pf ( 1.62 , 4 ,27 )=0.2
Test for affection of Row
P−value=1− pf ( 5.68 , 1 ,27 )=0.024
Test for affection of Row*Column
P−value=1− pf ( 0.95 , 4 , 27 )=0.45
R =0.961: 96.1% of total variation in wage is explained by (model) variation in
2
experience.
23 Jan 2024
Example 2.2
Model
wage=β 0 + β 1 exp+ ε
(a) Test:
H 0 : β 1=0 ; H 1 : β 1 ≠ 0
^β =0 1.6538−0
1
T stat = = =8.6
se ( ^β 1) 0.1923
Critical value: t (n−2) α /2, at 5%: t (3 )0.025= [ R ] qt ( 0.975 , 3 ) =3.18
|T stat|>3.18: reject H0: slope is significant at 5%
P-value ¿ 2 P ( T ( 3 ) >8.6 )=[ R ] 2[1−pt ( 8.6 , 3 ) ]=0.0033
“Slope is statistically significant (p = 0.00165)”
(b) Intercept
H 0 : β 0=0 ; H 1 : β0 ≠ 0
^β =0 2.231−0
0
T stat = = =4.449
se ( ^β ) 0.5015
0
Critical value: t (n−2) α /2, at 5%: t (3 )0.025= [ R ] qt ( 0.975 , 3 ) =3.18
|T stat|>3.18 reject H0: intercept is significant at 5%
P-value ¿ 2 P ( T ( 3 ) >4.449 )=[ R ] 2[1− pt ( 4.449 , 3 ) ]=0.021
(c) Confidence interval (CI) for slope
β 1 ∈ ^β1 ± t ( n−2) α / 2 se ( ^β1 )
( 1.6538 ± 3.18∗0.1923 )=( 1.042; 2.265)
At 95%, when experience increases 1 unit (year), on average, the increase of
wage is in (1.042; 2,265) units
CI 95% of average increase of wage when exp increases 1 unit is ….
Test the hypothesis that when exp increases 1 year, on average, wage increases
less than 2 thousands, and find p-value.
H 0 : β 1=2
H 1 : β1 <2
1.6538−2
T stat = =−1.8
0.1923
Critical value: −t (3 )0.05 =[ R ] =¿−2.35
T stat >−2.35: Not reject H0: hypothesis is not correct.
P-value = P ( T ( 3 ) ←1.8 ) =[ R ] pt (−1.8 , 3 ) =0.085
(d) exp=6=¿ ^
wage=2.231+1.6538∗6=12.1538
√
2
1 ( 6−2.4 )
se ( pred ) =0.4385 1+ + =0.843
n 5.2
CI 95% of predicted value of wage
12.1538 ± 3.18∗0.843=(9.478 ; 14.83)
Call:
lm(formula = wage ~ gen)
Residuals:
1 2 3 4 5
-2.3333 -0.3333 -1.0000 1.0000 2.6667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.0000 1.5635 3.838 0.0312 *
gen 0.3333 2.0184 0.165 0.8793
Dependent Variable: WAGE
Method: Least Squares
Date: 01/23/24 Time: 15:04
Sample: 1 5
Included observations: 5
Variable Coefficient Std. Error t-Statistic Prob.
C 6.000000 1.563472 3.837613 0.0312
GEN 0.333333 2.018434 0.165145 0.8793
R-squared 0.009009 Mean dependent var 6.200000
Adjusted R-squared -0.321321 S.D. dependent var 1.923538
S.E. of regression 2.211083 Akaike info criterion 4.714016
Sum squared resid 14.66667 Schwarz criterion 4.557792
Log likelihood -9.785041 F-statistic 0.027273
Durbin-Watson stat 0.765152 Prob(F-statistic) 0.879331
> year <- c(1,2,2,3,4)
> wage <- c(4,6,5,7,9)
> summary(lm(wage ~ year))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2308 0.5015 4.448 0.02113 *
year 1.6538 0.1923 8.600 0.00331 **
Residual standard error: 0.4385 on 3 degrees of freedom
Multiple R-squared: 0.961, Adjusted R-squared: 0.948
Wag e i=2.23+1.6538 yea r i + ei
^
w age i=2.23+1.6538 yea r i
R-sq = 0.961
In this sample
When year = 0 (staff without experience): average wage is ….
When experience increases 1 (year) => on average,….
96.1% of total variation in wage is explained by model (variation in year).
Without intercept
> summary(lm(wage ~ 0+ year))
Call:
lm(formula = wage ~ 0 + year)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
year 2.4412 0.1795 13.6 0.000169 ***
Residual standard error: 1.047 on 4 degrees of freedom
Multiple R-squared: 0.9788, Adjusted R-squared: 0.973
Wag e i=2.4412 yea r i +e i
^
w age i=2.4412 yea r i
27 February 2024
R : proportion of total variation in dependent variable is explained by model (by
2
variation in all of explanatory variables).
n−1
Adj−R =1−( 1−R )
2 2
n−k −1
n=10
2 10−1
(1) Y <- x, z => R2=0.6 → Adj R =1−( 1−0.6 ) =0.486
n−( 2+1 )
2 10−1
(2) Y <- x, z, w => R =0.65 → Adj R =1−( 1−0.6 ) =0.475
2
n−( 3+1 )
(3) Y <- x, w => R2=0.62
y=β 0 + β 1 x + β 2 z y=β 0 x z
β1 β2
∂y β1 β0 β1 x
β1−1
z
β2
∂x
Constant at any point (at x=x 0 , z =z 0 ¿
y ∂y x x β1−1 β2 x
εx= × β1 β0 β1 x z = β1
∂x y β 0+ β1 x+ β2 z β β
β0 x z 1 2
(at x=x 0 , z =z 0 ¿ constant
β1 β2
y=β 0 x z u
ln y =ln β 0+ β1 ln x+ β2 ln z +ln u
¿
ln y =β0 + β 1 ln x+ β2 ln z+ ε
Log-log model (log-linear model) !!!!
β 1=elasticity
dy dx dz
=β 1 + β 2
y x z
When x increases 1%, on average, y changes approximately β 1 % (cet. par.)
β 1 ∈(0 , 1): y diminishingly increases to x
β 1=1: y is linear on x (constantly increases)
β 1> 1: y increasingly increases to x
Profit: 10 (bil.): increases 1 percent => 10.1 (bil.)
Interest rate: 10 (%): increases 1 percent =>
11% ?
10.1 (%) ? Correct !!!
Increases 1 percent point => 11%
Dep. Is wage
Const 7.659 [0.001] -0.359 [0.994] -2.177 [0.557] -12.39 [0.060]
***
Exp 0.044 [0.936] 0.401 [0.003] 1.960 [0.029]
Exp^2 0.022 [0.516]
Edu 0.832 [0.913] 0.675 [0.030] 1.426 [0.009]
Edu^2 -0.005 [0.985]
Exp*edu -0.115 [0.066]
R-sq 0.543 0.2103 0.723 0.823
Adj R-sq 0.442 0.035 0.661 0.757
F-stat 5.356 [0.029] 1.198 [0.346] 11.76 [0.003] 12.42 [0.002]
With data of Exp, Edu, Wage, Male
Significant level: 5%
Question Answer
1. Mean of wage
2. Sample variance of experience
3. Covariance of exp and edu
4. Correlation of exp and edu = ?
5. Test for correlation between wage and
experience, then test statistic =?
At 5%:
A. Reject Ho, no correlated
B. Reject Ho, correlated
C. Not reject Ho, no correlated
D. Not reject Ho, correlated
Regress wage on edu, intercept included (Model
[1])
6. Estimated intercept = ?
7. Coefficient of determination =?
8. Test for significant of slope
A. Reject Ho, slope is insignificant
B. Reject Ho, slope is significant
C. Not reject Ho, insig.
D. Not reject Ho, sig
Transform the above model into log-log form
(Model [2])
9. Estimated slope =
10. The first fitted value =
11. The first residual =
12. Covariance between estimated intercept and
slope
Adding male into model [2], gain [3]
13. Adjutsted coefficient of determination
14. At 5%, how many coefficient are significant
15. Estimate the difference between male and
female in average wage
Adding experience into model [3], gain [4]
16. The new variable’s coefficient
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
rice=β 0 + β1 ¿ β ¿ 2 income + β 3 nrice+ ε
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.950e+02 2.947e+02 3.037 0.00254 **
size 1.106e+03 7.395e+01 14.951 < 2e-16 ***
income -1.559e-04 1.247e-03 -0.125 0.90058
nrice 2.986e-04 4.204e-03 0.071 0.94341
Var Coef Se Standardized coefficient
Intercept 895
Size 1106 0.637
Income -0.000156 -0.007
nrice 0.000299 0.004
S S S S ¿
ric e =β 1 siz e + β 2 incom e + β3 nric e + ε
> incomes<- (income - mean(income))/sd(income)
> sizes<- (size - mean(size))/sd(size)
> rices<- (rice - mean(rice))/sd(rice)
> nrices<- (nrice - mean(nrice))/sd(nrice)
> summary(lm(rices ~ 0 + sizes + incomes + nrices ))
Call:
lm(formula = rices ~ 0 + sizes + incomes + nrices)
Residuals:
Min 1Q Median 3Q Max
-2.6381 -0.4252 -0.0988 0.3343 7.5584
Coefficients:
Estimate Std. Error t value Pr(>|t|)
sizes 0.636886 0.042547 14.969 <2e-16 ***
incomes -0.007037 0.056232 -0.125 0.900
nrices 0.004236 0.059573 0.071 0.943
Regress wage on exp, edu, male (intercept
included)
Q Test the hypthesis that coeffient of exp
equals unit
A. Reject Ho, hypothesis is correct
B. Reject Ho, hypothesis is incorrect
C. Not reject Ho, hyp. is correct
D. Not reject Ho, hyp. is incorrect
Test the hypthesis that coeffient of exp
differs from unit
A. Reject Ho, hypothesis is correct
B. Reject Ho, hypothesis is incorrect
C. Not reject Ho, hyp. is correct
D. Not reject Ho, hyp. is incorrect
Test the hypothesis that sum of
coefficients of exp and edu differs from
1.5
A, B, C, D
Test the hypothesis that sum of slopes is
equal 3.
A, B, C, D
Add squared of exp into the model.
Test for significant of the new
coefficient
A. Reject Ho, coef. Is sig
B. Reject , is insig
C. Not reject, sig.
D. Not reject, insig.
Test for adding squared of exp and squared
of edu into the model, using F-test
A. Reject Ho, should be added
B. Reject Ho, should not be added
C. Not Reject Ho, should be added
D. Not Reject Ho, should not be added
Firm A Firm B
2022 K = 400; Y = 1000 K = 400; Y = 2000
2023 K = 404; Y = 1010 K = 404; Y = 2010
Absolute effect Δ K =4 ; Δ Y =10 Δ K =4 ; Δ Y =10
ΔY ΔY
=2.5 =2.5
ΔK ΔK
When Capital increases 1
unit => Output increases
2.5 units
ICOR: Increment Capital
Output Ratio = 4/10 = 0.4
Relative effect ΔK ΔK
% Δ K= ( 100 % ) =0.01=1 % % Δ K= ( 100 % ) =0.01=1 %
K K
ΔY ΔY
% ΔY = ( 100 % )=0.01=1 % % ΔY = ( 100 % )=0.005=0.5 %
Y Y
Y % ΔY 1% Y % Δ Y 0.5 %
εK= = =1 εK= = =0.5
% Δ K 1% % ΔK 1%
Y =f ( K ) :continuous
' dY '
Absolute effect: Derivative: Y = → dY =f ( K ) dK
dK
Relative effect
Y %dY dY /Y dY K ' K
εK= = = × =f ( K )
%dY dK / K dK Y Y