[go: up one dir, main page]

0% found this document useful (0 votes)
111 views10 pages

QM3 Lecture 2 With Notes

QM3

Uploaded by

Wayne Dorson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views10 pages

QM3 Lecture 2 With Notes

QM3

Uploaded by

Wayne Dorson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1 2

Quantitative Methods 3 (EBS2001), 2016/2017, lecture 2 gender What is your gender?  1 Female  2 Male
Please rate the statements below on a 1 to 7 scale; 1 means “totally disagree” and 7 “totally agree”.
satisfac Overall, I am very satisfied with XYZ 1 2 3 4 5 6 7
Today Preparing Cases 3, 4, 5 & 6 loyal1 For catering services at a party, XYZ will be my first choice 1 2 3 4 5 6 7
- Sharpe Chs. 4, 15–18 (relevant parts) loyal2 In the future, I will make use of XYZ’s services more often 1 2 3 4 5 6 7
loyal3 In the future, I will make use of XYZ’s services less often 1 2 3 4 5 6 7
- Some additional whistles and bells

Three basic examples:


Defining constructs using Cronbach’s alpha
A) A (small) part of Case 3) - Defining constructs using Cronbach’s alpha
- items “loyal1”, “loyal2” and “loyal3” each measure a feature of an underlying construct : “loyalty”
- Regression analysis, Part I)
Þ combine (some of) them into an average
B) Bike theft in Netherlands Regression analysis, Part II)
C) A (small) part of Case 5) Regression analysis, Part III) When deciding which items to use, check two things:

A) Basic example: a (small) part of Case 3) 1) The items should not contain “too many” missing values
- a survey among a sample of XYZ customers, fall 2003 here: OK for all three
- n = 471 respondents, many variables
2) To get a reliable construct, the items entering it should behave “similarly”
- we consider the following subset of 5 variables (i.e. questions):

3 4

2) To get a reliable construct, the items entering it should behave “similarly” ii) Formal analysis: Cronbach’s alpha
SPSS: Analyze > Scale > Reliability Analysis; choose loyal1, loyal2 and loyal3R as Items
i) Informal analysis: a correlation matrix
Case Processing Summary
Correlations
N %
loyal1 loyal2 loyal3
Cases Valid 371 78.8
loyal1 Pearson Correlation 1.000 .843** -.708**
Excludeda 100 21.2
Sig. (2-tailed) . .000 .000
Total 471 100.0
N 394 387 374
a Listwise deletion based on all variables in the procedure.
loyal2 Pearson Correlation .843** 1.000 -.780**
Sig. (2-tailed) .000 . .000 Reliability Statistics
N 387 450 426 Cronbach's Alpha N of Items
loyal3 Pearson Correlation -.708** -.780** 1.000 .916 3
Sig. (2-tailed) .000 .000 .
N 374 426 429
** Correlation is significant at the 0.01 level (2-tailed).
Note: - correlation measures the relation between two items
- alpha measures the agreement between all items
Note: - the correlations are quite high, as it should be
- alpha £ 1 0: items are totally unrelated
- “loyal3” is negatively correlated with the others
1: items overlap fully
Þ reverse scale problem
- rule of thumb: reliable construct when alpha ³ 0.75
Þ Transform > Compute Variable: “loyal3R” = 8 – “loyal3”

Conclusion:
ii) Formal analysis: Cronbach’s alpha use Transform > Compute Variable to define the construct
“loyalty” = ( “loyal1” + “loyal2” + “loyal3R” ) / 3
5 6
Regression analysis, Part I) Þ there seems to be a positive relation,
but not one-to-one
1) Regression: the basic idea
many other factors relevant Þ scatter around a line
2) Inference about individual coefficients: the t-test
3) Using dummies to model qualitative explanatory variables
Þ simple linear regression model:
4) Dummy-interaction models
5) Measuring the fit of regression models y = β0 + β1x + ε (multiple regression
model has k x’s)

- y response variable (here: “loyalty”)


1) Regression: the basic idea - x explanatory variable (here: “satisfac”)
- β0 intercept
Regression: relating a response variable Y to a (number of) explanatory variable(s) X
- β1 slope (if “satisfac” rises by 1 point, “loyalty” is predicted to rise by β1 points)
We expect: more satisfaction (explanatory) ® more loyalty (response)
- ε random error (all influences other than “satisfac”)
Þ check with scatterplot of “loyalty” vs. “satisfac”

SPSS: Graphs > Legacy Dialogs > Scatter/Dot Þ we want to estimate the regression coefficients β0 and β1
select Simple Scatter, click Define
choose “loyalty” as Y, “satisfac” as X

7 8
Þ fit a line through the sample points Model Summary
R Adjusted R Std. Error of
Model R
Square Square the Estimate
ŷ i = b 0 + b 1x i i = 1,…,n
1 .917 .841 .840 .68255
Predictors: (Constant), satisfac
hoping that the estimates b0 and b1 are close to β0 and β1
ANOVA
- intuition: choose estimates such that the vertical distances Model Sum of Squares df Mean Square F Sig.
1 Regression 898.763 1 898.763 1929.193 .000
ei = yi – ŷi (“residuals”) Residual 170.044 365 .466
Total 1068.808 366
become as small as possible Predictors: (Constant), satisfac
Dependent Variable: loyalty
Coefficients
- formalization: choose the estimates such that the
Unstandardized Standardized
sum of squared vertical distances, SSE, Coefficients Coefficients
Model B Std. Error Beta t Sig.
becomes as small as possible 1 (Constant) .918 .106 8.671 .000
n n n satisfac .815 .019 .917 43.923 .000
Þ Minimize SSE = å ei2 = å ( y i - yˆ i )2 = å ( y i - b0 - b1x i )2 Dependent Variable: loyalty
b0 , b1 i =1 i =1 i =1

Þ computers can solve such a minimization problem easily Þ least squares regression line or prediction equation:

loyâlty = 0.918 + 0.815satisfac


SPSS: Analyze > Regression > Linear
choose “loyalty” as Dependent, “satisfac” as Independent - b0 = 0.918
- b1 = 0.815 “if ‘satisfac’ rises by 1 point, we expect ‘loyalty’ to rise by 0.815 points”
- SSE = 170.044 any other line has larger vertical distances !
9 10

2) Inference about individual coefficients: the t-test If… the following regression assumptions are satisfied:

- LS regression line: loyâlty = 0.918 + 0.815satisfac 1) The linearity assumption: the true relation must be linear
Þ check the Linearity condition
“what does this line, fitted on the basis of a sample, tell us about the relationship
between ‘satisfac’ and ‘loyalty’ in the population?” 2) The independence assumption: the errors must be independent of each other
Þ check the Randomization condition
Þ inferential statistics!
3) The equal variance assumption: for all values of x, the errors have the same spread σε
- any other sample would give us different estimates b0 and b1 Þ check the Equal spread condition
Þ we need the sampling distribution of these estimates ! 4) The normality assumption: for all values of x, the errors follow a normal model
- decisive feature: the properties of the error term ε Þ check the Nearly normal (not so critical when the sample size n is large Þ CLT)
and Outlier conditions
y = β0 + β1x + ε
Then… the sampling distribution of the slope coefficient b1 looks as follows:
- the error ε reflects the effect on y of all factors other than x
Þ for any value of x, infinitely many distinct ε-values can occur (b1 - β1 )
~t t-distribution with n–k–1 df
Þ we can describe the behaviour of the error with a probability distribution
SE (b1 )

Þ this distribution has to satisfy some properties, to … - SE (b1 ) , “standard error of b1”, is calculated by SPSS (here: 0.019)
- this sampling distribution acts as the basis for inference !
- check the assumptions with residual analysis (lateron)

11 12

Þ Tool 1: Test statistic for testing H0: β1 = β1 : - the standard normal distribution is centered around 0
0

Þ if H0 is true, we expect a test statistic “close to” 0


point estimate - hypothesiz ed value b1 - β10
t= = Þ values “far below” 0 make H0 suspect ® reject
standard error SE (b1 )
Þ is the actual value of –9.74 “far enough” from 0 ?
if H0 true, t obeys a t-distribution with n–k–1 df Þ “far enough” from 0?
Note: only negative values for the test statistic cast doubt on the null !

Þ Tool 2: 100(1–α)% Confidence Interval for β1 Option a) Critical values: z < –zα ?
point estimate ± critical value * standard error = [b1 ± t α / 2 × SE ( b1 )] (hardly used in QM3) Option b) The P-value: “the lower the P-value, the more evidence against the null! ”

- the mechanics are identical to lecture 1 (slides 14–16)


Example 1
- with such a large z-value, we reject H0 (P-value extremely small)
Null hypothesis: H0: β1 = 1
Alternative hypothesis: HA: β1 < 1 (one-sided)
Example 2
“in the population, loyalty moves one-for-one with satisfaction”
Null hypothesis: H0: β1 = 0
vs. “less than one-for-one”
Alternative hypothesis: HA: β1 > 0 (one-sided)
0 .815 - 1
Þ calculate test statistic: t = = - 9 .74 “in the population, loyalty does not depend on satisfaction”
0 .019
vs. “loyalty depends positively on satisfaction”
- if H0 is true, this is a draw from a t-distribution with n–k–1 = 365 df
- one-sided alternative: never on the basis of the output, always on a priori grounds
- with so many df, the t-distribution effectively equals the z-distribution Þ z = –9.74 - this null is already evaluated by SPSS, but the reported P-value is two-sided
Þ take half of it, provided that the output agrees with the alternative
13 14

3) Using dummies to model qualitative explanatory variables Model 2): loyalty = β0 + β1satisfac + β2female + ε

Model 1): loyalty = β0 + β1satisfac + ε - multiple regression, trivial generalization of simple regression

- new element: qualitative (nominal) variable “gender” SPSS: Analyze > Regression > Linear
genderi = 1 person i is a woman choose “loyalty” as Dependent, “satisfac” and “female” as Independent
genderi = 2 person i is a man
Model Summary
Adjusted R Std. Error of
- we suspect: maybe women are more loyal than men, even when equally satisfied Model R R Square
Square the Estimate
2 .921 .848 .847 .66560
Þ how to include qualitative variables in a regression model? Predictors: (Constant), female, satisfac
ANOVA
- cannot include them directly: their levels are nonnumerical Model Sum of Squares df Mean Square F Sig.
2 Regression 883.730 2 441.865 997.392 .000
- crucial tool: dummy variables, each indicating a single level Residual 158.158 357 .443
Total 1041.889 359
1 if a person has a particular level
Predictors: (Constant), female, satisfac
0 if not Dependent Variable: loyalty
Coefficients
- general rule: if a qualitative variable has k levels, then Unstandardized Standardized
Coefficients Coefficients
- choose an arbitrary base level
Model B Std. Error Beta t Sig.
- include dummies for the remaining k–1 levels 2 (Constant) .794 .113 7.043 .000
satisfac .840 .019 .921 44.663 .000
- here: define a “female”-dummy, men as base level female -.046 .070 -.013 -.652 .515
Dependent Variable: loyalty
SPSS: Transform > Recode into Different Variables

15 16
Fitted line: loyâlty = 0.794 + 0.840satisfac – 0.046female 4) Dummy-interaction models

- men: female = 0 ® loyâlty = 0.794 + 0.840satisfac Fitted Model 2): loyâlty = 0.794 + 0.840satisfac – 0.046female
- women: female = 1 ® loyâlty = 0.794 + 0.840satisfac – 0.046
different intercept: “among women, loyalty is on average 0.046 pts. lower
= 0.748 + 0.840satisfac than among men with the same satisfaction”

same slope: “on average, an extra point of satisfaction increases loyalty by 0.840 points
Þ graphically:
for both men and women”
- parallel lines for both sexes, but with different intercepts
- the vertical distance equals the dummy coefficient, (–)0.046 - we suspect: maybe different slope for men and women
Þ “among women, loyalty is on average 0.046 pts. lower (!) than among men with the same satisfaction” men

men
e.g.:
Note: the intercept difference is not significant (P-value = 0.515)
Û
women women

Þ the “satisfac”-effect may depend on “female”

Þ add an interaction term to the model:


17 18
Model 3): loyalty = β0 + β1satisfac + β2female + β3satisfac *female + ε Fitted line: loyâlty = 0.964 + 0.809satisfac – 0.399female + 0.065satisfac*female

SPSS: Transform > Compute Variable: “inter” = “satisfac” * “female”


- men: female = 0 ® loyâlty = 0.964 + 0.809satisfac
Analyze > Regression > Linear
choose “loyalty” as Dependent, “satisfac”, “female” and “inter” as Independent - women: female = 1 ® loyâlty = 0.964 + 0.809satisfac – 0.399 + 0.065satisfac
= 0.565 + 0.874satisfac
Model Summary
Adjusted Std. Error of
Model R R Square Þ graphically:
R Square the Estimate
3 .922 .849 .848 .66373 - both sexes have different intercepts and different slopes
Predictors: (Constant), inter, satisfac, female
- the intercept difference is created by the dummy coefficient
ANOVA
Model Sum of Squares df Mean Square F Sig. - the slope difference is created by the interaction coefficient
3 Regression 885.055 3 295.018 669.670 .000
Þ it’s like fitting two separate lines !
Residual 156.833 356 .441
Total 1041.889 359
Predictors: (Constant), inter, satisfac, female
Dependent Variable: loyalty Note: both the intercept difference and the slope difference
Coefficients are significant at 10% (the P-values are 0.065 and 0.084 in turn)
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
3 (Constant) .964 .149 6.469 .000
satisfac .809 .026 .887 31.013 .000
female -.399 .216 -.117 -1.852 .065
inter .065 .038 .116 1.734 .084
Dependent Variable: loyalty

19 20

5) Measuring the fit of regression models Here:


explained variation SST - SSE SSE Model 1): loyalty = β0 + β1satisfac + ε
a) Coefficient of determination: R2 = = = 1-
total variation SST SST
- 0 £ R £ 1 , or 0% £ R £ 100%
2 2
Model 2): loyalty = β0 + β1satisfac + β2female + ε
- the proportion of variation in y that is explained by the x’s Model 3): loyalty = β0 + β1satisfac + β2female + β3satisfac *female + ε
- when you add explanatory variables, R2 always rises because SSE falls
Þ not suitable for model building purposes Model n k SSE R2
2
Radj se2 se
Þ extra explanatory variable should “pass a hurdle” 1) 367 1 170.044 0.841 0.840 0.466 0.68255
2) 360 2 158.158 0.848 0.847 0.443 0.66560
SSE /(n - k - 1)
b) Adjusted coeff. of determination: 2
Radj = 1- 3) 360 3 156.833 0.849 0.848 0.441 0.66373
SST /(n - 1)
2
- R “adjusted for degrees of freedom”
- when you add relevant explanatory variables, it rises ® model improves Note:
2
- when you add irrelevant explanatory variables, it falls ® model gets worse - on the basis of Radj (i.e. se), model 3) seems the best
- BUT: missing values for female Þ models 2) and 3) have 7 observations less than model 1)
c) The standard deviation of the residuals and the mean square error
strictly speaking, this invalidates the comparison
SSE SSE
se = se2 = we should re-estimate model 1) for the same 360 respondents and check again
n - k -1 n - k -1
- don’t use the overall F-test for model comparison
point estimate for error standard deviation σε point estimate for error variance σε2
based on spread of residuals based on spread of residuals
2
model with highest Radj = model with lowest se2 or se BREAK
21 22
B) Basic example: bike theft in the Netherlands Regression analysis, Part II)

61 largest Dutch cities (>50,000 inhabitants) in 1999 1) Residual analysis


- “inhab” # of inhabitants 2) The use of natural logarithms in regression models
- “educ” # of institutions of higher education
- “bike” # of stolen bikes reported to the police 1) Residual analysis

Model A): bikei = β0 + β1inhabi + β2educi + εi i = 1,…,61 - regression coefficients have nice properties, provided the four regression assumptions
(slide 10) are roughly satisfied
SPSS : Analyze > Regression > Linear - if seriously violated, a reformulation is needed (e.g. transformations)
choose “bike” as Dependent, “inhab” and “educ” as Independent
- all these assumptions involve the properties of the error
under Save, tick Unstandardized Residuals and Unstandardized Predicted Values (!!!)
Þ verify, using the residuals as observable point estimates
Model Summary
Adjusted Std. Error of
Model R R Square
R Square the Estimate
errors: εi = bikei – β0 – β1inhabi – β2educi i = 1,...,61
A .981 .961 .960 1195.28
residuals: ei = bikei – b0 – b1inhabi – b2educi i = 1,...,61
Predictors: (Constant), educ, inhab
Dependent Variable: bike
Coefficients
- tools: residual plots residuals against x and ŷ ® assumptions 1) and 3)
Unstandardized Standardized SPSS: Graphs > Legacy Dialogs > Scatter
Coefficients Coefficients
Model B Std. Error Beta t Sig. histogram frequency distribution residuals ® assumption 4)
A (Constant) -1246.849 237.395 -5.252 .000 SPSS: Graphs > Legacy Dialogs > Histogram, tick Display normal curve
inhab 4.793E-02 .002 .945 26.229 .000
educ 360.425 258.303 .050 1.395 .168
Dependent Variable: bike - residual analysis also helps to identify outliers (observations well-separated from the others)

23 24

6000
2) The use of natural logarithms in regression models
6000 6000

4000 4000 4000 Model A): slight nonlinearity, nonconstant variance, 2 outliers
Unstandardized Residual

Unstandardized Residual
Unstandardized Residual

2000 2000 2000


Response: use logarithmic transformations
0 0 0

-2000 -2000 -2000 Model B): ln(bikei) = γ0 + γ1ln(inhabi) + γ2educi + εi “natural logarithms”: lnx = elogx, e » 2.718

- model B) is nonlinear in the variables, but linear in the coefficients ® regression is OK


-4000 -4000 -4000

-6000 -6000 -6000

Þ define logarithmic variables “lnbike” and “lninhab”


0 200000 400000 600000 800000 -1 0 1 2 3 4 5 0 10000 20000 30000 40000

INHAB EDUC Unstandardized Predicted Value


SPSS: Transform > Compute Variable, function “LN”
40

1) The linearity assumption Þ run the regression in terms of these new variables
30
weak signs of nonlinearity in the “inhab” and “predicted value” plots Model Summary
R Adjusted Std. Error of
20 Model R
Square R Square the Estimate
3) The equal variance assumption B .985 .970 .969 .11060
10
from left to right in the “inhab” and “predicted value” plots, Predictors: (Constant), educ, lninhab
Dependent Variable: lnbike
the residuals seem to “fan out” Þ nonconstant variance ? 0
-5000.0 -1000.0 3000.0
-3000.0 1000.0 5000.0
Coefficients
4) The normality assumption Unstandardized Standardized
Unstandardized Residual Coefficients Coefficients
the histogram looks reasonably bell-shaped and symmetric Þ OK Model B Std. Error Beta t Sig.
B (Constant) -4.004 .404 -9.907 .000
Outliers lninhab 1.059 .037 .940 29.010 .000
all plots, esp. the histogram, show two extreme residuals: Amsterdam: e1 = 5168, The Hague: e3 = –5242 educ 4.675E-02 .025 .062 1.903 .062
Dependent Variable: lnbike
25 26
.3 .3 .3
Conclusion: - model B) looks OK; “educ” is now marginally significant
.2 .2 .2
- the logarithmic transformations solve the problems in model A)

Unstandardized Residual
Unstandardized Residual

Unstandardized Residual
.1 .1 .1 - slight nonlinearity
0.0 0.0 0.0 - nonconstant variance
-.1
- outliers
-.1 -.1

-.2 -.2 -.2


10.0 11.0 12.0 13.0 14.0 -1 0 1 2 3 4 5 7.0 8.0 9.0 10.0 11.0 How to interpret the coefficients of model B)?
LNINHAB EDUC Unstandardized Predicted Value

Ex.: bike = 2000 ln(bike) = ln(2000) » 7.60


30
ß ß
1) The linearity assumption
bike = 2100 ln(bike) = ln(2100) » 7.65
all three residual plots look like a random scatter Þ OK
20

3) The equal variance assumption so: bike ↑ by 5% Û ln(bike) ↑ by » 0.05


10
horizontal band appearance in all three residual plots Þ OK
general principle: - if lnx goes up by 0.01, x goes up by roughly 1%;
4) The normality assumption 0
-.300 -.200 -.100 .000 .100 .200 .300 - if lnx goes up by 0.05, x goes up by roughly 5%; etc.
the histogram looks bell-shaped and symmetric Þ OK Unstandardized Residual
- only correct for small changes in x
Outliers - only correct for lnx or elogx, not for 10
logx
the two outliers have disappeared “spontaneously”
in Case 6), “log” really means “ln”

27 28
Model B): lnbîke = –4.004 + 1.059lninhab + 0.047educ Special effect ii): testing for proportionality in Model B)

- “educ”-coefficient: if educ ↑ by 1, we expect lnbike ↑ by 0.047 ln(bike ) = γ 0 + γ 1 ln( inhab ) + γ 2educ + ε


“given population size, one extra institution of higher education raises the number of stolen bikes by 4.7%”
interesting special case: γ1 = 1 (proportionality)
Þ “log-level” coefficient γ2 can be interpreted as semi-elasticity 1% more inhabitants ® 1% more stolen bikes
50% more inhabitants ® 50% more stolen bikes
twice as many inhabitants ® twice as many stolen bikes, etc.
- “inhab”-coefficient: if lninhab ↑ by 1, we expect lnbike ↑ by 1.059
if lninhab ↑ by 0.01, we expect lnbike ↑ by 0.01*1.059 = 0.01059 H0: γ1 = 1 1.059 - 1
Þ t= = 1.59 Þ don’t reject at 10%
“given ‘educ’, a 1% increase in the number of inhabitants leads to 1.059% more stolen bikes” vs. HA: γ1 ¹ 1 0.037

Þ if γ1 = 1, we can rewrite Model B) as:


Þ “log-log” coefficient γ1 can be interpreted as elasticity
ln(bike ) = γ 0 + ln(inhab ) + γ 2 educ + ε

Special effect i): rewriting Model B) Û ln(bike ) - ln(inhab ) = γ 0 + γ 2 educ + ε

æ bike ö
Û lnç inhab ÷ = γ 0 + γ 2 educ + ε
rewrite Model B) by taking the antilog on both sides:
è ø
-4.004
ln(biˆke ) = -4.004 + 1.059 ln(inhab ) + 0.047educ Û biˆke = e × inhab1.059 × e0.047 educ
Þ response variable is the ratio of stolen bikes per head
cf. the CD production function in Case 6) ! cf. item f) of Case 6) !
29 30
C) Basic example: a (small) part of Case 5) SPSS: Analyze > Regression > Linear; choose “SM” as Dependent, “clr1” – “clr6” as Independent
Under Options, deselect “Include constant in equation”
- Duographic BV produces custom-printed T-shirts
Model Summary
- T = 48 (weekly) or T = 12 (monthly) observations on six distinct outputs and seven cost drivers Model R
Adjusted Std. Error of
R Square
R Square the Estimate
- we only consider the cost driver “sales orders” here 1 .987 .973 .969 10.424

ANOVA
Model 1): SMt = β1clr 1t + β2clr 2t + β3clr 3t + β4clr 4t + β5clr 5t + β6clr 6t + εt t = 1,…,48 Sum of Mean
Model df F Sig.
Squares Square
- SM : # sales orders per week 1 Regression 165934.357 6 27655.726 254.520 .000
Residual 4563.643 42 108.658
- clr1 : # 1-colour T-shirts produced per week, etc. Total 170498.000 48
- no intercept (see start case for motivation) Dependent Variable: sm

2 2 Coefficients
Þ reported R , Radj and overall F-statistic are misleading (SPSS redefines SST) Unstandardized Standardized
Þ use se for model comparison Coefficients Coefficients
Model B Std. Error Beta t Sig.
Þ t-tests and partial F-tests are still valid 1 clr1 .051 .005 .662 9.342 .000
clr2 .039 .014 .150 2.856 .007
clr3 .032 .017 .081 1.910 .063
clr4 .030 .032 .037 .924 .361
Regression analysis, Part III) clr5 .040 .048 .039 .834 .409
clr6 .051 .022 .111 2.309 .026
1) Collinearity Dependent Variable: sm
2) Time series data: checking for autocorrelation
3) Inference about several coefficients: the partial F-test

31 32

1) Collinearity 2) Time series data: checking for autocorrelation

- if explanatory variables are strongly linearly related Regression assumption 2), slide 10, is very relevant for time series data:
Þ they measure almost the same thing (“overlap”)
2) The independence assumption: the errors must be independent of each other
Þ their individual effects are unstable / estimated imprecisely
Þ check the Randomization condition
Þ high standard errors of their coefficients, low t ratios
Þ the variables seem irrelevant, even when they are relevant
Time series data often suffer from (first-order) autocorrelation :
Seems no problem here (high t ratios!), but check nonetheless - each error is related to (i.e. dependent on) the previous error

Two detection tools discussed in QM2: - formally: εt = φεt -1 + at at an error without autocorrelation

i) Inspect the correlations among the explanatory variables - φ>0: positive autocorrelation - if error is positive, next one tends to be positive also
determine the correlation matrix: potential problems if there are correlations >0.9 (not here) - if error is negative, next one tends to be negative also
Þ successive errors tend to resemble each other
ii) Determine Variance Inflation Factors of all explanatory variables
- φ<0: negative autocorrelation - if error is positive, next one tends to be negative
1
VIF j = j = 1,…,k - if error is negative, next one tends to be positive
1 - R 2j
Þ successive errors tend to mirror each other
- Rj2: the R2 of a regression with Xj as response against all other explanatory variables
- if there is autocorrelation, the usual t- and F-tests are no longer valid
- high VIFj (say: 10 or higher) Þ high Rj2, so Xj must be very similar to other explanatory variables
Þ high collinearity Þ try to detect it!

- the reported VIF-values are misleading in models without intercept Þ again take the residuals as point estimates for the errors
33 34
A graphical detection tool: plot the residuals against their own lagged values 3) Inference about several coefficients: the partial F-test

SPSS: Transform > Compute Variable, function “LAG” ® “lagres” Model 1): SM t = β1clr 1t + β2clr 2t + β3 clr 3 t + β4 clr 4 t + β5 clr 5 t + β6 clr 6 t + ε t
Graphs > Legacy Dialogs > Scatter/Dot
Basic idea: - test reduced model against complete model 1)
- reduced model is special case of complete model
- fit them both, compare their SSE with F-statistic
Model 1):
QM2: the standard case e.g. H0: β4 = 0, β5 = 0
Þ no problem (maybe slightly negative?)
Þ reduced model: (plug H0 into the complete model)

SMt = β1clr 1t + β2clr 2t + β3clr 3t + β6clr 6t + εt

QM3: more subtle cases e.g. H0: β2 = β1, β3 = β1, β4 = β1, β5 = β1, β6 = β1
250 “all cost driver factors are the same”
Counterexample (reported for contrast): 150
looks very different, but still implies a special case!

Unstandardized Residual
mortgaget = δ0 + δ1interest t + εt 50 Þ reduced model 2): (plug H0 into the complete model)
t = 1,…,36 (quarterly)
-50 SM t = β1clr 1t + β1clr 2t + β1clr 3 t + β1clr 4 t + β1clr 5 t + β1clr 6 t + εt
Þ positive autocorrelation !
-150 = β1 (clr 1t + clr 2t + ... + clr 6 t ) + εt
-250
-250 -150 -50 50 150 250
= β1shirtst + εt
LAGRES

35 36

with shirtst = clr 1t + clr 2 t + ... + clr 6 t Intuition: compare the SSE’s of both models 1) and 2)

“for this cost driver, only total output is important” Model 1): “complete model” SSEc = 4563.643 k = 6 explanatory variables (slide 30)

Model 2): “reduced model” SSEr = 4712.327 g = 1 explanatory variables (slide 35)
SPSS: Transform > Compute Variable, define “shirts”
Analyze > Regression > Linear; choose “SM” as Dependent, “shirts” as Independent - always SSEr > SSEc (fewer explanatory variables)
Under Options, deselect “Include constant in equation” - if H0 true, we expect SSEr – SSEc relatively small
Model Summary - if SSEr – SSEc relatively large, H0 is suspect Þ reject
Adjusted Std. Error of
Model R R Square
R Square the Estimate
2 .986 .972 .972 10.013
Formalization: the Partial F Test
ANOVA
Sum of (SSEr - SSEc ) /(k - g ) ( 4712.327 - 4563.643) /(6 - 1)
Model
Squares
df Mean Square F Sig. F= = = 0.274
SSEc /(n - k ) 4563.643 /(48 - 6)
2 Regression 165785.673 1 165785.673 1653.520 .000
Residual 4712.327 47 100.262
Total 170498.000 48 - if H0 is true, the F-statistic has an F-distribution with k – g = 5 numerator-df
Dependent Variable: sm and n – k = 42 » 40 denominator-df
Coefficients
Unstandardized Standardized - note: n – k – 1 denominator-df in models with intercept; n – k without
Coefficients Coefficients
Model B Std. Error Beta t Sig.
general rule: df = # observations – # estimated coefficients
2 shirts .046 .001 .986 40.663 .000
Þ is the actual value of 0.274 “far enough” above 0 ?
Dependent Variable: sm
37
Option a) Critical value: F > Fα ?
E.g. α = 5% Þ F0.05 = 2.45 Þ don’t reject H0
α = 10% Þ F0.10 = 2.00 Þ don’t reject H0

Option b) The P-value


P-value > 10% (exact value: 92.5%)

“if the null were true, there would be much more than 10% chance to obtain
an F-statistic ³ 0.274 ”

Þ don’t reject H0

Conclusion: the simpler model 2) may very well be adequate

Note:
- in model 2), b1 = 0.046 (Þ b2 etc.), similar as in model 1)
- two even trickier null hypotheses later in Case 5)
- don’t mix things up with the overall F-test: (ANOVA-table)

H0: β1 = 0, β2 = 0, β3 = 0, β4 = 0, β5 = 0, β6 = 0 “all cost driver factors are zero”

You might also like