0% found this document useful (0 votes)

111 views10 pages

QM3 Lecture 2 With Notes

QM3

Uploaded by

Wayne Dorson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views10 pages

QM3 Lecture 2 With Notes

QM3

Uploaded by

Wayne Dorson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

1 2

Quantitative Methods 3 (EBS2001), 2016/2017, lecture 2 gender What is your gender? 1 Female 2 Male
Please rate the statements below on a 1 to 7 scale; 1 means “totally disagree” and 7 “totally agree”.
satisfac Overall, I am very satisfied with XYZ 1 2 3 4 5 6 7
Today Preparing Cases 3, 4, 5 & 6 loyal1 For catering services at a party, XYZ will be my first choice 1 2 3 4 5 6 7
- Sharpe Chs. 4, 15–18 (relevant parts) loyal2 In the future, I will make use of XYZ’s services more often 1 2 3 4 5 6 7
loyal3 In the future, I will make use of XYZ’s services less often 1 2 3 4 5 6 7
- Some additional whistles and bells

Three basic examples:

Defining constructs using Cronbach’s alpha
A) A (small) part of Case 3) - Defining constructs using Cronbach’s alpha
- items “loyal1”, “loyal2” and “loyal3” each measure a feature of an underlying construct : “loyalty”
- Regression analysis, Part I)
Þ combine (some of) them into an average
B) Bike theft in Netherlands Regression analysis, Part II)
C) A (small) part of Case 5) Regression analysis, Part III) When deciding which items to use, check two things:

A) Basic example: a (small) part of Case 3) 1) The items should not contain “too many” missing values
- a survey among a sample of XYZ customers, fall 2003 here: OK for all three
- n = 471 respondents, many variables
2) To get a reliable construct, the items entering it should behave “similarly”
- we consider the following subset of 5 variables (i.e. questions):

3 4

2) To get a reliable construct, the items entering it should behave “similarly” ii) Formal analysis: Cronbach’s alpha
SPSS: Analyze > Scale > Reliability Analysis; choose loyal1, loyal2 and loyal3R as Items
i) Informal analysis: a correlation matrix
Case Processing Summary
Correlations
N %
loyal1 loyal2 loyal3
Cases Valid 371 78.8
loyal1 Pearson Correlation 1.000 .843** -.708**
Excludeda 100 21.2
Sig. (2-tailed) . .000 .000
Total 471 100.0
N 394 387 374
a Listwise deletion based on all variables in the procedure.
loyal2 Pearson Correlation .843** 1.000 -.780**
Sig. (2-tailed) .000 . .000 Reliability Statistics
N 387 450 426 Cronbach's Alpha N of Items
loyal3 Pearson Correlation -.708** -.780** 1.000 .916 3
Sig. (2-tailed) .000 .000 .
N 374 426 429
** Correlation is significant at the 0.01 level (2-tailed).
Note: - correlation measures the relation between two items
- alpha measures the agreement between all items
Note: - the correlations are quite high, as it should be
- alpha £ 1 0: items are totally unrelated
- “loyal3” is negatively correlated with the others
1: items overlap fully
Þ reverse scale problem
- rule of thumb: reliable construct when alpha ³ 0.75
Þ Transform > Compute Variable: “loyal3R” = 8 – “loyal3”

Conclusion:
ii) Formal analysis: Cronbach’s alpha use Transform > Compute Variable to define the construct
“loyalty” = ( “loyal1” + “loyal2” + “loyal3R” ) / 3
5 6
Regression analysis, Part I) Þ there seems to be a positive relation,
but not one-to-one
1) Regression: the basic idea
many other factors relevant Þ scatter around a line
2) Inference about individual coefficients: the t-test
3) Using dummies to model qualitative explanatory variables
Þ simple linear regression model:
4) Dummy-interaction models
5) Measuring the fit of regression models y = β0 + β1x + ε (multiple regression
model has k x’s)

- y response variable (here: “loyalty”)

1) Regression: the basic idea - x explanatory variable (here: “satisfac”)
- β0 intercept
Regression: relating a response variable Y to a (number of) explanatory variable(s) X
- β1 slope (if “satisfac” rises by 1 point, “loyalty” is predicted to rise by β1 points)
We expect: more satisfaction (explanatory) ® more loyalty (response)
- ε random error (all influences other than “satisfac”)
Þ check with scatterplot of “loyalty” vs. “satisfac”

SPSS: Graphs > Legacy Dialogs > Scatter/Dot Þ we want to estimate the regression coefficients β0 and β1
select Simple Scatter, click Define
choose “loyalty” as Y, “satisfac” as X

7 8
Þ fit a line through the sample points Model Summary
R Adjusted R Std. Error of
Model R
Square Square the Estimate
ŷ i = b 0 + b 1x i i = 1,…,n
1 .917 .841 .840 .68255
Predictors: (Constant), satisfac
hoping that the estimates b0 and b1 are close to β0 and β1
ANOVA
- intuition: choose estimates such that the vertical distances Model Sum of Squares df Mean Square F Sig.
1 Regression 898.763 1 898.763 1929.193 .000
ei = yi – ŷi (“residuals”) Residual 170.044 365 .466
Total 1068.808 366
become as small as possible Predictors: (Constant), satisfac
Dependent Variable: loyalty
Coefficients
- formalization: choose the estimates such that the
Unstandardized Standardized
sum of squared vertical distances, SSE, Coefficients Coefficients
Model B Std. Error Beta t Sig.
becomes as small as possible 1 (Constant) .918 .106 8.671 .000
n n n satisfac .815 .019 .917 43.923 .000
Þ Minimize SSE = å ei2 = å ( y i - yˆ i )2 = å ( y i - b0 - b1x i )2 Dependent Variable: loyalty
b0 , b1 i =1 i =1 i =1

Þ computers can solve such a minimization problem easily Þ least squares regression line or prediction equation:

loyâlty = 0.918 + 0.815satisfac

SPSS: Analyze > Regression > Linear
choose “loyalty” as Dependent, “satisfac” as Independent - b0 = 0.918
- b1 = 0.815 “if ‘satisfac’ rises by 1 point, we expect ‘loyalty’ to rise by 0.815 points”
- SSE = 170.044 any other line has larger vertical distances !
9 10

2) Inference about individual coefficients: the t-test If… the following regression assumptions are satisfied:

- LS regression line: loyâlty = 0.918 + 0.815satisfac 1) The linearity assumption: the true relation must be linear
Þ check the Linearity condition
“what does this line, fitted on the basis of a sample, tell us about the relationship
between ‘satisfac’ and ‘loyalty’ in the population?” 2) The independence assumption: the errors must be independent of each other
Þ check the Randomization condition
Þ inferential statistics!
3) The equal variance assumption: for all values of x, the errors have the same spread σε
- any other sample would give us different estimates b0 and b1 Þ check the Equal spread condition
Þ we need the sampling distribution of these estimates ! 4) The normality assumption: for all values of x, the errors follow a normal model
- decisive feature: the properties of the error term ε Þ check the Nearly normal (not so critical when the sample size n is large Þ CLT)
and Outlier conditions
y = β0 + β1x + ε
Then… the sampling distribution of the slope coefficient b1 looks as follows:
- the error ε reflects the effect on y of all factors other than x
Þ for any value of x, infinitely many distinct ε-values can occur (b1 - β1 )
~t t-distribution with n–k–1 df
Þ we can describe the behaviour of the error with a probability distribution
SE (b1 )

Þ this distribution has to satisfy some properties, to … - SE (b1 ) , “standard error of b1”, is calculated by SPSS (here: 0.019)
- this sampling distribution acts as the basis for inference !
- check the assumptions with residual analysis (lateron)

11 12

Þ Tool 1: Test statistic for testing H0: β1 = β1 : - the standard normal distribution is centered around 0
0

Þ if H0 is true, we expect a test statistic “close to” 0

point estimate - hypothesiz ed value b1 - β10
t= = Þ values “far below” 0 make H0 suspect ® reject
standard error SE (b1 )
Þ is the actual value of –9.74 “far enough” from 0 ?
if H0 true, t obeys a t-distribution with n–k–1 df Þ “far enough” from 0?
Note: only negative values for the test statistic cast doubt on the null !

Þ Tool 2: 100(1–α)% Confidence Interval for β1 Option a) Critical values: z < –zα ?
point estimate ± critical value * standard error = [b1 ± t α / 2 × SE ( b1 )] (hardly used in QM3) Option b) The P-value: “the lower the P-value, the more evidence against the null! ”

- the mechanics are identical to lecture 1 (slides 14–16)

Example 1
- with such a large z-value, we reject H0 (P-value extremely small)
Null hypothesis: H0: β1 = 1
Alternative hypothesis: HA: β1 < 1 (one-sided)
Example 2
“in the population, loyalty moves one-for-one with satisfaction”
Null hypothesis: H0: β1 = 0
vs. “less than one-for-one”
Alternative hypothesis: HA: β1 > 0 (one-sided)
0 .815 - 1
Þ calculate test statistic: t = = - 9 .74 “in the population, loyalty does not depend on satisfaction”
0 .019
vs. “loyalty depends positively on satisfaction”
- if H0 is true, this is a draw from a t-distribution with n–k–1 = 365 df
- one-sided alternative: never on the basis of the output, always on a priori grounds
- with so many df, the t-distribution effectively equals the z-distribution Þ z = –9.74 - this null is already evaluated by SPSS, but the reported P-value is two-sided
Þ take half of it, provided that the output agrees with the alternative
13 14

3) Using dummies to model qualitative explanatory variables Model 2): loyalty = β0 + β1satisfac + β2female + ε

Model 1): loyalty = β0 + β1satisfac + ε - multiple regression, trivial generalization of simple regression

- new element: qualitative (nominal) variable “gender” SPSS: Analyze > Regression > Linear
genderi = 1 person i is a woman choose “loyalty” as Dependent, “satisfac” and “female” as Independent
genderi = 2 person i is a man
Model Summary
Adjusted R Std. Error of
- we suspect: maybe women are more loyal than men, even when equally satisfied Model R R Square
Square the Estimate
2 .921 .848 .847 .66560
Þ how to include qualitative variables in a regression model? Predictors: (Constant), female, satisfac
ANOVA
- cannot include them directly: their levels are nonnumerical Model Sum of Squares df Mean Square F Sig.
2 Regression 883.730 2 441.865 997.392 .000
- crucial tool: dummy variables, each indicating a single level Residual 158.158 357 .443
Total 1041.889 359
1 if a person has a particular level
Predictors: (Constant), female, satisfac
0 if not Dependent Variable: loyalty
Coefficients
- general rule: if a qualitative variable has k levels, then Unstandardized Standardized
Coefficients Coefficients
- choose an arbitrary base level
Model B Std. Error Beta t Sig.
- include dummies for the remaining k–1 levels 2 (Constant) .794 .113 7.043 .000
satisfac .840 .019 .921 44.663 .000
- here: define a “female”-dummy, men as base level female -.046 .070 -.013 -.652 .515
Dependent Variable: loyalty
SPSS: Transform > Recode into Different Variables

15 16
Fitted line: loyâlty = 0.794 + 0.840satisfac – 0.046female 4) Dummy-interaction models

- men: female = 0 ® loyâlty = 0.794 + 0.840satisfac Fitted Model 2): loyâlty = 0.794 + 0.840satisfac – 0.046female
- women: female = 1 ® loyâlty = 0.794 + 0.840satisfac – 0.046
different intercept: “among women, loyalty is on average 0.046 pts. lower
= 0.748 + 0.840satisfac than among men with the same satisfaction”

same slope: “on average, an extra point of satisfaction increases loyalty by 0.840 points
Þ graphically:
for both men and women”
- parallel lines for both sexes, but with different intercepts
- the vertical distance equals the dummy coefficient, (–)0.046 - we suspect: maybe different slope for men and women
Þ “among women, loyalty is on average 0.046 pts. lower (!) than among men with the same satisfaction” men

men
e.g.:
Note: the intercept difference is not significant (P-value = 0.515)
Û
women women

Þ the “satisfac”-effect may depend on “female”

Þ add an interaction term to the model:

17 18
Model 3): loyalty = β0 + β1satisfac + β2female + β3satisfac *female + ε Fitted line: loyâlty = 0.964 + 0.809satisfac – 0.399female + 0.065satisfac*female

SPSS: Transform > Compute Variable: “inter” = “satisfac” * “female”

- men: female = 0 ® loyâlty = 0.964 + 0.809satisfac
Analyze > Regression > Linear
choose “loyalty” as Dependent, “satisfac”, “female” and “inter” as Independent - women: female = 1 ® loyâlty = 0.964 + 0.809satisfac – 0.399 + 0.065satisfac
= 0.565 + 0.874satisfac
Model Summary
Adjusted Std. Error of
Model R R Square Þ graphically:
R Square the Estimate
3 .922 .849 .848 .66373 - both sexes have different intercepts and different slopes
Predictors: (Constant), inter, satisfac, female
- the intercept difference is created by the dummy coefficient
ANOVA
Model Sum of Squares df Mean Square F Sig. - the slope difference is created by the interaction coefficient
3 Regression 885.055 3 295.018 669.670 .000
Þ it’s like fitting two separate lines !
Residual 156.833 356 .441
Total 1041.889 359
Predictors: (Constant), inter, satisfac, female
Dependent Variable: loyalty Note: both the intercept difference and the slope difference
Coefficients are significant at 10% (the P-values are 0.065 and 0.084 in turn)
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
3 (Constant) .964 .149 6.469 .000
satisfac .809 .026 .887 31.013 .000
female -.399 .216 -.117 -1.852 .065
inter .065 .038 .116 1.734 .084
Dependent Variable: loyalty

19 20

5) Measuring the fit of regression models Here:

explained variation SST - SSE SSE Model 1): loyalty = β0 + β1satisfac + ε
a) Coefficient of determination: R2 = = = 1-
total variation SST SST
- 0 £ R £ 1 , or 0% £ R £ 100%
2 2
Model 2): loyalty = β0 + β1satisfac + β2female + ε
- the proportion of variation in y that is explained by the x’s Model 3): loyalty = β0 + β1satisfac + β2female + β3satisfac *female + ε
- when you add explanatory variables, R2 always rises because SSE falls
Þ not suitable for model building purposes Model n k SSE R2
2
Radj se2 se
Þ extra explanatory variable should “pass a hurdle” 1) 367 1 170.044 0.841 0.840 0.466 0.68255
2) 360 2 158.158 0.848 0.847 0.443 0.66560
SSE /(n - k - 1)
b) Adjusted coeff. of determination: 2
Radj = 1- 3) 360 3 156.833 0.849 0.848 0.441 0.66373
SST /(n - 1)
2
- R “adjusted for degrees of freedom”
- when you add relevant explanatory variables, it rises ® model improves Note:
2
- when you add irrelevant explanatory variables, it falls ® model gets worse - on the basis of Radj (i.e. se), model 3) seems the best
- BUT: missing values for female Þ models 2) and 3) have 7 observations less than model 1)
c) The standard deviation of the residuals and the mean square error
strictly speaking, this invalidates the comparison
SSE SSE
se = se2 = we should re-estimate model 1) for the same 360 respondents and check again
n - k -1 n - k -1
- don’t use the overall F-test for model comparison
point estimate for error standard deviation σε point estimate for error variance σε2
based on spread of residuals based on spread of residuals
2
model with highest Radj = model with lowest se2 or se BREAK
21 22
B) Basic example: bike theft in the Netherlands Regression analysis, Part II)

61 largest Dutch cities (>50,000 inhabitants) in 1999 1) Residual analysis

- “inhab” # of inhabitants 2) The use of natural logarithms in regression models
- “educ” # of institutions of higher education
- “bike” # of stolen bikes reported to the police 1) Residual analysis

Model A): bikei = β0 + β1inhabi + β2educi + εi i = 1,…,61 - regression coefficients have nice properties, provided the four regression assumptions
(slide 10) are roughly satisfied
SPSS : Analyze > Regression > Linear - if seriously violated, a reformulation is needed (e.g. transformations)
choose “bike” as Dependent, “inhab” and “educ” as Independent
- all these assumptions involve the properties of the error
under Save, tick Unstandardized Residuals and Unstandardized Predicted Values (!!!)
Þ verify, using the residuals as observable point estimates
Model Summary
Adjusted Std. Error of
Model R R Square
R Square the Estimate
errors: εi = bikei – β0 – β1inhabi – β2educi i = 1,...,61
A .981 .961 .960 1195.28
residuals: ei = bikei – b0 – b1inhabi – b2educi i = 1,...,61
Predictors: (Constant), educ, inhab
Dependent Variable: bike
Coefficients
- tools: residual plots residuals against x and ŷ ® assumptions 1) and 3)
Unstandardized Standardized SPSS: Graphs > Legacy Dialogs > Scatter
Coefficients Coefficients
Model B Std. Error Beta t Sig. histogram frequency distribution residuals ® assumption 4)
A (Constant) -1246.849 237.395 -5.252 .000 SPSS: Graphs > Legacy Dialogs > Histogram, tick Display normal curve
inhab 4.793E-02 .002 .945 26.229 .000
educ 360.425 258.303 .050 1.395 .168
Dependent Variable: bike - residual analysis also helps to identify outliers (observations well-separated from the others)

23 24

6000
2) The use of natural logarithms in regression models
6000 6000

4000 4000 4000 Model A): slight nonlinearity, nonconstant variance, 2 outliers
Unstandardized Residual

Unstandardized Residual
Unstandardized Residual

2000 2000 2000

Response: use logarithmic transformations
0 0 0

-2000 -2000 -2000 Model B): ln(bikei) = γ0 + γ1ln(inhabi) + γ2educi + εi “natural logarithms”: lnx = elogx, e » 2.718

- model B) is nonlinear in the variables, but linear in the coefficients ® regression is OK

-4000 -4000 -4000

-6000 -6000 -6000

Þ define logarithmic variables “lnbike” and “lninhab”

0 200000 400000 600000 800000 -1 0 1 2 3 4 5 0 10000 20000 30000 40000

INHAB EDUC Unstandardized Predicted Value

SPSS: Transform > Compute Variable, function “LN”
40

1) The linearity assumption Þ run the regression in terms of these new variables
30
weak signs of nonlinearity in the “inhab” and “predicted value” plots Model Summary
R Adjusted Std. Error of
20 Model R
Square R Square the Estimate
3) The equal variance assumption B .985 .970 .969 .11060
10
from left to right in the “inhab” and “predicted value” plots, Predictors: (Constant), educ, lninhab
Dependent Variable: lnbike
the residuals seem to “fan out” Þ nonconstant variance ? 0
-5000.0 -1000.0 3000.0
-3000.0 1000.0 5000.0
Coefficients
4) The normality assumption Unstandardized Standardized
Unstandardized Residual Coefficients Coefficients
the histogram looks reasonably bell-shaped and symmetric Þ OK Model B Std. Error Beta t Sig.
B (Constant) -4.004 .404 -9.907 .000
Outliers lninhab 1.059 .037 .940 29.010 .000
all plots, esp. the histogram, show two extreme residuals: Amsterdam: e1 = 5168, The Hague: e3 = –5242 educ 4.675E-02 .025 .062 1.903 .062
Dependent Variable: lnbike
25 26
.3 .3 .3
Conclusion: - model B) looks OK; “educ” is now marginally significant
.2 .2 .2
- the logarithmic transformations solve the problems in model A)

Unstandardized Residual
Unstandardized Residual

Unstandardized Residual
.1 .1 .1 - slight nonlinearity
0.0 0.0 0.0 - nonconstant variance
-.1
- outliers
-.1 -.1

-.2 -.2 -.2

10.0 11.0 12.0 13.0 14.0 -1 0 1 2 3 4 5 7.0 8.0 9.0 10.0 11.0 How to interpret the coefficients of model B)?
LNINHAB EDUC Unstandardized Predicted Value

Ex.: bike = 2000 ln(bike) = ln(2000) » 7.60

30
ß ß
1) The linearity assumption
bike = 2100 ln(bike) = ln(2100) » 7.65
all three residual plots look like a random scatter Þ OK
20

3) The equal variance assumption so: bike ↑ by 5% Û ln(bike) ↑ by » 0.05

10
horizontal band appearance in all three residual plots Þ OK
general principle: - if lnx goes up by 0.01, x goes up by roughly 1%;
4) The normality assumption 0
-.300 -.200 -.100 .000 .100 .200 .300 - if lnx goes up by 0.05, x goes up by roughly 5%; etc.
the histogram looks bell-shaped and symmetric Þ OK Unstandardized Residual
- only correct for small changes in x
Outliers - only correct for lnx or elogx, not for 10
logx
the two outliers have disappeared “spontaneously”
in Case 6), “log” really means “ln”

27 28
Model B): lnbîke = –4.004 + 1.059lninhab + 0.047educ Special effect ii): testing for proportionality in Model B)

- “educ”-coefficient: if educ ↑ by 1, we expect lnbike ↑ by 0.047 ln(bike ) = γ 0 + γ 1 ln( inhab ) + γ 2educ + ε

“given population size, one extra institution of higher education raises the number of stolen bikes by 4.7%”
interesting special case: γ1 = 1 (proportionality)
Þ “log-level” coefficient γ2 can be interpreted as semi-elasticity 1% more inhabitants ® 1% more stolen bikes
50% more inhabitants ® 50% more stolen bikes
twice as many inhabitants ® twice as many stolen bikes, etc.
- “inhab”-coefficient: if lninhab ↑ by 1, we expect lnbike ↑ by 1.059
if lninhab ↑ by 0.01, we expect lnbike ↑ by 0.01*1.059 = 0.01059 H0: γ1 = 1 1.059 - 1
Þ t= = 1.59 Þ don’t reject at 10%
“given ‘educ’, a 1% increase in the number of inhabitants leads to 1.059% more stolen bikes” vs. HA: γ1 ¹ 1 0.037

Þ if γ1 = 1, we can rewrite Model B) as:

Þ “log-log” coefficient γ1 can be interpreted as elasticity
ln(bike ) = γ 0 + ln(inhab ) + γ 2 educ + ε

Special effect i): rewriting Model B) Û ln(bike ) - ln(inhab ) = γ 0 + γ 2 educ + ε

æ bike ö
Û lnç inhab ÷ = γ 0 + γ 2 educ + ε
rewrite Model B) by taking the antilog on both sides:
è ø
-4.004
ln(biˆke ) = -4.004 + 1.059 ln(inhab ) + 0.047educ Û biˆke = e × inhab1.059 × e0.047 educ
Þ response variable is the ratio of stolen bikes per head
cf. the CD production function in Case 6) ! cf. item f) of Case 6) !
29 30
C) Basic example: a (small) part of Case 5) SPSS: Analyze > Regression > Linear; choose “SM” as Dependent, “clr1” – “clr6” as Independent
Under Options, deselect “Include constant in equation”
- Duographic BV produces custom-printed T-shirts
Model Summary
- T = 48 (weekly) or T = 12 (monthly) observations on six distinct outputs and seven cost drivers Model R
Adjusted Std. Error of
R Square
R Square the Estimate
- we only consider the cost driver “sales orders” here 1 .987 .973 .969 10.424

ANOVA
Model 1): SMt = β1clr 1t + β2clr 2t + β3clr 3t + β4clr 4t + β5clr 5t + β6clr 6t + εt t = 1,…,48 Sum of Mean
Model df F Sig.
Squares Square
- SM : # sales orders per week 1 Regression 165934.357 6 27655.726 254.520 .000
Residual 4563.643 42 108.658
- clr1 : # 1-colour T-shirts produced per week, etc. Total 170498.000 48
- no intercept (see start case for motivation) Dependent Variable: sm

2 2 Coefficients
Þ reported R , Radj and overall F-statistic are misleading (SPSS redefines SST) Unstandardized Standardized
Þ use se for model comparison Coefficients Coefficients
Model B Std. Error Beta t Sig.
Þ t-tests and partial F-tests are still valid 1 clr1 .051 .005 .662 9.342 .000
clr2 .039 .014 .150 2.856 .007
clr3 .032 .017 .081 1.910 .063
clr4 .030 .032 .037 .924 .361
Regression analysis, Part III) clr5 .040 .048 .039 .834 .409
clr6 .051 .022 .111 2.309 .026
1) Collinearity Dependent Variable: sm
2) Time series data: checking for autocorrelation
3) Inference about several coefficients: the partial F-test

31 32

1) Collinearity 2) Time series data: checking for autocorrelation

- if explanatory variables are strongly linearly related Regression assumption 2), slide 10, is very relevant for time series data:
Þ they measure almost the same thing (“overlap”)
2) The independence assumption: the errors must be independent of each other
Þ their individual effects are unstable / estimated imprecisely
Þ check the Randomization condition
Þ high standard errors of their coefficients, low t ratios
Þ the variables seem irrelevant, even when they are relevant
Time series data often suffer from (first-order) autocorrelation :
Seems no problem here (high t ratios!), but check nonetheless - each error is related to (i.e. dependent on) the previous error

Two detection tools discussed in QM2: - formally: εt = φεt -1 + at at an error without autocorrelation

i) Inspect the correlations among the explanatory variables - φ>0: positive autocorrelation - if error is positive, next one tends to be positive also
determine the correlation matrix: potential problems if there are correlations >0.9 (not here) - if error is negative, next one tends to be negative also
Þ successive errors tend to resemble each other
ii) Determine Variance Inflation Factors of all explanatory variables
- φ<0: negative autocorrelation - if error is positive, next one tends to be negative
1
VIF j = j = 1,…,k - if error is negative, next one tends to be positive
1 - R 2j
Þ successive errors tend to mirror each other
- Rj2: the R2 of a regression with Xj as response against all other explanatory variables
- if there is autocorrelation, the usual t- and F-tests are no longer valid
- high VIFj (say: 10 or higher) Þ high Rj2, so Xj must be very similar to other explanatory variables
Þ high collinearity Þ try to detect it!

- the reported VIF-values are misleading in models without intercept Þ again take the residuals as point estimates for the errors
33 34
A graphical detection tool: plot the residuals against their own lagged values 3) Inference about several coefficients: the partial F-test

SPSS: Transform > Compute Variable, function “LAG” ® “lagres” Model 1): SM t = β1clr 1t + β2clr 2t + β3 clr 3 t + β4 clr 4 t + β5 clr 5 t + β6 clr 6 t + ε t
Graphs > Legacy Dialogs > Scatter/Dot
Basic idea: - test reduced model against complete model 1)
- reduced model is special case of complete model
- fit them both, compare their SSE with F-statistic
Model 1):
QM2: the standard case e.g. H0: β4 = 0, β5 = 0
Þ no problem (maybe slightly negative?)
Þ reduced model: (plug H0 into the complete model)

SMt = β1clr 1t + β2clr 2t + β3clr 3t + β6clr 6t + εt

QM3: more subtle cases e.g. H0: β2 = β1, β3 = β1, β4 = β1, β5 = β1, β6 = β1
250 “all cost driver factors are the same”
Counterexample (reported for contrast): 150
looks very different, but still implies a special case!

Unstandardized Residual
mortgaget = δ0 + δ1interest t + εt 50 Þ reduced model 2): (plug H0 into the complete model)
t = 1,…,36 (quarterly)
-50 SM t = β1clr 1t + β1clr 2t + β1clr 3 t + β1clr 4 t + β1clr 5 t + β1clr 6 t + εt
Þ positive autocorrelation !
-150 = β1 (clr 1t + clr 2t + ... + clr 6 t ) + εt
-250
-250 -150 -50 50 150 250
= β1shirtst + εt
LAGRES

35 36

with shirtst = clr 1t + clr 2 t + ... + clr 6 t Intuition: compare the SSE’s of both models 1) and 2)

“for this cost driver, only total output is important” Model 1): “complete model” SSEc = 4563.643 k = 6 explanatory variables (slide 30)

Model 2): “reduced model” SSEr = 4712.327 g = 1 explanatory variables (slide 35)
SPSS: Transform > Compute Variable, define “shirts”
Analyze > Regression > Linear; choose “SM” as Dependent, “shirts” as Independent - always SSEr > SSEc (fewer explanatory variables)
Under Options, deselect “Include constant in equation” - if H0 true, we expect SSEr – SSEc relatively small
Model Summary - if SSEr – SSEc relatively large, H0 is suspect Þ reject
Adjusted Std. Error of
Model R R Square
R Square the Estimate
2 .986 .972 .972 10.013
Formalization: the Partial F Test
ANOVA
Sum of (SSEr - SSEc ) /(k - g ) ( 4712.327 - 4563.643) /(6 - 1)
Model
Squares
df Mean Square F Sig. F= = = 0.274
SSEc /(n - k ) 4563.643 /(48 - 6)
2 Regression 165785.673 1 165785.673 1653.520 .000
Residual 4712.327 47 100.262
Total 170498.000 48 - if H0 is true, the F-statistic has an F-distribution with k – g = 5 numerator-df
Dependent Variable: sm and n – k = 42 » 40 denominator-df
Coefficients
Unstandardized Standardized - note: n – k – 1 denominator-df in models with intercept; n – k without
Coefficients Coefficients
Model B Std. Error Beta t Sig.
general rule: df = # observations – # estimated coefficients
2 shirts .046 .001 .986 40.663 .000
Þ is the actual value of 0.274 “far enough” above 0 ?
Dependent Variable: sm
37
Option a) Critical value: F > Fα ?
E.g. α = 5% Þ F0.05 = 2.45 Þ don’t reject H0
α = 10% Þ F0.10 = 2.00 Þ don’t reject H0

Option b) The P-value

P-value > 10% (exact value: 92.5%)

“if the null were true, there would be much more than 10% chance to obtain
an F-statistic ³ 0.274 ”

Þ don’t reject H0

Conclusion: the simpler model 2) may very well be adequate

Note:
- in model 2), b1 = 0.046 (Þ b2 etc.), similar as in model 1)
- two even trickier null hypotheses later in Case 5)
- don’t mix things up with the overall F-test: (ANOVA-table)

H0: β1 = 0, β2 = 0, β3 = 0, β4 = 0, β5 = 0, β6 = 0 “all cost driver factors are zero”

OLS Assumptions & Issues Guide
No ratings yet
OLS Assumptions & Issues Guide
4 pages
Lecture10 Regression2 TS PDF
No ratings yet
Lecture10 Regression2 TS PDF
22 pages
Stats B
No ratings yet
Stats B
75 pages
Multivariate Statistics Introduction
No ratings yet
Multivariate Statistics Introduction
20 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Lecture 3 SLR - 2
No ratings yet
Lecture 3 SLR - 2
29 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Chapter 12 11
No ratings yet
Chapter 12 11
15 pages
Statistical Inference in Regression
No ratings yet
Statistical Inference in Regression
30 pages
Lecture 3 SLR - 2
No ratings yet
Lecture 3 SLR - 2
29 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Basic Econometrics: M. Hashem Pesaran Paper 3, Lent Term
No ratings yet
Basic Econometrics: M. Hashem Pesaran Paper 3, Lent Term
25 pages
Chapter 14
No ratings yet
Chapter 14
65 pages
Statistical Analysis Techniques
No ratings yet
Statistical Analysis Techniques
3 pages
Week 13
No ratings yet
Week 13
25 pages
Intro to Linear Regression
No ratings yet
Intro to Linear Regression
22 pages
Chapter 5 Statistics
No ratings yet
Chapter 5 Statistics
47 pages
Hypothesis Testing & Analysis Guide
No ratings yet
Hypothesis Testing & Analysis Guide
26 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
11 SimpleRegression
No ratings yet
11 SimpleRegression
42 pages
6.1 Test For Single Mean: Assumptions
No ratings yet
6.1 Test For Single Mean: Assumptions
17 pages
Chapter 8
No ratings yet
Chapter 8
45 pages
ISOM2500 Spring 25 - Topic 10 - Linear Regression Interpretation and Diagnosis
No ratings yet
ISOM2500 Spring 25 - Topic 10 - Linear Regression Interpretation and Diagnosis
51 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
StockWatson Econ CH04
No ratings yet
StockWatson Econ CH04
27 pages
EC501 Lecture 03
No ratings yet
EC501 Lecture 03
30 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
18 pages
Business Analytics & Machine Learning: Regression Analysis
No ratings yet
Business Analytics & Machine Learning: Regression Analysis
58 pages
Statistic SimpleLinearRegression
No ratings yet
Statistic SimpleLinearRegression
7 pages
EC212: Introduction To Econometrics Multiple Regression: Inference (Wooldridge, Ch. 4)
No ratings yet
EC212: Introduction To Econometrics Multiple Regression: Inference (Wooldridge, Ch. 4)
89 pages
STAB27
No ratings yet
STAB27
51 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
AP Statistics Tutorial
No ratings yet
AP Statistics Tutorial
3 pages
Module 5
No ratings yet
Module 5
24 pages
Applied Statistics II Chapter 7 The Relationship Between Two Variables
No ratings yet
Applied Statistics II Chapter 7 The Relationship Between Two Variables
73 pages
Ecn 306
No ratings yet
Ecn 306
43 pages
Applied Linear Regression Models 4th Ed Note
No ratings yet
Applied Linear Regression Models 4th Ed Note
46 pages
Chapter 4 - Notes
No ratings yet
Chapter 4 - Notes
58 pages
StockWatson Econ CH04
No ratings yet
StockWatson Econ CH04
27 pages
Data Highlights Combined
No ratings yet
Data Highlights Combined
36 pages
ES12005 Lecture 2.5 2024-25
No ratings yet
ES12005 Lecture 2.5 2024-25
75 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Unit IV - Analytics Tasks (Students)
No ratings yet
Unit IV - Analytics Tasks (Students)
127 pages
Stat Final Exam
No ratings yet
Stat Final Exam
4 pages
Statistical Methods
No ratings yet
Statistical Methods
7 pages
Topic - Chapter 12 - Regression Models
No ratings yet
Topic - Chapter 12 - Regression Models
1 page
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
Discussion+on+Multiple+Regression ShimengHuang
No ratings yet
Discussion+on+Multiple+Regression ShimengHuang
35 pages
Formulas and Probability Tables: Quantitative Methods III
No ratings yet
Formulas and Probability Tables: Quantitative Methods III
8 pages
Resit Examination Version: SZ: School of Business and Economics
No ratings yet
Resit Examination Version: SZ: School of Business and Economics
19 pages
Course Manual Statistics 1718
No ratings yet
Course Manual Statistics 1718
72 pages
Chapter 3 Answers
100% (3)
Chapter 3 Answers
6 pages
Statistical Formulas & Inference Guide
No ratings yet
Statistical Formulas & Inference Guide
3 pages
Examination Version: TS: School of Business and Economics
No ratings yet
Examination Version: TS: School of Business and Economics
17 pages
QM3-IB, 2002S, 2003-2004, First Sit Exam, 30-1-2004: Questions 1-11) : An Empirical Market Research Project (Case 2)
No ratings yet
QM3-IB, 2002S, 2003-2004, First Sit Exam, 30-1-2004: Questions 1-11) : An Empirical Market Research Project (Case 2)
195 pages
Course Manual QM3IB - 1617
No ratings yet
Course Manual QM3IB - 1617
39 pages
Our Basic Empirical Example: Comparing Multiple Regression Models: Adjusted R and The Partial F-Test
No ratings yet
Our Basic Empirical Example: Comparing Multiple Regression Models: Adjusted R and The Partial F-Test
7 pages
Spearman's Rank Correlation QM3 - 1617
No ratings yet
Spearman's Rank Correlation QM3 - 1617
2 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
17 pages
Six Sigma BooK Part2
No ratings yet
Six Sigma BooK Part2
83 pages
Econometrics PDF
No ratings yet
Econometrics PDF
19 pages
Psychology Lab: ANOVA Analysis Guide
No ratings yet
Psychology Lab: ANOVA Analysis Guide
2 pages
BISC Lab Manual
No ratings yet
BISC Lab Manual
209 pages
School Curriculum Objectives
No ratings yet
School Curriculum Objectives
6 pages
Recipes For Data Processing
No ratings yet
Recipes For Data Processing
51 pages
Correlation and Regression
No ratings yet
Correlation and Regression
6 pages
Quasi-Experimental and Single-Case Designs M:C 10
100% (1)
Quasi-Experimental and Single-Case Designs M:C 10
5 pages
Cluster Sampling: Methods & Examples
No ratings yet
Cluster Sampling: Methods & Examples
12 pages
Data Collection Report by Excy Karizze222
No ratings yet
Data Collection Report by Excy Karizze222
2 pages
Formal vs. Informal Writing Guide
100% (2)
Formal vs. Informal Writing Guide
15 pages
Essentials of Econometrics Guide
7% (27)
Essentials of Econometrics Guide
12 pages
Regression Analysis for Engineers
No ratings yet
Regression Analysis for Engineers
76 pages
MecanicaQuanticaScript PDF
No ratings yet
MecanicaQuanticaScript PDF
493 pages
Article Reviewed - Determinants Control System Management and Corporate Performance of Manufacturing With Standar Nasional Indonesia in East Java.
No ratings yet
Article Reviewed - Determinants Control System Management and Corporate Performance of Manufacturing With Standar Nasional Indonesia in East Java.
3 pages
Writing Hypothesis
100% (3)
Writing Hypothesis
17 pages
EAPP 2nd Quarter
100% (1)
EAPP 2nd Quarter
2 pages
Google Data Scientist Interview Questions + Guide in 2024
No ratings yet
Google Data Scientist Interview Questions + Guide in 2024
17 pages
Pengaruh Kompensasi Finansial Dan Motivasi Kerja Guru Terhadap Kinerja Guru SMK Negeri Pariwisata Di Kota Padang Yesmira Syamra
No ratings yet
Pengaruh Kompensasi Finansial Dan Motivasi Kerja Guru Terhadap Kinerja Guru SMK Negeri Pariwisata Di Kota Padang Yesmira Syamra
11 pages
Business Research Method in ACFN: Chapter One
No ratings yet
Business Research Method in ACFN: Chapter One
18 pages
Econometric Modelling Guide
No ratings yet
Econometric Modelling Guide
17 pages
Types of Research Papers by Dr. Shukla
No ratings yet
Types of Research Papers by Dr. Shukla
24 pages
Thinking Like A Researcher
No ratings yet
Thinking Like A Researcher
43 pages
USAID Proposal
33% (3)
USAID Proposal
15 pages
Teaching Principles & Methods Quiz
No ratings yet
Teaching Principles & Methods Quiz
13 pages
Econ 110: Sampling Theory and Statistical Inference in Economics
No ratings yet
Econ 110: Sampling Theory and Statistical Inference in Economics
14 pages
21st - Century.manufacturing (Wright)
100% (1)
21st - Century.manufacturing (Wright)
449 pages
Indian Journal of Law and Economics (IJLE) PDF
No ratings yet
Indian Journal of Law and Economics (IJLE) PDF
202 pages
Practical Research 2 - Lesson 1
100% (3)
Practical Research 2 - Lesson 1
13 pages

QM3 Lecture 2 With Notes

Uploaded by

QM3 Lecture 2 With Notes

Uploaded by

1 2

Three basic examples:

- y response variable (here: “loyalty”)

loyâlty = 0.918 + 0.815satisfac

Þ if H0 is true, we expect a test statistic “close to” 0

- the mechanics are identical to lecture 1 (slides 14–16)

Þ the “satisfac”-effect may depend on “female”

Þ add an interaction term to the model:

SPSS: Transform > Compute Variable: “inter” = “satisfac” * “female”

5) Measuring the fit of regression models Here:

61 largest Dutch cities (>50,000 inhabitants) in 1999 1) Residual analysis

2000 2000 2000

- model B) is nonlinear in the variables, but linear in the coefficients ® regression is OK

-6000 -6000 -6000

Þ define logarithmic variables “lnbike” and “lninhab”

INHAB EDUC Unstandardized Predicted Value

-.2 -.2 -.2

Ex.: bike = 2000 ln(bike) = ln(2000) » 7.60

3) The equal variance assumption so: bike ↑ by 5% Û ln(bike) ↑ by » 0.05

- “educ”-coefficient: if educ ↑ by 1, we expect lnbike ↑ by 0.047 ln(bike ) = γ 0 + γ 1 ln( inhab ) + γ 2educ + ε

Þ if γ1 = 1, we can rewrite Model B) as:

Special effect i): rewriting Model B) Û ln(bike ) - ln(inhab ) = γ 0 + γ 2 educ + ε

1) Collinearity 2) Time series data: checking for autocorrelation

SMt = β1clr 1t + β2clr 2t + β3clr 3t + β6clr 6t + εt

Option b) The P-value

Conclusion: the simpler model 2) may very well be adequate

H0: β1 = 0, β2 = 0, β3 = 0, β4 = 0, β5 = 0, β6 = 0 “all cost driver factors are zero”

You might also like