100% found this document useful (1 vote)

86 views31 pages

Flexible Data Models: Dummy Variables and Interaction Effects

The document discusses how to incorporate qualitative predictors into regression models using dummy variables and interaction effects. It explains dummy variables for two-level and three-level categorical predictors. It also explains how to model interaction effects, including interactions between numeric variables and between numeric and dummy variables. The document uses a direct marketing example to illustrate these concepts, building regression models to explain customer spending amount based on variables like salary, number of children, catalogs received, and incorporating dummy variables for factors like age, gender, and marital status.

Uploaded by

Saitej

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

86 views31 pages

Flexible Data Models: Dummy Variables and Interaction Effects

Uploaded by

Saitej

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Flexible Data Models:

Dummy Variables and Interaction Effects

ANOL BHATTACHERJEE, PH.D.

UNIVERSITY OF SOUTH FLORIDA
Outline
 How to incorporate qualitative predictors in regression models:
 Dummy variables.
 Two-level and three-level dummies.
 How to model situations when the effect of one predictor on the outcome variable depends
upon the value of another predictor:
 Interaction effect.
 Interaction among numeric variables.
 Interaction among numeric and dummy variables.
Motivation: Direct Marketing Example
 Problem:
 A direct marketer wants to identify which customers to target for a new direct mail catalogue.
 The marketer has available a database containing information on past customer behavior.
 Goal:
 Mine this database to extract valuable business insight about future customers’ behavior.
 Explain why some customers spend more than others.
 Explain which customer characteristics relate to AmountSpent, and how!
Age Gender OwnHome Married Location Salary Children History Catalogs AmountSpent
Young Male Rent Single Close 15000 3 Low 6 38
Young Male Rent Single Close 13000 3 Low 6 43
Young Female Rent Single Close 14600 3 Low 6 47
Young Female Rent Single Close 17900 3 Low 6 62
Old Female Own Single Close 12700 2 Low 6 65
Young Female Rent Single Close 23000 3 Low 6 79
Young Female Rent Single Close 12700 3 Low 12 87
Young Female Rent Single Close 12100 1 Low 6 90
Old Female Own Married Close 10100 1 Low 6 93
Young Female Rent Married Close 42000 3 Low 6 105
Young Male Rent Single Close 11200 0 NA 6 106
Getting Started: Explore Associations
 Explore which variables relate to amount spent, and the nature of their relationship.
 How: Correlation analysis, scatterplots, etc., but most importantly, business sense.
Table of correlations
Salary Children Catalogs AmountSpent
Salary 1.000
Children 0.050 1.000
Catalogs 0.184 -0.113 1.000
AmountSpent 0.700 -0.222 0.473 1.000

 Questions:
 What can we learn about the relationship between amount spent and the other variables?
 What would be an intuitive simple regression model that we would want to investigate first?
Scatterplots
0.0 1.0 2.0 3.0 0 2000 4000 6000

150000
Salary

50000
3.0
2.0
Children

1.0
0.0

20
Catalogs

15
10
6000
4000
AmountSpent
2000
0

50000 150000 10 15 20
A Very Basic Model
 Hypothesis: Customers receiving more catalogs are expected to spend more on purchases.
 Ha: ∆AmountSpent / ∆Catalogs > 0
m1 <- lm(AmountSpent ~ Catalogs, data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 209.766 65.194 3.218 0.00133 **
Catalogs 68.588 4.048 16.944 < 2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 847.4 on 998 degrees of freedom
Multiple R-squared: 0.2234, Adjusted R-squared: 0.2226
F-statistic: 287.1 on 1 and 998 DF, p-value: < 2.2e-16

 Questions:
 What do we learn from the model about the relationship between Catalogs and AmountSpent?
 Is this a good model? Can we have a better model?
Controlling for Other Numeric Variables
 But amount spent should also depend on customers’ salary, number of children, etc.
 Higher Salary should result in higher AmountSpent: ∆AmountSpent / ∆Salary > 0
 More Children should result in lesser AmountSpent: ∆AmountSpent / ∆Children < 0
 So we should control for Salary and Children
m2 <- lm(AmountSpent ~ Catalogs + Salary + Children, data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.428e+02 5.372e+01 -8.242 5.29e-16 ***
Catalogs 4.770e+01 2.755e+00 17.310 < 2e-16 ***
Salary 2.041e-02 5.929e-04 34.417 < 2e-16 ***
Children -1.987e+02 1.709e+01 -11.628 < 2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 562.5 on 996 degrees of freedom
Multiple R-squared: 0.6584, Adjusted R-squared: 0.6574
F-statistic: 640 on 3 and 996 DF, p-value: < 2.2e-16

 Questions:
 Is this a better model than Model 1?
 How did adding Salary and Children change the effect of Catalogs compared to not having the controls?
How Much Explanation Does Catalogs Add
Model with Catalog m2 <- lm(AmountSpent ~ Catalogs + Salary + Children, data=d)

Residual standard error: 562.5 on 996 degrees of freedom

Multiple R-squared: 0.6584, Adjusted R-squared: 0.6574
F-statistic: 640 on 3 and 996 DF, p-value: < 2.2e-16

Model without Catalog m3 <- lm(AmountSpent ~ Salary + Children, data=d)

Residual standard error: 641.3 on 997 degrees of freedom

Multiple R-squared: 0.5557, Adjusted R-squared: 0.5548
F-statistic: 623.4 on 2 and 997 DF, p-value: < 2.2e-16

Comparing Nested Models anova(m2, m3, test="Chisq")

Analysis of Variance Table

Model 1: AmountSpent ~ Catalogs + Salary + Children
Model 2: AmountSpent ~ Salary + Children
Res.Df RSS Df Sum of Sq Pr(>Chi)
1 996 315171647
2 997 409993126 -1 -94821478 < 2.2e-16 ***
Increase in variance explained
anova(m2, m3, test="F“) from m3 to m2 is significant
What About the Remaining Variables?
Age Gender OwnHome Married Location Salary Children History Catalogs AmountSpent
Young Male Rent Single Close 15000 3 Low 6 38
Young Male Rent Single Close 13000 3 Low 6 43
Young Female Rent Single Close 14600 3 Low 6 47
Young Female Rent Single Close 17900 3 Low 6 62
Old Female Own Single Close 12700 2 Low 6 65
Young Female Rent Single Close 23000 3 Low 6 79
 So far,
Young we have
Female Rentinvestigated
Singlethe relationship
Close between amount
12700 of money spent
3 Low 12 and the three87
Young Female Rent Single Close 12100 1 Low 6 90
Old numeric predictors
Female Own salary, children,
Married and catalogs.
Close 10100 1 Low 6 93
 What
Young about the
Female Rentremaining variables?
Married CloseAre they unimportant?3 Low
42000 6 105
Young Male Rent Single Close 11200 0 NA 6 106
 Does age potentially have an effect on spending behavior?
 How about gender or marital status?
 Should we include these variables into our regression model?
Qualitative Predictors and Dummy Variables
 Let’s start with variable Age: Age
 Age is a categorical variable with three possible values: Young, Middle or Old. Old
Middle
 No software will allow us to use the variable Age directly. Why?
Middle
 We can solve this problem by quantifying the qualitative variable Age. Young
 How do we quantify categorical variables? Middle
Young
 By creating dummy variables: n-1 combinations of binary variables to represent n
Middle
possible values of categorical variables. Young
 R can create dummy variables using the command “as.factor( )”, e.g. as.factor(Age) Young
Young
1, if middle aged 1, if old
Age_mid =  Age_old =  Young
0, otherwise 0, otherwise
Old
Young
Middle-aged customer: Age_mid=1 & Age_old=0
Old customer: Age_mid=0 & Age_old=1
Young customer: Age_mid=0 & Age_old=0
Regression with Dummy Variables
m4 <- lm(AmountSpent ~ as.factor(Age), data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1501.69 38.42 39.085 <2e-16 ***
as.factor(Age)Old -69.56 71.65 -0.971 0.332
as.factor(Age)Young -943.07 63.95 -14.748 <2e-16 ***

Residual standard error: 866 on 997 degrees of freedom

Multiple R-squared: 0.1897, Adjusted R-squared: 0.1881
F-statistic: 116.7 on 2 and 997 DF, p-value: < 2.2e-16

 Questions:
 What does the coefficient 𝛽2 (=-69.56) mean?
A. The average amount spent by an old customer.
B. The average amount spent by a middle-aged customer.
C. The average amount spent by an old customer relative to a mid-aged customer.
D. The average amount spent by an old customer relative to a young customer.
 Which age group spends the most: young, medium, or old?
Interpreting Dummy Coefficients
m5 <- lm(AmountSpent ~ as.factor(Age) + Catalogs, data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 575.653 67.046 8.586 <2e-16 ***
as.factor(Age)Old -53.702 63.895 -0.840 0.401
as.factor(Age)Young -798.937 57.716 -13.843 <2e-16 ***
Catalogs 60.034 3.736 16.068 <2e-16 ***

Residual standard error: 772.1 on 996 degrees of freedom

Multiple R-squared: 0.3565, Adjusted R-squared: 0.3546
F-statistic: 183.9 on 3 and 996 DF, p-value: < 2.2e-16

AmountSpent
𝛽0 = Amount spent by mid-aged people when their salary is zero
576
𝛽1= Difference in amount spent by old people relative to middle-
576 - 54 aged people with the same salary

𝛽1 – 𝛽1 = Difference in amount spent by young people relative

Catalogs to old people with the same salary
576 - 799
What If We Wish to Set Young as the Base Level
d$Age <- relevel(d$Age, "Young")
m5 <- lm(AmountSpent ~ as.factor(Age) + Catalogs, data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -223.284 66.673 -3.349 0.000842 ***
as.factor(Age)Middle 798.937 57.716 13.843 < 2e-16 ***
as.factor(Age)Old 745.235 71.056 10.488 < 2e-16 ***
Catalogs 60.034 3.736 16.068 < 2e-16 ***

Residual standard error: 772.1 on 996 degrees of freedom

Multiple R-squared: 0.3565, Adjusted R-squared: 0.3546
F-statistic: 183.9 on 3 and 996 DF, p-value: < 2.2e-16

 Questions:
 How does the results change?
 Which age group spends the most: young, medium, or old?
Numeric and Dummy Variables
m6 <- lm(AmountSpent ~ Catalogs + Salary + as.factor(Age), data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.876e+02 5.398e+01 -12.738 < 2e-16 ***
Catalogs 5.145e+01 2.880e+00 17.866 < 2e-16 ***
Salary 2.079e-02 7.840e-04 26.516 < 2e-16 ***
as.factor(Age)Middle -1.018e+02 5.575e+01 -1.826 0.06813 .
as.factor(Age)Old 1.680e+02 5.861e+01 2.866 0.00424 **

Residual standard error: 591.3 on 995 degrees of freedom

Multiple R-squared: 0.6229, Adjusted R-squared: 0.6214
F-statistic: 411 on 4 and 995 DF, p-value: < 2.2e-16

 Questions:
 What does the coefficient 𝛽4 (= -102) mean?
A. Middle-aged customers spend the least.
B. Middle-aged customers spend less than young customers.
C. We should ignore this coefficient because the p-value is not significant.
D. Middle-aged customers spend less than young customers with the same salary level and same number of catalogs.
 Why did the coefficient of Middle change sign?
Prediction with Dummy Variables
m6 <- lm(AmountSpent ~ Catalogs + Salary + as.factor(Age), data=d)

Estimate Std. Error t value Pr(>|t|)

(Intercept) -6.876e+02 5.398e+01 -12.738 < 2e-16 ***
Catalogs 5.145e+01 2.880e+00 17.866 < 2e-16 ***
Salary 2.079e-02 7.840e-04 26.516 < 2e-16 ***
as.factor(Age)Middle -1.018e+02 5.575e+01 -1.826 0.06813 .
as.factor(Age)Old 1.680e+02 5.861e+01 2.866 0.00424 **

 Questions:
 What is the amount spend by an old person with a salary of $50,000 receiving no catalog per year?
 What is the amount spend by a young person with a salary of $30,000 receiving 1 catalog per year?
 What is the amount spend by a middle aged person with a salary of $80,000 with 2 kids?
 What is the difference in amount spend by an old and a young person with the same salary and same number
of catalogs?
 What is the difference in amount spend by an old person receiving 1 catalog and a middle aged person
receiving 2 catalogs, when controlled for salary?
Multiple Numeric and Dummies
m7 <- lm(AmountSpent ~ Salary + Catalogs + Children + as.factor(Age) +
as.factor(Gender) + as.factor(Married) + as.factor(Location), data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.058e+02 7.501e+01 -8.076 1.92e-15 ***
Salary 2.251e-02 9.517e-04 23.648 < 2e-16 ***
Catalogs 4.314e+01 2.549e+00 16.922 < 2e-16 ***
Children -2.008e+02 1.724e+01 -11.645 < 2e-16 ***
as.factor(Age)Middle -8.208e+01 5.062e+01 -1.622 0.105
as.factor(Age)Old -2.038e+01 5.400e+01 -0.377 0.706
as.factor(Gender)Male -4.197e+01 3.467e+01 -1.211 0.226
as.factor(Married)Single 6.759e+01 4.677e+01 1.445 0.149
as.factor(Location)Far 5.071e+02 3.622e+01 14.001 < 2e-16 ***

Residual standard error: 513.9 on 991 degrees of freedom

Multiple R-squared: 0.7164, Adjusted R-squared: 0.7141
F-statistic: 312.9 on 8 and 991 DF, p-value: < 2.2e-16

 Question: What do you see in the above model?

Interpretation: Effect of Location Dummy
 Customers who live far from a store selling similar

AmtSpent
products spend more (larger intercept) than those
living close to competing stores (smaller intercept). Far
Close
 However, customers living far spend at the same rate
as customers living close (same slope) .
 Do you think this is reasonable/realistic?
 Are other scenarios plausible? Which ones? Salary

 How can we investigate other scenarios?

 Remember: Our answers are only as good as the
questions we ask. Simple, unrealistic questions will
yield simple, unrealistic answers.
Interaction Effects
 A more challenging question: Far

AmtSpent
Is the spending rate higher for customers who live
far away than those living close?
 How to answer this question?
Close
 Add an interaction term, computed as the product
of values in Salary and Location columns.
 Salary*as.factor(Location)
 Examine if the interaction effect is significant. Salary
Interaction Between Numeric and Dummy Variables
m8 <- lm(AmountSpent ~ Salary + Catalogs + as.factor(Location) +
Salary*as.factor(Location), data=d);

Estimate Std. Error t value Pr(>|t|)

(Intercept) -5.421e+02 5.268e+01 -10.289 <2e-16 ***
Salary 1.687e-02 6.579e-04 25.636 <2e-16 ***
Catalogs 4.564e+01 2.603e+00 17.532 <2e-16 ***
as.factor(Location)Far -1.684e+02 7.593e+01 -2.218 0.0268 *
Salary:as.factor(Location)Far 1.215e-02 1.205e-03 10.075 <2e-16 ***

Residual standard error: 530 on 995 degrees of freedom

Multiple R-squared: 0.6971, Adjusted R-squared: 0.6959
F-statistic: 572.6 on 4 and 995 DF, p-value: < 2.2e-16

 Questions:
 If the salary of a customer who lives close increases by $10,000, what is the predicted increase in amount spent?
 If the salary of a customer who lives far increases by $10,000, what is the predicted increase in amount spent?
Interaction Between Numeric Variables
m9 <- lm(AmountSpent ~ Salary + Catalogs + Children + Salary*Catalogs, data=d)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.832e+02 8.277e+01 3.422 0.000647 ***
Salary 6.618e-03 1.363e-03 4.856 1.39e-06 ***
Catalogs -4.296e+00 5.359e+00 -0.802 0.422963
Children -1.984e+02 1.613e+01 -12.302 < 2e-16 ***
Salary:Catalogs 9.416e-04 8.486e-05 11.096 < 2e-16 ***

Multiple R-squared: 0.696, Adjusted R-squared: 0.6948

F-statistic: 569.6 on 4 and 995 DF, p-value: < 2.2e-16

 Questions:
 What is the marginal effect of children on amount spent?
 What is the marginal effect of catalogs on amount spent?
 Draw two graphs to show the difference between the two questions above.
 Does marginal effect apply for dummy variables?
How Messy Can It Get?
m8 <- lm(AmountSpent ~ Salary + Catalogs + Children + Gender + Married
+ Catalogs*Salary + Catalogs*Children +Catalogs*Salary*Children +
Catalogs*Gender + Catalogs*Gender*Salary, data=d)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.917e+01 1.427e+02 0.415 0.678458
Salary 8.617e-03 2.552e-03 3.377 0.000762 ***
Catalogs -3.000e+00 8.757e+00 -0.343 0.731960
Children -4.185e+01 7.540e+01 -0.555 0.579053
GenderMale 6.212e+01 1.743e+02 0.356 0.721674
MarriedSingle 1.522e+01 4.476e+01 0.340 0.733902
Salary:Catalogs 1.042e-03 1.506e-04 6.918 8.25e-12 ***
Two-way interaction Catalogs:Children 3.004e+00 5.159e+00 0.582 0.560497
Two-way interaction Salary:Children -1.068e-03 1.229e-03 -0.869 0.384948
Catalogs:GenderMale -5.511e+00 1.129e+01 -0.488 0.625570
with dummies Salary:GenderMale -5.758e-04 2.862e-03 -0.201 0.840601
Three-way interaction Salary:Catalogs:Children -1.638e-04 8.004e-05 -2.047 0.040944 *
Three-way interaction Salary:Catalogs:GenderMale 5.876e-05 1.759e-04 0.334 0.738453
with dummies
Multiple R-squared: 0.7138, Adjusted R-squared: 0.7103
F-statistic: 205.2 on 12 and 987 DF, p-value: < 2.2e-16
Example: Gender Discrimination
 The scenario: Gender Experience
Male 4
Salary
32
 Fifth National Bank is facing a gender discrimination lawsuit alleging Female
Female
15
12
39.1
33.2
that it pays its female employees less than male employees. Female 15 30.6
Male 3 29
 The bank’s database includes information on its 208 employees with Female 3 30.5
each employee’s annual salary (in $thousands), gender, and years Female 4 30
Male 10 27
of experience. Female 4 34
Female 9 29.5
 Question: Female 11 26.8
Female 16 31.3
 What regression model should we run to evaluate the gender
discrimination claim?
Exploring the Data
 Side-by-side box plots.
 Question:
 What do these box plots tell us?
Male

Female

40 60 80 100

Salary
Model 1: Main Effect of Gender
m1 <- lm(Salary ~ as.factor(Gender), data=d);

Estimate Std. Error t value Pr(>|t|)

(Intercept) 37.2099 0.8945 41.597 < 2e-16 ***
as.factor(Gender)Male 8.2955 1.5645 5.302 2.94e-07 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.58 on 206 degrees of freedom
Multiple R-squared: 0.1201, Adjusted R-squared: 0.1158
F-statistic: 28.12 on 1 and 206 DF, p-value: 2.935e-07

 Questions:
 Based on the above model, is there gender discrimination at this bank?
 If so, how much are female employees discriminated against?
 Is this a good model?
 Is there anything missing from this analysis? Elaborate.
Model 2: Controlling for Experience
m2 <- lm(Salary ~ Experience + as.factor(Gender), data=d);

Estimate Std. Error t value Pr(>|t|)

(Intercept) 26.63000 1.20832 22.039 < 2e-16 ***
Experience 0.87231 0.08034 10.858 < 2e-16 ***
as.factor(Gender)Male 8.51029 1.24979 6.809 1.06e-10 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.454 on 205 degrees of freedom
Multiple R-squared: 0.4413, Adjusted R-squared: 0.4359
F-statistic: 80.98 on 2 and 205 DF, p-value: < 2.2e-16

 Questions:
 Based on this model, does gender discrimination exist?
 If so, how much are female employees discriminated against?
 Is this model better than the previous model? Why?

 Is there anything still missing from this analysis? Elaborate.

Digging Further
 Scatterplots (or other visualizations) are a
great way of understanding what’s happening
in the data.

90
Female

Questions:
Male


80
 After controlling for experience, does it appear

70
that gender discrimination still exists? Explain.

Salary
How can we investigate this analytically?

60


50
40
30
10 20 30 40

Experience
Model 3: Interaction Effect
m3 <- lm(Salary ~ Experience + as.factor(Gender) +
Experience*as.factor(Gender), data=d)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.1668 1.4059 23.592 < 2e-16 ***
Experience 0.3334 0.1033 3.228 0.00145 **
as.factor(Gender)Male -4.0171 2.0553 -1.955 0.05201 .
Experience:as.factor(Gender)Male 1.0431 0.1437 7.261 7.95e-12 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.555 on 204 degrees of freedom
Multiple R-squared: 0.5561, Adjusted R-squared: 0.5495
F-statistic: 85.18 on 3 and 204 DF, p-value: < 2.2e-16

What does this negative coefficient mean?

 Question:
 Based on this model, does gender discrimination exist?
 If so, how much are female employees discriminated against?
 Is this model better than the previous two models? Why?
Model 3: Interpretation
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.1668 1.4059 23.592 < 2e-16 ***
Experience 0.3334 0.1033 3.228 0.00145 **
as.factor(Gender)Male -4.0171 2.0553 -1.955 0.05201 .
Experience:as.factor(Gender)Male 1.0431 0.1437 7.261 7.95e-12 ***

 Questions:
 On average, how much more does a male employee with 10 years of experience make than a
female employee with the same experience level?
 Interaction effect coefficient 𝛽3 = 1.04 refers to
A. The average rate of salary increase for each additional year of experience for male employees.
B. The amount by which the average rate of salary increase for each additional year of experience for
male employees relative to that of female employees.
C. The average salary of male employees with zero years of experience.
 What does 𝛽2 = -4.01 mean?
Model 3: Wrapping Up
 Questions:
 Which of the 3 models best characterizes discrimination?
A. The gender-only model.
B. The gender + experience model.
C. The interaction model.
 Is there discrimination in the starting salaries?
 Is there discrimination in the rate of salary increase?
 Are there any limitations to this analysis?
Questions to Ponder …
 More on interaction effects:
 Can we have interaction between two numeric
variables? How will you interpret the interaction

90
coefficient? Female
Male

 Can we have interaction between one numeric

80
variable and one categorical variables with three

70
levels (e.g., old, mid, young)?

Salary
 Can you have interaction between two categorical

60
variables (e.g., old/mid/young vs. far/near)?

50
 Gender discrimination problem:

40
 Is this really a linear effect, i.e., is the rate of
increase the same (constant) at all experience

30
levels? 10 20 30 40

 If not, how can we investigate this? (Answer: non- Experience

linear models).
Key Takeaways
 Incorporating dummy variables as predictors in a regression model allows us to compare or
make predictions for each group and compare predictions across groups.
 Adding an interaction term x1*x2 as a predictor allows for a more flexible model,
 Change in y relative to change in x1 is a function of the other predictor x2.
 Interaction may involve two or more variables, which may be numeric and/or categorical.
 In all of these models, the DV is still numeric.

Sanjiv Jaggia, Alison Kelly - Business Statistics - Communicating With Numbers (2012, McGraw Hill Higher Education) PDF
80% (10)
Sanjiv Jaggia, Alison Kelly - Business Statistics - Communicating With Numbers (2012, McGraw Hill Higher Education) PDF
746 pages
Exercises - SPSS
No ratings yet
Exercises - SPSS
6 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
Sample Size Calculations Thabane
No ratings yet
Sample Size Calculations Thabane
42 pages
Types of Statistical Tests
No ratings yet
Types of Statistical Tests
5 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Aiken & West (1991) Chap07 PDF
No ratings yet
Aiken & West (1991) Chap07 PDF
14 pages
DevOps 2018 Report
0% (1)
DevOps 2018 Report
46 pages
SPSS in Research Part 2 by Prof. Dr. Ananda Kumar
100% (2)
SPSS in Research Part 2 by Prof. Dr. Ananda Kumar
212 pages
Syllabus of 14.130X Taught in MIT
No ratings yet
Syllabus of 14.130X Taught in MIT
6 pages
SSRN Id2596846
No ratings yet
SSRN Id2596846
59 pages
Panel Data For Learing
100% (2)
Panel Data For Learing
34 pages
17ME-ENV-48 SPSS Practical
No ratings yet
17ME-ENV-48 SPSS Practical
41 pages
Class 7
No ratings yet
Class 7
42 pages
13 Practical Statistics Using SPSS Revision 2009
100% (1)
13 Practical Statistics Using SPSS Revision 2009
60 pages
EDA Regression1
100% (1)
EDA Regression1
15 pages
SPSS Practical
No ratings yet
SPSS Practical
31 pages
Harvard SPSS Tutorial PDF
No ratings yet
Harvard SPSS Tutorial PDF
84 pages
Count Data Models in SAS
No ratings yet
Count Data Models in SAS
12 pages
Sampling Techniques - Towards Data Science
No ratings yet
Sampling Techniques - Towards Data Science
10 pages
Spss Example
No ratings yet
Spss Example
458 pages
Frequency Distribution For Categorical Data
No ratings yet
Frequency Distribution For Categorical Data
6 pages
UsefulStataCommands PDF
No ratings yet
UsefulStataCommands PDF
51 pages
Linear Regression Analysis For Survey Data
No ratings yet
Linear Regression Analysis For Survey Data
28 pages
IT Spss Ppractical
No ratings yet
IT Spss Ppractical
40 pages
Chapter 6 Section 4-5: Probability: Multiple Choice
No ratings yet
Chapter 6 Section 4-5: Probability: Multiple Choice
7 pages
Non Parametric Estimation
No ratings yet
Non Parametric Estimation
19 pages
Data Manipulation
No ratings yet
Data Manipulation
24 pages
Organizing Visualizing and Describing Data
No ratings yet
Organizing Visualizing and Describing Data
35 pages
Question and Answers For Pyplots
No ratings yet
Question and Answers For Pyplots
11 pages
Rstudio Cheat Sheet: Console
No ratings yet
Rstudio Cheat Sheet: Console
3 pages
Doe-Introduction To Design and Analysis of Experiments With The Sas
No ratings yet
Doe-Introduction To Design and Analysis of Experiments With The Sas
191 pages
Spss Syllabus
No ratings yet
Spss Syllabus
2 pages
Predictive Modeling Project Report
100% (2)
Predictive Modeling Project Report
31 pages
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
No ratings yet
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
24 pages
STATA Codes - Basic
No ratings yet
STATA Codes - Basic
8 pages
Spss Workbook
No ratings yet
Spss Workbook
13 pages
Probability Laws: Complementary Event
No ratings yet
Probability Laws: Complementary Event
23 pages
Prepared by Bundala, N.H
No ratings yet
Prepared by Bundala, N.H
58 pages
An Introduction To R Language
No ratings yet
An Introduction To R Language
11 pages
Data Transformation With Dplyr - Cheatsheet
100% (1)
Data Transformation With Dplyr - Cheatsheet
2 pages
Credibility Theory 2
No ratings yet
Credibility Theory 2
45 pages
17 Regression Analysis
No ratings yet
17 Regression Analysis
10 pages
Keyboard Shortcuts RStudio
No ratings yet
Keyboard Shortcuts RStudio
6 pages
Ordinary Least Squares: Linear Model
No ratings yet
Ordinary Least Squares: Linear Model
13 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Statistical Inference For Decision Making
No ratings yet
Statistical Inference For Decision Making
9 pages
Statistics For Management: Q.1 A) 'Statistics Is The Backbone of Decision Making'. Comment
No ratings yet
Statistics For Management: Q.1 A) 'Statistics Is The Backbone of Decision Making'. Comment
10 pages
Unit3 160420200647 PDF
No ratings yet
Unit3 160420200647 PDF
146 pages
Rab Nawaz Lodhi Management Sciences 2016 HSR BU Islamabad 27.07.2017 PDF
No ratings yet
Rab Nawaz Lodhi Management Sciences 2016 HSR BU Islamabad 27.07.2017 PDF
302 pages
Poisson Regression - Stata Data Analysis Examples
No ratings yet
Poisson Regression - Stata Data Analysis Examples
12 pages
Statistics For Health Data Science An Organic Approach
No ratings yet
Statistics For Health Data Science An Organic Approach
238 pages
Dummy Regression
No ratings yet
Dummy Regression
23 pages
10E-Poisson Regression
No ratings yet
10E-Poisson Regression
19 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
21 pages
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
No ratings yet
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
101 pages
Quantitative Techniques
100% (1)
Quantitative Techniques
3 pages
Advanced Data Analysis
No ratings yet
Advanced Data Analysis
30 pages
Multiple Regression Analysis: I 0 1 I1 K Ik I
100% (1)
Multiple Regression Analysis: I 0 1 I1 K Ik I
30 pages
Confidence Intervals: Submitted To: Prof. Neeta Gupta
100% (2)
Confidence Intervals: Submitted To: Prof. Neeta Gupta
13 pages
Mastering IDEAScript: The Definitive Guide
From Everand
Mastering IDEAScript: The Definitive Guide
IDEA
No ratings yet
15 Building Regression Models Part2
No ratings yet
15 Building Regression Models Part2
17 pages
MGT 6203 - Sri - M2 - Customer Analytics Indicator Variables v042619
No ratings yet
MGT 6203 - Sri - M2 - Customer Analytics Indicator Variables v042619
30 pages
07 - Multiple Linear Regression III
No ratings yet
07 - Multiple Linear Regression III
6 pages
Middle Class Spending Report CIA 3 Ba
No ratings yet
Middle Class Spending Report CIA 3 Ba
10 pages
Distributed Information Systems: Javascript and Jquery
No ratings yet
Distributed Information Systems: Javascript and Jquery
56 pages
Distributed Information Systems: Lecture 4 - Entity Framework Based On Julia Lerman, Chs 1-8
No ratings yet
Distributed Information Systems: Lecture 4 - Entity Framework Based On Julia Lerman, Chs 1-8
58 pages
Distributed Information Systems: Lecture 4 - Linq
No ratings yet
Distributed Information Systems: Lecture 4 - Linq
19 pages
2019 Spring Syllabus ISM6562 Muma 1
No ratings yet
2019 Spring Syllabus ISM6562 Muma 1
8 pages
Logistic Regression Video Exhibits Markup
No ratings yet
Logistic Regression Video Exhibits Markup
45 pages
Advanced Information Systems Analysis and Design: Class 6: Introduction To Icase With Argouml
No ratings yet
Advanced Information Systems Analysis and Design: Class 6: Introduction To Icase With Argouml
30 pages
l9 Osi Model
No ratings yet
l9 Osi Model
172 pages
L2 ObjectOrientedProgrammingIntroduction
No ratings yet
L2 ObjectOrientedProgrammingIntroduction
57 pages
Jacquenetta SlidesCarnival
No ratings yet
Jacquenetta SlidesCarnival
32 pages
Advanced Information Systems Analysis and Design: Class 10: New Directions in Software Development
No ratings yet
Advanced Information Systems Analysis and Design: Class 10: New Directions in Software Development
95 pages
Advanced Information Systems Analysis and Design: Class 8: Software System Security
No ratings yet
Advanced Information Systems Analysis and Design: Class 8: Software System Security
17 pages
Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida
No ratings yet
Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida
28 pages
Advanced Information Systems Analysis and Design: Class 3: Requirements, Specification, and Architecture
No ratings yet
Advanced Information Systems Analysis and Design: Class 3: Requirements, Specification, and Architecture
71 pages
Temporal and Spatial Models: Anol Bhattacherjee, Ph.D. University of South Florida
No ratings yet
Temporal and Spatial Models: Anol Bhattacherjee, Ph.D. University of South Florida
39 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
6304 Time Series Video Exhibits Markup
No ratings yet
6304 Time Series Video Exhibits Markup
20 pages
Distributed Information Systems: Prototypicalactivewebsite R Est - Apis, Js On, Data Ba Se, Cha Rts
No ratings yet
Distributed Information Systems: Prototypicalactivewebsite R Est - Apis, Js On, Data Ba Se, Cha Rts
31 pages
Advanced Information Systems Analysis and Design: Class 2: System Development Processes and Methods
No ratings yet
Advanced Information Systems Analysis and Design: Class 2: System Development Processes and Methods
98 pages
Distributed Information Systems: Lecture9 - Networkingbasics: Osi Model Basedon Agrawalandsharma, Prospectpress
No ratings yet
Distributed Information Systems: Lecture9 - Networkingbasics: Osi Model Basedon Agrawalandsharma, Prospectpress
172 pages
Distributed Information Systems: Lecture 4 - Linq
No ratings yet
Distributed Information Systems: Lecture 4 - Linq
19 pages
Distributed Information Systems: Javascript and Jquery
No ratings yet
Distributed Information Systems: Javascript and Jquery
56 pages
Distributed Information Systems: Prototypical Active Website Rest-Apis, Json, Database, Charts
No ratings yet
Distributed Information Systems: Prototypical Active Website Rest-Apis, Json, Database, Charts
31 pages
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
No ratings yet
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
58 pages
L4 Linq
No ratings yet
L4 Linq
19 pages
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
No ratings yet
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
65 pages
Distributed Information Systems: Lecture 2 - Git, Object-Oriented Programming
No ratings yet
Distributed Information Systems: Lecture 2 - Git, Object-Oriented Programming
57 pages
The Impact of Parents Overseas Employment On Educ
No ratings yet
The Impact of Parents Overseas Employment On Educ
20 pages
Rinusinusitis en EPOC
No ratings yet
Rinusinusitis en EPOC
9 pages
Temas 4 Al 7
No ratings yet
Temas 4 Al 7
191 pages
Society For American Archaeology
No ratings yet
Society For American Archaeology
27 pages
Regression Control Chart For Two Related Variables: A Forgotten Lesson
No ratings yet
Regression Control Chart For Two Related Variables: A Forgotten Lesson
19 pages
Excel Efoundation 2021
No ratings yet
Excel Efoundation 2021
3 pages
Plant Based-Consumer Preferences On The Market of PL
No ratings yet
Plant Based-Consumer Preferences On The Market of PL
12 pages
Smart Specialization Policy in The European Union Relatedness Knowledge Complexity and Regional Diversification
No ratings yet
Smart Specialization Policy in The European Union Relatedness Knowledge Complexity and Regional Diversification
18 pages
Bowles SchoolingInequalityGeneration 1972
No ratings yet
Bowles SchoolingInequalityGeneration 1972
34 pages
Product Demand Estimation Forecasting Capacity OT Prod RT Prod Out Sourcing Production Plan
No ratings yet
Product Demand Estimation Forecasting Capacity OT Prod RT Prod Out Sourcing Production Plan
8 pages
Dap An BTL KTL HK I 2022
No ratings yet
Dap An BTL KTL HK I 2022
4 pages
Simple Linear Regression Interpretation PDF
No ratings yet
Simple Linear Regression Interpretation PDF
2 pages
Assignment 1 BAN 100 Edwin Castillo
No ratings yet
Assignment 1 BAN 100 Edwin Castillo
11 pages
Financial Calculator Notes - Sharp EL738
No ratings yet
Financial Calculator Notes - Sharp EL738
9 pages
Implementation and Study of K-Nearest Ne
No ratings yet
Implementation and Study of K-Nearest Ne
62 pages
Linear Regression Stat Edit Worksheet PDF
No ratings yet
Linear Regression Stat Edit Worksheet PDF
5 pages
Econometrics - Chapter 17 - Simultaneous Equations Models - Shalabh, IIT Kanpur
No ratings yet
Econometrics - Chapter 17 - Simultaneous Equations Models - Shalabh, IIT Kanpur
30 pages
AI Expert Roadmap
No ratings yet
AI Expert Roadmap
15 pages
Two SPSS Programs For Interpreting Multiple Regression Results
No ratings yet
Two SPSS Programs For Interpreting Multiple Regression Results
5 pages
Determinants of Domestic Tourism Growth in India: Radhika B. Nair and Jayalakshmy Ramachandran
No ratings yet
Determinants of Domestic Tourism Growth in India: Radhika B. Nair and Jayalakshmy Ramachandran
8 pages
Statistical Analysis of Caterpillar 793D Haul Truck Engine Data
No ratings yet
Statistical Analysis of Caterpillar 793D Haul Truck Engine Data
10 pages
Mod 4
No ratings yet
Mod 4
24 pages
Credit Risk Estimation Model Development Process Main Steps and Model Improvementengineering Economics
No ratings yet
Credit Risk Estimation Model Development Process Main Steps and Model Improvementengineering Economics
8 pages
On The Use of Indicator Variables in Regression Analysis: by Keith M. Bower, M.S
No ratings yet
On The Use of Indicator Variables in Regression Analysis: by Keith M. Bower, M.S
4 pages
Higher Engineering Mathematics Bs Grewal-Page11
No ratings yet
Higher Engineering Mathematics Bs Grewal-Page11
1 page
A Growth Comparison Among Three Commercial Tilapia Species in A Biofloc System
No ratings yet
A Growth Comparison Among Three Commercial Tilapia Species in A Biofloc System
14 pages

Flexible Data Models: Dummy Variables and Interaction Effects

Uploaded by

Flexible Data Models: Dummy Variables and Interaction Effects

Uploaded by

Flexible Data Models:

Dummy Variables and Interaction Effects

ANOL BHATTACHERJEE, PH.D.

Residual standard error: 562.5 on 996 degrees of freedom

Model without Catalog m3 <- lm(AmountSpent ~ Salary + Children, data=d)

Residual standard error: 641.3 on 997 degrees of freedom

Comparing Nested Models anova(m2, m3, test="Chisq")

Analysis of Variance Table

Residual standard error: 866 on 997 degrees of freedom

Residual standard error: 772.1 on 996 degrees of freedom

𝛽1 – 𝛽1 = Difference in amount spent by young people relative

Residual standard error: 772.1 on 996 degrees of freedom

Residual standard error: 591.3 on 995 degrees of freedom

Estimate Std. Error t value Pr(>|t|)

Residual standard error: 513.9 on 991 degrees of freedom

 Question: What do you see in the above model?

 How can we investigate other scenarios?

Estimate Std. Error t value Pr(>|t|)

Residual standard error: 530 on 995 degrees of freedom

Estimate Std. Error t value Pr(>|t|)

Multiple R-squared: 0.696, Adjusted R-squared: 0.6948

Estimate Std. Error t value Pr(>|t|)

Estimate Std. Error t value Pr(>|t|)

Estimate Std. Error t value Pr(>|t|)

 Is there anything still missing from this analysis? Elaborate.

What does this negative coefficient mean?

 Can we have interaction between one numeric

 If not, how can we investigate this? (Answer: non- Experience

You might also like