Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida
Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida
Question:
Does the detergent sales plot fit an alternative model better?
Exponential model: Y on log scale
Logarithmic Model
How to create a log model:
Create a new predictor variable log(Price); this is the natural
logarithm, or log to the base e (e = 2.71828).
Check plot of Qty vs. log(Price) for linearity.
Specify model: Qty = a + b log(Price)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4723.8 322.4 14.65 < 2e-16 ***
log(Price) -1993.9 199.2 -10.01 2.46e-13 ***
y = a + b log (x)
dY 1 dX
b or dY b
dX X X X
b = dy / (dx/x)
b is the change in y for proportional change in x (relative to original
value of X), i.e., how much y would change if x changed by 100%
or doubled (not change of 1 unit).
Log Model
m <- lm(Qty ~ log(Price), data=d)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4723.8 322.4 14.65 < 2e-16 ***
log(Price) -1993.9 199.2 -10.01 2.46e-13 ***
Question: Question:
b = -1994 implies that when price At a price of $4, the predicted demand
increases by ___, demand decreases according to this model is…
by _____ boxes. A. -104,182 boxes
A. $1 ; 1994 B. -3254 boxes
C. 0 boxes
B. $1994 ; 1
D. 1959 boxes
C. 1% ; 1994
E. 4724 boxes
D. 100% ; 1994
F. 7979 boxes
E. None of the above G. None of the above
Exponential Model
Regression model: Qty = c eb Price
Model estimated as: log(Qty) = a + b Price
m <- lm(log(Qty) ~ Price, data=d)
Interpretation:
1 dY dY
b or b dX
Y dX Y
b = -0.236 implies that when price increases by $1, demand decreases on average by 23.58%
When price increases by $0.10, demand decreases by 0.2358*0.1 = 0.02358 or 2.358%
Exponential Model: Another Interpretation
Qty(Price) = c eb Price
Qty(Price+1) = c eb (Price+1) = c eb*Price eb*1
Qty(Price+1) / Qty(Price) = eb m <- lm(log(Qty) ~ Price, data=d)
eb = Change in Qty (y) per
Estimate Std. Error t value Pr(>|t|)
unit change in Price (x) (Intercept) 8.48937 0.13359 63.550 < 2e-16 ***
e-0.2355 = 0.79 implies that Qty drops Price -0.23550 0.02614 -9.009 6.88e-12 ***
by 21 units for $1 change in price. Multiple R-squared: 0.6284, Adjusted R-squared: 0.6206
F-statistic: 81.16 on 1 and 48 DF, p-value: 6.877e-12
Question:
Can we have a model of the form: log(y) = a + b*log(x)?
If so, how will you interpret the beta coefficient b?
Generalized Linear Models
A generalization of linear regression model to allow for non-normal DV, i.e., 𝜀 ≁ N(0, 𝜎2).
Lognormal (exponential) distribution: Y is exponentially decreasing; log(Y) may have 𝜀 ~ N(0, 𝜎2).
Logistic distribution: Y is binary, e.g. loan default (vs. solvent loans).
But we are interested probability of default P[Y=1], which may have 𝜀 ~ N(0, 𝜎2).
Poisson distribution: Y is a count, e.g., Number of calls received at a call center per hour.
Binomial distribution: Y is the count of a binary occurrence, e.g., number of loan defaults in different
banks.
R code:
m1 <- glm(log(Qty) ~ Price, data =d, family=gaussian) Gaussian = Normal
m2 <- glm(defaults ~ meanROA + loantarget, data=d, family=binomial) log(Y) = a + bX + e
m2 <- glm(callcount ~ hourofday + dayofmonth, data=d, family=poisson)
Questions: What results do you get if you run the Qty-Price model as a glm model? How do these
results compare with the lm output?
Common Distributions
Prestige Data Example
A 1960s survey on the perceived prestige of different occupations reports the following data:
Average monthly income.
Prestige score (from a social survey).
Additional information (education, pct women, etc).
How would you examine the impact of income on prestige?
education income women prestige
GOV.ADMINISTRATORS 13.11 12351 11.16 68.8
GENERAL.MANAGERS 12.26 25879 4.02 69.1
ACCOUNTANTS 12.77 9271 15.7 63.4
PURCHASING.OFFICERS 11.42 8865 9.11 56.8
CHEMISTS 14.62 8403 11.68 73.5
PHYSICISTS 15.64 11030 5.13 77.6
BIOLOGISTS 15.09 8258 25.65 72.6
ARCHITECTS 15.44 14163 2.69 78.1
CIVIL.ENGINEERS 14.52 11377 1.03 73.1
MINING.ENGINEERS 14.64 11023 0.94 68.8 Data: Prestige.csv
How to Model Prestige Data
Perceived prestige vs. average income:
Improper to exclude Physicians, General Managers,
Osteopaths, etc. as outliers, since these are real
professions. PHYSICIANS
80
LAWYERS
60
Prestige
Which model?
We can try log model -
40
better than linear model
but fit may be a problem.
20
0 5000 10000 15000 20000 25000
Average Income
Regression of Prestige Data
m <- lm(prestige ~ log(income) + education + women,
data=d) But what about these data?
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -110.9658 14.8429 -7.476 3.27e-11 ***
log(income) 13.4382 1.9138 7.022 2.90e-10 ***
education 3.7305 0.3544 10.527 < 2e-16 ***
women 0.0469 0.0299 1.568 0.12
Question:
Is this a reasonable model?
Prestige Assumptions?
Are There Better Model Transformations?
Of course!
The challenge is to discover them.
And then, to interpret their results.
Box-Cox Transformation: A power-transformation of the form:
y = xλ if λ≠0
= log(x) if λ=0
Quadratic models:
y = b0 + b1x + b2x2 + e
But we will skip these for now!
Finding the best nonlinear relationship involve trial and error and understanding the domain.
Alternatively, we can determine it automatically via non-parametric regression.
What if…
What if we had a method that could automatically
detect non-linearities?
Answer: Nonparametric regression methods.
PHYSICIANS
80
LAWYERS
60
such as linearity and are distribution-free (residuals
Prestige
𝜀 ≁ N(0, 𝜎2)).
Advantages:
40
Better data-fit leading to better predictive capabilities.
Disadvantages:
20
No “neat” economic interpretations possible, e.g.,
marginal effects, etc. 0 5000 10000 15000 20000 25000
Average Income
Piecewise polynomial:
A polynomial that is only defined on a certain
range of the data.
A piecewise polynomial of degree k is continuously
differentiable k-1 times.
Eliminates excessive oscillation, however may not
be smooth at the breakpoints.
Splines & Smoothing Splines
Spline:
A numeric function that is piecewise-defined by polynomial functions with a high degree
of smoothness at the places where the polynomials connect.
Linear spline: 0-times differentiable.
Cubic spline: twice differentiable.
Smoothing Spline:
A method of fitting a smooth curve to a set of noisy observations using a spline function.
Find a piecewise polynomial f(x) with smooth breakpoints
f (t ) 0 1t 2t 2 pt p l 1 pl (t )
L p
λ > 0 is a smoothing parameter, which can be adjusted to achieve a desired level of smoothing,
Smoothing Spline for Prestige Data
Two different smoothing splines fit to the Prestige
data:
λ=0.5, little smoothing (red line)
80
λ=1 large, more smoothing (blue line)
Which spline represents the data better?
60
Visual inspection.
Prestige
Cross-validate using holdout sample.
40
Minimizing mean-squared error of fit.
Estimating smoothing splines in R:
20
smooth.spline (x, y)
The smoothing parameter λ is the “spar” attribute:
0 5000 10000 15000 20000 25000
smooth.spline (x, y, spar=0.5)
Average Income
Generalized Additive Model
To estimate a nonparametric regression (via smoothing splines) with intercept and slope, we
need a new class of models, called Generalized Additive Models (R: gam).
Income and education
library(mgcv)
are both estimated via
gam1 <- gam(prestige ~ s(income) + s(education), data=d)
summary(gam1) smoothing splines (fully-
gam2 <- gam(prestige ~ s(income) + education, data=d) parametric model)
gam3 <- gam(prestige ~ income + education, data=d)
gam4 <- gam(prestige ~ s(income, education), data=d) Only income estimated by
smoothing splines (semi-
parametric model)
Interpreting GAM Results
Formula: prestige ~ s(income) Linear model (red) vs. GAM (green)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.833 1.098 42.65 <2e-16 ***
80
Approximate significance of smooth terms:
edf Ref.df F p-value
60
s(income) 2.464 3.065 46.14 <2e-16 ***
Prestige
R-sq.(adj) = 0.585
40
Deviance explained = 59.5%
GCV = 127.31 Scale est. = 122.98 n = 102
20
Question: Based on GAM, we can learn that 0 5000 10000 15000 20000 25000