[go: up one dir, main page]

0% found this document useful (0 votes)
56 views28 pages

Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida

The document discusses various non-linear data modeling techniques including log, exponential, and generalized linear models. It provides examples of fitting these models to real-world data like detergent sales and occupational prestige. The key advantage of non-linear models is that they can better capture non-linear relationships in data compared to simple linear models.

Uploaded by

Saitej
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views28 pages

Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida

The document discusses various non-linear data modeling techniques including log, exponential, and generalized linear models. It provides examples of fitting these models to real-world data like detergent sales and occupational prestige. The key advantage of non-linear models is that they can better capture non-linear relationships in data compared to simple linear models.

Uploaded by

Saitej
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Non-Linear Data Models

ANOL BHATTACHERJEE, PH.D.


UNIVERSITY OF SOUTH FLORIDA
Outline
 Non-linear relationships:
 When do you want non-linear models.
 Methods for fitting non-linear relationships:
 Log models.
 Exponential models.
 Piecewise polynomials, splines, and smoothing splines.
 Generalized additive models.
 Regression trees.
Detergent Sales Example
 Problem:
 A brand manager at a consumer goods firm is studying
the sales of the firm’s flagship brand of laundry
detergent, Clean.
 Data:
 Weekly sales data over a 50-week period are obtained
from one sales district.
 Prevailing retail price for a 5-lb box of Clean per week.
 Boxes (=demand) sold that week.
 Questions:
 How does demand change as a function of price? Is
there a positive or negative trend?
 What is the shape of this relationship?
Data: Detergent Sales.csv
Linear Model
m <- lm(Qty ~ Price, data=d)
summary(m)
plot(m)

Estimate Std. Error t value Pr(>|t|)


(Intercept) 3501.99 225.57 15.525 < 2e-16 ***
Price -393.63 44.14 -8.918 9.38e-12 ***

Residual standard error: 214.3 on 48 degrees of freedom


Multiple R-squared: 0.6236, Adjusted R-squared: 0.6158
F-statistic: 79.52 on 1 and 48 DF, p-value: 9.377e-12

 What happens if we fit a linear model to non-linear data?


 Poor fit: low multiple R2.
 Model puts excessive weight on extreme points (changing
the slope), adding to SSE, and biasing our inferences.
Examples of Non-Linear Models

Linear models Log model: X on log scale

Question:
Does the detergent sales plot fit an alternative model better?
Exponential model: Y on log scale
Logarithmic Model
 How to create a log model:
 Create a new predictor variable log(Price); this is the natural
logarithm, or log to the base e (e = 2.71828).
 Check plot of Qty vs. log(Price) for linearity.
 Specify model: Qty = a + b log(Price)

plot(Qty ~ log(Price), data=d)


m <- lm(Qty ~ log(Price), data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4723.8 322.4 14.65 < 2e-16 ***
log(Price) -1993.9 199.2 -10.01 2.46e-13 ***

Residual standard error: 198.8 on 48 degrees of freedom


Multiple R-squared: 0.6761, Adjusted R-squared: 0.6693
F-statistic: 100.2 on 1 and 48 DF, p-value: 2.457e-13
The Log Model
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4723.8 322.4 14.65 < 2e-16 ***
log(Price) -1993.9 199.2 -10.01 2.46e-13 ***

 Question: The coefficient b = -1994 implies that…


A. When the price of Clean decreases by $1, demand increases by 1994 boxes.
B. When the price of Clean decreases by log($1), demand increases by 1994 boxes.
C. When the price of Clean decreases by $1, demand increases by log(1994) boxes.
D. None of the above.
 Log models must be interpreted with care.
 It makes no (practical) sense to increase/decrease the log(Price) by one unit.
 We want to know the impact of price increase by one dollar, not by log of a dollar!
Interpreting Log Models
 A look back at basic calculus:
 First order derivative of a function dy/dt is the rate of change in Y Y
for any given value of x.
 The tangent in the y vs. x plot.
 Conceptually similar to velocity or speed.

 y = a + b log (x)
dY 1 dX
b or dY  b
dX X X X
 b = dy / (dx/x)
 b is the change in y for proportional change in x (relative to original
value of X), i.e., how much y would change if x changed by 100%
or doubled (not change of 1 unit).
Log Model
m <- lm(Qty ~ log(Price), data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4723.8 322.4 14.65 < 2e-16 ***
log(Price) -1993.9 199.2 -10.01 2.46e-13 ***

 Question:  Question:
 b = -1994 implies that when price  At a price of $4, the predicted demand
increases by ___, demand decreases according to this model is…
by _____ boxes. A. -104,182 boxes
A. $1 ; 1994 B. -3254 boxes
C. 0 boxes
B. $1994 ; 1
D. 1959 boxes
C. 1% ; 1994
E. 4724 boxes
D. 100% ; 1994
F. 7979 boxes
E. None of the above G. None of the above
Exponential Model
 Regression model: Qty = c eb Price
 Model estimated as: log(Qty) = a + b Price
m <- lm(log(Qty) ~ Price, data=d)

Estimate Std. Error t value Pr(>|t|)


(Intercept) 8.48937 0.13359 63.550 < 2e-16 ***
Price -0.23550 0.02614 -9.009 6.88e-12 ***

Multiple R-squared: 0.6284, Adjusted R-squared: 0.6206


F-statistic: 81.16 on 1 and 48 DF, p-value: 6.877e-12

 Interpretation:
1 dY dY
b or  b  dX
Y dX Y

 b = -0.236 implies that when price increases by $1, demand decreases on average by 23.58%
 When price increases by $0.10, demand decreases by 0.2358*0.1 = 0.02358 or 2.358%
Exponential Model: Another Interpretation
Qty(Price) = c eb Price
Qty(Price+1) = c eb (Price+1) = c eb*Price  eb*1
Qty(Price+1) / Qty(Price) = eb m <- lm(log(Qty) ~ Price, data=d)
eb = Change in Qty (y) per
Estimate Std. Error t value Pr(>|t|)
unit change in Price (x) (Intercept) 8.48937 0.13359 63.550 < 2e-16 ***
e-0.2355 = 0.79 implies that Qty drops Price -0.23550 0.02614 -9.009 6.88e-12 ***

by 21 units for $1 change in price. Multiple R-squared: 0.6284, Adjusted R-squared: 0.6206
F-statistic: 81.16 on 1 and 48 DF, p-value: 6.877e-12

 Question:
 Can we have a model of the form: log(y) = a + b*log(x)?
 If so, how will you interpret the beta coefficient b?
Generalized Linear Models
 A generalization of linear regression model to allow for non-normal DV, i.e., 𝜀 ≁ N(0, 𝜎2).
 Lognormal (exponential) distribution: Y is exponentially decreasing; log(Y) may have 𝜀 ~ N(0, 𝜎2).
 Logistic distribution: Y is binary, e.g. loan default (vs. solvent loans).
 But we are interested probability of default P[Y=1], which may have 𝜀 ~ N(0, 𝜎2).
 Poisson distribution: Y is a count, e.g., Number of calls received at a call center per hour.
 Binomial distribution: Y is the count of a binary occurrence, e.g., number of loan defaults in different
banks.
 R code:
m1 <- glm(log(Qty) ~ Price, data =d, family=gaussian) Gaussian = Normal
m2 <- glm(defaults ~ meanROA + loantarget, data=d, family=binomial) log(Y) = a + bX + e
m2 <- glm(callcount ~ hourofday + dayofmonth, data=d, family=poisson)

 Questions: What results do you get if you run the Qty-Price model as a glm model? How do these
results compare with the lm output?
Common Distributions
Prestige Data Example
 A 1960s survey on the perceived prestige of different occupations reports the following data:
 Average monthly income.
 Prestige score (from a social survey).
 Additional information (education, pct women, etc).
 How would you examine the impact of income on prestige?
education income women prestige
GOV.ADMINISTRATORS 13.11 12351 11.16 68.8
GENERAL.MANAGERS 12.26 25879 4.02 69.1
ACCOUNTANTS 12.77 9271 15.7 63.4
PURCHASING.OFFICERS 11.42 8865 9.11 56.8
CHEMISTS 14.62 8403 11.68 73.5
PHYSICISTS 15.64 11030 5.13 77.6
BIOLOGISTS 15.09 8258 25.65 72.6
ARCHITECTS 15.44 14163 2.69 78.1
CIVIL.ENGINEERS 14.52 11377 1.03 73.1
MINING.ENGINEERS 14.64 11023 0.94 68.8 Data: Prestige.csv
How to Model Prestige Data
 Perceived prestige vs. average income:
 Improper to exclude Physicians, General Managers,
Osteopaths, etc. as outliers, since these are real
professions. PHYSICIANS

80
LAWYERS

 If extreme points are included, they will drag (bias)


the slope to increase SSE. OSTEOPATHS GENERAL.MA

Clearly non-linear model fits better.

60

Prestige
 Which model?
We can try log model -

40

better than linear model
but fit may be a problem.

20
0 5000 10000 15000 20000 25000

Average Income
Regression of Prestige Data
m <- lm(prestige ~ log(income) + education + women,
data=d) But what about these data?
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -110.9658 14.8429 -7.476 3.27e-11 ***
log(income) 13.4382 1.9138 7.022 2.90e-10 ***
education 3.7305 0.3544 10.527 < 2e-16 ***
women 0.0469 0.0299 1.568 0.12

Residual standard error: 7.093 on 98 degrees of freedom


Multiple R-squared: 0.8351, Adjusted R-squared: 0.83
F-statistic: 165.4 on 3 and 98 DF, p-value: < 2.2e-16

 Question:
 Is this a reasonable model?
Prestige Assumptions?
Are There Better Model Transformations?
 Of course!
 The challenge is to discover them.
 And then, to interpret their results.
 Box-Cox Transformation: A power-transformation of the form:
y = xλ if λ≠0
= log(x) if λ=0
 Quadratic models:
y = b0 + b1x + b2x2 + e
 But we will skip these for now!
 Finding the best nonlinear relationship involve trial and error and understanding the domain.
 Alternatively, we can determine it automatically via non-parametric regression.
What if…
 What if we had a method that could automatically
detect non-linearities?
Answer: Nonparametric regression methods.
PHYSICIANS

80
LAWYERS

 More flexible than linear (parametric) regression


because they don’t make restrictive assumptions OSTEOPATHS GENERAL.MA

60
such as linearity and are distribution-free (residuals

Prestige
𝜀 ≁ N(0, 𝜎2)).
 Advantages:

40
 Better data-fit leading to better predictive capabilities.
 Disadvantages:

20
 No “neat” economic interpretations possible, e.g.,
marginal effects, etc. 0 5000 10000 15000 20000 25000

Average Income

 Entirely new concepts, such as flexible differential


equation models, etc.
Comparison with Parametric Regression
Y *
 Parametric methods: *
 Linear regression is used when there is not enough data to reliably *
estimate complex models f(.) *
*
 We then “augment” this little data with restrictive model
*
assumptions, “hoping” that these assumptions are true. X
 E.g. linear relationship, normal residuals. Y f(X)
* * *
 Nonparametric models: *
 Nonparametric regression can be used in “large” datasets, i.e.,
*
when there is enough data available to reliably estimate f(.) *
X
 When there isn’t enough data, many curves are possible; with
enough data, we can find a unique curve that fits best. Y f(X) * *
* *
* * * * * ** **
* * * *
*
** ** * *
** * *
** *
*
* *
X
How Does Nonparametric Regression Work?
 Goal:
 We want to find a function f( ) such that f(X) approximates the response Y as closely as possible.
 How?
 Construct f( ) by piecing together many individual (parametric) functions in a convenient way.
 For instance, we may piece together multiple linear and/or quadratic functions.
 This is done via polynomials, splines, and smoothing splines.
Polynomials & Piecewise Polynomials
 Polynomial:
 A polynomial of degree k is of the form:
f (t )   0  1t   2t 2     k t k
 A polynomial of degree 1 is a linear function:
f (t )   0  1t

 Piecewise polynomial:
 A polynomial that is only defined on a certain
range of the data.
 A piecewise polynomial of degree k is continuously
differentiable k-1 times.
 Eliminates excessive oscillation, however may not
be smooth at the breakpoints.
Splines & Smoothing Splines
 Spline:
 A numeric function that is piecewise-defined by polynomial functions with a high degree
of smoothness at the places where the polynomials connect.
 Linear spline: 0-times differentiable.
 Cubic spline: twice differentiable.
 Smoothing Spline:
 A method of fitting a smooth curve to a set of noisy observations using a spline function.
 Find a piecewise polynomial f(x) with smooth breakpoints
f (t )  0  1t   2t 2     pt p  l 1  pl (t  ) 
L p

 f(x) minimizes the penalized sum-of-squares:


PENNSE ( f )   y  f (t )     f ' ' ( x) dx
2 2
j j
j

λ > 0 is a smoothing parameter, which can be adjusted to achieve a desired level of smoothing,
Smoothing Spline for Prestige Data
 Two different smoothing splines fit to the Prestige
data:
 λ=0.5, little smoothing (red line)

80
 λ=1 large, more smoothing (blue line)
 Which spline represents the data better?

60
Visual inspection.

Prestige

 Cross-validate using holdout sample.

40
 Minimizing mean-squared error of fit.
 Estimating smoothing splines in R:

20
 smooth.spline (x, y)
The smoothing parameter λ is the “spar” attribute:
0 5000 10000 15000 20000 25000

smooth.spline (x, y, spar=0.5)
Average Income
Generalized Additive Model
 To estimate a nonparametric regression (via smoothing splines) with intercept and slope, we
need a new class of models, called Generalized Additive Models (R: gam).
Income and education
library(mgcv)
are both estimated via
gam1 <- gam(prestige ~ s(income) + s(education), data=d)
summary(gam1) smoothing splines (fully-
gam2 <- gam(prestige ~ s(income) + education, data=d) parametric model)
gam3 <- gam(prestige ~ income + education, data=d)
gam4 <- gam(prestige ~ s(income, education), data=d) Only income estimated by
smoothing splines (semi-
parametric model)
Interpreting GAM Results
Formula: prestige ~ s(income) Linear model (red) vs. GAM (green)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.833 1.098 42.65 <2e-16 ***

80
Approximate significance of smooth terms:
edf Ref.df F p-value

60
s(income) 2.464 3.065 46.14 <2e-16 ***

Prestige
R-sq.(adj) = 0.585

40
Deviance explained = 59.5%
GCV = 127.31 Scale est. = 122.98 n = 102

20
 Question: Based on GAM, we can learn that 0 5000 10000 15000 20000 25000

There is a positive relationship between Income and Prestige


Average Income
A.
B. The relationship between Income and Prestige is nonlinear
C. The rate of Prestige-increase levels-off after Income-values of 15,000 or higher
D. All of the above
E. None of the above
Overfitting

 How to detect overfitting:


 Split data set into train and test data; build regression model using train data and examine model
against test data.
 K-fold cross-validation.
Key Takeaways
 When do we need non-linear models?
 When the linearity assumption of linear models fails.
 How would you know if the linearity assumption is invalid?
 Start with a scatterplot.
 But residual plot is more definitive.
 How do we handle non-linear relationships?
 By transforming non-linear data to linear data using transformations:
 Log and exponential transformations.
 Changes the interpretation of regression coefficients (proportional change, not unit change).
 By using true non-linear methods:
 Quadratic models.
 Piecewise polynomials, splines, and smoothing splines (gam).
 Regression trees.

You might also like