0% found this document useful (0 votes)

56 views28 pages

Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida

The document discusses various non-linear data modeling techniques including log, exponential, and generalized linear models. It provides examples of fitting these models to real-world data like detergent sales and occupational prestige. The key advantage of non-linear models is that they can better capture non-linear relationships in data compared to simple linear models.

Uploaded by

Saitej

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views28 pages

Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida

Uploaded by

Saitej

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Non-Linear Data Models

ANOL BHATTACHERJEE, PH.D.

UNIVERSITY OF SOUTH FLORIDA
Outline
 Non-linear relationships:
 When do you want non-linear models.
 Methods for fitting non-linear relationships:
 Log models.
 Exponential models.
 Piecewise polynomials, splines, and smoothing splines.
 Generalized additive models.
 Regression trees.
Detergent Sales Example
 Problem:
 A brand manager at a consumer goods firm is studying
the sales of the firm’s flagship brand of laundry
detergent, Clean.
 Data:
 Weekly sales data over a 50-week period are obtained
from one sales district.
 Prevailing retail price for a 5-lb box of Clean per week.
 Boxes (=demand) sold that week.
 Questions:
 How does demand change as a function of price? Is
there a positive or negative trend?
 What is the shape of this relationship?
Data: Detergent Sales.csv
Linear Model
m <- lm(Qty ~ Price, data=d)
summary(m)
plot(m)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3501.99 225.57 15.525 < 2e-16 ***
Price -393.63 44.14 -8.918 9.38e-12 ***

Residual standard error: 214.3 on 48 degrees of freedom

Multiple R-squared: 0.6236, Adjusted R-squared: 0.6158
F-statistic: 79.52 on 1 and 48 DF, p-value: 9.377e-12

 What happens if we fit a linear model to non-linear data?

 Poor fit: low multiple R2.
 Model puts excessive weight on extreme points (changing
the slope), adding to SSE, and biasing our inferences.
Examples of Non-Linear Models

Linear models Log model: X on log scale

Question:
Does the detergent sales plot fit an alternative model better?
Exponential model: Y on log scale
Logarithmic Model
 How to create a log model:
 Create a new predictor variable log(Price); this is the natural
logarithm, or log to the base e (e = 2.71828).
 Check plot of Qty vs. log(Price) for linearity.
 Specify model: Qty = a + b log(Price)

plot(Qty ~ log(Price), data=d)

m <- lm(Qty ~ log(Price), data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4723.8 322.4 14.65 < 2e-16 ***
log(Price) -1993.9 199.2 -10.01 2.46e-13 ***

Residual standard error: 198.8 on 48 degrees of freedom

Multiple R-squared: 0.6761, Adjusted R-squared: 0.6693
F-statistic: 100.2 on 1 and 48 DF, p-value: 2.457e-13
The Log Model
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4723.8 322.4 14.65 < 2e-16 ***
log(Price) -1993.9 199.2 -10.01 2.46e-13 ***

 Question: The coefficient b = -1994 implies that…

A. When the price of Clean decreases by $1, demand increases by 1994 boxes.
B. When the price of Clean decreases by log($1), demand increases by 1994 boxes.
C. When the price of Clean decreases by $1, demand increases by log(1994) boxes.
D. None of the above.
 Log models must be interpreted with care.
 It makes no (practical) sense to increase/decrease the log(Price) by one unit.
 We want to know the impact of price increase by one dollar, not by log of a dollar!
Interpreting Log Models
 A look back at basic calculus:
 First order derivative of a function dy/dt is the rate of change in Y Y
for any given value of x.
 The tangent in the y vs. x plot.
 Conceptually similar to velocity or speed.

 y = a + b log (x)
dY 1 dX
b or dY  b
dX X X X
 b = dy / (dx/x)
 b is the change in y for proportional change in x (relative to original
value of X), i.e., how much y would change if x changed by 100%
or doubled (not change of 1 unit).
Log Model
m <- lm(Qty ~ log(Price), data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4723.8 322.4 14.65 < 2e-16 ***
log(Price) -1993.9 199.2 -10.01 2.46e-13 ***

 Question:  Question:
 b = -1994 implies that when price  At a price of $4, the predicted demand
increases by ___, demand decreases according to this model is…
by _____ boxes. A. -104,182 boxes
A. $1 ; 1994 B. -3254 boxes
C. 0 boxes
B. $1994 ; 1
D. 1959 boxes
C. 1% ; 1994
E. 4724 boxes
D. 100% ; 1994
F. 7979 boxes
E. None of the above G. None of the above
Exponential Model
 Regression model: Qty = c eb Price
 Model estimated as: log(Qty) = a + b Price
m <- lm(log(Qty) ~ Price, data=d)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.48937 0.13359 63.550 < 2e-16 ***
Price -0.23550 0.02614 -9.009 6.88e-12 ***

Multiple R-squared: 0.6284, Adjusted R-squared: 0.6206

F-statistic: 81.16 on 1 and 48 DF, p-value: 6.877e-12

 Interpretation:
1 dY dY
b or  b  dX
Y dX Y

 b = -0.236 implies that when price increases by $1, demand decreases on average by 23.58%
 When price increases by $0.10, demand decreases by 0.2358*0.1 = 0.02358 or 2.358%
Exponential Model: Another Interpretation
Qty(Price) = c eb Price
Qty(Price+1) = c eb (Price+1) = c eb*Price  eb*1
Qty(Price+1) / Qty(Price) = eb m <- lm(log(Qty) ~ Price, data=d)
eb = Change in Qty (y) per
Estimate Std. Error t value Pr(>|t|)
unit change in Price (x) (Intercept) 8.48937 0.13359 63.550 < 2e-16 ***
e-0.2355 = 0.79 implies that Qty drops Price -0.23550 0.02614 -9.009 6.88e-12 ***

by 21 units for $1 change in price. Multiple R-squared: 0.6284, Adjusted R-squared: 0.6206
F-statistic: 81.16 on 1 and 48 DF, p-value: 6.877e-12

 Question:
 Can we have a model of the form: log(y) = a + b*log(x)?
 If so, how will you interpret the beta coefficient b?
Generalized Linear Models
 A generalization of linear regression model to allow for non-normal DV, i.e., 𝜀 ≁ N(0, 𝜎2).
 Lognormal (exponential) distribution: Y is exponentially decreasing; log(Y) may have 𝜀 ~ N(0, 𝜎2).
 Logistic distribution: Y is binary, e.g. loan default (vs. solvent loans).
 But we are interested probability of default P[Y=1], which may have 𝜀 ~ N(0, 𝜎2).
 Poisson distribution: Y is a count, e.g., Number of calls received at a call center per hour.
 Binomial distribution: Y is the count of a binary occurrence, e.g., number of loan defaults in different
banks.
 R code:
m1 <- glm(log(Qty) ~ Price, data =d, family=gaussian) Gaussian = Normal
m2 <- glm(defaults ~ meanROA + loantarget, data=d, family=binomial) log(Y) = a + bX + e
m2 <- glm(callcount ~ hourofday + dayofmonth, data=d, family=poisson)

 Questions: What results do you get if you run the Qty-Price model as a glm model? How do these
results compare with the lm output?
Common Distributions
Prestige Data Example
 A 1960s survey on the perceived prestige of different occupations reports the following data:
 Average monthly income.
 Prestige score (from a social survey).
 Additional information (education, pct women, etc).
 How would you examine the impact of income on prestige?
education income women prestige
GOV.ADMINISTRATORS 13.11 12351 11.16 68.8
GENERAL.MANAGERS 12.26 25879 4.02 69.1
ACCOUNTANTS 12.77 9271 15.7 63.4
PURCHASING.OFFICERS 11.42 8865 9.11 56.8
CHEMISTS 14.62 8403 11.68 73.5
PHYSICISTS 15.64 11030 5.13 77.6
BIOLOGISTS 15.09 8258 25.65 72.6
ARCHITECTS 15.44 14163 2.69 78.1
CIVIL.ENGINEERS 14.52 11377 1.03 73.1
MINING.ENGINEERS 14.64 11023 0.94 68.8 Data: Prestige.csv
How to Model Prestige Data
 Perceived prestige vs. average income:
 Improper to exclude Physicians, General Managers,
Osteopaths, etc. as outliers, since these are real
professions. PHYSICIANS

80
LAWYERS

 If extreme points are included, they will drag (bias)

the slope to increase SSE. OSTEOPATHS GENERAL.MA

Clearly non-linear model fits better.

60


Prestige
 Which model?
We can try log model -

40

better than linear model
but fit may be a problem.

20
0 5000 10000 15000 20000 25000

Average Income
Regression of Prestige Data
m <- lm(prestige ~ log(income) + education + women,
data=d) But what about these data?
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -110.9658 14.8429 -7.476 3.27e-11 ***
log(income) 13.4382 1.9138 7.022 2.90e-10 ***
education 3.7305 0.3544 10.527 < 2e-16 ***
women 0.0469 0.0299 1.568 0.12

Residual standard error: 7.093 on 98 degrees of freedom

Multiple R-squared: 0.8351, Adjusted R-squared: 0.83
F-statistic: 165.4 on 3 and 98 DF, p-value: < 2.2e-16

 Question:
 Is this a reasonable model?
Prestige Assumptions?
Are There Better Model Transformations?
 Of course!
 The challenge is to discover them.
 And then, to interpret their results.
 Box-Cox Transformation: A power-transformation of the form:
y = xλ if λ≠0
= log(x) if λ=0
 Quadratic models:
y = b0 + b1x + b2x2 + e
 But we will skip these for now!
 Finding the best nonlinear relationship involve trial and error and understanding the domain.
 Alternatively, we can determine it automatically via non-parametric regression.
What if…
 What if we had a method that could automatically
detect non-linearities?
Answer: Nonparametric regression methods.
PHYSICIANS


80
LAWYERS

 More flexible than linear (parametric) regression

because they don’t make restrictive assumptions OSTEOPATHS GENERAL.MA

60
such as linearity and are distribution-free (residuals

Prestige
𝜀 ≁ N(0, 𝜎2)).
 Advantages:

40
 Better data-fit leading to better predictive capabilities.
 Disadvantages:

20
 No “neat” economic interpretations possible, e.g.,
marginal effects, etc. 0 5000 10000 15000 20000 25000

Average Income

 Entirely new concepts, such as flexible differential

equation models, etc.
Comparison with Parametric Regression
Y *
 Parametric methods: *
 Linear regression is used when there is not enough data to reliably *
estimate complex models f(.) *
*
 We then “augment” this little data with restrictive model
*
assumptions, “hoping” that these assumptions are true. X
 E.g. linear relationship, normal residuals. Y f(X)
* * *
 Nonparametric models: *
 Nonparametric regression can be used in “large” datasets, i.e.,
*
when there is enough data available to reliably estimate f(.) *
X
 When there isn’t enough data, many curves are possible; with
enough data, we can find a unique curve that fits best. Y f(X) * *
* *
* * * * * ** **
* * * *
*
** ** * *
** * *
** *
*
* *
X
How Does Nonparametric Regression Work?
 Goal:
 We want to find a function f( ) such that f(X) approximates the response Y as closely as possible.
 How?
 Construct f( ) by piecing together many individual (parametric) functions in a convenient way.
 For instance, we may piece together multiple linear and/or quadratic functions.
 This is done via polynomials, splines, and smoothing splines.
Polynomials & Piecewise Polynomials
 Polynomial:
 A polynomial of degree k is of the form:
f (t )   0  1t   2t 2     k t k
 A polynomial of degree 1 is a linear function:
f (t )   0  1t

 Piecewise polynomial:
 A polynomial that is only defined on a certain
range of the data.
 A piecewise polynomial of degree k is continuously
differentiable k-1 times.
 Eliminates excessive oscillation, however may not
be smooth at the breakpoints.
Splines & Smoothing Splines
 Spline:
 A numeric function that is piecewise-defined by polynomial functions with a high degree
of smoothness at the places where the polynomials connect.
 Linear spline: 0-times differentiable.
 Cubic spline: twice differentiable.
 Smoothing Spline:
 A method of fitting a smooth curve to a set of noisy observations using a spline function.
 Find a piecewise polynomial f(x) with smooth breakpoints
f (t )  0  1t   2t 2     pt p  l 1  pl (t  ) 
L p

 f(x) minimizes the penalized sum-of-squares:

PENNSE ( f )   y  f (t )     f ' ' ( x) dx
2 2
j j
j

λ > 0 is a smoothing parameter, which can be adjusted to achieve a desired level of smoothing,
Smoothing Spline for Prestige Data
 Two different smoothing splines fit to the Prestige
data:
 λ=0.5, little smoothing (red line)

80
 λ=1 large, more smoothing (blue line)
 Which spline represents the data better?

60
Visual inspection.

Prestige

 Cross-validate using holdout sample.

40
 Minimizing mean-squared error of fit.
 Estimating smoothing splines in R:

20
 smooth.spline (x, y)
The smoothing parameter λ is the “spar” attribute:
0 5000 10000 15000 20000 25000

smooth.spline (x, y, spar=0.5)
Average Income
Generalized Additive Model
 To estimate a nonparametric regression (via smoothing splines) with intercept and slope, we
need a new class of models, called Generalized Additive Models (R: gam).
Income and education
library(mgcv)
are both estimated via
gam1 <- gam(prestige ~ s(income) + s(education), data=d)
summary(gam1) smoothing splines (fully-
gam2 <- gam(prestige ~ s(income) + education, data=d) parametric model)
gam3 <- gam(prestige ~ income + education, data=d)
gam4 <- gam(prestige ~ s(income, education), data=d) Only income estimated by
smoothing splines (semi-
parametric model)
Interpreting GAM Results
Formula: prestige ~ s(income) Linear model (red) vs. GAM (green)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.833 1.098 42.65 <2e-16 ***

80
Approximate significance of smooth terms:
edf Ref.df F p-value

60
s(income) 2.464 3.065 46.14 <2e-16 ***

Prestige
R-sq.(adj) = 0.585

40
Deviance explained = 59.5%
GCV = 127.31 Scale est. = 122.98 n = 102

20
 Question: Based on GAM, we can learn that 0 5000 10000 15000 20000 25000

There is a positive relationship between Income and Prestige

Average Income
A.
B. The relationship between Income and Prestige is nonlinear
C. The rate of Prestige-increase levels-off after Income-values of 15,000 or higher
D. All of the above
E. None of the above
Overfitting

 How to detect overfitting:

 Split data set into train and test data; build regression model using train data and examine model
against test data.
 K-fold cross-validation.
Key Takeaways
 When do we need non-linear models?
 When the linearity assumption of linear models fails.
 How would you know if the linearity assumption is invalid?
 Start with a scatterplot.
 But residual plot is more definitive.
 How do we handle non-linear relationships?
 By transforming non-linear data to linear data using transformations:
 Log and exponential transformations.
 Changes the interpretation of regression coefficients (proportional change, not unit change).
 By using true non-linear methods:
 Quadratic models.
 Piecewise polynomials, splines, and smoothing splines (gam).
 Regression trees.

M348 Applied Statistical Modelling - Linear Models
No ratings yet
M348 Applied Statistical Modelling - Linear Models
504 pages
M348 Applied Statistical Modelling - Generalised Linear Models
No ratings yet
M348 Applied Statistical Modelling - Generalised Linear Models
295 pages
CSE3506 PPT Ref1
No ratings yet
CSE3506 PPT Ref1
135 pages
Alternator Parts and Function
70% (10)
Alternator Parts and Function
3 pages
11 - Econometrics - Linear Regression
No ratings yet
11 - Econometrics - Linear Regression
20 pages
12-Econometrics-Linear Regression
No ratings yet
12-Econometrics-Linear Regression
18 pages
RSM1282-2025-Session 9-Binary Dependent Variables & Logistic Regression - POST
No ratings yet
RSM1282-2025-Session 9-Binary Dependent Variables & Logistic Regression - POST
35 pages
MKT3600 - L10 - Correlation and Regression 2
No ratings yet
MKT3600 - L10 - Correlation and Regression 2
30 pages
EC501 Lecture 04
No ratings yet
EC501 Lecture 04
30 pages
Sales and Advertising
No ratings yet
Sales and Advertising
14 pages
APAN 5200 - LinearRegression
No ratings yet
APAN 5200 - LinearRegression
39 pages
Model Builing
No ratings yet
Model Builing
45 pages
SAT Suite Question Bank - Problem-Solving and Data Analysis
No ratings yet
SAT Suite Question Bank - Problem-Solving and Data Analysis
341 pages
LinearRegressionUsing R
No ratings yet
LinearRegressionUsing R
91 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
l9 Osi Model
No ratings yet
l9 Osi Model
172 pages
Deena Assignment 2
No ratings yet
Deena Assignment 2
16 pages
Notes Part 2
No ratings yet
Notes Part 2
101 pages
Regression Functions
No ratings yet
Regression Functions
14 pages
Linear Regression
No ratings yet
Linear Regression
46 pages
Advanced Information Systems Analysis and Design: Class 2: System Development Processes and Methods
No ratings yet
Advanced Information Systems Analysis and Design: Class 2: System Development Processes and Methods
98 pages
Advanced Information Systems Analysis and Design: Class 10: New Directions in Software Development
No ratings yet
Advanced Information Systems Analysis and Design: Class 10: New Directions in Software Development
95 pages
Bcom RMFC Lab - Lab File Bcom RMFC Lab - Lab File
No ratings yet
Bcom RMFC Lab - Lab File Bcom RMFC Lab - Lab File
40 pages
Machine Learning-Lecture 2 (Student)
No ratings yet
Machine Learning-Lecture 2 (Student)
9 pages
2-06 Non-Linear Models - Logged Variables and Standardized Coefficients
No ratings yet
2-06 Non-Linear Models - Logged Variables and Standardized Coefficients
28 pages
15 Types of Regression You Should Know
No ratings yet
15 Types of Regression You Should Know
30 pages
Advanced Information Systems Analysis and Design: Class 3: Requirements, Specification, and Architecture
No ratings yet
Advanced Information Systems Analysis and Design: Class 3: Requirements, Specification, and Architecture
71 pages
Estimating Demand: Regression Analysis
No ratings yet
Estimating Demand: Regression Analysis
29 pages
DSME2040 Regression Students
No ratings yet
DSME2040 Regression Students
35 pages
Logistic Regression Video Exhibits Markup
No ratings yet
Logistic Regression Video Exhibits Markup
45 pages
Temporal and Spatial Models: Anol Bhattacherjee, Ph.D. University of South Florida
No ratings yet
Temporal and Spatial Models: Anol Bhattacherjee, Ph.D. University of South Florida
39 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Homework 2
100% (1)
Homework 2
14 pages
Ccc+Exam PMP
No ratings yet
Ccc+Exam PMP
109 pages
Jacquenetta SlidesCarnival
No ratings yet
Jacquenetta SlidesCarnival
32 pages
Fluid Mechanics Handout
No ratings yet
Fluid Mechanics Handout
146 pages
Advanced Information Systems Analysis and Design: Class 6: Introduction To Icase With Argouml
No ratings yet
Advanced Information Systems Analysis and Design: Class 6: Introduction To Icase With Argouml
30 pages
MGT 6203 - Sri - M3 - Nonlinear Models v042619
No ratings yet
MGT 6203 - Sri - M3 - Nonlinear Models v042619
19 pages
MM Cia1
No ratings yet
MM Cia1
17 pages
L2 ObjectOrientedProgrammingIntroduction
No ratings yet
L2 ObjectOrientedProgrammingIntroduction
57 pages
Cursus Advanced Econometrics
No ratings yet
Cursus Advanced Econometrics
129 pages
6304 Time Series Video Exhibits Markup
No ratings yet
6304 Time Series Video Exhibits Markup
20 pages
Flexible Data Models: Dummy Variables and Interaction Effects
100% (1)
Flexible Data Models: Dummy Variables and Interaction Effects
31 pages
Model Specification
No ratings yet
Model Specification
27 pages
Transformation and Dummy Variables Econometrics
No ratings yet
Transformation and Dummy Variables Econometrics
34 pages
Chapter 06-Regression Analysis
No ratings yet
Chapter 06-Regression Analysis
41 pages
Data Dictionary
No ratings yet
Data Dictionary
19 pages
SM Notes 2020
No ratings yet
SM Notes 2020
139 pages
Module 1 Week 7 8 Physics 1 PDF
No ratings yet
Module 1 Week 7 8 Physics 1 PDF
32 pages
Combined System TRNSYS 2024
No ratings yet
Combined System TRNSYS 2024
15 pages
Slides Marked As Extra Study Are Not As A Part of Syllabus. Those Are Provided For Add-On Knowledge
No ratings yet
Slides Marked As Extra Study Are Not As A Part of Syllabus. Those Are Provided For Add-On Knowledge
45 pages
Model Development
No ratings yet
Model Development
80 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages
Session 3: - Quantitative Demand
No ratings yet
Session 3: - Quantitative Demand
32 pages
Lec 7
No ratings yet
Lec 7
39 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
Error Codes 2008 K
No ratings yet
Error Codes 2008 K
3 pages
Distributed Information Systems: Prototypical Active Website Rest-Apis, Json, Database, Charts
No ratings yet
Distributed Information Systems: Prototypical Active Website Rest-Apis, Json, Database, Charts
31 pages
Regression Analysis For Forecasting: Yosef Daryanto
No ratings yet
Regression Analysis For Forecasting: Yosef Daryanto
36 pages
Distributed Information Systems: Lecture 4 - Linq
No ratings yet
Distributed Information Systems: Lecture 4 - Linq
19 pages
Distributed Information Systems: Lecture 4 - Linq
No ratings yet
Distributed Information Systems: Lecture 4 - Linq
19 pages
Unit 3
No ratings yet
Unit 3
30 pages
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
No ratings yet
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
65 pages
Distributed Information Systems: Lecture 4 - Entity Framework Based On Julia Lerman, Chs 1-8
No ratings yet
Distributed Information Systems: Lecture 4 - Entity Framework Based On Julia Lerman, Chs 1-8
58 pages
Engine Control System: Workshop Manual
97% (33)
Engine Control System: Workshop Manual
267 pages
Distributed Information Systems: Lecture9 - Networkingbasics: Osi Model Basedon Agrawalandsharma, Prospectpress
No ratings yet
Distributed Information Systems: Lecture9 - Networkingbasics: Osi Model Basedon Agrawalandsharma, Prospectpress
172 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
No ratings yet
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
58 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Anderson Ch16
No ratings yet
Anderson Ch16
59 pages
Advanced Information Systems Analysis and Design: Class 8: Software System Security
No ratings yet
Advanced Information Systems Analysis and Design: Class 8: Software System Security
17 pages
L4 Linq
No ratings yet
L4 Linq
19 pages
Statistical Modelling: Regression: Choosing The Independent Variables
No ratings yet
Statistical Modelling: Regression: Choosing The Independent Variables
14 pages
Assignment-15 BA
No ratings yet
Assignment-15 BA
11 pages
Distributed Information Systems: Lecture 2 - Git, Object-Oriented Programming
No ratings yet
Distributed Information Systems: Lecture 2 - Git, Object-Oriented Programming
57 pages
DevOps 2018 Report
0% (1)
DevOps 2018 Report
46 pages
Assignment March 2021
No ratings yet
Assignment March 2021
4 pages
Grade 3 DLL MATH 3 Q3 Week 5
No ratings yet
Grade 3 DLL MATH 3 Q3 Week 5
3 pages
Econometrics (Yamaguchi)
No ratings yet
Econometrics (Yamaguchi)
30 pages
Distributed Information Systems: Javascript and Jquery
No ratings yet
Distributed Information Systems: Javascript and Jquery
56 pages
Distributed Information Systems: Javascript and Jquery
No ratings yet
Distributed Information Systems: Javascript and Jquery
56 pages
CH 06
No ratings yet
CH 06
22 pages
Introductory Econometrics: Regression Functional Form, Model Selection, Prediction
No ratings yet
Introductory Econometrics: Regression Functional Form, Model Selection, Prediction
32 pages
River Reach Data Fields-Landscape
No ratings yet
River Reach Data Fields-Landscape
11 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
A Step by Step Oracle DB Migration Test Case 1704032098
No ratings yet
A Step by Step Oracle DB Migration Test Case 1704032098
11 pages
Distributed Information Systems: Prototypicalactivewebsite R Est - Apis, Js On, Data Ba Se, Cha Rts
No ratings yet
Distributed Information Systems: Prototypicalactivewebsite R Est - Apis, Js On, Data Ba Se, Cha Rts
31 pages
Chapter Five Demand Estimation: Page 1 of 22
No ratings yet
Chapter Five Demand Estimation: Page 1 of 22
22 pages
GJ BOE 2014 Paper 1 Solution
100% (1)
GJ BOE 2014 Paper 1 Solution
12 pages
A Study of Financial Performance Using DuPont Analysis in Food Distribution Market
No ratings yet
A Study of Financial Performance Using DuPont Analysis in Food Distribution Market
9 pages
Engine Harness (W28) Component Location-6.8L Tier 2/stage II
No ratings yet
Engine Harness (W28) Component Location-6.8L Tier 2/stage II
26 pages
Ex 3
No ratings yet
Ex 3
12 pages
Estimating Demand: Learn How To Interpret The Results of Regression Analysis Based On Demand Data
No ratings yet
Estimating Demand: Learn How To Interpret The Results of Regression Analysis Based On Demand Data
18 pages
Demand Forecasting CH 4
No ratings yet
Demand Forecasting CH 4
17 pages
15AES213 Aerospace Structures 1: Assignments 2
No ratings yet
15AES213 Aerospace Structures 1: Assignments 2
2 pages
2019 Spring Syllabus ISM6562 Muma 1
No ratings yet
2019 Spring Syllabus ISM6562 Muma 1
8 pages
Chapter 4: Economic Analysis
No ratings yet
Chapter 4: Economic Analysis
18 pages
Che-01 Ignou
No ratings yet
Che-01 Ignou
8 pages
The Impact of Visual Merchandising On Consumer Impulsive Buying Behaviour
No ratings yet
The Impact of Visual Merchandising On Consumer Impulsive Buying Behaviour
12 pages
Empirical Models: Data Collection
No ratings yet
Empirical Models: Data Collection
16 pages
5 11
No ratings yet
5 11
25 pages
Open CVIntro
No ratings yet
Open CVIntro
13 pages
AEMO Data
No ratings yet
AEMO Data
9 pages
What Is Empirical - Models
No ratings yet
What Is Empirical - Models
14 pages
MIDI+Pedal+Converter+MPC-10+Instructions+V2 0
No ratings yet
MIDI+Pedal+Converter+MPC-10+Instructions+V2 0
9 pages
Deformation Analysis For The Rotary Draw Bending Process of Circular Tubes - Stress Distribution and Wall Thinning
No ratings yet
Deformation Analysis For The Rotary Draw Bending Process of Circular Tubes - Stress Distribution and Wall Thinning
5 pages
2018 Chemistry Standardised Test For Science Stream (SPM)
No ratings yet
2018 Chemistry Standardised Test For Science Stream (SPM)
7 pages
Chapter 4 ECON NOTES
No ratings yet
Chapter 4 ECON NOTES
8 pages
LEARNING ACTIVITY SHEET 1 - Science 10
No ratings yet
LEARNING ACTIVITY SHEET 1 - Science 10
3 pages
Functional Forms of Regression Models
No ratings yet
Functional Forms of Regression Models
6 pages
Intro Geo 4 - Surface Area Nets
No ratings yet
Intro Geo 4 - Surface Area Nets
8 pages
Characteristics of The Long Jump Heinz Weidner Harmut Dickwach
100% (1)
Characteristics of The Long Jump Heinz Weidner Harmut Dickwach
6 pages
Lambda Sensor LSU 49 Datasheet 51 en 2779147659pdf
No ratings yet
Lambda Sensor LSU 49 Datasheet 51 en 2779147659pdf
4 pages
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet

Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida

Uploaded by

Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida

Uploaded by

Non-Linear Data Models

ANOL BHATTACHERJEE, PH.D.

Estimate Std. Error t value Pr(>|t|)

Residual standard error: 214.3 on 48 degrees of freedom

 What happens if we fit a linear model to non-linear data?

Linear models Log model: X on log scale

plot(Qty ~ log(Price), data=d)

Residual standard error: 198.8 on 48 degrees of freedom

 Question: The coefficient b = -1994 implies that…

Estimate Std. Error t value Pr(>|t|)

Multiple R-squared: 0.6284, Adjusted R-squared: 0.6206

 If extreme points are included, they will drag (bias)

Clearly non-linear model fits better.

Residual standard error: 7.093 on 98 degrees of freedom

 More flexible than linear (parametric) regression

 Entirely new concepts, such as flexible differential

 f(x) minimizes the penalized sum-of-squares:

There is a positive relationship between Income and Prestige

 How to detect overfitting:

You might also like