0% found this document useful (0 votes)

16 views70 pages

Lecture 6 Regression v1

Uploaded by

Mah noor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views70 pages

Lecture 6 Regression v1

Uploaded by

Mah noor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

Lecture 6:

Regressions

Dr. Joshua Huang (黄哲学)

Shenzhen Institutes of Advanced Technology

Chinese Academy of Sciences

1
Agenda

• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions

2
Univariate Regression

• Given a distribution of a set of two dimensional points, find a straight line or a

curve Y=f(X) that can best fit the points
• Given a X, f(X) can be used to estimate Y

Y=f(X) Y=f(X)

Linear Nonlinear

X
X

3
Multivariate Regression

• Given:
– a large number of data records with m attributes (variables)
– One attribute (variable) Y to be predicted from the others
– Given a function form Y=f(X1, X2, …,Xm-1)
• Look for:
– Learn a specific function from the training data and use the
function to predict the value Y of new records
– The function is called the regression model

4
Regression Techniques

• Linear regression attempts to predict the value of a

continuous target as a linear function of one or more
independent inputs
• Nonlinear regression attempts to predict the value of a
continuous target as a nonlinear function of one or more
independent inputs
• Logistic regression attempts to predict the probability that
a binary or ordinal target will acquire the event of interest
as a function of one or more independent inputs

5
Regression Models

• Regression Model:

Y    1 X1   2 X 2     n X n  
• Not appropriate for dichotomous response variable, i.e. Y is either 1 or 0.
• Logistic Regression Model:

P  P(Y  1 | X 1 , X 2 ,, X n )
P
ln    1 X 1   2 X 2     n X n
1 P
e (   1 X1   2 X 2   n X n )
P
1  e (   1 X1   2 X 2   n X n )

6
How Do We Decide Which Model?
• Look at the target variable
– If the target variable is continuous, then linear regression is a good
place to start. If linear regression is not satisfactory, try nonlinear
regression.
– If the target variable is dichotomous, then logistic regression is
better.

7
Issues

• Determine the model forms

• Learn a model
• Training data
• Non-linearity

8
Agenda

• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions

9
Simple linear regression and Correlation

• Quantifying the relationship between two continuous

variables

• Predict (or forecast) the value of one variable from

knowledge of the value of another variable

10
Simple linear regression
• In simple linear regression we generate an equation to
calculate the value of a dependent variable (Y) from an
independent variable (X)

11
Regression Model

• Say you drive to work at an average of 60 km’ s/hour. It takes

about 1 minute for every kilometre travelled…
• This is a mathematical model that represents the relationship
between the two variables
7
Time taken (minutes)

6 Time = 1  distance
5
4
3
2
1
0
0 2 4 6 8
Distance traveled (km's)
12
Regression Model
• If you take some time to walk to your car and then walk from the car to
work, this takes an extra 3 minutes per day. The model becomes

10
Time taken (minutes)

9
8
7 Time = 3 + 1  distance
6
5
4
3
2
1
0
0 2 4 6 8
Distance traveled (km's)
13
Regression Model

• If the travel distance for each minute is not precise because of

traffic, roadworks, etc., the model becomes
12
Time taken (minutes)

10 Time = 3 + 1 distance + random effect

6
4

2
0
0 2 4 6 8
Distance traveled (km's)

14
Bi-variate Linear Regression Model

• In general, the bi-variate regression equation takes

the form;
y   0  1 x  e
•
• y = the dependent variable
• x = the independent variable
• o = The y-intercept
• 1 = The slope of the line
• e = random error term

15
Line of Best Fit
• Given a data set, we need to find a way of calculating the
parameters of the equation
?
14 ?
?
12
10
8
6
4
2
0
0 2 4 6 8 10

• We need to find the line of best fit

16
Line of Best Fit
• Because the line will seldom fit the data precisely, there is
always some error associated with our line
• The line of best fit is the line that minimises the spread of these
errors

14
12 ŷ
10
8
6
( yi  yˆi )
4
2
0
0 2 4 6 8 10

17
Error Term

• The term (yi- ŷ ) is known as the error or residual

•
• ei = (yi- ŷ )

• The line of best fit occurs when the Sum of the Squared Errors is
minimised

n
SSE   ( yi  y
ˆ )2
i 1

18
Estimates of Parameters

• The slope of the line

SS xy
̂1 
SS x
• where n
SS xy   ( xi  x )( yi  y )
i 1

• and
n
SS x   ( xi  x ) 2
i 1

19
Estimates of Parameters

• Y intercept
ˆ0  y  ˆ1 x

• where
n n

y i x i
y i 1
x i 1

n n

20
Example

• X Y
x  37.83
• kilos cost $
y  153.83
• 17 132
• 21 150
SSxy  891.83
• 35 160 SSx  1612.83
• 39 162
• 50 149
• 65 170

21
Example

ˆ SS xy 891.83
1    0.533
SS x 1612.83

ˆ0  y  ˆ1 x  153.83  0.553  37.83  132.91

And the equation is

y = 132.91 + 0.553x

22
Interpret Parameter Estimates
• In the previous example, the estimate of the slope ̂ was
1
0.553. This means that for every change in X of 1 kg,
there will be a change in Y of 0.553 dollor.

23
Interpret the Parameter Estimates
̂ is the y intercept. I.e., the point at which the line crosses
0
the y axis. In this case $132.91

$132.91

It is the value of Y when X = 0

24
Extrapolation
• Extrapolation is when you extend the meaning of the equation
outside the bounds of the data.

• In the previous example, the X values ranged between 17 and

65 kilos. It would therefore be unwise to make a comment on
the relationship outside this range

25
Other Regression Analyses

• The above procedure is called point estimation using the least square
estimation method (LSE) to find the parameters of the linear
regression model 0 and 1
• Other analyses include
– Scatter plot of Y against X to check the linearity
– Confidence intervals for 0 and 1
– Statistical test
– Correlation analysis

26
Agenda

• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions

27
Multiple Regression:

• More input variables (predictors)

– Age, gender
– Education, income
– Race

28
Multiple Linear Regression
• Every input variable should have an impact on the outcome
– Estimates how much the outcome would change if this predictor increased
one unit and all the rest stayed constant.
• Minimal co-linearity
– Input variables (Predictors) are not themselves too tightly dependent to
each other. If two variables are perfectly related, when one is held
constant, the other is as well.
– Need to check this assumption first!

29
Multivariate Linear Regression

• Relation between one continuous variable y and a set of continuous

variables X=(x1, x2, …, xk)

y   0  β1 x1  β2 x2  ...  βk xk  
• Give a set of data records about X and y, we can calculate the
coefficients  0 , 1 ,  2 ,...,  k

• After  0 , 1 ,  2 ,...,  k are known, given a new record about X, we can

compute the corresponding value y. For example, estimate the annual
income of a customer.

30
Assumptions for Multivariate Linear
Regression

• Notation: The (k+1)-variable {(Y, X1, …, Xk)} is the study

population.

• Population assumption 1.
– The mean μY of the subpopulation of Y values with X1=x1, …, Xk=xk is

Y ( x1 ,..., xk )   0  1 x1  ...   k xk
– This indicates that any given input record (X1=x1, …, Xk=xk ) is related
to a set of Y values and the value from the regression model is the mean
value of the Y values.

31
Assumptions for Multivariate Linear
Regression
• Population assumption 2
– The standard deviation of the Y values for any input record (X1=x1, …,
Xk=xk ) is the same.
• Population assumption 3
– Each subpopulation of the Y values has a Gaussian distribution
– The study population {(Y, X1, …, Xk)} is a (k+1)-variable Gaussian population
• Population assumption 4
– All sample data are obtained by simple random sampling
• Population assumption 5
– All sample values yi, xi1, xik for i=1, …,n are observed without error.

32
Sample Data

Y X1 X2 Xk
y1 x1,1 x1,2 … x1,k
y2 x2,1 x2,2 … x2,k
. . . . .
. . . . .
. . . . .
yi xi,1 xi,2 … xi,k
. . . . .
. . . . .
. . . . .
yn xn,1 xn,2 … xn,k

33
Least Square Estimation Method

• Given a sample data and the regression formula

y   0  β1 x1  β2 x2  ...  βk xk
• We write the estimate regression formula as

yˆ  ˆ0  βˆ1 x1  βˆ 2 x2  ...  βˆ k xk

• Assume using the estimate regression formula, we compute a set of

Y values { yˆ1 ,..., yˆ i ,..., yˆ n }. Comparing these values with the
corresponding real values { y1 ,..., yi ,..., yn }, we can compute the
error
eˆi  yi  yˆ i  yi  [ ˆ0  ˆ1 xi ,1  ...  ˆk xi ,k ]
34
Least Square Estimation Method
• The least square estimates ˆ0 , ˆ1 , ˆ2 ,..., ˆk is chosen in such a way
that minimizes the sum of squared errors

n n
SSE   eˆ   ( yi  ˆ0  ˆ1 xi ,1  ...  ˆk xi ,k ) 2
2
i
i 1 i 1

• How to compute ˆ0 , ˆ1 , ˆ2 ,..., ˆk to minimize the above function?

35
Least Square Estimation: Matrix
Representation
• Let { y1 ,..., yi ,..., yn } and { ˆ0 , ˆ1 , ˆ2 ,..., ˆk } be represented as
an (n x 1) and a (k x 1) vector. Let X bet an (n x (k+1)) matrix.

1 x1,1 x1,2 . . . x1,k 

 
1 x2,1 x2,2 . . . x2,k 
 y1  . . . . .   ˆ0 
y     
 2 . . . . .   ˆ1 
.  . . . . .   
y  X   ˆ  . 
β
.  1 xi,1 xi,2 . . . xi,k  . 
.     
  . . . . . 
. 
 yn  . . . . .   ˆ 
   k
. . . . . 
1 xn,1 xn,2 . . . xn,k 
 

36
Least Square Estimation: Matrix
Representation
• In matrix representation, the estimate regression
formula

yˆ  ˆ0  βˆ1 x1  βˆ 2 x2  ...  βˆ k xk

• .
Can be represented as

ˆ
yˆ  X β eˆ1 
eˆ 
 2
• ˆ1 , eˆ2 ,..., eˆn } as an (n x 1)
If we write errors { e
. 
vector, then eˆ   
. 
ˆ
eˆ  y  yˆ  y  X β . 
 
eˆn 
37
Least Square Estimation: Matrix
Representation
• Given
ˆ
y Xβ
• .
• Where Y={ y1 ,..., yi ,..., yn }, we can solve ˆ0 , ˆ1 , ˆ2 ,..., ˆk as
follows

• ˆ
X X β X y
T T

• and

ˆ
β (X X) X y
T -1 T

38
Example

• Let

6  1 3 9 16 
1  ˆ0 
9 
 6 13 13  
   ˆ 
12 1 4 3 17  ˆ
X   β   1
y 
1 8 2 10   ˆ2 
5  ˆ 
13 1 9 
3 
3 4
   
1 2 4 7 
2 

39
Example

1 1 1 1 1 1
3 6 4 8 3 2
X T

 9 13 3 2 4 4
 
16 13 17 10 9 7

 47 
6 26 35 72  203
26 153 315 
X TX 
138 X Ty  
35 153 295 448 277
   
72 315 448 944 598 

40
Example
 2.59578 - 0.15375 - 0.01962 - 0.13737   47 
- 0.15375 0.03965 - 0.00014 - 0.00144  203 
ˆ  (X X ) X y  
T -1 T   
- 0.01962 - 0.00014 0.01234 - 0.00431  277 
   
- 0.13737 - 0.00144 - 0.00431 0.01406  598 
 3.20975 
- 0.07573
 
- 0.11162
 
 0.46691 

Thus,
ˆ0  3.20975 ˆ1  0.07573 ˆ2  0.11162 ˆ3  0.46691
yˆ  3.20975 0.07573x1  0.11162x2  0.46691x3
41
Agenda

• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions

42
Logistic Regression

• Model relationship between set of variables xi

– dichotomous (yes/no)
– categorical (social class, ... )
– continuous (age, ...)

and
– A target variable Y which is dichotomous (binary).

• Dichotomous variables: Respond or Not Respond, Risk

or Not Risk, Claim or No claim

43
Example Data

Age and signs of coronary heart disease (CD)

Age CD Age CD Age CD

22 0 40 0 54 0
23 0 41 1 55 1
24 0 46 0 58 1
27 0 47 0 60 1
28 0 48 0 60 0
30 0 49 1 62 1
30 0 49 0 65 1
32 0 50 1 67 1
33 0 51 0 71 1
35 1 51 1 77 1
38 0 52 0 81 1
How Can We Analyse These Data?

• Compare mean age of diseased and non-diseased

– Non-diseased: 38.6 years

– Diseased: 58.7 years
Dot-plot

Y
es
Signsofcoronarydisease

N
o

0 2
0 4
0 6
0 8
0 1
00
A
GE(y
ears
)
Group The Data
Prevalence (%) of signs of CD according to age group
Diseased

Age group # in group # %

20 - 29 5 0 0

30 - 39 6 1 17

40 - 49 7 2 29

50 - 59 7 4 57

60 - 69 5 4 80

70 - 79 2 2 100

80 - 89 1 1 100
Dot-plot: Grouping Data

Diseased % 100

0
0 2 4 6 8
Age group
Logistic Function

Probability 1.0
of disease e  x
P( y x) 
0.8 1  e  x

0.6

0.4

0.2

0.0
x
Logistic Transformation

• Take the natural log of

e  x
P( y x) 
1  e  x
• We have

 P( y x ) 
ln      x
1  P( y x ) 

50
Advantage of the Logistic Transformation

• Transform a nonlinear model to a linear regression model

• Logit between -  and + 
• Probability (P) is constrained between 0 and 1

• Directly related to notion of odds of disease

 P  P
ln    α  βx  e α βx
 1-P  1-P
Interpretation of Coefficient 

Exposure x
Disease y yes no

yes P ( y x  1) P ( y x  0)

no 1  P ( y x  1) 1  P ( y x  0)

P
 e α βx Oddsd e  e  
1-P
Oddsd e  e
Interpretation of Coefficient 

•  = increse in logarithm of odds ratio for one unit increse

in x
• Risk of developing coronary heart disease (CD)
by age (<55 and

CD 55+ (1) < 55 (0)

Present (1) 21 22
Absent (0) 6 51

Odds of disease among exposed = 21/6

Odds of disease among unexposed = 22/51
Odds ratio = 8.1
Fit Equation to the Data

• Linear regression: Least squares

• Logitic regression: Maximum likelihood
• Likelihood function
– Estimate parameters  and  with property that likelihood
(probability) of observed data is higher than for any other values
– Practically easier to work with log-likelihood

n
L()  ln l ()   yi ln  ( xi )  (1  yi ) ln 1   ( xi )
i 1
Maximum likelihood

• Iterative computing
– Choice of an arbitrary value for the coefficients (usually 0)
– Computing of log-likelihood
– Variation of coefficients’ values
– Reiteration until maximisation (plateau)

• Results
– Maximun Likelihood Estimates (MLE) for  and 
– Estimates of P(y) for a given value of x
Multiple Logistic Regression

• More than one independent variable

– Dichotomous, ordinal, nominal, continuous …

 P
ln    α  β1 x1  β2 x2  ... βi xi
 1-P 
• Interpretation of i
– Increase in log-odds for one unit increase in xi with all other xi’s
constant
– Measure association between xi and log-odds adjusted for all other xi

56
Effect Modification

• Effect modification
– Can be modelled by including interaction terms

P
ln    α  β1 x1  β2 x2  β3 x1  x2
 1-P 

57
Coding of Variables

• Dichotomous variables: yes = 1, no = 0

• Continuous variables
– Increase in OR for a one unit change in exposure variable
– Logistic model is multiplicative 
OR increases exponentially with x
• Nominal variables or ordinal with unequal classes:
– Tobacco smoked: no=0, grey=1, brown=2, blond=3

58
Indicator Variables: Type of Tobacco

Tobacco Dummy variables

consumption Dark Light Both
Dark 1 0 0
Light 0 1 0
Both 0 0 1
None 0 0 0

• Neutralises artificial hierarchy between classes in the variable

"type of tobacco"
• No assumptions made
• 3 variables (3 df) in model using same reference

59
Agenda

• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions

60
Linear versus Nonlinear
Regression Models

A linear regression model is linear in the parameters. That

is, there is only one parameter in each term of the model
and each parameter is a multiplicative constant on the
independent variable(s) of that term.

A nonlinear model is nonlinear in the parameters.

61
Examples of Linear Models

Y   0  1 X  
Y   0  1 X   2 X   2

Y   0  1 ln( X )  
 0  1 X
Y e 

62
Examples of Nonlinear Models

2 X
Y  1e 
3 X
Y  1  (  2  1 )e 
2
Y  1 X 

63
Example of Nonlinear Model

Y  1 ( X   2 )   3  
2

64
Examples of Nonlinear Models

2
Y  1 X 
65
Examples of Nonlinear Models

4
( 3 X )
Y   1  (  2   1 )e 
66
Estimation Techniques for
Nonlinear Regression

Because nonlinear models cannot be solved explicitly,

iterative numerical methods must be used to estimate the
parameters. The methods available in the NLIN procedure
are

• steepest-descent or gradient
• Newton
• modified Gauss-Newton
• Marquardt
• multivariate secant or false position.

67
Model Specification

For each nonlinear model to be analyzed, we must

specify

• the model equation

• starting values of the parameters to be estimated.

68
Potential Lack of Convergence
of Nonlinear Estimates

Convergence may not be obtained under certain conditions.

These include

• incorrect specification of the model

• poor initial starting values
• an overdefined model
• insufficient data.

69
Thank You

20-11-2012 Tung and Aftab-Multivariate Linear Regression Model
No ratings yet
20-11-2012 Tung and Aftab-Multivariate Linear Regression Model
86 pages
Regression Analysis Handouts
No ratings yet
Regression Analysis Handouts
12 pages
Simple Linear Regression and Correlation Analysis: Chapter Five
No ratings yet
Simple Linear Regression and Correlation Analysis: Chapter Five
5 pages
AIML MSE 2 Notes
No ratings yet
AIML MSE 2 Notes
35 pages
Stats Question Paper-1
No ratings yet
Stats Question Paper-1
1 page
RSHH Qam12 ch04
No ratings yet
RSHH Qam12 ch04
83 pages
Statistics Skills for Students
No ratings yet
Statistics Skills for Students
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Act 5.6. 1-5
No ratings yet
Act 5.6. 1-5
3 pages
Regression Analysis
No ratings yet
Regression Analysis
47 pages
Linear Regression and Correlation A Level Notes (Precision Academy)
No ratings yet
Linear Regression and Correlation A Level Notes (Precision Academy)
17 pages
2.3. Linear Regression
No ratings yet
2.3. Linear Regression
15 pages
Section 2
No ratings yet
Section 2
22 pages
Math Modelling for Students
No ratings yet
Math Modelling for Students
6 pages
WEEK 4 St.
No ratings yet
WEEK 4 St.
7 pages
Mathematics in Decision Making
No ratings yet
Mathematics in Decision Making
20 pages
Predictive Analytics Primer
No ratings yet
Predictive Analytics Primer
66 pages
ML 2
No ratings yet
ML 2
63 pages
Unit III
No ratings yet
Unit III
13 pages
4-Regression Analysis 2
No ratings yet
4-Regression Analysis 2
10 pages
Midterm PDF
No ratings yet
Midterm PDF
10 pages
Regression I: Simple Regression: Class 21
No ratings yet
Regression I: Simple Regression: Class 21
54 pages
Econometrics Notes
No ratings yet
Econometrics Notes
6 pages
ML Module 2 All Topics
No ratings yet
ML Module 2 All Topics
19 pages
Wa0003
No ratings yet
Wa0003
20 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Ics054 Unit 2a
No ratings yet
Ics054 Unit 2a
8 pages
Mindanao State University General Santos City: Simple Linear Regression
No ratings yet
Mindanao State University General Santos City: Simple Linear Regression
12 pages
Chpter 8 Linear Correlation Analysis and Regressio 250623 080425
No ratings yet
Chpter 8 Linear Correlation Analysis and Regressio 250623 080425
72 pages
Lesson 3
No ratings yet
Lesson 3
33 pages
Book 2
No ratings yet
Book 2
15 pages
Probability and Statistics Part 6 Regression
No ratings yet
Probability and Statistics Part 6 Regression
47 pages
Tahir Abbas F19-12005 M.com 02 Statics
No ratings yet
Tahir Abbas F19-12005 M.com 02 Statics
5 pages
(Ebook PDF) Stat2: Modeling With Regression and Anova 2Nd Edition PDF Download
No ratings yet
(Ebook PDF) Stat2: Modeling With Regression and Anova 2Nd Edition PDF Download
48 pages
Ols PDF
No ratings yet
Ols PDF
8 pages
(Ebook PDF) Introductory Econometrics: A Modern Approach 7th Edition by Jeffrey Download
100% (2)
(Ebook PDF) Introductory Econometrics: A Modern Approach 7th Edition by Jeffrey Download
52 pages
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
No ratings yet
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
23 pages
Lec31 32ESFall15
No ratings yet
Lec31 32ESFall15
35 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
121 pages
Unit 11
No ratings yet
Unit 11
21 pages
Sol - PQ220 6234F.Ch 11
No ratings yet
Sol - PQ220 6234F.Ch 11
13 pages
Linear Regression for Students
No ratings yet
Linear Regression for Students
2 pages
Handouts SLR 19
No ratings yet
Handouts SLR 19
40 pages
Curve Fitting: There Are Two General Approaches For Curve Fitting
No ratings yet
Curve Fitting: There Are Two General Approaches For Curve Fitting
63 pages
(Ebook PDF) Introductory Econometrics: A Modern Approach 7th Edition by Jeffreypdf Download
100% (4)
(Ebook PDF) Introductory Econometrics: A Modern Approach 7th Edition by Jeffreypdf Download
53 pages
Regression Analysis
No ratings yet
Regression Analysis
10 pages
Chapter 8
No ratings yet
Chapter 8
39 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
4 Curve Fitting Least Square Regression and Interpolation
No ratings yet
4 Curve Fitting Least Square Regression and Interpolation
59 pages
Stat2 Textbook
100% (4)
Stat2 Textbook
1,656 pages
Lec 16 - Logistic Regression
No ratings yet
Lec 16 - Logistic Regression
11 pages
ANSWER KEYS Statistics
No ratings yet
ANSWER KEYS Statistics
16 pages
PracticeExamRegression3024 PDF
100% (2)
PracticeExamRegression3024 PDF
13 pages
Bell Work
No ratings yet
Bell Work
17 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Empirical Models: Data Collection
No ratings yet
Empirical Models: Data Collection
16 pages
CH 10 Regression and Correlation Assignment
No ratings yet
CH 10 Regression and Correlation Assignment
18 pages
10 Multicollinearity&Het
No ratings yet
10 Multicollinearity&Het
8 pages
ISTT
No ratings yet
ISTT
4 pages
Quantitative Analysis 3
No ratings yet
Quantitative Analysis 3
22 pages
Pengaruh Kualitas Pelayanan Dan Harga Terhadap Keputusan Pembelian Pada Kedai Kirani Coffee Abdul Mukti
No ratings yet
Pengaruh Kualitas Pelayanan Dan Harga Terhadap Keputusan Pembelian Pada Kedai Kirani Coffee Abdul Mukti
17 pages
Count Data Models in SAS
No ratings yet
Count Data Models in SAS
12 pages
Beer Sales With Analysis
No ratings yet
Beer Sales With Analysis
84 pages
Basic Concepts: Time Value of Money
100% (1)
Basic Concepts: Time Value of Money
20 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
12 pages
Department of Chemical Engineering CH 544: Multivariate Data Analysis
No ratings yet
Department of Chemical Engineering CH 544: Multivariate Data Analysis
3 pages
Models of Decision Making
100% (6)
Models of Decision Making
19 pages
C6 Gomez
No ratings yet
C6 Gomez
186 pages
An Automobile Rental Company Wants To Predict The Yearly Maintenance Expense
No ratings yet
An Automobile Rental Company Wants To Predict The Yearly Maintenance Expense
2 pages
John Loucks: Slides by
No ratings yet
John Loucks: Slides by
24 pages
Stock Markets, Banks, and Growth: Panel Evidence: Thorsten Beck, Ross Levine
No ratings yet
Stock Markets, Banks, and Growth: Panel Evidence: Thorsten Beck, Ross Levine
20 pages
PDF Test Bank For Introductory Econometrics A Modern Approach 5th Edition by Wooldridge
No ratings yet
PDF Test Bank For Introductory Econometrics A Modern Approach 5th Edition by Wooldridge
7 pages
PSTAT 174/274 Lecture Notes 3
No ratings yet
PSTAT 174/274 Lecture Notes 3
24 pages
Lecture 4.3 Regression-1
No ratings yet
Lecture 4.3 Regression-1
30 pages
Introduction To Structural Equation Modeling Using Stata: University College London October 16, 2019
No ratings yet
Introduction To Structural Equation Modeling Using Stata: University College London October 16, 2019
127 pages
Development Economics Course Outline
No ratings yet
Development Economics Course Outline
6 pages
STAT 512 Homework Solutions
No ratings yet
STAT 512 Homework Solutions
6 pages
CH 4.violations of The Assumptions of The Classical Model
No ratings yet
CH 4.violations of The Assumptions of The Classical Model
54 pages
Confidence Intervals in Regression
No ratings yet
Confidence Intervals in Regression
1 page
Quantitative Methods: Reading Number Reading Title Study Session
No ratings yet
Quantitative Methods: Reading Number Reading Title Study Session
40 pages
ECON814 January 2025 Exam Paper - Final Version-1-1
No ratings yet
ECON814 January 2025 Exam Paper - Final Version-1-1
6 pages
EVIEWS Tutorial: Basics: Professor Roy Batchelor City University Business School, London & ESCP, Paris
No ratings yet
EVIEWS Tutorial: Basics: Professor Roy Batchelor City University Business School, London & ESCP, Paris
17 pages
Cost Analysis for Freight Transport
No ratings yet
Cost Analysis for Freight Transport
3 pages
Topic 24 - Hypothesis Tests and Confidence Intervals in Multiple Regression Question
No ratings yet
Topic 24 - Hypothesis Tests and Confidence Intervals in Multiple Regression Question
10 pages
Reference Card (PDQ Sheet) : Week 1: Statistical Inference For One and Two Popula-Tion Variances
No ratings yet
Reference Card (PDQ Sheet) : Week 1: Statistical Inference For One and Two Popula-Tion Variances
50 pages
Chapter-15 Written Notes
No ratings yet
Chapter-15 Written Notes
2 pages
What Constrains Indian Manufacturing?
No ratings yet
What Constrains Indian Manufacturing?
46 pages

Lecture 6 Regression v1

Uploaded by

Lecture 6 Regression v1

Uploaded by

Lecture 6:

Dr. Joshua Huang (黄哲学)

Shenzhen Institutes of Advanced Technology

• Given a distribution of a set of two dimensional points, find a straight line or a

• Linear regression attempts to predict the value of a

• Determine the model forms

• Quantifying the relationship between two continuous

• Predict (or forecast) the value of one variable from

• Say you drive to work at an average of 60 km’ s/hour. It takes

• If the travel distance for each minute is not precise because of

10 Time = 3 + 1 distance + random effect

• In general, the bi-variate regression equation takes

• We need to find the line of best fit

• The term (yi- ŷ ) is known as the error or residual

• The slope of the line

ˆ0  y  ˆ1 x  153.83  0.553  37.83  132.91

And the equation is

It is the value of Y when X = 0

• In the previous example, the X values ranged between 17 and

• More input variables (predictors)

• Relation between one continuous variable y and a set of continuous

• After  0 , 1 ,  2 ,...,  k are known, given a new record about X, we can

• Notation: The (k+1)-variable {(Y, X1, …, Xk)} is the study

• Given a sample data and the regression formula

yˆ  ˆ0  βˆ1 x1  βˆ 2 x2  ...  βˆ k xk

• Assume using the estimate regression formula, we compute a set of

1 x1,1 x1,2 . . . x1,k 

yˆ  ˆ0  βˆ1 x1  βˆ 2 x2  ...  βˆ k xk

• Model relationship between set of variables xi

• Dichotomous variables: Respond or Not Respond, Risk

Age and signs of coronary heart disease (CD)

Age CD Age CD Age CD

• Compare mean age of diseased and non-diseased

– Non-diseased: 38.6 years

Age group # in group # %

• Take the natural log of

• Transform a nonlinear model to a linear regression model

• Directly related to notion of odds of disease

•  = increse in logarithm of odds ratio for one unit increse

CD 55+ (1) < 55 (0)

Odds of disease among exposed = 21/6

• Linear regression: Least squares

• More than one independent variable

• Dichotomous variables: yes = 1, no = 0

Tobacco Dummy variables

• Neutralises artificial hierarchy between classes in the variable

A linear regression model is linear in the parameters. That

A nonlinear model is nonlinear in the parameters.

Because nonlinear models cannot be solved explicitly,

For each nonlinear model to be analyzed, we must

• the model equation

Convergence may not be obtained under certain conditions.

• incorrect specification of the model

You might also like