[go: up one dir, main page]

0% found this document useful (0 votes)
16 views70 pages

Lecture 6 Regression v1

Uploaded by

Mah noor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views70 pages

Lecture 6 Regression v1

Uploaded by

Mah noor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Lecture 6:

Regressions

Dr. Joshua Huang (黄哲学)

Shenzhen Institutes of Advanced Technology


Chinese Academy of Sciences

1
Agenda

• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions

2
Univariate Regression

• Given a distribution of a set of two dimensional points, find a straight line or a


curve Y=f(X) that can best fit the points
• Given a X, f(X) can be used to estimate Y

Y=f(X) Y=f(X)

Linear Nonlinear

X
X

3
Multivariate Regression

• Given:
– a large number of data records with m attributes (variables)
– One attribute (variable) Y to be predicted from the others
– Given a function form Y=f(X1, X2, …,Xm-1)
• Look for:
– Learn a specific function from the training data and use the
function to predict the value Y of new records
– The function is called the regression model

4
Regression Techniques

• Linear regression attempts to predict the value of a


continuous target as a linear function of one or more
independent inputs
• Nonlinear regression attempts to predict the value of a
continuous target as a nonlinear function of one or more
independent inputs
• Logistic regression attempts to predict the probability that
a binary or ordinal target will acquire the event of interest
as a function of one or more independent inputs

5
Regression Models

• Regression Model:

Y    1 X1   2 X 2     n X n  
• Not appropriate for dichotomous response variable, i.e. Y is either 1 or 0.
• Logistic Regression Model:

P  P(Y  1 | X 1 , X 2 ,, X n )
P
ln    1 X 1   2 X 2     n X n
1 P
e (   1 X1   2 X 2   n X n )
P
1  e (   1 X1   2 X 2   n X n )

6
How Do We Decide Which Model?
• Look at the target variable
– If the target variable is continuous, then linear regression is a good
place to start. If linear regression is not satisfactory, try nonlinear
regression.
– If the target variable is dichotomous, then logistic regression is
better.

7
Issues

• Determine the model forms


• Learn a model
• Training data
• Non-linearity

8
Agenda

• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions

9
Simple linear regression and Correlation

• Quantifying the relationship between two continuous


variables

• Predict (or forecast) the value of one variable from


knowledge of the value of another variable

10
Simple linear regression
• In simple linear regression we generate an equation to
calculate the value of a dependent variable (Y) from an
independent variable (X)

11
Regression Model

• Say you drive to work at an average of 60 km’ s/hour. It takes


about 1 minute for every kilometre travelled…
• This is a mathematical model that represents the relationship
between the two variables
7
Time taken (minutes)

6 Time = 1  distance
5
4
3
2
1
0
0 2 4 6 8
Distance traveled (km's)
12
Regression Model
• If you take some time to walk to your car and then walk from the car to
work, this takes an extra 3 minutes per day. The model becomes

10
Time taken (minutes)

9
8
7 Time = 3 + 1  distance
6
5
4
3
2
1
0
0 2 4 6 8
Distance traveled (km's)
13
Regression Model

• If the travel distance for each minute is not precise because of


traffic, roadworks, etc., the model becomes
12
Time taken (minutes)

10 Time = 3 + 1 distance + random effect


8

6
4

2
0
0 2 4 6 8
Distance traveled (km's)

14
Bi-variate Linear Regression Model

• In general, the bi-variate regression equation takes


the form;
y   0  1 x  e

• y = the dependent variable
• x = the independent variable
• o = The y-intercept
• 1 = The slope of the line
• e = random error term

15
Line of Best Fit
• Given a data set, we need to find a way of calculating the
parameters of the equation
?
14 ?
?
12
10
8
6
4
2
0
0 2 4 6 8 10

• We need to find the line of best fit


16
Line of Best Fit
• Because the line will seldom fit the data precisely, there is
always some error associated with our line
• The line of best fit is the line that minimises the spread of these
errors

14
12 ŷ
10
8
6
( yi  yˆi )
4
2
0
0 2 4 6 8 10

17
Error Term

• The term (yi- ŷ ) is known as the error or residual



• ei = (yi- ŷ )

• The line of best fit occurs when the Sum of the Squared Errors is
minimised

n
SSE   ( yi  y
ˆ )2
i 1

18
Estimates of Parameters

• The slope of the line


SS xy
̂1 
SS x
• where n
SS xy   ( xi  x )( yi  y )
i 1

• and
n
SS x   ( xi  x ) 2
i 1

19
Estimates of Parameters

• Y intercept
ˆ0  y  ˆ1 x

• where
n n

y i x i
y i 1
x i 1

n n

20
Example

• X Y
x  37.83
• kilos cost $
y  153.83
• 17 132
• 21 150
SSxy  891.83
• 35 160 SSx  1612.83
• 39 162
• 50 149
• 65 170

21
Example

ˆ SS xy 891.83
1    0.533
SS x 1612.83

ˆ0  y  ˆ1 x  153.83  0.553  37.83  132.91

And the equation is


y = 132.91 + 0.553x

22
Interpret Parameter Estimates
• In the previous example, the estimate of the slope ̂ was
1
0.553. This means that for every change in X of 1 kg,
there will be a change in Y of 0.553 dollor.

23
Interpret the Parameter Estimates
̂ is the y intercept. I.e., the point at which the line crosses
0
the y axis. In this case $132.91

$132.91

It is the value of Y when X = 0


24
Extrapolation
• Extrapolation is when you extend the meaning of the equation
outside the bounds of the data.

• In the previous example, the X values ranged between 17 and


65 kilos. It would therefore be unwise to make a comment on
the relationship outside this range

25
Other Regression Analyses

• The above procedure is called point estimation using the least square
estimation method (LSE) to find the parameters of the linear
regression model 0 and 1
• Other analyses include
– Scatter plot of Y against X to check the linearity
– Confidence intervals for 0 and 1
– Statistical test
– Correlation analysis

26
Agenda

• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions

27
Multiple Regression:

• More input variables (predictors)


– Age, gender
– Education, income
– Race

28
Multiple Linear Regression
• Every input variable should have an impact on the outcome
– Estimates how much the outcome would change if this predictor increased
one unit and all the rest stayed constant.
• Minimal co-linearity
– Input variables (Predictors) are not themselves too tightly dependent to
each other. If two variables are perfectly related, when one is held
constant, the other is as well.
– Need to check this assumption first!

29
Multivariate Linear Regression

• Relation between one continuous variable y and a set of continuous


variables X=(x1, x2, …, xk)

y   0  β1 x1  β2 x2  ...  βk xk  
• Give a set of data records about X and y, we can calculate the
coefficients  0 , 1 ,  2 ,...,  k

• After  0 , 1 ,  2 ,...,  k are known, given a new record about X, we can


compute the corresponding value y. For example, estimate the annual
income of a customer.

30
Assumptions for Multivariate Linear
Regression

• Notation: The (k+1)-variable {(Y, X1, …, Xk)} is the study


population.

• Population assumption 1.
– The mean μY of the subpopulation of Y values with X1=x1, …, Xk=xk is

Y ( x1 ,..., xk )   0  1 x1  ...   k xk
– This indicates that any given input record (X1=x1, …, Xk=xk ) is related
to a set of Y values and the value from the regression model is the mean
value of the Y values.

31
Assumptions for Multivariate Linear
Regression
• Population assumption 2
– The standard deviation of the Y values for any input record (X1=x1, …,
Xk=xk ) is the same.
• Population assumption 3
– Each subpopulation of the Y values has a Gaussian distribution
– The study population {(Y, X1, …, Xk)} is a (k+1)-variable Gaussian population
• Population assumption 4
– All sample data are obtained by simple random sampling
• Population assumption 5
– All sample values yi, xi1, xik for i=1, …,n are observed without error.

32
Sample Data

Y X1 X2 Xk
y1 x1,1 x1,2 … x1,k
y2 x2,1 x2,2 … x2,k
. . . . .
. . . . .
. . . . .
yi xi,1 xi,2 … xi,k
. . . . .
. . . . .
. . . . .
yn xn,1 xn,2 … xn,k

33
Least Square Estimation Method

• Given a sample data and the regression formula

y   0  β1 x1  β2 x2  ...  βk xk
• We write the estimate regression formula as

yˆ  ˆ0  βˆ1 x1  βˆ 2 x2  ...  βˆ k xk

• Assume using the estimate regression formula, we compute a set of


Y values { yˆ1 ,..., yˆ i ,..., yˆ n }. Comparing these values with the
corresponding real values { y1 ,..., yi ,..., yn }, we can compute the
error
eˆi  yi  yˆ i  yi  [ ˆ0  ˆ1 xi ,1  ...  ˆk xi ,k ]
34
Least Square Estimation Method
• The least square estimates ˆ0 , ˆ1 , ˆ2 ,..., ˆk is chosen in such a way
that minimizes the sum of squared errors

n n
SSE   eˆ   ( yi  ˆ0  ˆ1 xi ,1  ...  ˆk xi ,k ) 2
2
i
i 1 i 1

• How to compute ˆ0 , ˆ1 , ˆ2 ,..., ˆk to minimize the above function?

35
Least Square Estimation: Matrix
Representation
• Let { y1 ,..., yi ,..., yn } and { ˆ0 , ˆ1 , ˆ2 ,..., ˆk } be represented as
an (n x 1) and a (k x 1) vector. Let X bet an (n x (k+1)) matrix.

1 x1,1 x1,2 . . . x1,k 


 
1 x2,1 x2,2 . . . x2,k 
 y1  . . . . .   ˆ0 
y     
 2 . . . . .   ˆ1 
.  . . . . .   
y  X   ˆ  . 
β
.  1 xi,1 xi,2 . . . xi,k  . 
.     
  . . . . . 
. 
 yn  . . . . .   ˆ 
   k
. . . . . 
1 xn,1 xn,2 . . . xn,k 
 

36
Least Square Estimation: Matrix
Representation
• In matrix representation, the estimate regression
formula

yˆ  ˆ0  βˆ1 x1  βˆ 2 x2  ...  βˆ k xk


• .
Can be represented as

ˆ
yˆ  X β eˆ1 
eˆ 
 2
• ˆ1 , eˆ2 ,..., eˆn } as an (n x 1)
If we write errors { e
. 
vector, then eˆ   
. 
ˆ
eˆ  y  yˆ  y  X β . 
 
eˆn 
37
Least Square Estimation: Matrix
Representation
• Given
ˆ
y Xβ
• .
• Where Y={ y1 ,..., yi ,..., yn }, we can solve ˆ0 , ˆ1 , ˆ2 ,..., ˆk as
follows

• ˆ
X X β X y
T T

• and

ˆ
β (X X) X y
T -1 T

38
Example

• Let

6  1 3 9 16 
1  ˆ0 
9 
 6 13 13  
   ˆ 
12 1 4 3 17  ˆ
X   β   1
y 
1 8 2 10   ˆ2 
5  ˆ 
13 1 9 
3 
3 4
   
1 2 4 7 
2 

39
Example

1 1 1 1 1 1
3 6 4 8 3 2
X T

 9 13 3 2 4 4
 
16 13 17 10 9 7

 47 
6 26 35 72  203
26 153 315 
X TX 
138 X Ty  
35 153 295 448 277
   
72 315 448 944 598 

40
Example
 2.59578 - 0.15375 - 0.01962 - 0.13737   47 
- 0.15375 0.03965 - 0.00014 - 0.00144  203 
ˆ  (X X ) X y  
T -1 T   
- 0.01962 - 0.00014 0.01234 - 0.00431  277 
   
- 0.13737 - 0.00144 - 0.00431 0.01406  598 
 3.20975 
- 0.07573
 
- 0.11162
 
 0.46691 

Thus,
ˆ0  3.20975 ˆ1  0.07573 ˆ2  0.11162 ˆ3  0.46691
yˆ  3.20975 0.07573x1  0.11162x2  0.46691x3
41
Agenda

• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions

42
Logistic Regression

• Model relationship between set of variables xi


– dichotomous (yes/no)
– categorical (social class, ... )
– continuous (age, ...)

and
– A target variable Y which is dichotomous (binary).

• Dichotomous variables: Respond or Not Respond, Risk


or Not Risk, Claim or No claim

43
Example Data

Age and signs of coronary heart disease (CD)

Age CD Age CD Age CD


22 0 40 0 54 0
23 0 41 1 55 1
24 0 46 0 58 1
27 0 47 0 60 1
28 0 48 0 60 0
30 0 49 1 62 1
30 0 49 0 65 1
32 0 50 1 67 1
33 0 51 0 71 1
35 1 51 1 77 1
38 0 52 0 81 1
How Can We Analyse These Data?

• Compare mean age of diseased and non-diseased

– Non-diseased: 38.6 years


– Diseased: 58.7 years
Dot-plot

Y
es
Signsofcoronarydisease

N
o

0 2
0 4
0 6
0 8
0 1
00
A
GE(y
ears
)
Group The Data
Prevalence (%) of signs of CD according to age group
Diseased

Age group # in group # %

20 - 29 5 0 0

30 - 39 6 1 17

40 - 49 7 2 29

50 - 59 7 4 57

60 - 69 5 4 80

70 - 79 2 2 100

80 - 89 1 1 100
Dot-plot: Grouping Data

Diseased % 100

80

60

40

20

0
0 2 4 6 8
Age group
Logistic Function

Probability 1.0
of disease e  x
P( y x) 
0.8 1  e  x

0.6

0.4

0.2

0.0
x
Logistic Transformation

• Take the natural log of

e  x
P( y x) 
1  e  x
• We have

 P( y x ) 
ln      x
1  P( y x ) 

50
Advantage of the Logistic Transformation

• Transform a nonlinear model to a linear regression model


• Logit between -  and + 
• Probability (P) is constrained between 0 and 1

• Directly related to notion of odds of disease

 P  P
ln    α  βx  e α βx
 1-P  1-P
Interpretation of Coefficient 

Exposure x
Disease y yes no

yes P ( y x  1) P ( y x  0)

no 1  P ( y x  1) 1  P ( y x  0)

P
 e α βx Oddsd e  e  
1-P
Oddsd e  e
Interpretation of Coefficient 

•  = increse in logarithm of odds ratio for one unit increse


in x
• Risk of developing coronary heart disease (CD)
by age (<55 and

CD 55+ (1) < 55 (0)


Present (1) 21 22
Absent (0) 6 51

Odds of disease among exposed = 21/6


Odds of disease among unexposed = 22/51
Odds ratio = 8.1
Fit Equation to the Data

• Linear regression: Least squares


• Logitic regression: Maximum likelihood
• Likelihood function
– Estimate parameters  and  with property that likelihood
(probability) of observed data is higher than for any other values
– Practically easier to work with log-likelihood

n
L()  ln l ()   yi ln  ( xi )  (1  yi ) ln 1   ( xi )
i 1
Maximum likelihood

• Iterative computing
– Choice of an arbitrary value for the coefficients (usually 0)
– Computing of log-likelihood
– Variation of coefficients’ values
– Reiteration until maximisation (plateau)

• Results
– Maximun Likelihood Estimates (MLE) for  and 
– Estimates of P(y) for a given value of x
Multiple Logistic Regression

• More than one independent variable


– Dichotomous, ordinal, nominal, continuous …

 P
ln    α  β1 x1  β2 x2  ... βi xi
 1-P 
• Interpretation of i
– Increase in log-odds for one unit increase in xi with all other xi’s
constant
– Measure association between xi and log-odds adjusted for all other xi

56
Effect Modification

• Effect modification
– Can be modelled by including interaction terms

P
ln    α  β1 x1  β2 x2  β3 x1  x2
 1-P 

57
Coding of Variables

• Dichotomous variables: yes = 1, no = 0


• Continuous variables
– Increase in OR for a one unit change in exposure variable
– Logistic model is multiplicative 
OR increases exponentially with x
• Nominal variables or ordinal with unequal classes:
– Tobacco smoked: no=0, grey=1, brown=2, blond=3

58
Indicator Variables: Type of Tobacco

Tobacco Dummy variables


consumption Dark Light Both
Dark 1 0 0
Light 0 1 0
Both 0 0 1
None 0 0 0

• Neutralises artificial hierarchy between classes in the variable


"type of tobacco"
• No assumptions made
• 3 variables (3 df) in model using same reference

59
Agenda

• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions

60
Linear versus Nonlinear
Regression Models

A linear regression model is linear in the parameters. That


is, there is only one parameter in each term of the model
and each parameter is a multiplicative constant on the
independent variable(s) of that term.

A nonlinear model is nonlinear in the parameters.

61
Examples of Linear Models

Y   0  1 X  
Y   0  1 X   2 X   2

Y   0  1 ln( X )  
 0  1 X
Y e 

62
Examples of Nonlinear Models

2 X
Y  1e 
3 X
Y  1  (  2  1 )e 
2
Y  1 X 

63
Example of Nonlinear Model

Y  1 ( X   2 )   3  
2

64
Examples of Nonlinear Models

2
Y  1 X 
65
Examples of Nonlinear Models

4
( 3 X )
Y   1  (  2   1 )e 
66
Estimation Techniques for
Nonlinear Regression

Because nonlinear models cannot be solved explicitly,


iterative numerical methods must be used to estimate the
parameters. The methods available in the NLIN procedure
are

• steepest-descent or gradient
• Newton
• modified Gauss-Newton
• Marquardt
• multivariate secant or false position.

67
Model Specification

For each nonlinear model to be analyzed, we must


specify

• the model equation


• starting values of the parameters to be estimated.

68
Potential Lack of Convergence
of Nonlinear Estimates

Convergence may not be obtained under certain conditions.


These include

• incorrect specification of the model


• poor initial starting values
• an overdefined model
• insufficient data.

69
Thank You

70

You might also like