Lecture 6:
Regressions
Dr. Joshua Huang (黄哲学)
Shenzhen Institutes of Advanced Technology
Chinese Academy of Sciences
1
Agenda
• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions
2
Univariate Regression
• Given a distribution of a set of two dimensional points, find a straight line or a
curve Y=f(X) that can best fit the points
• Given a X, f(X) can be used to estimate Y
Y=f(X) Y=f(X)
Linear Nonlinear
X
X
3
Multivariate Regression
• Given:
– a large number of data records with m attributes (variables)
– One attribute (variable) Y to be predicted from the others
– Given a function form Y=f(X1, X2, …,Xm-1)
• Look for:
– Learn a specific function from the training data and use the
function to predict the value Y of new records
– The function is called the regression model
4
Regression Techniques
• Linear regression attempts to predict the value of a
continuous target as a linear function of one or more
independent inputs
• Nonlinear regression attempts to predict the value of a
continuous target as a nonlinear function of one or more
independent inputs
• Logistic regression attempts to predict the probability that
a binary or ordinal target will acquire the event of interest
as a function of one or more independent inputs
5
Regression Models
• Regression Model:
Y 1 X1 2 X 2 n X n
• Not appropriate for dichotomous response variable, i.e. Y is either 1 or 0.
• Logistic Regression Model:
P P(Y 1 | X 1 , X 2 ,, X n )
P
ln 1 X 1 2 X 2 n X n
1 P
e ( 1 X1 2 X 2 n X n )
P
1 e ( 1 X1 2 X 2 n X n )
6
How Do We Decide Which Model?
• Look at the target variable
– If the target variable is continuous, then linear regression is a good
place to start. If linear regression is not satisfactory, try nonlinear
regression.
– If the target variable is dichotomous, then logistic regression is
better.
7
Issues
• Determine the model forms
• Learn a model
• Training data
• Non-linearity
8
Agenda
• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions
9
Simple linear regression and Correlation
• Quantifying the relationship between two continuous
variables
• Predict (or forecast) the value of one variable from
knowledge of the value of another variable
10
Simple linear regression
• In simple linear regression we generate an equation to
calculate the value of a dependent variable (Y) from an
independent variable (X)
11
Regression Model
• Say you drive to work at an average of 60 km’ s/hour. It takes
about 1 minute for every kilometre travelled…
• This is a mathematical model that represents the relationship
between the two variables
7
Time taken (minutes)
6 Time = 1 distance
5
4
3
2
1
0
0 2 4 6 8
Distance traveled (km's)
12
Regression Model
• If you take some time to walk to your car and then walk from the car to
work, this takes an extra 3 minutes per day. The model becomes
10
Time taken (minutes)
9
8
7 Time = 3 + 1 distance
6
5
4
3
2
1
0
0 2 4 6 8
Distance traveled (km's)
13
Regression Model
• If the travel distance for each minute is not precise because of
traffic, roadworks, etc., the model becomes
12
Time taken (minutes)
10 Time = 3 + 1 distance + random effect
8
6
4
2
0
0 2 4 6 8
Distance traveled (km's)
14
Bi-variate Linear Regression Model
• In general, the bi-variate regression equation takes
the form;
y 0 1 x e
•
• y = the dependent variable
• x = the independent variable
• o = The y-intercept
• 1 = The slope of the line
• e = random error term
15
Line of Best Fit
• Given a data set, we need to find a way of calculating the
parameters of the equation
?
14 ?
?
12
10
8
6
4
2
0
0 2 4 6 8 10
• We need to find the line of best fit
16
Line of Best Fit
• Because the line will seldom fit the data precisely, there is
always some error associated with our line
• The line of best fit is the line that minimises the spread of these
errors
14
12 ŷ
10
8
6
( yi yˆi )
4
2
0
0 2 4 6 8 10
17
Error Term
• The term (yi- ŷ ) is known as the error or residual
•
• ei = (yi- ŷ )
• The line of best fit occurs when the Sum of the Squared Errors is
minimised
n
SSE ( yi y
ˆ )2
i 1
18
Estimates of Parameters
• The slope of the line
SS xy
̂1
SS x
• where n
SS xy ( xi x )( yi y )
i 1
• and
n
SS x ( xi x ) 2
i 1
19
Estimates of Parameters
• Y intercept
ˆ0 y ˆ1 x
• where
n n
y i x i
y i 1
x i 1
n n
20
Example
• X Y
x 37.83
• kilos cost $
y 153.83
• 17 132
• 21 150
SSxy 891.83
• 35 160 SSx 1612.83
• 39 162
• 50 149
• 65 170
21
Example
ˆ SS xy 891.83
1 0.533
SS x 1612.83
ˆ0 y ˆ1 x 153.83 0.553 37.83 132.91
And the equation is
y = 132.91 + 0.553x
22
Interpret Parameter Estimates
• In the previous example, the estimate of the slope ̂ was
1
0.553. This means that for every change in X of 1 kg,
there will be a change in Y of 0.553 dollor.
23
Interpret the Parameter Estimates
̂ is the y intercept. I.e., the point at which the line crosses
0
the y axis. In this case $132.91
$132.91
It is the value of Y when X = 0
24
Extrapolation
• Extrapolation is when you extend the meaning of the equation
outside the bounds of the data.
• In the previous example, the X values ranged between 17 and
65 kilos. It would therefore be unwise to make a comment on
the relationship outside this range
25
Other Regression Analyses
• The above procedure is called point estimation using the least square
estimation method (LSE) to find the parameters of the linear
regression model 0 and 1
• Other analyses include
– Scatter plot of Y against X to check the linearity
– Confidence intervals for 0 and 1
– Statistical test
– Correlation analysis
26
Agenda
• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions
27
Multiple Regression:
• More input variables (predictors)
– Age, gender
– Education, income
– Race
28
Multiple Linear Regression
• Every input variable should have an impact on the outcome
– Estimates how much the outcome would change if this predictor increased
one unit and all the rest stayed constant.
• Minimal co-linearity
– Input variables (Predictors) are not themselves too tightly dependent to
each other. If two variables are perfectly related, when one is held
constant, the other is as well.
– Need to check this assumption first!
29
Multivariate Linear Regression
• Relation between one continuous variable y and a set of continuous
variables X=(x1, x2, …, xk)
y 0 β1 x1 β2 x2 ... βk xk
• Give a set of data records about X and y, we can calculate the
coefficients 0 , 1 , 2 ,..., k
• After 0 , 1 , 2 ,..., k are known, given a new record about X, we can
compute the corresponding value y. For example, estimate the annual
income of a customer.
30
Assumptions for Multivariate Linear
Regression
• Notation: The (k+1)-variable {(Y, X1, …, Xk)} is the study
population.
• Population assumption 1.
– The mean μY of the subpopulation of Y values with X1=x1, …, Xk=xk is
Y ( x1 ,..., xk ) 0 1 x1 ... k xk
– This indicates that any given input record (X1=x1, …, Xk=xk ) is related
to a set of Y values and the value from the regression model is the mean
value of the Y values.
31
Assumptions for Multivariate Linear
Regression
• Population assumption 2
– The standard deviation of the Y values for any input record (X1=x1, …,
Xk=xk ) is the same.
• Population assumption 3
– Each subpopulation of the Y values has a Gaussian distribution
– The study population {(Y, X1, …, Xk)} is a (k+1)-variable Gaussian population
• Population assumption 4
– All sample data are obtained by simple random sampling
• Population assumption 5
– All sample values yi, xi1, xik for i=1, …,n are observed without error.
32
Sample Data
Y X1 X2 Xk
y1 x1,1 x1,2 … x1,k
y2 x2,1 x2,2 … x2,k
. . . . .
. . . . .
. . . . .
yi xi,1 xi,2 … xi,k
. . . . .
. . . . .
. . . . .
yn xn,1 xn,2 … xn,k
33
Least Square Estimation Method
• Given a sample data and the regression formula
y 0 β1 x1 β2 x2 ... βk xk
• We write the estimate regression formula as
yˆ ˆ0 βˆ1 x1 βˆ 2 x2 ... βˆ k xk
• Assume using the estimate regression formula, we compute a set of
Y values { yˆ1 ,..., yˆ i ,..., yˆ n }. Comparing these values with the
corresponding real values { y1 ,..., yi ,..., yn }, we can compute the
error
eˆi yi yˆ i yi [ ˆ0 ˆ1 xi ,1 ... ˆk xi ,k ]
34
Least Square Estimation Method
• The least square estimates ˆ0 , ˆ1 , ˆ2 ,..., ˆk is chosen in such a way
that minimizes the sum of squared errors
n n
SSE eˆ ( yi ˆ0 ˆ1 xi ,1 ... ˆk xi ,k ) 2
2
i
i 1 i 1
• How to compute ˆ0 , ˆ1 , ˆ2 ,..., ˆk to minimize the above function?
35
Least Square Estimation: Matrix
Representation
• Let { y1 ,..., yi ,..., yn } and { ˆ0 , ˆ1 , ˆ2 ,..., ˆk } be represented as
an (n x 1) and a (k x 1) vector. Let X bet an (n x (k+1)) matrix.
1 x1,1 x1,2 . . . x1,k
1 x2,1 x2,2 . . . x2,k
y1 . . . . . ˆ0
y
2 . . . . . ˆ1
. . . . . .
y X ˆ .
β
. 1 xi,1 xi,2 . . . xi,k .
.
. . . . .
.
yn . . . . . ˆ
k
. . . . .
1 xn,1 xn,2 . . . xn,k
36
Least Square Estimation: Matrix
Representation
• In matrix representation, the estimate regression
formula
yˆ ˆ0 βˆ1 x1 βˆ 2 x2 ... βˆ k xk
• .
Can be represented as
ˆ
yˆ X β eˆ1
eˆ
2
• ˆ1 , eˆ2 ,..., eˆn } as an (n x 1)
If we write errors { e
.
vector, then eˆ
.
ˆ
eˆ y yˆ y X β .
eˆn
37
Least Square Estimation: Matrix
Representation
• Given
ˆ
y Xβ
• .
• Where Y={ y1 ,..., yi ,..., yn }, we can solve ˆ0 , ˆ1 , ˆ2 ,..., ˆk as
follows
• ˆ
X X β X y
T T
• and
ˆ
β (X X) X y
T -1 T
38
Example
• Let
6 1 3 9 16
1 ˆ0
9
6 13 13
ˆ
12 1 4 3 17 ˆ
X β 1
y
1 8 2 10 ˆ2
5 ˆ
13 1 9
3
3 4
1 2 4 7
2
39
Example
1 1 1 1 1 1
3 6 4 8 3 2
X T
9 13 3 2 4 4
16 13 17 10 9 7
47
6 26 35 72 203
26 153 315
X TX
138 X Ty
35 153 295 448 277
72 315 448 944 598
40
Example
2.59578 - 0.15375 - 0.01962 - 0.13737 47
- 0.15375 0.03965 - 0.00014 - 0.00144 203
ˆ (X X ) X y
T -1 T
- 0.01962 - 0.00014 0.01234 - 0.00431 277
- 0.13737 - 0.00144 - 0.00431 0.01406 598
3.20975
- 0.07573
- 0.11162
0.46691
Thus,
ˆ0 3.20975 ˆ1 0.07573 ˆ2 0.11162 ˆ3 0.46691
yˆ 3.20975 0.07573x1 0.11162x2 0.46691x3
41
Agenda
• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions
42
Logistic Regression
• Model relationship between set of variables xi
– dichotomous (yes/no)
– categorical (social class, ... )
– continuous (age, ...)
and
– A target variable Y which is dichotomous (binary).
• Dichotomous variables: Respond or Not Respond, Risk
or Not Risk, Claim or No claim
43
Example Data
Age and signs of coronary heart disease (CD)
Age CD Age CD Age CD
22 0 40 0 54 0
23 0 41 1 55 1
24 0 46 0 58 1
27 0 47 0 60 1
28 0 48 0 60 0
30 0 49 1 62 1
30 0 49 0 65 1
32 0 50 1 67 1
33 0 51 0 71 1
35 1 51 1 77 1
38 0 52 0 81 1
How Can We Analyse These Data?
• Compare mean age of diseased and non-diseased
– Non-diseased: 38.6 years
– Diseased: 58.7 years
Dot-plot
Y
es
Signsofcoronarydisease
N
o
0 2
0 4
0 6
0 8
0 1
00
A
GE(y
ears
)
Group The Data
Prevalence (%) of signs of CD according to age group
Diseased
Age group # in group # %
20 - 29 5 0 0
30 - 39 6 1 17
40 - 49 7 2 29
50 - 59 7 4 57
60 - 69 5 4 80
70 - 79 2 2 100
80 - 89 1 1 100
Dot-plot: Grouping Data
Diseased % 100
80
60
40
20
0
0 2 4 6 8
Age group
Logistic Function
Probability 1.0
of disease e x
P( y x)
0.8 1 e x
0.6
0.4
0.2
0.0
x
Logistic Transformation
• Take the natural log of
e x
P( y x)
1 e x
• We have
P( y x )
ln x
1 P( y x )
50
Advantage of the Logistic Transformation
• Transform a nonlinear model to a linear regression model
• Logit between - and +
• Probability (P) is constrained between 0 and 1
• Directly related to notion of odds of disease
P P
ln α βx e α βx
1-P 1-P
Interpretation of Coefficient
Exposure x
Disease y yes no
yes P ( y x 1) P ( y x 0)
no 1 P ( y x 1) 1 P ( y x 0)
P
e α βx Oddsd e e
1-P
Oddsd e e
Interpretation of Coefficient
• = increse in logarithm of odds ratio for one unit increse
in x
• Risk of developing coronary heart disease (CD)
by age (<55 and
CD 55+ (1) < 55 (0)
Present (1) 21 22
Absent (0) 6 51
Odds of disease among exposed = 21/6
Odds of disease among unexposed = 22/51
Odds ratio = 8.1
Fit Equation to the Data
• Linear regression: Least squares
• Logitic regression: Maximum likelihood
• Likelihood function
– Estimate parameters and with property that likelihood
(probability) of observed data is higher than for any other values
– Practically easier to work with log-likelihood
n
L() ln l () yi ln ( xi ) (1 yi ) ln 1 ( xi )
i 1
Maximum likelihood
• Iterative computing
– Choice of an arbitrary value for the coefficients (usually 0)
– Computing of log-likelihood
– Variation of coefficients’ values
– Reiteration until maximisation (plateau)
• Results
– Maximun Likelihood Estimates (MLE) for and
– Estimates of P(y) for a given value of x
Multiple Logistic Regression
• More than one independent variable
– Dichotomous, ordinal, nominal, continuous …
P
ln α β1 x1 β2 x2 ... βi xi
1-P
• Interpretation of i
– Increase in log-odds for one unit increase in xi with all other xi’s
constant
– Measure association between xi and log-odds adjusted for all other xi
56
Effect Modification
• Effect modification
– Can be modelled by including interaction terms
P
ln α β1 x1 β2 x2 β3 x1 x2
1-P
57
Coding of Variables
• Dichotomous variables: yes = 1, no = 0
• Continuous variables
– Increase in OR for a one unit change in exposure variable
– Logistic model is multiplicative
OR increases exponentially with x
• Nominal variables or ordinal with unequal classes:
– Tobacco smoked: no=0, grey=1, brown=2, blond=3
58
Indicator Variables: Type of Tobacco
Tobacco Dummy variables
consumption Dark Light Both
Dark 1 0 0
Light 0 1 0
Both 0 0 1
None 0 0 0
• Neutralises artificial hierarchy between classes in the variable
"type of tobacco"
• No assumptions made
• 3 variables (3 df) in model using same reference
59
Agenda
• Introduction to Regression
• Simple Linear Regression
• Multivariate Linear Regressions
• Logistic Regressions
• Nonlinear Regressions
60
Linear versus Nonlinear
Regression Models
A linear regression model is linear in the parameters. That
is, there is only one parameter in each term of the model
and each parameter is a multiplicative constant on the
independent variable(s) of that term.
A nonlinear model is nonlinear in the parameters.
61
Examples of Linear Models
Y 0 1 X
Y 0 1 X 2 X 2
Y 0 1 ln( X )
0 1 X
Y e
62
Examples of Nonlinear Models
2 X
Y 1e
3 X
Y 1 ( 2 1 )e
2
Y 1 X
63
Example of Nonlinear Model
Y 1 ( X 2 ) 3
2
64
Examples of Nonlinear Models
2
Y 1 X
65
Examples of Nonlinear Models
4
( 3 X )
Y 1 ( 2 1 )e
66
Estimation Techniques for
Nonlinear Regression
Because nonlinear models cannot be solved explicitly,
iterative numerical methods must be used to estimate the
parameters. The methods available in the NLIN procedure
are
• steepest-descent or gradient
• Newton
• modified Gauss-Newton
• Marquardt
• multivariate secant or false position.
67
Model Specification
For each nonlinear model to be analyzed, we must
specify
• the model equation
• starting values of the parameters to be estimated.
68
Potential Lack of Convergence
of Nonlinear Estimates
Convergence may not be obtained under certain conditions.
These include
• incorrect specification of the model
• poor initial starting values
• an overdefined model
• insufficient data.
69
Thank You
70