0% found this document useful (0 votes)

118 views54 pages

03 Linear Regression

This document discusses linear regression. It begins by defining linear regression and comparing it to classification. It then discusses using different basis functions for linear regression models, including polynomial and Gaussian bases. It covers topics like solving linear regression problems, regularized regression, and Bayesian linear regression. It also discusses maximum likelihood estimation and how linear least squares regression maximizes the likelihood for a Gaussian noise model.

Uploaded by

Shashank Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views54 pages

03 Linear Regression

Uploaded by

Shashank Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

LINEAR REGRESSION

J. Elder

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Credits
Probability & Bayesian Inference

Some of these slides were sourced and/or modified

from:
Christopher

Bishop, Microsoft UK

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

What is Linear Regression?

Probability & Bayesian Inference

In classification, we seek to identify the categorical class Ck

associate with a given input vector x.
In regression, we seek to identify (or estimate) a continuous
variable y associated with a given input vector x.
y is called the dependent variable.
x is called the independent variable.
If y is a vector, we call this multiple regression.
We will focus on the case where y is a scalar.
Notation:
y will denote the continuous model of the dependent variable
t will denote discrete noisy observations of the dependent
variable (sometimes called the target variable).

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Where is the Linear in Linear Regression?

Probability & Bayesian Inference

In regression we assume that y is a function of x.

The exact nature of this function is governed by an
unknown parameter vector w:
y = y x, w
The regression is linear if y is linear in w. In other
words, we can express y as

( )
()

y = wt! x
where

()

! x is some (potentially nonlinear) function of x.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Basis Function Models

Probability & Bayesian Inference

Generally

where j(x) are known as basis functions.

Typically, 0(x) = 1, so that w0 acts as a bias.
In the simplest case, we use linear basis functions :
d(x) = xd.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Example: Polynomial Bases

Probability & Bayesian Inference

Polynomial basis
functions:

These are global

small change in x
affects all basis functions.
A small change in a
basis function affects y
for all x.
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Example: Polynomial Curve Fitting

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Sum-of-Squares Error Function

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

1st Order Polynomial

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

3rd Order Polynomial

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

9th Order Polynomial

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

Penalize large coefficient values

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

9thOrderPolynomial

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

9thOrderPolynomial

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

9thOrderPolynomial

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Probabilistic View of Curve Fitting

Probability & Bayesian Inference

Why least squares?

Model noise (deviation of data from model) as
Gaussian i.i.d.

where ! !

1
is the precision of the noise.
2
"

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood
Probability & Bayesian Inference

We determine wML by minimizing the squared error E(w).

Thus least-squares regression reflects an assumption that the

noise is i.i.d. Gaussian.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood
Probability & Bayesian Inference

We determine wML by minimizing the squared error E(w).

Now given wML, we can estimate the variance of the noise:

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
21

Probability & Bayesian Inference

Generating function
Observed data
Maximum likelihood prediction
Posterior over t

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

MAP: A Step towards Bayes

Probability & Bayesian Inference

Prior knowledge about probable values of w can be incorporated into the

regression:

Now the posterior over w is proportional to the product of the likelihood

times the prior:

The result is to introduce a new quadratic term in w into the error function
to be minimized:

Thus regularized (ridge) regression reflects a 0-mean isotropic Gaussian

prior on the weights.
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Gaussian Bases
Probability & Bayesian Inference

Gaussian basis functions:

Think of these as interpolation functions.

These are local:

small change in x affects

only nearby basis functions.
a small change in a basis
function affects y only for
nearby x.
j and s control location
and scale (width).
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood and Linear Least Squares

Probability & Bayesian Inference

Assume observations from a deterministic function with

added Gaussian noise:
where

which is the same as saying,

Given observed inputs,
, and
targets,
we obtain the likelihood
function

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood and Linear Least Squares

Probability & Bayesian Inference

Taking the logarithm, we get

where

is the sum-of-squares error.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood and Least Squares

Probability & Bayesian Inference

Computing the gradient and setting it to zero yields

Solving for w, we get

where

The Moore-Penrose
pseudo-inverse,
.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

End of Lecture 8

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularized Least Squares

Probability & Bayesian Inference

Consider the error function:

Data term + Regularization term

With the sum-of-squares error function and a

quadratic regularizer, we get

which is minimized by

is called the
regularization
coefficient.

Thus the name ridge regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularized Least Squares

Probability & Bayesian Inference

With a more general regularizer, we have

Lasso

Quadratic

(Least absolute shrinkage and selection operator)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularized Least Squares

Probability & Bayesian Inference

Lasso generates sparse solutions.

Iso-contours
of data term ED(w)

Iso-contour of
regularization term EW(w)

Quadratic

Lasso

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Solving Regularized Systems

Probability & Bayesian Inference

Quadratic regularization has the advantage that

the solution is closed form.
Non-quadratic regularizers generally do not have
closed form solutions
Lasso can be framed as minimizing a quadratic
error with linear constraints, and thus represents a
convex optimization problem that can be solved by
quadratic programming or other convex
optimization methods.
We will discuss quadratic programming when we
cover SVMs

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Multiple Outputs
Probability & Bayesian Inference

Analogous to the single output case we have:

Given observed inputs

targets
we obtain the log likelihood function

, and

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Multiple Outputs
Probability & Bayesian Inference

Maximizing with respect to W, we obtain

If we consider a single target variable, tk, we see that

where
single output case.

, which is identical with the

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Some Useful MATLAB Functions

Probability & Bayesian Inference

polyfit
Least-squares

fit of a polynomial of specified order to

given data

regress
More

general function that computes linear weights for

least-squares fit

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Rev. Thomas Bayes, 1702 - 1761

Bayesian Linear Regression

Probability & Bayesian Inference

Define a conjugate prior over w:

Combining this with the likelihood function and using

results for marginal and conditional Gaussian
distributions, gives the posterior

where

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Probability & Bayesian Inference

A common choice for the prior is

for which

Thus mN represents the ridge regression solution with

! =" /#

Next we consider an example

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Probability & Bayesian Inference

0 data points observed

Prior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Probability & Bayesian Inference

1 data point observed

Likelihood for (x1,t1)

Posterior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Probability & Bayesian Inference

2 data points observed

Likelihood for (x2,t2)

Posterior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Probability & Bayesian Inference

20 data points observed

Likelihood for (x20,t20)

Posterior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

Predict t for new values of x by integrating over w:

where

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

Example: Sinusoidal data, 9 Gaussian basis functions,

1 data point

Notice how much bigger our uncertainty is

relative to the ML method!!

p t | t,! , "

Samples of y(x,w)

E #$t | t,! , " %&

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

Example: Sinusoidal data, 9 Gaussian basis functions,

2 data points
E #$t | t,! , " %&

p t | t,! , "

Samples of y(x,w)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

Example: Sinusoidal data, 9 Gaussian basis functions,

4 data points
E #$t | t,! , " %&

p t | t,! , "

Samples of y(x,w)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

Example: Sinusoidal data, 9 Gaussian basis functions,

25 data points
E #$t | t,! , " %&

p t | t,! , "

Samples of y(x,w)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Equivalent Kernel
Probability & Bayesian Inference

The predictive mean can be written

Equivalent kernel or
smoother matrix.

This is a weighted sum of the training data target

values, tn.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Equivalent Kernel
53

Probability & Bayesian Inference

Weight of tn depends on distance between x and xn;

nearby xn carry more weight.
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
ML Lecture Linear Regression 1
No ratings yet
ML Lecture Linear Regression 1
33 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
ML 3
No ratings yet
ML 3
66 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
38 pages
Lecture Slides - Linear Regression (2025)
No ratings yet
Lecture Slides - Linear Regression (2025)
45 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Linear Regression Models Guide
100% (1)
Linear Regression Models Guide
61 pages
Lecture3 2015
No ratings yet
Lecture3 2015
38 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
46 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
ML Lec8
No ratings yet
ML Lec8
7 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Chapter Regression
No ratings yet
Chapter Regression
10 pages
Lecture 13 - Least Squares
No ratings yet
Lecture 13 - Least Squares
28 pages
M6 RegressionLinearModels v2
No ratings yet
M6 RegressionLinearModels v2
97 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
Slides Foundations
No ratings yet
Slides Foundations
81 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
01-Linear Regression-Part 2
No ratings yet
01-Linear Regression-Part 2
37 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Linear Modal For Regresion
No ratings yet
Linear Modal For Regresion
32 pages
PRML Exercise Solutions Guide
No ratings yet
PRML Exercise Solutions Guide
87 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Scribe Notes Fall 2022
No ratings yet
Scribe Notes Fall 2022
41 pages
Probabilistic Modeling in ML
No ratings yet
Probabilistic Modeling in ML
156 pages
Regression
No ratings yet
Regression
39 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
Lecture15 Regression
No ratings yet
Lecture15 Regression
15 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Linear Regression
No ratings yet
Linear Regression
91 pages
Machine Learning Guide
No ratings yet
Machine Learning Guide
185 pages
UC Berkeley ML Course Guide
100% (1)
UC Berkeley ML Course Guide
185 pages
Regression
No ratings yet
Regression
11 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
I2ml2e Chap4 v1 0
No ratings yet
I2ml2e Chap4 v1 0
27 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
Manifold Learning Theory and Applications 9781439871102 Compress
No ratings yet
Manifold Learning Theory and Applications 9781439871102 Compress
322 pages
(Updated On 03 08 2016) : Time Table Indian Institute of Technology Delhi
No ratings yet
(Updated On 03 08 2016) : Time Table Indian Institute of Technology Delhi
2 pages
Deep Learning Biomedicine
No ratings yet
Deep Learning Biomedicine
28 pages
TAAN - Discovering Trekking Trails in Nepal PDF
No ratings yet
TAAN - Discovering Trekking Trails in Nepal PDF
144 pages
Coursera BioinfoMethods-I Lecture01
No ratings yet
Coursera BioinfoMethods-I Lecture01
15 pages
Collingwood Art
No ratings yet
Collingwood Art
24 pages
Nucl. Acids Res. 2004 Stanke W309 12
No ratings yet
Nucl. Acids Res. 2004 Stanke W309 12
4 pages
Assin2 Solution
No ratings yet
Assin2 Solution
6 pages
Baba Yaga
No ratings yet
Baba Yaga
2 pages
My New Resume
No ratings yet
My New Resume
2 pages
Rao Et Al 2021 Covid 19 Detection Using Cough Sound Analysis and Deep Learning Algorithms
No ratings yet
Rao Et Al 2021 Covid 19 Detection Using Cough Sound Analysis and Deep Learning Algorithms
11 pages
Credit Card Fraud Detection Using Neural Networks
No ratings yet
Credit Card Fraud Detection Using Neural Networks
30 pages
Opinion Noam Chomsky The False Promise of ChatGPT - The New York Times-2
No ratings yet
Opinion Noam Chomsky The False Promise of ChatGPT - The New York Times-2
10 pages
Datascience
No ratings yet
Datascience
6 pages
3DOC1
No ratings yet
3DOC1
55 pages
Enat College CS
No ratings yet
Enat College CS
284 pages
2022 SAP IBP Strategic Roadmap - June 28, 2022
No ratings yet
2022 SAP IBP Strategic Roadmap - June 28, 2022
57 pages
Indoor Plants
No ratings yet
Indoor Plants
8 pages
Decentralized Learning in Healthcare A Review of Emerging Techniques
No ratings yet
Decentralized Learning in Healthcare A Review of Emerging Techniques
22 pages
Distributed ML with Parameter Server
No ratings yet
Distributed ML with Parameter Server
16 pages
QR Based Food Ordering System
100% (1)
QR Based Food Ordering System
61 pages
Identity and Access Management in The Cloud
No ratings yet
Identity and Access Management in The Cloud
8 pages
Project Proposal Machine Learning
No ratings yet
Project Proposal Machine Learning
6 pages
Detection of Deepfake Videos Using Long Distance Attention 2
No ratings yet
Detection of Deepfake Videos Using Long Distance Attention 2
10 pages
AI's Impact on Jobs
No ratings yet
AI's Impact on Jobs
1 page
50 Examples of How Brands Are Using AI Plus AI Survey - Sweathead
No ratings yet
50 Examples of How Brands Are Using AI Plus AI Survey - Sweathead
86 pages
SANS W Copeland Leveraging Generative AI Memory Analysis
No ratings yet
SANS W Copeland Leveraging Generative AI Memory Analysis
23 pages
The Impact of Artificial Intelligence in The Drones War in Ukraine
No ratings yet
The Impact of Artificial Intelligence in The Drones War in Ukraine
32 pages
Solytics Partners Asd
No ratings yet
Solytics Partners Asd
8 pages
MCQS Unit - 5
No ratings yet
MCQS Unit - 5
4 pages
Artificial Intelligence Finals
No ratings yet
Artificial Intelligence Finals
30 pages
BSCS Curriculum 2018
No ratings yet
BSCS Curriculum 2018
4 pages
Grokking Artificial Intelligence Algorithms 1st Edition Rishal Hurbans Newest Edition 2025
100% (1)
Grokking Artificial Intelligence Algorithms 1st Edition Rishal Hurbans Newest Edition 2025
166 pages
A Novel Approach To Noise Clustering For Outlier D
No ratings yet
A Novel Approach To Noise Clustering For Outlier D
6 pages
Generative AI - Report On Legal Industry 2024
No ratings yet
Generative AI - Report On Legal Industry 2024
21 pages
Generating Evidence For Artificial Intelligence-Based Medical Devices
No ratings yet
Generating Evidence For Artificial Intelligence-Based Medical Devices
104 pages
Machine and Deep Learning: Artificial Intelligence Application in Biotic and Abiotic Stress Management in Plants
No ratings yet
Machine and Deep Learning: Artificial Intelligence Application in Biotic and Abiotic Stress Management in Plants
15 pages
Decision Trees in Bank Marketing
No ratings yet
Decision Trees in Bank Marketing
27 pages
AI and Machine Learning in Sustainable Finance Decision
No ratings yet
AI and Machine Learning in Sustainable Finance Decision
7 pages