0% found this document useful (0 votes)

39 views33 pages

ML Lecture Linear Regression 1

Uploaded by

yiruiliu115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views33 pages

ML Lecture Linear Regression 1

Uploaded by

yiruiliu115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

PATTERN RECOGNITION

AND MACHINE LEARNING

CHAPTER 3: LINEAR MODELS FOR REGRESSION

Learning Objectives
1、How to achieve linear regression using basis functions?
2、What are the relationships between maximum likelihood and least
squares, between maximum a posterior and regularization, and among
expected loss, bias, variance, and noise?
3、What are the common regularization methods for regression?
4、How to achieve Bayesian linear regression?
5、What is the kernel for regression?
6、How to choose the model complexity?
7、What are the evidence approximation and maximization?
Bayesian Machine Learning
Process of Machine Learning：

p |training data, model  p(training data | model, ) p0 |model

posterior likelihood prior

Process of Prediction：

p testing data | 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑑𝑎𝑡𝑎, 𝑚𝑜𝑑𝑒𝑙 =

න 𝑝 (testing data | model, ) p  | training 𝑑𝑎𝑡𝑎, 𝑚𝑜𝑑𝑒𝑙 d

Process of Model Evaluation： For super-parameter tuning

p 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑑𝑎𝑡𝑎, | 𝑚𝑜𝑑𝑒𝑙 =

we have
𝐿−1 = 𝐴𝑇𝑄 −1 𝐴 + Σ−1
𝑚 = 𝐿{𝐴𝑇𝑄 −1 𝑦 + Σ−1 𝜇}
Bayesian Prediction for LGS
Given 𝑦 = 𝐴𝑥 + 𝑣
𝑝 𝑥 = 𝒩 𝑥|𝜇, Σ 𝑝 𝑣 = 𝒩 𝑣|0, 𝑄
𝑥 =𝑚+𝑢
𝑝 𝑥|𝑦 = 𝒩 𝑥|𝑚, 𝐿 𝑝 𝑢 = 𝒩 𝑢|0, 𝐿

We have
𝑝 𝑦|𝑥 = 𝒩 𝑦|𝐴𝑥, 𝑄

𝑝 𝑦′ = න 𝑝 𝑦′|𝑥 𝑝 𝑥|𝑦 𝑑𝑥 = 𝒩 𝑦′|𝐴𝑚, 𝐴𝐿𝐴𝑇 + 𝑄

Bayesian Model Evaluation for LGS
Given 𝑦 = 𝐴𝑥 + 𝑣
𝑝 𝑥 = 𝒩 𝑥|𝜇, Σ 𝑝 𝑣 = 𝒩 𝑣|0, 𝑄

we have

𝑝 𝑦|𝑥 = 𝒩 𝑦|𝐴𝑥, 𝑄

𝑝 𝑦 = න 𝑝 𝑦|𝑥 𝑝 𝑥 𝑑𝑥 = 𝒩 𝑦|𝐴𝜇, 𝐴Σ𝐴𝑇 + 𝑄

Outlines

 Linear Basis Function Models

 Maximum Likelihood and Least Squares
 Bias Variance Decomposition
 Bayesian Linear Regression
 Predictive Distribution
 Bayesian Model Comparison
 Evidence Approximation and Maximization
Linear Basis Function Models (1)
Example: Polynomial Curve Fitting
Linear Basis Function Models (2)
 Generally

where Áj(x) are known as basis functions.

 Typically, Á0(x) = 1, so that w0 acts as a
bias.
 In the simplest case, we use linear basis
functions : Ád(x) = xd.
Linear Basis Function Models (3)
Polynomial basis functions:

These are global; a small change

in x affect all basis functions.
Linear Basis Function Models (4)
Gaussian basis functions:

These are local; a small change

in x only affect nearby basis
functions. ¹j and s control
location and scale (width).
Linear Basis Function Models (5)
Sigmoidal basis functions:

where

Also these are local; a small

change in x only affect nearby
basis functions. ¹j and s
control location and scale
(slope).
Outlines

 Linear Basis Function Models

 Assume observations from a deterministic function

with added Gaussian noise:
where

which is the same as saying,

 Given observed inputs, , and targets,

, we obtain the likelihood function
Maximum Likelihood and Least Squares (2)

Taking the logarithm, we get

where

is the sum-of-squares error.

Maximum Likelihood and Least Squares (3)

Computing the gradient and setting it to zero yields

Solving for w, we get The Moore-Penrose

pseudo-inverse, .

Roger Penrose
where 2020 Nobel Prize
Laurate in Physics
Geometry of Least Squares
Consider

N-dimensional
M-dimensional

S is spanned by .
wML minimizes the distance
between t and its orthogonal
projection on S, i.e. y.
Sequential Learning
 Data items considered one at a time (a.k.a.
online learning); use stochastic (sequential)
gradient descent:

 This is known as the least-mean-squares

(LMS) algorithm. Issue: how to choose ´?
Regularized Least Squares (1)
 Consider the error function:

Data term + Regularization term

 With the sum-of-squares error function and a

quadratic regularizer, we get

¸ is called the
regularization
which is minimized by coefficient.
Regularized Least Squares (2)
With a more general regularizer, we have

Lasso Quadratic
Regularized Least Squares (3)
Lasso tends to generate sparser solutions than a
quadratic regularizer.
Multiple Outputs (1)
Analogously to the single output case we have:

Given observed inputs, , and targets,

, we obtain the log likelihood function
Multiple Outputs (2)
 Maximizing with respect to W, we obtain

 If we consider a single target variable, tk, we see

that

where , which is identical with the

single output case.
Outlines

 Linear Basis Function Models

ground truth: optimal predictor

predictor noise

https://stats.stackexchange.com/questions/228561/loss-functions-for-regression-proof
The Bias-Variance Decomposition (1)
 Recall the expected squared loss,

where

 The second term of E[L] corresponds to the noise

inherent in the random variable t.
 What about the first term?
The Bias-Variance Decomposition (2)
 Suppose we were given multiple data sets, each of
size N. Any particular data set, D, will give a
particular function y(x;D). We then have
The Bias-Variance Decomposition (3)
 Taking the expectation over D yields
The Bias-Variance Decomposition (4)
 Thus we can write

where
Model:

Model:

Data:
The Bias-Variance Decomposition (5)
 Example: 25 data sets from the sinusoidal, varying
the degree of regularization, ¸.
The Bias-Variance Decomposition (6)
 Example: 25 data sets from the sinusoidal, varying
the degree of regularization, ¸.
The Bias-Variance Decomposition (7)
 Example: 25 data sets from the sinusoidal, varying
the degree of regularization, ¸.
The Bias-Variance Trade-off
From these plots, we note
that an over-regularized
model (large ¸) will have a
high bias, while an under-
regularized model (small ¸)
will have a high variance.

Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Linear Modal For Regresion
No ratings yet
Linear Modal For Regresion
32 pages
ML 3
No ratings yet
ML 3
66 pages
Linear Regression Models Guide
100% (1)
Linear Regression Models Guide
61 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
ML Lecture Linear Regression 2
No ratings yet
ML Lecture Linear Regression 2
23 pages
03 Linear Regression
No ratings yet
03 Linear Regression
54 pages
Lecture3 2015
No ratings yet
Lecture3 2015
38 pages
ML Lec8
No ratings yet
ML Lec8
7 pages
Ch-2 Linear Models For Regression
No ratings yet
Ch-2 Linear Models For Regression
40 pages
Chap 3
No ratings yet
Chap 3
28 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
P&AD Lect 17 1 Unit2
No ratings yet
P&AD Lect 17 1 Unit2
14 pages
ML Lecture Linear Regression 3
No ratings yet
ML Lecture Linear Regression 3
22 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
Lecture 2 - Linear Regression
No ratings yet
Lecture 2 - Linear Regression
54 pages
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
46 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
38 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Lecture2 2013
No ratings yet
Lecture2 2013
60 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
ML Unit3
No ratings yet
ML Unit3
9 pages
PR M4 Notes
No ratings yet
PR M4 Notes
38 pages
ML Classifiers & Regression Guide
No ratings yet
ML Classifiers & Regression Guide
46 pages
Updated Module2 - OTML Updated
No ratings yet
Updated Module2 - OTML Updated
83 pages
Unit 2
No ratings yet
Unit 2
133 pages
Neural Networks for Engineers
No ratings yet
Neural Networks for Engineers
44 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
Lecture 2 2022
No ratings yet
Lecture 2 2022
34 pages
M6 RegressionLinearModels v2
No ratings yet
M6 RegressionLinearModels v2
97 pages
Intro To Machine Learning Lecture Notes2
No ratings yet
Intro To Machine Learning Lecture Notes2
7 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
110 pages
CPSC 540 Assignment 1 (Due January 19)
100% (1)
CPSC 540 Assignment 1 (Due January 19)
9 pages
9 Mle
No ratings yet
9 Mle
39 pages
G.C. Calafiore (Politecnico Di Torino)
No ratings yet
G.C. Calafiore (Politecnico Di Torino)
23 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
Machine Learning UNIT II
No ratings yet
Machine Learning UNIT II
34 pages
Scribe Notes Fall 2022
No ratings yet
Scribe Notes Fall 2022
41 pages
UNIT II Regration
No ratings yet
UNIT II Regration
62 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Ancient Egypt
No ratings yet
Ancient Egypt
2 pages
Ted Talk Report
No ratings yet
Ted Talk Report
7 pages
Genetic Diversity Wild Almonds PDF
No ratings yet
Genetic Diversity Wild Almonds PDF
21 pages
Forensic Equipments
No ratings yet
Forensic Equipments
9 pages
Literature Review On Leadership Behaviour
100% (4)
Literature Review On Leadership Behaviour
4 pages
Mangrove Planting:: Restoration and Propagation Using Plantable Paper
No ratings yet
Mangrove Planting:: Restoration and Propagation Using Plantable Paper
16 pages
1042 620 Eh 001 - 0
No ratings yet
1042 620 Eh 001 - 0
55 pages
ABAU. Like On Mars Text. - 0
No ratings yet
ABAU. Like On Mars Text. - 0
1 page
Single Phase Line Parameters with MATLAB
No ratings yet
Single Phase Line Parameters with MATLAB
3 pages
The Contemporary World: Defining Globalization
No ratings yet
The Contemporary World: Defining Globalization
30 pages
Awakening
No ratings yet
Awakening
10 pages
Flying High 5 - Student PDF
50% (2)
Flying High 5 - Student PDF
86 pages
Abanoub Emil Demian: Objective
No ratings yet
Abanoub Emil Demian: Objective
3 pages
Debate Skills for Students
No ratings yet
Debate Skills for Students
11 pages
Dumpy Level Experiment 3
No ratings yet
Dumpy Level Experiment 3
9 pages
High Performance Liquid Chromatography, Altus A-10 Perkin Elmer
100% (1)
High Performance Liquid Chromatography, Altus A-10 Perkin Elmer
4 pages
Media and Information Literacy Guide
No ratings yet
Media and Information Literacy Guide
21 pages
(Ebook PDF) Statistics For Social Workers 9th Edition Download
100% (1)
(Ebook PDF) Statistics For Social Workers 9th Edition Download
47 pages
Rheomax CFD
No ratings yet
Rheomax CFD
7 pages
DSP Codes
No ratings yet
DSP Codes
41 pages
Luh Gede Wirani Riskayanti Darmawan
No ratings yet
Luh Gede Wirani Riskayanti Darmawan
110 pages
Eco Drive: Students Support Vendors
No ratings yet
Eco Drive: Students Support Vendors
12 pages
Guidance Note On Pre-Excavation Grouting For Underground Construction in Hard Rock (Garshol, 2014)
100% (1)
Guidance Note On Pre-Excavation Grouting For Underground Construction in Hard Rock (Garshol, 2014)
25 pages
1800-Article Text-4427-1-10-20210824
No ratings yet
1800-Article Text-4427-1-10-20210824
20 pages
News VS Sports Writing
No ratings yet
News VS Sports Writing
3 pages
Strategic Management Concepts BRV 1st Edition by Jeffrey H Dyer Ebook and TestBank Bundle Verified PDF
No ratings yet
Strategic Management Concepts BRV 1st Edition by Jeffrey H Dyer Ebook and TestBank Bundle Verified PDF
416 pages
Zipf's Law in Toki Pona
No ratings yet
Zipf's Law in Toki Pona
4 pages
Module 5 Logic
No ratings yet
Module 5 Logic
10 pages
Possible Cds Topics
No ratings yet
Possible Cds Topics
2 pages
CSEC Geography MayJune P2 2017
100% (1)
CSEC Geography MayJune P2 2017
18 pages

ML Lecture Linear Regression 1

Uploaded by

ML Lecture Linear Regression 1

Uploaded by

PATTERN RECOGNITION

AND MACHINE LEARNING

CHAPTER 3: LINEAR MODELS FOR REGRESSION

p |training data, model  p(training data | model, ) p0 |model

p testing data | 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑑𝑎𝑡𝑎, 𝑚𝑜𝑑𝑒𝑙 =

Process of Model Evaluation： For super-parameter tuning

p 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑑𝑎𝑡𝑎, | 𝑚𝑜𝑑𝑒𝑙 =

𝑝 𝑦′ = න 𝑝 𝑦′|𝑥 𝑝 𝑥|𝑦 𝑑𝑥 = 𝒩 𝑦′|𝐴𝑚, 𝐴𝐿𝐴𝑇 + 𝑄

𝑝 𝑦 = න 𝑝 𝑦|𝑥 𝑝 𝑥 𝑑𝑥 = 𝒩 𝑦|𝐴𝜇, 𝐴Σ𝐴𝑇 + 𝑄

 Linear Basis Function Models

where Áj(x) are known as basis functions.

These are global; a small change

These are local; a small change

Also these are local; a small

 Linear Basis Function Models

 Assume observations from a deterministic function

which is the same as saying,

 Given observed inputs, , and targets,

Taking the logarithm, we get

is the sum-of-squares error.

Computing the gradient and setting it to zero yields

Solving for w, we get The Moore-Penrose

 This is known as the least-mean-squares

Data term + Regularization term

 With the sum-of-squares error function and a

Given observed inputs, , and targets,

 If we consider a single target variable, tk, we see

where , which is identical with the

 Linear Basis Function Models

ground truth: optimal predictor

 The second term of E[L] corresponds to the noise

You might also like