0% found this document useful (0 votes)

21 views33 pages

Lecture 02

This document summarizes a lecture on linear models for machine learning. It discusses polynomial curve fitting to sample data points using linear functions of varying degrees. Higher-degree polynomials can overfit the data, while regularization can help reduce overfitting by penalizing large coefficients. The document also introduces linear basis function models using different basis functions like polynomials, Gaussians, and sigmoids. Maximum likelihood estimation is discussed as a way to fit the linear model parameters to data.

Uploaded by

carlo.768.ri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views33 pages

Lecture 02

Uploaded by

carlo.768.ri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Advanced Machine Learning

Lecture 2: Linear models

Sandjai Bhulai
Vrije Universiteit Amsterdam

s.bhulai@vu.nl
8 September 2023
Linear models

Advanced Machine Learning

Polynomial curve tting
▪ 10 points sampled from sin(2πx) + disturbance

x3 x5 x7 x9
sin x = x − + − + +⋯
3! 5! 7! 9!
3 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
fi
Polynomial curve tting
▪ Polynomial curve
M
y(x, w) = w0 + w1x + w2 x 2 + ⋯ + wM x M = wj x j
∑
j=0

▪ Performance is measured by

1 N
{y(xn, w) − tn}2
2∑
E(w) =
n=1

4 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

fi
Polynomial curve tting: order 0

M
y(x, w) = w0 + w1x + w2 x 2 + ⋯ + wM x M = wj x j
∑
j=0
5 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
fi
Polynomial curve tting: order 1

M
y(x, w) = w0 + w1x + w2 x 2 + ⋯ + wM x M = wj x j
∑
j=0
6 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
fi
Polynomial curve tting: order 3

M
y(x, w) = w0 + w1x + w2 x 2 + ⋯ + wM x M = wj x j
∑
j=0
7 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
fi
Polynomial curve tting: order 9

M
y(x, w) = w0 + w1x + w2 x 2 + ⋯ + wM x M = wj x j
∑
j=0
8 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
fi
Over tting
▪ Root mean square (RMS) error: ERMS = 2E(w*)/N

9 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

fi
Over tting

10 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

fi
E ect of dataset size
▪ Polynomial of order 9 and N = 15

11 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

ff
E ect of dataset size
▪ Polynomial of order 9 and N = 100

12 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

ff
Regularization
▪ Penalize large coe cients values

1 N λ
{y(xn, w) − tn} + ∥w∥2
2
2∑
Ẽ(w) =
n=1
2

▪ λ becomes a model parameter

13 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

ffi
Regularization
▪ Regularization with ln λ = − 18

14 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Regularization
▪ Regularization with ln λ = 0

15 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Regularization
▪ ERMS versus ln λ

16 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Regularization

17 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

A deeper analysis

Advanced Machine Learning

What is the issue?

x3 x5 x7 x9
sin x = x − + − + +⋯
3! 5! 7! 9!
19 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
Linear basis function models
▪ General model is

M−1
wjφj(x) = w⊤φ(x)
∑
y(x, w) =
j=0

▪ φj are known are basis functions

▪ Typically, φ0(x) = 1 so that w0 acts as bias

20 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Linear basis function models
▪ General model is

M−1
wjφj(x) = w⊤φ(x)
∑
y(x, w) =
j=0

▪ Polynomial basis functions:

φj(x) = x j

▪ These are global functions

21 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Linear basis function models
▪ General model is

M−1
wjφj(x) = w⊤φ(x)
∑
y(x, w) =
j=0

▪ Gaussian basis functions:

(x − μj)2
{ 2s 2 }
φj(x) = exp −

▪ These are local functions

> μj controls location

> s controls scale
22 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
Linear basis function models
▪ General model is

M−1
wjφj(x) = w⊤φ(x)
∑
y(x, w) =
j=0

▪ Sigmoidal basis functions:

( )
x − μj
φj(x) = σ
s
where
1
σ(a) =
1 + exp(−a)

23 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Maximum likelihood
▪ Assume observations from a deterministic function with
added Gaussian noise:
t = y(x, w) + ϵ where p(ϵ | β) = (ϵ | 0, β −1)

{ 2σ 2 }
2 1 1 2
▪ Note that (x | μ, σ ) = exp − (x − μ)
(2πσ 2)1/2

β = 1/σ 2

(x | μ, σ 2) > 0
∞

∫−∞
(x | μ, σ 2) dx = 1
𝒩
𝒩
24 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
𝒩
𝒩
Maximum likelihood
▪ Assume observations from a deterministic function with
added Gaussian noise:
t = y(x, w) + ϵ where p(ϵ | β) = (ϵ | 0, β −1)

{ 2σ 2 }
2 1 1 2
▪ Note that (x | μ, σ ) = exp − (x − μ)
(2πσ 2)1/2
∞

∫−∞
[x] = x (x | μ, σ 2) dx = μ
∞

∫−∞
[x 2] = x 2 (x | μ, σ 2) dx = μ 2 + σ 2

var[x] = [x 2] − [x]2 = σ 2
𝒩
𝔼
𝔼
𝒩
𝒩
25 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
𝔼
𝔼
𝒩
Maximum likelihood
▪ Assume observations from a deterministic function with
added Gaussian noise:
t = y(x, w) + ϵ where p(ϵ | β) = (ϵ | 0, β −1)

▪ This is the same as saying

p(t | x, w, β) = (t | y(x, w), β −1)

M−1
wjφj(x) = w⊤φ(x)
▪ Recall: y(x, w) =
∑
j=0

26 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

𝒩
𝒩
Maximum likelihood
▪ This is the same as saying

p(t | x, w, β) = (t | y(x, w), β −1)

▪ Given observed inputs X = {x1, …, xN} and targets

t = [t1, …, tN ]⊤, we obtain the likelihood function:
N
(tn | w⊤φ(xn), β −1)
∏
p(t | X, w, β) =
n=1
𝒩
27 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
𝒩
Maximum likelihood
▪ Taking the logarithm, we get
N
(tn | w⊤φ(xn), β −1)
∑
ln p(t | w, β) = ln
n=1
N N
= ln β − ln(2π) − βED(w)
2 2

1 N
{tn − w⊤φ(xn)}2
2∑
where ED(w) =
n=1

{ 2σ 2 }
2 1 1 2
▪ Recall: (x | μ, σ ) = exp − (x − μ)
(2πσ 2)1/2
𝒩
𝒩
28 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
Maximum likelihood
▪ Computing the gradient and setting it to zero yields
N
{tn − w⊤φ(xn)}φ(xn)⊤ = 0
∑
∇w ln p(t | w, β) = β
n=1
The Moore-Penrose
pseudo-inverse
▪ Solve for w, we get
wML = (Φ Φ) ⊤ −1
Φ⊤t

with φ0(x1) φ1(x1) ⋯ φM−1(x1)

φ0(x2) φ1(x2) ⋯ φM−1(x2)
Φ=
⋮ ⋮ ⋱ ⋮
φ0(xN ) φ1(xN ) ⋯ φM−1(xN )

29 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Interpretation
▪ Consider y = ΦwML = [φ1, …, φM]wML

y∈ ⊆

N-dimensional

M-dimensional

▪ is spanned by φ1, …, φM
▪ wML minimizes the distance between t and its orthogonal
projection on , i.e., y

30
𝒮
𝒮
Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
𝒮
𝒯
Regularization
▪ Consider the error function

ED(w) + λEW (w)

data term + regularization term

▪ With the sum-of-squares error function and a quadratic

regularizer, we get
1 N ⊤ 2 λ ⊤
∑
{tn − w φ(xn)} + w w
2 n=1 2

▪ This is minimized by w = (λI + Φ⊤Φ)

−1
Φ⊤t

31 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Regularization
▪ With a more general regularizer, we have

1 N λ M
{tn − w⊤φ(xn)}2 + | wj |q
2∑
n=1
2 ∑
j=1

lasso quadratic

32 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Regularization
▪ Lasso tends to generate sparser solutions than a quadratic
regularizer

33 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Lecture 03
No ratings yet
Lecture 03
47 pages
Neural Networks for Engineers
No ratings yet
Neural Networks for Engineers
44 pages
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
Lecture 04
No ratings yet
Lecture 04
28 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
ML 3
No ratings yet
ML 3
66 pages
Lecture3 2015
No ratings yet
Lecture3 2015
38 pages
Machine Learning Regression Techniques
No ratings yet
Machine Learning Regression Techniques
16 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Function Approximation Guide
No ratings yet
Function Approximation Guide
74 pages
Data Fitting Techniques Lecture
No ratings yet
Data Fitting Techniques Lecture
106 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
ML Iit Madras Summary (1-12)
No ratings yet
ML Iit Madras Summary (1-12)
43 pages
Slides Foundations
No ratings yet
Slides Foundations
81 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
ML Lecture Linear Regression 1
No ratings yet
ML Lecture Linear Regression 1
33 pages
Lec19 Introduction2LinearRegression
No ratings yet
Lec19 Introduction2LinearRegression
53 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
38 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
03 1 Linear Basis Function Models Draft SEP24
No ratings yet
03 1 Linear Basis Function Models Draft SEP24
52 pages
Linear Regression Models Guide
100% (1)
Linear Regression Models Guide
61 pages
21Csc305P-Machine Learning: Offline
No ratings yet
21Csc305P-Machine Learning: Offline
8 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
PRML Exercise Solutions Guide
No ratings yet
PRML Exercise Solutions Guide
87 pages
MFMLHandout
No ratings yet
MFMLHandout
7 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Ch-2 Linear Models For Regression
No ratings yet
Ch-2 Linear Models For Regression
40 pages
Mathematical Foundations for ML & DS
No ratings yet
Mathematical Foundations for ML & DS
3 pages
Ai - Foundations of Machine Learning II
No ratings yet
Ai - Foundations of Machine Learning II
54 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Regression
No ratings yet
Regression
39 pages
PR M4 Notes
No ratings yet
PR M4 Notes
38 pages
Curve Fitting
No ratings yet
Curve Fitting
38 pages
Basic Concepts For Understanding ML & DL
No ratings yet
Basic Concepts For Understanding ML & DL
8 pages
DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
Machine Learning Math Primer
No ratings yet
Machine Learning Math Primer
4 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
Day 1
No ratings yet
Day 1
41 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
MLF Mock 1 Solution
No ratings yet
MLF Mock 1 Solution
5 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
ML - Perplexity
No ratings yet
ML - Perplexity
71 pages
Mathematics For Machine Learning V5
No ratings yet
Mathematics For Machine Learning V5
10 pages
ML 01
No ratings yet
ML 01
24 pages
BT4211 Data-Driven Marketing: Fundamentals: Process and Statistical Issues in Predictive Modeling
No ratings yet
BT4211 Data-Driven Marketing: Fundamentals: Process and Statistical Issues in Predictive Modeling
38 pages
Business Analytics Data Analysis and Decision Making 6th Edition Albright Solutions Manual
100% (43)
Business Analytics Data Analysis and Decision Making 6th Edition Albright Solutions Manual
26 pages
Maths 9709 Paper 5 Format 1 - Discrete Random Variables
No ratings yet
Maths 9709 Paper 5 Format 1 - Discrete Random Variables
97 pages
Pre-Test & Post Test Analysis: Variable 1
No ratings yet
Pre-Test & Post Test Analysis: Variable 1
7 pages
Two-Way Anova Ujian Anova Dua Hala Kom 6115
No ratings yet
Two-Way Anova Ujian Anova Dua Hala Kom 6115
17 pages
Research I: Quarter 3 - Module 4: Basic Statistics in Experimental Research
100% (9)
Research I: Quarter 3 - Module 4: Basic Statistics in Experimental Research
24 pages
Linear Assignment
No ratings yet
Linear Assignment
8 pages
ISYE6414 FA23 Practice Midterm Exam 2 Solutions
No ratings yet
ISYE6414 FA23 Practice Midterm Exam 2 Solutions
6 pages
IEE380 Project
No ratings yet
IEE380 Project
5 pages
Sellers2017 - Underdispersion Models
No ratings yet
Sellers2017 - Underdispersion Models
27 pages
Markov Chain
No ratings yet
Markov Chain
2 pages
Tema Exposicion
No ratings yet
Tema Exposicion
6 pages
EFA and CFA
No ratings yet
EFA and CFA
36 pages
AGLM Explanation
No ratings yet
AGLM Explanation
26 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Data Visualization & Probability Basics
No ratings yet
Data Visualization & Probability Basics
29 pages
Statistics & Probability Syllabus
No ratings yet
Statistics & Probability Syllabus
3 pages
1Qs Standard Deviation, Co-Relation & Regression
No ratings yet
1Qs Standard Deviation, Co-Relation & Regression
6 pages
PRCV Exp 1-2
No ratings yet
PRCV Exp 1-2
10 pages
EE 6106: Online Learning and Optimisation Homework 1
No ratings yet
EE 6106: Online Learning and Optimisation Homework 1
4 pages
324 Final
No ratings yet
324 Final
8 pages
Ef3451 HW1 (Feb14 12)
No ratings yet
Ef3451 HW1 (Feb14 12)
2 pages
Probability for Health Risks
No ratings yet
Probability for Health Risks
156 pages
Quiz 5: Sampling Distributions Quiz 5: Sampling Distributions
No ratings yet
Quiz 5: Sampling Distributions Quiz 5: Sampling Distributions
1 page
Independent Samples T Test 5
No ratings yet
Independent Samples T Test 5
3 pages
Grade 11 Statistics Exam Guide
No ratings yet
Grade 11 Statistics Exam Guide
5 pages
Teen Mobile Phone Usage Survey
No ratings yet
Teen Mobile Phone Usage Survey
7 pages
1.5 - Inference For A Single Proportion Using A Theory-Based Approach
No ratings yet
1.5 - Inference For A Single Proportion Using A Theory-Based Approach
6 pages
R Regress Post Estimation Time Series
No ratings yet
R Regress Post Estimation Time Series
12 pages
Action Research
No ratings yet
Action Research
18 pages

Lecture 02

Uploaded by

Lecture 02

Uploaded by

Advanced Machine Learning

Lecture 2: Linear models

Advanced Machine Learning

4 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

9 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

10 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

11 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

12 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

▪ λ becomes a model parameter

13 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

14 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

15 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

16 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

17 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

Advanced Machine Learning

▪ φj are known are basis functions

20 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

▪ Polynomial basis functions:

▪ These are global functions

21 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

▪ Gaussian basis functions:

▪ These are local functions

> μj controls location

▪ Sigmoidal basis functions:

23 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

▪ This is the same as saying

p(t | x, w, β) = (t | y(x, w), β −1)

26 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

p(t | x, w, β) = (t | y(x, w), β −1)

▪ Given observed inputs X = {x1, …, xN} and targets

with φ0(x1) φ1(x1) ⋯ φM−1(x1)

29 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

ED(w) + λEW (w)

▪ With the sum-of-squares error function and a quadratic

▪ This is minimized by w = (λI + Φ⊤Φ)

31 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

32 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

33 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

You might also like