0% found this document useful (0 votes)

22 views23 pages

Lecture17 Kernels

Uploaded by

yitongwu766

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views23 pages

Lecture17 Kernels

Uploaded by

yitongwu766

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Kernel Methods

George Lan

A. Russell Chandler III Chair Professor

H. Milton Stewart School of Industrial & Systems
Engineering
Nonlinear regression

Want to fit a polynomial regression model

𝑦 = 𝜃! + 𝜃" 𝑥 + 𝜃# 𝑥 # + ⋯ + 𝜃$ 𝑥 $ + 𝜖

Let 𝑥( = 1, 𝑥, 𝑥 # , … , 𝑥 $ % and 𝜃 = 𝜃! , 𝜃" , 𝜃# , … , 𝜃$ %

y = 𝜃 % 𝑥(
2
Problem of explicitly constructing features
Explicitly construct feature 𝜙 𝑥 : 𝑅$ ↦
𝐹, feature space can grow really large
and really quickly

Eg. Polynomial feature of degree 𝑑

𝑥!" , 𝑥! 𝑥# … 𝑥" , 𝑥!# 𝑥# … 𝑥"$!
Total number of such feature is
𝑑+𝑛−1 𝑑+𝑛−1 !
=
𝑑 𝑑! 𝑛 − 1 !
𝑑 = 6, 𝑛 = 100, there are 1.6 billion
terms

3
Can we avoid expanding the features?
Rather than computing the features explicitly, and then compute
inner product.

Can we merge two steps using a clever function 𝑘(𝑥& , 𝑥' )

Eg. Polynomial 𝑑 = 2
!
𝑥"# 𝑦"#
𝑥" 𝑥# 𝑦" 𝑦# # # # #
𝜙 𝑥 !𝜙 𝑦 = # #
= 𝑥 " 𝑦" + 2𝑥" 𝑥# 𝑦" 𝑦# + 𝑥# 𝑦#
𝑥# 𝑦#
𝑥# 𝑥" 𝑦# 𝑦"
= 𝑥" 𝑦" + 𝑥# 𝑦# # = 𝑥 ! 𝑦 #
𝑂 𝑛 𝑐𝑜𝑚𝑝𝑢𝑡𝑎𝑡𝑖𝑜𝑛!

Polynomial kernel degreee 𝑑, 𝑘 𝑥, 𝑦 = 𝑥 % 𝑦 ( = 𝜙 𝑥 %𝜙 𝑦

4
Typical kernels for vector data
Polynomial of degree d
𝑘 𝑥, 𝑦 = 𝑥 % 𝑦 "

Polynomial of degree up to d
𝑘 𝑥, 𝑦 = 𝑥 % 𝑦 + 𝑐 "

Exponential kernel (infinite degree polynomials)

𝑘 𝑥, 𝑦 = exp(𝑠 ⋅ 𝑥 % 𝑦)
Gaussian radial basis function (RBF) kernel
&$' !
𝑘 𝑥, 𝑦 = exp −
#( !
Laplace Kernel
&$'
𝑘 𝑥, 𝑦 = exp −
#( !
Exponentiated distance
" &,' !
𝑘 𝑥, 𝑦 = exp −
*! 5
Feature space not unique!
Eg. Polynomial 𝑑 = 2
%
𝑥!# 𝑦!#
𝑥! 𝑥# 𝑦! 𝑦#
𝜙 𝑥 %𝜙 𝑦 = = 𝑥!# 𝑦!# + 2𝑥! 𝑥# 𝑦! 𝑦# + 𝑥## 𝑦##
𝑥## 𝑦##
𝑥# 𝑥! 𝑦# 𝑦!
= 𝑥" 𝑦" + 𝑥# 𝑦# # = 𝑥 ! 𝑦 #

%
𝑥!# 𝑦!#

𝜙 𝑥 %
𝜙 𝑦 = 2𝑥! 𝑥# 2𝑦! 𝑦# = 𝑥!# 𝑦!# + 2𝑥! 𝑥# 𝑦! 𝑦# + 𝑥## 𝑦##
𝑥## 𝑦##

= 𝑥" 𝑦" + 𝑥# 𝑦# # = 𝑥 ! 𝑦 #

6
What 𝑘(𝑥, 𝑦) can be called a kernel function?
𝑘(𝑥, 𝑦) equivalent to first compute feature 𝜙(𝑥), and then
perform inner product 𝑘 𝑥, 𝑦 = 𝜙 𝑥 % 𝜙 𝑦

A dataset 𝐷 = 𝑥" , 𝑥# , 𝑥5 … 𝑥$

Compute pairwise kernel function 𝑘 𝑥& , 𝑥' and form a 𝑛×𝑛

kernel matrix (Gram matrix)
𝑘(𝑥! , 𝑥# ) … 𝑘(𝑥! , 𝑥+ )
𝐾= ⋮ ⋱ ⋮
𝑘(𝑥+ , 𝑥! ) … 𝑘(𝑥+ , 𝑥+ )

𝑘(𝑥, 𝑦) is a kernel function, iff matrix 𝐾 is positive semidefinite

∀𝑣 ∈ 𝑅 + , 𝑣 % 𝐾𝑣 ≥ 0
7
Support Vector Machines
Primal problem:
!
min 𝑤 % 𝑤 + 𝐶 ∑. 𝜉 .
,,- #

𝑠. 𝑡. 𝑤 % 𝑥 . + 𝑏 𝑦 . ≥ 1 − 𝜉 . , 𝜉 . ≥ 0, ∀𝑗
Can be high order
polynomial features
Lagrangian 𝜙 𝑥$
!
𝐿 𝑤, 𝜉, 𝛼, 𝛽 = 𝑤 % 𝑤 + 𝐶 ∑. 𝜉 . + ∑. 𝛼. (1 − 𝜉 . − P𝑤 % 𝑥 . +
#
𝑏 Q𝑦 . ) − β . 𝜉 .
𝛼/ ≥ 0, 𝛽/ ≥ 0

Take derivative of 𝐿 𝑤, 𝜉, 𝛼, 𝛽 with respect to 𝑤 and 𝜉 we

have
𝑤 = ∑. 𝛼. 𝑦 . 𝑥 .
𝑏 = 𝑦0 − 𝑤 % 𝑥0 for any 𝑘 such that 0 < 𝛼0 < 𝐶 8
SVM dual problem and kernelize
Plug in 𝑤 and 𝑏 into the Lagrangian, and the dual problem
! / . /% .
𝑀𝑎𝑥1 ∑/ 𝛼/ − ∑/,. 𝛼/ 𝛼. 𝑦 𝑦 𝑥 𝑥
#
𝑠. 𝑡. ∑/ 𝛼/ 𝑦 / = 0 Can be replaced by
#
0 ≤ 𝛼/ ≤ 𝐶 𝜙 𝑥 " 𝜙(𝑥 $ )
and 𝑘(𝑥 " , 𝑥 $ )

Other steps can also depend only on inner product

% . .%
𝑤 𝑥 = ∑. 𝛼. 𝑦 𝑥 𝑥
𝑏 = 𝑦0 − 𝑤 % 𝑥0 for any 𝑘 such that 0 < 𝛼0 < 𝐶

9
Illustration of kernel SVM
Kernel SVM
implicitly map data to a new nonlinear feature space
find linear decision boundary in the new space

10
Ridge regression and matrix inversion lemma
Matrix inversion lemma (𝐵 ∈ 𝑅$×H ):

𝐵𝐵% + 𝜆𝐼 I" 𝐵 = 𝐵 𝐵% 𝐵 + 𝜆𝐼 I"

Note that 𝑋 = (𝑥 " , 𝑥 (#) , … 𝑥 (H) )

Evaluate ridge regression solution: 𝜃 J = 𝑋𝑋 % + 𝜆𝐼 I" 𝑋𝑦 on a

new test point 𝑥

𝑥 % 𝜃 J = 𝑥 % 𝑋𝑋 % + 𝜆𝐼 I" 𝑋𝑦
= 𝑥 % 𝑋 𝑋 % 𝑋 + 𝜆𝐼 I" 𝑦

Only dependent on inner product between data points

11
Kernel ridge regression
𝑓 𝑥 = 𝜃 % x = 𝑦 % 𝑋 % 𝑋 + 𝜆𝐼+ $!
𝑋 % 𝑥 only depends on inner
products!

𝑥"! 𝑥" … 𝑥"! 𝑥%

𝑋!𝑋 = ⋮ ⋱ ⋮
𝑥%! 𝑥" … 𝑥%! 𝑥%

𝑥"! 𝑥
𝑋!𝑥 = ⋮
𝑥%! 𝑥

Kernel ridge regression: replace inner product by a kernel function

𝑋 ! 𝑋 → 𝐾 = 𝑘 𝑥& , 𝑥$
%×%
𝑋 ! 𝑥 → k ( = 𝑘 𝑥& , 𝑥 %×"
𝑓 𝑥 = 𝑦 ! 𝐾 + 𝜆𝐼% )"
𝑘*
12
Kernel ridge regression
Use Gaussian rbf kernel
*)+ !
𝑘 𝑥, 𝑦 = exp −
#, !

𝑙𝑎𝑟𝑔𝑒 𝜎, 𝑙𝑎𝑟𝑔𝑒 𝜆 𝑠𝑚𝑎𝑙𝑙 𝜎, 𝑠𝑚𝑎𝑙𝑙 𝜆 𝑠𝑚𝑎𝑙𝑙 𝜎, 𝑙𝑎𝑟𝑔𝑒 𝜆

Use cross-validation to choose parameters

13
Principal component analysis
Given a set of 𝑚 centered
observations 𝑥 & ∈ 𝑅( , PCA finds
the direction that maximizes the
variance
𝑋 = 𝑥!, 𝑥 #, … , 𝑥 2
∗ ! % / #
𝑤 = 𝑎𝑟𝑔𝑚𝑎𝑥 , 4! ∑/ 𝑤 𝑥
2
!
= 𝑎𝑟𝑔𝑚𝑎𝑥 , 4! 𝑤 % 𝑋𝑋 % 𝑤
2

" %, 𝑤 ∗
𝐶= H
𝑋𝑋 can be found by
solving the following eigen-value
problem
𝐶𝑤 = 𝜆 𝑤
14
Alternative expression for PCA
The principal component lies in the span of the data
𝑤 = I 𝛼& 𝑥 & = 𝑋𝛼
&
! ! !
𝑤 = 𝐶𝑤 = 𝑋𝑋 % w= 𝑋( 𝑋 % w)= 𝑋𝛼 for any 𝜆 > 0.
5 52 52
Plug this in we have
1
𝐶𝑤 = 𝑋𝑋 % 𝑋𝛼 = 𝜆𝑤 = 𝜆 𝑋𝛼
𝑚

Furthermore, for each data point 𝑥 & , the following relation

holds
/% ! /% % /%
Only depends on
𝑥 𝐶𝑤 = 𝑥 𝑋𝑋 𝑋𝛼 = 𝜆𝑥 𝑋𝛼, ∀𝑖 inner product matrix
2
! %
In matrix form, 𝑋 𝑋𝑋 % 𝑋𝛼 = 𝜆𝑋 % 𝑋𝛼
2
15
Kernel PCA
Key Idea: Replace inner product matrix by kernel matrix
"
PCA: 𝑋 ! 𝑋𝑋 ! 𝑋𝛼 = 𝜆𝑋 ! 𝑋𝛼
-

Kernel PCA:
𝑥. ↦ 𝜙 𝑥. , X ↦ Φ = 𝜙 𝑥" , … , 𝜙 𝑥. , 𝐾 = Φ ! Φ
Nonlinear principal component 𝑤 = Φ𝛼
" "
𝐾𝐾𝛼 = 𝜆𝐾𝛼, equivalent to 𝐾𝛼 = 𝜆 𝛼
- -
The solutions of the above two linear systems differ only for eigenvectors
of 𝐾 with zero eigenvalue.
! !
𝐾(" 𝐾𝛼 − 𝜆𝛼) = 0, " 𝐾𝛼 − 𝜆𝛼 can not belong to the null space of 𝐾 since
neither 𝐾𝛼 nor 𝛼 does (under the assumption that 𝐾𝛼 is nonzero.

Key computation: form an 𝑚 by 𝑚 kernel matrix 𝐾, and then

perform eigen-decomposition on 𝐾
16
Kernel PCA example
Gaussian radial basis function (RBF) kernel
"
TIT !
exp − #U"
over 2 dimensional space
Eigen-vector evaluated at a test point 𝑥 is a function
𝑤 % 𝜙 𝑥 = ∑& 𝛼& < 𝜙 𝑥 & , 𝜙 𝑥 > = ∑& 𝛼& 𝑘(𝑥 & , 𝑥)

17
Canonical correlation analysis

18
Canonical correlation analysis
Given 𝐷 = 𝑥" , 𝑦" , … , 𝑥 H , 𝑦 H ~𝑃(𝑥, 𝑦)
𝑋 = (𝑥 ! , 𝑥 # , … , 𝑥 2 )
𝑌 = (𝑦 ! , 𝑦 # , … , 𝑦 2 )

Find two vectors 𝑤T and 𝑤V , and project the data respectively

𝑋𝑤& and 𝑌𝑤'

Such that the correlations of the projected data are maximized

𝜌 = max 𝑐𝑜𝑟𝑟 𝑋𝑤T , 𝑌𝑤V
W#,W$
< 𝑋𝑤T , 𝑌𝑤V >
= max
W#,W$ 𝑋𝑤T 𝑌𝑤V

19
Matrix form of CCA
Define the covariance matrix of 𝑥, 𝑦

𝑥 𝑥 % 𝐶TT 𝐶TV
𝐶 = 𝔼(T,V) 𝑦 𝑦 =
𝐶VT 𝐶VV

The optimization problem is equal to

𝑤T% 𝐶TV 𝑤V
𝜌 = max
W#,W$
𝑤T% 𝐶TT 𝑤T 𝑤V% 𝐶VV 𝑤V

20
CCA as generalized eigenvalue problem
The optimality conditions say

C xy wy = lC xx wx
C yx wx = lC yy wy
,%& >%' ,'
𝜆= (set the gradient equal to zero).
,%& >%% ,%

Put these conditions into matrix format

æ 0 C xy öæ wx ö æ C xx 0 öæ wx ö
ç ÷ç ÷ = l ç ÷ç ÷
çC 0 ÷øçè wy ÷ø ç 0 C yy ÷øçè wy ÷ø
è yx è

Generalized eigenvalue problem 𝐴𝑤 = 𝜆𝐵𝑤

21
CCA in inner product format
Similar to PCA, the directions of projection lie in the span of
the data 𝑋 = 𝑥" , 𝑥 # , … , 𝑥 H , 𝑌 = (𝑦" , 𝑦 # , … , 𝑦 H )
𝑤& = 𝑋𝛼, 𝑤' = 𝑌𝛽
! !
𝐶&' = 𝑋𝑌 % , 𝐶&& = 𝑋𝑋 % , 𝐶'' = 𝑌𝑌 %
2 2

W#%X#$W$
Earlier we have 𝜌 = max
W#,W$ W %X W W %X W
# ## # $ $$ $

Plug in 𝑤T = 𝑋𝛼, 𝑤V = 𝑌𝛽
Data only appear in
inner products
a T X T XY T Yb
r = max
a ,b
a T X T XX T Xa b T Y T YY T Yb
22
Kernel CCA
Replace inner product matrix by kernel matrix
a T KxK yb
r = max
a ,b
a K x K xa b K y K y b
T T

Where 𝐾T is kernel matrix for data 𝑋, with entries 𝐾T 𝑖, 𝑗 =

𝑘 𝑥& , 𝑥'

Solve generalized eigenvalue problem

æ 0 K x K y öæ a ö æ KxKx 0 öæ a ö
ç ÷çç ÷÷ = l ç ÷çç ÷÷
çK K 0 ÷øè b ø ç 0 K y K y ÷øè b ø
è y x è
23

Kernel Models for Data Scientists
No ratings yet
Kernel Models for Data Scientists
56 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
Lecture03 Kernel
No ratings yet
Lecture03 Kernel
28 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
Lecture-Notes Kernal Methods
No ratings yet
Lecture-Notes Kernal Methods
12 pages
Lecture 8 - Kernels
No ratings yet
Lecture 8 - Kernels
32 pages
Machine Learning: Kernel Methods
No ratings yet
Machine Learning: Kernel Methods
6 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Week 9 Notes
No ratings yet
Week 9 Notes
6 pages
4c Kernels
No ratings yet
4c Kernels
31 pages
Machine Learning Course - Kernel Regression
No ratings yet
Machine Learning Course - Kernel Regression
9 pages
05 Kernel
No ratings yet
05 Kernel
24 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Kernel Methods For General Pattern Analysis PDF
No ratings yet
Kernel Methods For General Pattern Analysis PDF
77 pages
Kernel Methods
No ratings yet
Kernel Methods
32 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
Practice Problems For ML Midterms
No ratings yet
Practice Problems For ML Midterms
5 pages
Kernel Trick
No ratings yet
Kernel Trick
40 pages
Kernel Methods
No ratings yet
Kernel Methods
6 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
Slides Chap5 KernelMethods
No ratings yet
Slides Chap5 KernelMethods
24 pages
Multivariat Kernel Regression
No ratings yet
Multivariat Kernel Regression
3 pages
Kernel Methods for ML Experts
No ratings yet
Kernel Methods for ML Experts
29 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
33 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
Advanced Kernel Methods Course
No ratings yet
Advanced Kernel Methods Course
644 pages
Kernel Ridge Regression
No ratings yet
Kernel Ridge Regression
8 pages
Wk02 Machine Learning
No ratings yet
Wk02 Machine Learning
4 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
KPCA
No ratings yet
KPCA
26 pages
Kernel Methods for Statisticians
No ratings yet
Kernel Methods for Statisticians
53 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
53 pages
Kernel Methods
No ratings yet
Kernel Methods
19 pages
Kernal and Multiclass
No ratings yet
Kernal and Multiclass
51 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
Lect 3
No ratings yet
Lect 3
14 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
SVM 4
No ratings yet
SVM 4
8 pages
Chap6 1-KernelMethods
No ratings yet
Chap6 1-KernelMethods
36 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Kernel Nearest-Neighbor Algorithm
No ratings yet
Kernel Nearest-Neighbor Algorithm
10 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
Practice Assignment 3
No ratings yet
Practice Assignment 3
2 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
3 pages
Advanced Dimension Reduction Techniques
No ratings yet
Advanced Dimension Reduction Techniques
40 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
5 pages
Divide and Conquer Kernel Ridge Regression: University of California, Berkeley University of California, Berkeley
No ratings yet
Divide and Conquer Kernel Ridge Regression: University of California, Berkeley University of California, Berkeley
26 pages
On The Nystr Om Method For Approximating A Gram Matrix For Improved Kernel-Based Learning
No ratings yet
On The Nystr Om Method For Approximating A Gram Matrix For Improved Kernel-Based Learning
23 pages
Machine Learning 3
No ratings yet
Machine Learning 3
35 pages
ICS E4030 Lecture1
No ratings yet
ICS E4030 Lecture1
37 pages
Medium Com @jtchen2k Kernel Density Estimation With Python From Scratch c200b187...
No ratings yet
Medium Com @jtchen2k Kernel Density Estimation With Python From Scratch c200b187...
8 pages
Kernel Density Estimation Guide
No ratings yet
Kernel Density Estimation Guide
105 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
Kernel Smoothing & Regression Guide
No ratings yet
Kernel Smoothing & Regression Guide
5 pages
Financial Market Volatility Forecasting
No ratings yet
Financial Market Volatility Forecasting
52 pages
Namma Kalvi 12th Maths Chapter 3 Study Material em 213435
No ratings yet
Namma Kalvi 12th Maths Chapter 3 Study Material em 213435
22 pages
Mathcs41 Module 4
No ratings yet
Mathcs41 Module 4
28 pages
Matrix Decomposition for ML Experts
No ratings yet
Matrix Decomposition for ML Experts
184 pages
Linear Optimization Lecture
No ratings yet
Linear Optimization Lecture
3 pages
Unit 3 Cubic and Biquadratic Equations: Structure
100% (1)
Unit 3 Cubic and Biquadratic Equations: Structure
37 pages
ECE 490: Optimization Problem Set
No ratings yet
ECE 490: Optimization Problem Set
2 pages
Graphs Polynomial Functions NOTES
No ratings yet
Graphs Polynomial Functions NOTES
4 pages
Rajesh P Mishra 2152A: Classical Optimization Theory
No ratings yet
Rajesh P Mishra 2152A: Classical Optimization Theory
22 pages
Simplex Method Calculator1
No ratings yet
Simplex Method Calculator1
3 pages
Chebyshev and Fourier Spectral Methods 2 Revised Edition John P. Boyd Download
No ratings yet
Chebyshev and Fourier Spectral Methods 2 Revised Edition John P. Boyd Download
52 pages
CIV - 2nd Year Syllabus 2017 18 Batch Admiited Students PDF
No ratings yet
CIV - 2nd Year Syllabus 2017 18 Batch Admiited Students PDF
44 pages
M.Tech Civil Engg Syllabus
No ratings yet
M.Tech Civil Engg Syllabus
45 pages
Advanced Spline Theory Guide
No ratings yet
Advanced Spline Theory Guide
1 page
Cho 24aps2102 Cont 2024 Sem2
No ratings yet
Cho 24aps2102 Cont 2024 Sem2
8 pages
Heriot Watt University Reservoir Simulation
No ratings yet
Heriot Watt University Reservoir Simulation
485 pages
School of Advanced Sciences MAT5005: Advanced Mathematical Methods Question Bank
No ratings yet
School of Advanced Sciences MAT5005: Advanced Mathematical Methods Question Bank
2 pages
MSC Physics Syllabus 2022 2023
No ratings yet
MSC Physics Syllabus 2022 2023
50 pages
Assignment 5 Alain Bezing
No ratings yet
Assignment 5 Alain Bezing
4 pages
Polynomial Interpolation Guide
No ratings yet
Polynomial Interpolation Guide
10 pages
Gaussian Elimination Method
No ratings yet
Gaussian Elimination Method
6 pages
50 Quadratic Tricks With NDA PYQs
No ratings yet
50 Quadratic Tricks With NDA PYQs
5 pages
Complete Polynomial Sheet
No ratings yet
Complete Polynomial Sheet
11 pages
UNIT 3.1 (Greedy Method - Knapsack Problem)
No ratings yet
UNIT 3.1 (Greedy Method - Knapsack Problem)
24 pages
LA-Lecture Notes by Rashid A.
No ratings yet
LA-Lecture Notes by Rashid A.
98 pages
Solved Long Division Problems With Step-By-Step Walkthrough: Solutions Are On Page 2
No ratings yet
Solved Long Division Problems With Step-By-Step Walkthrough: Solutions Are On Page 2
2 pages
Convex and Concave Functions
No ratings yet
Convex and Concave Functions
3 pages
Matlab Random Waves
No ratings yet
Matlab Random Waves
185 pages
Using ANSYS For Finite Element Analysis, Volume I A Tutorial For Engineers by Wael A. Altabey, Mohammad Noori, Libin Wang PDF
100% (2)
Using ANSYS For Finite Element Analysis, Volume I A Tutorial For Engineers by Wael A. Altabey, Mohammad Noori, Libin Wang PDF
210 pages
Numerical Solutions for Linear Systems
No ratings yet
Numerical Solutions for Linear Systems
9 pages
I. Model Problems. II. Practice III. Challenge Problems VI. Answer Key
No ratings yet
I. Model Problems. II. Practice III. Challenge Problems VI. Answer Key
4 pages

Lecture17 Kernels

Uploaded by

Lecture17 Kernels

Uploaded by

Kernel Methods

A. Russell Chandler III Chair Professor

Want to fit a polynomial regression model

Let 𝑥( = 1, 𝑥, 𝑥 # , … , 𝑥 $ % and 𝜃 = 𝜃! , 𝜃" , 𝜃# , … , 𝜃$ %

Eg. Polynomial feature of degree 𝑑

Can we merge two steps using a clever function 𝑘(𝑥& , 𝑥' )

Polynomial kernel degreee 𝑑, 𝑘 𝑥, 𝑦 = 𝑥 % 𝑦 ( = 𝜙 𝑥 %𝜙 𝑦

Exponential kernel (infinite degree polynomials)

Compute pairwise kernel function 𝑘 𝑥& , 𝑥' and form a 𝑛×𝑛

𝑘(𝑥, 𝑦) is a kernel function, iff matrix 𝐾 is positive semidefinite

Take derivative of 𝐿 𝑤, 𝜉, 𝛼, 𝛽 with respect to 𝑤 and 𝜉 we

Other steps can also depend only on inner product

𝐵𝐵% + 𝜆𝐼 I" 𝐵 = 𝐵 𝐵% 𝐵 + 𝜆𝐼 I"

Note that 𝑋 = (𝑥 " , 𝑥 (#) , … 𝑥 (H) )

Evaluate ridge regression solution: 𝜃 J = 𝑋𝑋 % + 𝜆𝐼 I" 𝑋𝑦 on a

Only dependent on inner product between data points

𝑥"! 𝑥" … 𝑥"! 𝑥%

Kernel ridge regression: replace inner product by a kernel function

𝑙𝑎𝑟𝑔𝑒 𝜎, 𝑙𝑎𝑟𝑔𝑒 𝜆 𝑠𝑚𝑎𝑙𝑙 𝜎, 𝑠𝑚𝑎𝑙𝑙 𝜆 𝑠𝑚𝑎𝑙𝑙 𝜎, 𝑙𝑎𝑟𝑔𝑒 𝜆

Use cross-validation to choose parameters

Furthermore, for each data point 𝑥 & , the following relation

Key computation: form an 𝑚 by 𝑚 kernel matrix 𝐾, and then

Find two vectors 𝑤T and 𝑤V , and project the data respectively

Such that the correlations of the projected data are maximized

The optimization problem is equal to

Put these conditions into matrix format

Generalized eigenvalue problem 𝐴𝑤 = 𝜆𝐵𝑤

Where 𝐾T is kernel matrix for data 𝑋, with entries 𝐾T 𝑖, 𝑗 =

Solve generalized eigenvalue problem

You might also like