Support Vector Machine
Support Vector Machine
1
2/27/2024
where f(x) is the objective function, ci(x) and dj(x) are constraint
functions
• A simple method for solving constrained optimization problems is to
substitute variables from the constraint function in the objective
function, so that the optimal value of the objective function can be
determined.
2
2/27/2024
Substitution Method
Example:
Determine the solution to the following constrained optimization
problem:
Answer:
- From the constraints it is obtained that x2 = -8/3 x1
- Substitution into the objective function: f(x) = -8/3 x13 – ln(x1)
- If the derivative f(x) is equal to zero, then x1 = -1/2
- So, the solution is x = (-1/2, 4/3)T
5
Lagrange Method
3
2/27/2024
Lagrange Method
Example:
Determine the solution to the following constrained optimization problem:
Answer:
- Lagrange Function:
- If the derivative L(x, α) = 0, then
4
2/27/2024
Boundary Function
Linear Model
10
5
2/27/2024
Boundary Function
Hyperplane
11
Boundary Function
Hyperplane: Parameter Interpretation
12
6
2/27/2024
y(x) = 0
y(x) > 0 y(x) < 0 Given training data {xn, tn}, n = 1 to N
where tn ϵ {-1, +1}
• Let data xn is classified to class tn = +1 if
y(xn) 0 and class tn = -1 if y(xn) < 0,
then all data is classified correctly if
they meet tny(xn) 0 for all n
• Problem:
How to determine the parameters w
and b such that tny(xn) 0 for all n
Class 1 Class 2
(t = +1) (t = -1)
13
14
7
2/27/2024
• Why maximum?
• Based on intuition, the maximum
margin is a safe choice because if
there is a small error in the data it
will provide the smallest possibility
of misclassification
• Based on VC theory (1960-1990),
maximum margin will provide the
best generalization capability
15
𝑡 𝑦 𝒙 𝑡 𝒘 ϕ 𝒙 +𝑏
=
∥𝒘∥ ∥𝒘∥
1
𝑚𝑎𝑥 𝑚𝑖𝑛 𝑡 𝑤 ϕ 𝑥 +𝑏
, ∥𝑤∥
16
8
2/27/2024
𝑡 𝑤 ϕ 𝑥 +𝑏 ≥1
17
18
9
2/27/2024
19
20
10
2/27/2024
1
𝑚𝑖𝑛 ∥𝒘∥
𝒘, 2
𝑠. 𝑡. 𝑡 𝒘 ϕ 𝒙 + 𝑏 ≥ 1, 𝑛 = 1, . . , 𝑁
𝑠. 𝑡. 𝑡 𝒘 ϕ 𝒙 + 𝑏 ≥ 1 − ξ , 𝑛 = 1, . . , 𝑁
ξ ≥0
where the parameter C > 0 will control the trade-off between the amount
of data in the wrong class (narrowing margin) and the generalization
capability (widening margin)
21
1
𝐿 𝒘, 𝑏 = ∥ 𝒘 ∥ +𝐶 ∑ ξ − ∑ 𝑎 {𝑡 𝒘 ϕ 𝒙 +𝑏 −1+ξ }− ∑ μ ξ
2
22
11
2/27/2024
1
𝐿 𝒘, 𝑏 = ∥ 𝒘 ∥ +𝐶 ξ − 𝑎 𝑡 𝒘 ϕ 𝒙 +𝑏 −1+ξ − μ ξ
2
𝑎 ≥0 [𝐾𝐾𝑇 − 𝑆1]
𝑡 𝑦 𝑥 −1+ξ ≥0 [𝐾𝐾𝑇 − 𝑆2]
𝑎 𝑡 𝑦 𝑥 −1+ξ =0 [𝐾𝐾𝑇 − 𝑆3]
μ ≥0 [𝐾𝐾𝑇 − 𝑆4]
ξ ≥0 [𝐾𝐾𝑇 − 𝑆5]
μ ξ =0 [𝐾𝐾𝑇 − 𝑆6]
23
1
𝐿 𝒘, 𝑏 = ∥ 𝑤 ∥ +𝐶 ∑ ξ − ∑ 𝑎 {𝑡 𝑤 ϕ 𝑥 +𝑏 −1+ξ }− ∑ μ ξ
2
𝜕𝐿
=𝑤− ∑ 𝑎 𝑡 ϕ 𝑥 =0→𝑤= ∑ 𝑎 𝑡 ϕ 𝑥
𝜕𝒘
𝜕𝐿
= 𝑎 𝑡 =0→ 𝑎 𝑡 =0
𝜕𝑏
𝜕𝐿
=𝐶−𝑎 −μ =0→𝑎 =𝐶−μ
𝜕ξ
24
12
2/27/2024
Thus becoming:
1
𝐿 𝒂 = 𝑎 𝑎 𝑡 𝑡 ϕ 𝑥 ϕ 𝑥 − 𝑎 𝑎 𝑡 𝑡 ϕ 𝑥 ϕ 𝑥 −𝑏 𝑎 𝑡 + 𝑎
2
1
= 𝑎 − 𝑎 𝑎 𝑡 𝑡 ϕ 𝑥 ϕ 𝑥
2
Kernel Function
1
= ∑ 𝑎 − ∑ ∑ 𝑎 𝑎 𝑡 𝑡 𝑘 𝑥 ,𝑥
2
25
• So, the dual form of the soft margin optimization problem is:
1
𝑚𝑎𝑥 𝐿 𝒂 = 𝑎 − 𝑎 𝑎 𝑡 𝑡 𝑘 𝒙 ,𝒙
𝒂 2
𝑠. 𝑡. 0 ≤ 𝑎 ≤ 𝐶, 𝑛 = 1, . . , 𝑁 (Slide 27 [KKT-S1, KKT-s4], Slide 28)
∑ 𝑎 𝑡 =0 (Slide 28)
26
13
2/27/2024
𝑦 𝒙 =𝒘 ϕ 𝒙 +𝑏 = 𝑎 𝑡 ϕ 𝒙 ϕ 𝒙 +𝑏 = 𝑎 𝑡 𝑘 𝒙, 𝒙 +𝑏
• Only data with an > 0 (support vectors) plays a role, so the boundary
function can be written as:
𝑦 𝒙 = 𝑎 𝑡 𝑘 𝒙, 𝒙 +𝑏
∈
27
28
14
2/27/2024
• The bias value can be determined based on a model that only contains support
vectors, namely training data with an > 0. Based on [KKT-S3] that an{tny(xn) – 1 + ξn}
= 0, then
𝑡 𝑦 𝑥 = 1−ξ
• By only using the support vector located at the margin (ξn = 0), then
𝑡 𝑦 𝑥 =1
1
𝑡 𝑎 𝑡 𝑘 𝑥 ,𝑥 +𝑏 =1→𝑏 = 𝑡 − 𝑎 𝑡 𝑘 𝑥 ,𝑥
𝑁
29
30
15
2/27/2024
Prediction Function
𝑓 𝒙 = 𝑑 𝑦 𝒙 =𝑑 𝑎 𝑡 𝑘 𝐱, 𝒙 +𝑏
where
31
Process Illustration
• Three data are given in the following table, where the first two data are
training data, and the third data is testing data. By using a linear kernel
function k(x,z) = xTz and C = 0.5, determine the accuracy of the modelSVC.
n 1 2 3
xn (1,0)T (5,0)T (4.5, 2)T
tn -1 1 1
Answer:
32
16
2/27/2024
Process Illustration
n 1 2 3
xn (1,0)T (5,0)T (4.5, 2)T
tn -1 1 1
33
34
17
2/27/2024
Kernel Function
Definition
36
18
2/27/2024
Kernel Function
Example
Proof:
Let x=(x1,x2), z=(z1,z2) then
(xTz)2 = (x1z1 + x2z2)2
= x12z12 + 2x1z1x2z2 + x22z22
= (x12, √2 x1x2, x22)T (z12, √2 z1z2, z22)
= (x)T(z)
37
Kernel Function
Practicality
Example:
– Let x = (1,2)T, z = (3,5)T is vector in original space R2
– (x) = (x12, √2 x1x2, x22) is mapping function from original space R2 to
higher dimensional space R3
– Inner product of x and z on higher dimensional space R3 is (x)T(z) =
169
– The above inner product can be determined directly from the original
space R2 using the kernel function, i.e., k(x,z) = (xTz)2 = 169
38
19
2/27/2024
Kernel Function
Practicality
Example:
The distance between two vectors in a higher dimensional
space can be found directly from the original space using the
kernel function, i.e:
||(x)-(z)||2 = (x)T(x) + (z)T(z) - 2(x)T(z)
= k(x,x) + k(z,z) - 2k(x,z)
39
Kernel Function
Practicality
40
20
2/27/2024
Kernel Function
Popular Examples
• Linear :
k(xi, xj) = xiT xj
• Polynomial :
k(xi, xj) = ( xiT xj + r)d, where > 0
• Radial basis function (RBF) :
k ( x i , x j )=exp {− γ ∥ x i− x j∥ }
2
• Sigmoid :
T
k ( x i , x j ) = tanh (ρ x i x j + r ) ,
1
dimana tanh (a ) = 2 σ (a)− 1, dan σ (a) =
1+exp (a)
41
21
2/27/2024
SVC SVR
43
44
22
2/27/2024
𝑠. 𝑡. 𝑡 ≤ y 𝒙 + ε + ξ
𝑡 ≥y 𝒙 −ε−ξ
ξ ≥ 0, ξ ≥ 0
45
SVC SVR
1 1
𝑚𝑖𝑛 ∥ 𝒘 ∥ +𝐶 ∑ ξ 𝑚𝑖𝑛 𝐶 ∑ (ξ +ξ ) + ∥𝒘∥
𝒘, 2 𝒘, 2
𝑠. 𝑡. 𝑡 y 𝒙 ≥ 1 − ξ 𝑠. 𝑡. 𝑡 ≤ y 𝒙 + ε + ξ
ξ ≥0 𝑡 ≥y 𝒙 −ε−ξ
ξ ≥ 0, ξ ≥ 0
46
23
2/27/2024
1 1
𝑚𝑖𝑛 𝐶 ∑ (𝑡 − 𝑦(𝒙 )) + ∥ 𝒘 ∥ 𝑚𝑖𝑛 𝐶 ∑ (ξ +ξ ) + ∥𝒘∥
𝒘, 2 𝒘, 2
𝑠. 𝑡. 𝑡 ≤ y 𝒙 + ε + ξ
𝑡 ≥y 𝒙 −ε−ξ
ξ ≥ 0, ξ ≥ 0
47
48
24
2/27/2024
a n (ϵ+ξ n+ y n− t n ) = 0 [ KKT − 1]
ān (ϵ+ξ̄n − y n+t n ) = 0 [ KKT − 2]
(C − a n ) ξ n = 0 [ KKT − 3]
(C − ān ) ξ̄n = 0 [ KKT − 4]
49
50
25
2/27/2024
so becomes:
51
(Slide 20)
52
26
2/27/2024
• Only data with an* ≠ 0 and an*≠ 0 (support vectors) which plays a role in the
SVR model, so it can be written as:
𝑦 𝑥 = ∑ 𝑎 ∗ − 𝑎 ∗ 𝑘 𝑥, 𝑥 +𝑏
∈
53
54
27
2/27/2024
(1) (C − an )ξ n = 0 ( KKT − 3)
(2) a n (ε+ξ n + y n − t n ) = 0 ( KKT − 1)
(3) y ( x) = w T ϕ( x)+b
𝑏 =𝑦 𝑥 −𝑤 ϕ 𝑥
=𝑡 −ε−𝑤 ϕ 𝑥
=𝑡 −ε− 𝑎∗ − 𝑎∗ 𝑘 𝑥 , 𝑥
∈
55
56
28
2/27/2024
Prediction Function
• For example, given data z, predictions from that data are determined
based on the regression function y(x), i.e., y(z)
57
58
29
2/27/2024
References
30