[go: up one dir, main page]

0% found this document useful (0 votes)
4 views30 pages

Support Vector Machine

The document provides an overview of Support Vector Machines (SVM), focusing on their application in classification and regression problems. It discusses the formulation of SVM as a constrained optimization problem, including methods like the substitution method and Lagrange method for solving these problems. Additionally, it covers the concepts of maximum margin and soft margin in SVM, detailing the optimization processes involved in determining the hyperplane for classification.

Uploaded by

erlanderrrachmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views30 pages

Support Vector Machine

The document provides an overview of Support Vector Machines (SVM), focusing on their application in classification and regression problems. It discusses the formulation of SVM as a constrained optimization problem, including methods like the substitution method and Lagrange method for solving these problems. Additionally, it covers the concepts of maximum margin and soft margin in SVM, detailing the optimization processes involved in determining the hyperplane for classification.

Uploaded by

erlanderrrachmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

2/27/2024

SCMA801204 – Advance Machine Learning

Support Vector Machine

Dr. rer. nat. Hendri Murfi

Machine Learning Group


Department of Mathematics, Universitas Indonesia – Depok 16424
Telp. +62-21-7862719/7863439, Fax. +62-21-7863439, Email. hendri@sci.ui.ac.id

Support Vector Machine

• Initially, support vector machines (SVM) were used to solve


classification problems. Furthermore, this method was also
developed to solve regression and other problems

• SVM for classification problems is often also referred to as support


vector classification (SVC), while for regression problems it is
support vector regression (SVR).

• In general, the learning process in SVM is formulated as a bound


constrained optimization problem.

1
2/27/2024

Support Vector Machine


Constrained Optimization Theory

Constrained Optimization Problems


Form and Method

• The general form of a constrained optimization problem is:

where f(x) is the objective function, ci(x) and dj(x) are constraint
functions
• A simple method for solving constrained optimization problems is to
substitute variables from the constraint function in the objective
function, so that the optimal value of the objective function can be
determined.

2
2/27/2024

Substitution Method

Example:
Determine the solution to the following constrained optimization
problem:

Answer:
- From the constraints it is obtained that x2 = -8/3 x1
- Substitution into the objective function: f(x) = -8/3 x13 – ln(x1)
- If the derivative f(x) is equal to zero, then x1 = -1/2
- So, the solution is x = (-1/2, 4/3)T
5

Lagrange Method

• The weakness of the substitution method is that the variables in the


constraint function cannot always be expressed in terms of other
variables to determine the optimal value of the objective function.
• An alternative method is the Lagrange method:

where L(x, α, μ) is the Lagrange function, α and μ are Lagrange


multipliers, the three conditions on the constraint are called Karush-
Kuhn-Tucker (KKT) conditions

3
2/27/2024

Lagrange Method
Example:
Determine the solution to the following constrained optimization problem:

Answer:
- Lagrange Function:
- If the derivative L(x, α) = 0, then

- From equations 1 and 2: 2x1 = x2


- Substitute into equation 3: x1 = 1/3
- So, the solution is x = (1/3, 2/3)T
7

Support Vector Machine


Support Vector Classification

4
2/27/2024

Two-Class Classification Problems


Problem Formulation

Given training data {xn, tn}, n = 1 to N


where tn ϵ {-1,+1}
boundary function
• Problem:
How to get a model y(x) that
represents the training data
• General Method:
Looking for a model y(x) as a
boundary function between the
two classes, so that all training data
Class 1 (+1) Class 2 (-1)
xn is in the appropriate class

Boundary Function
Linear Model

• SVC uses a linear model as a boundary function with the


general form as follows:
y(x) = wT(x) + b
where x is input vector, w is weight, (x) is basis function,
and b is bias

10

5
2/27/2024

Boundary Function
Hyperplane

• Let the basis function is a linear function (x) = x, then the


boundary function is a hyperplane, i.e.:
y(x) = wTx + b

11

Boundary Function
Hyperplane: Parameter Interpretation

• Parameter w determine the orientation of hyperplane


Proof: if xA and xB lie on hyperplane, then y(xA) = y(xB) = 0 or wT(xA-xB) =
0, so w is orthogonal to all vectors in the hyperplane
• Parameter b determine the location of hyperplane
Proof: the distance of any vector x to the hyperplane and in the direction
of w is y(x)/||w||, so the distance from the starting point to the
hyperplane is -b/||w||

12

6
2/27/2024

Support Vector Classification


Problem Formulation

y(x) = 0
y(x) > 0 y(x) < 0 Given training data {xn, tn}, n = 1 to N
where tn ϵ {-1, +1}
• Let data xn is classified to class tn = +1 if
y(xn)  0 and class tn = -1 if y(xn) < 0,
then all data is classified correctly if
they meet tny(xn)  0 for all n
• Problem:
How to determine the parameters w
and b such that tny(xn)  0 for all n
Class 1 Class 2
(t = +1) (t = -1)

13

Support Vector Classification


Maximum Margin

• Margin is defined as the shortest • To determine the boundary


distance between the function, SVC selects the
hyperplane and the training data hyperplane that has the
maximum margin

14

7
2/27/2024

Support Vector Classification


Maximum Margin

• Why maximum?
• Based on intuition, the maximum
margin is a safe choice because if
there is a small error in the data it
will provide the smallest possibility
of misclassification
• Based on VC theory (1960-1990),
maximum margin will provide the
best generalization capability

15

Support Vector Classification


Maximum Margin

• Assuming all data can be separated


linearly (linearly separable), then the
distance between any data xn to
hyperplane is:

𝑡 𝑦 𝒙 𝑡 𝒘 ϕ 𝒙 +𝑏
=
∥𝒘∥ ∥𝒘∥

• So, the selection of the hyperplane


that has the maximum margin can be
formulated as follows:

1
𝑚𝑎𝑥 𝑚𝑖𝑛 𝑡 𝑤 ϕ 𝑥 +𝑏
, ∥𝑤∥

16

8
2/27/2024

Support Vector Classification


Maximum Margin

• The solution to the previous optimization problem will be very


complex, so it needs to be converted to an equivalent problem that
is easier to solve

• One method is to use the canonical form of


hyperplane:
T
t n (w ϕ( x n )+b) = 1

for data closest to the hyperplane.


Furthermore, all learning data will be
classified correctly if it meets:

𝑡 𝑤 ϕ 𝑥 +𝑏 ≥1

17

Support Vector Classification


Quadratic Programming

• So, the problem of determining hyperplane with the previous


maximum margin, namely:
1
𝑚𝑎𝑥 𝑚𝑖𝑛 𝑡 𝒘 ϕ 𝒙 +𝑏
𝒘, ∥𝒘∥
becomes an optimization problem with the following constraints:
1
𝑚𝑖𝑛 ∥𝒘∥
𝒘, 2
𝑠. 𝑡. 𝑡 𝒘 ϕ 𝒙 + 𝑏 ≥ 1, 𝑛 = 1, . . , 𝑁

• In other words, determining the values of the parameters w and b


becomes a quadratic programming problem, namely minimizing a
quadratic function with the constrain in form of a linear inequality

18

9
2/27/2024

Support Vector Classification


Soft Margin

• In the previous problem


formulation, also known as hard
margin, all training data is assumed
to be separable by a linear boundary
function.
• In real problems, these conditions
often cannot be met because of
overlapping data between classes
• To solve this problem, the approach
used is a soft margin which allows
some data to be in the "wrong class"

19

Support Vector Classification


Soft Margin

• To realize this soft margin, a slack variable


ξn ≥ 0 is introduced with one slack
variable for each training data
• The slack variable is formulated as
ξn = |tn – y(xn)|, so:
– ξn = 0 is data that lies on or outside
margin and the correct side
– 0 < ξn ≤ 1, the data that lies within the
margin and the correct side
– ξn > 1, data that lies on the wrong
side

20

10
2/27/2024

Support Vector Classification


Soft Margin

• The constrained optimization problem for hard margins is:

1
𝑚𝑖𝑛 ∥𝒘∥
𝒘, 2
𝑠. 𝑡. 𝑡 𝒘 ϕ 𝒙 + 𝑏 ≥ 1, 𝑛 = 1, . . , 𝑁

Then, the constrained optimization problem for soft margins is:


1
𝑚𝑖𝑛 ∥ 𝒘 ∥ +𝐶 ∑ ξ
𝒘, 2

𝑠. 𝑡. 𝑡 𝒘 ϕ 𝒙 + 𝑏 ≥ 1 − ξ , 𝑛 = 1, . . , 𝑁
ξ ≥0
where the parameter C > 0 will control the trade-off between the amount
of data in the wrong class (narrowing margin) and the generalization
capability (widening margin)
21

Solving Soft Margin


Lagrange Method

• To solve the constrained optimization problems of soft margins, you


can use the Lagrange method
• The Lagrange function of the constrained optimization problem is:

1
𝐿 𝒘, 𝑏 = ∥ 𝒘 ∥ +𝐶 ∑ ξ − ∑ 𝑎 {𝑡 𝒘 ϕ 𝒙 +𝑏 −1+ξ }− ∑ μ ξ
2

where an ≥ 0 and μn ≥ 0 are the Lagrange multipliers for each


constraint

22

11
2/27/2024

Solving Soft Margin


Lagrange Method

1
𝐿 𝒘, 𝑏 = ∥ 𝒘 ∥ +𝐶 ξ − 𝑎 𝑡 𝒘 ϕ 𝒙 +𝑏 −1+ξ − μ ξ
2

The Karush-Kuhn-Tucker (KKT) conditions or conditions that must be met by


the solution of the controlled optimization problem are:

𝑎 ≥0 [𝐾𝐾𝑇 − 𝑆1]
𝑡 𝑦 𝑥 −1+ξ ≥0 [𝐾𝐾𝑇 − 𝑆2]
𝑎 𝑡 𝑦 𝑥 −1+ξ =0 [𝐾𝐾𝑇 − 𝑆3]
μ ≥0 [𝐾𝐾𝑇 − 𝑆4]
ξ ≥0 [𝐾𝐾𝑇 − 𝑆5]
μ ξ =0 [𝐾𝐾𝑇 − 𝑆6]

23

Solving Soft Margin


Lagrange Method

1
𝐿 𝒘, 𝑏 = ∥ 𝑤 ∥ +𝐶 ∑ ξ − ∑ 𝑎 {𝑡 𝑤 ϕ 𝑥 +𝑏 −1+ξ }− ∑ μ ξ
2

• The derivative of L(w, b) with respect to w, b and  is equal to zero, then:

𝜕𝐿
=𝑤− ∑ 𝑎 𝑡 ϕ 𝑥 =0→𝑤= ∑ 𝑎 𝑡 ϕ 𝑥
𝜕𝒘

𝜕𝐿
= 𝑎 𝑡 =0→ 𝑎 𝑡 =0
𝜕𝑏
𝜕𝐿
=𝐶−𝑎 −μ =0→𝑎 =𝐶−μ
𝜕ξ

24

12
2/27/2024

Solving Soft Margin


Lagrange Method

• Substitute the derivative result into the Lagrange function, namely:


1
𝐿 𝑤, 𝑏 = ∥ 𝑤 ∥ +𝐶 ξ − 𝑎 𝑡 𝑤 ϕ 𝑥 +𝑏 −1+ξ − μ ξ
2

Thus becoming:
1
𝐿 𝒂 = 𝑎 𝑎 𝑡 𝑡 ϕ 𝑥 ϕ 𝑥 − 𝑎 𝑎 𝑡 𝑡 ϕ 𝑥 ϕ 𝑥 −𝑏 𝑎 𝑡 + 𝑎
2

1
= 𝑎 − 𝑎 𝑎 𝑡 𝑡 ϕ 𝑥 ϕ 𝑥
2
Kernel Function
1
= ∑ 𝑎 − ∑ ∑ 𝑎 𝑎 𝑡 𝑡 𝑘 𝑥 ,𝑥
2

25

Solving Soft Margin


Dual Form

• So, the dual form of the soft margin optimization problem is:

1
𝑚𝑎𝑥 𝐿 𝒂 = 𝑎 − 𝑎 𝑎 𝑡 𝑡 𝑘 𝒙 ,𝒙
𝒂 2
𝑠. 𝑡. 0 ≤ 𝑎 ≤ 𝐶, 𝑛 = 1, . . , 𝑁 (Slide 27 [KKT-S1, KKT-s4], Slide 28)

∑ 𝑎 𝑡 =0 (Slide 28)

• The resulting dual form is also in the form of quadratic programming


with simpler constraints, also known as bound-constrained
optimization

26

13
2/27/2024

Solving Soft Margin


Dual Form: Weight

• The solution to the dual form quadratic programming is a, so:

𝒘= 𝑎 𝑡 ϕ 𝒙 (Slide 28) Kernel Function

𝑦 𝒙 =𝒘 ϕ 𝒙 +𝑏 = 𝑎 𝑡 ϕ 𝒙 ϕ 𝒙 +𝑏 = 𝑎 𝑡 𝑘 𝒙, 𝒙 +𝑏

• Only data with an > 0 (support vectors) plays a role, so the boundary
function can be written as:

𝑦 𝒙 = 𝑎 𝑡 𝑘 𝒙, 𝒙 +𝑏

where S is a set of indices of support vectors

27

Solving Soft Margin


Dual Form: Support Vectors

• Based on Slide 28 that an = C – μn, Slide


30 and 32 that 0 < an ≤ C
• If an < C then μn > 0. Next, Based on
[KKT-S6] that μnξn = 0 then ξn = 0. So,
support vectors is training data that lies
on margin
• If an = C, then μn = 0. From [KKT-S6] that
μnξn = 0 then ξn ≠ 0. So, support vectors
is training data that lies inside margin,
either classified correctly (ξn ≤ 1) or
incorrectly (ξn > 1)

28

14
2/27/2024

Solving Soft Margin


Dual Form: Bias

• The bias value can be determined based on a model that only contains support
vectors, namely training data with an > 0. Based on [KKT-S3] that an{tny(xn) – 1 + ξn}
= 0, then
𝑡 𝑦 𝑥 = 1−ξ

• By only using the support vector located at the margin (ξn = 0), then

𝑡 𝑦 𝑥 =1

1
𝑡 𝑎 𝑡 𝑘 𝑥 ,𝑥 +𝑏 =1→𝑏 = 𝑡 − 𝑎 𝑡 𝑘 𝑥 ,𝑥
𝑁

where M is a set of indices of support vectors with 0 < an < C

29

Solving Soft Margin


Method

• The main part of the learning problem of SVC is solving a


constrained optimization problem in the form of a bound-
constrained optimization.
• There are several methods that have been developed to solve this
optimization problem, such as:
– SMO
– SVMLight
– LibSVM

30

15
2/27/2024

Prediction Function

• After the learning process, the resulting model can be used to


classify new data x using a decision function:

𝑓 𝒙 = 𝑑 𝑦 𝒙 =𝑑 𝑎 𝑡 𝑘 𝐱, 𝒙 +𝑏

where

31

Process Illustration

• Three data are given in the following table, where the first two data are
training data, and the third data is testing data. By using a linear kernel
function k(x,z) = xTz and C = 0.5, determine the accuracy of the modelSVC.
n 1 2 3
xn (1,0)T (5,0)T (4.5, 2)T
tn -1 1 1
Answer:

32

16
2/27/2024

Process Illustration
n 1 2 3
xn (1,0)T (5,0)T (4.5, 2)T
tn -1 1 1

33

Practical Usage Tips

• It is highly recommended to scale the data, for example [0,1] or [-


1,1], or standardize the data, for example having a mean of 0 and a
variance of 1. This scaling and standardization is carried out both
on training data and testing data
• The recommended kernel function is the RBF function, so there
are two parameters that must be optimized at the model selection
stage, namely the C and gamma parameters of the RBF function
(Hsu, 2013)
• In practice, a search grid for optimal parameters C and gamma
with intervals 10-3 to 103 is sufficient. If the best parameters lie at a
boundary of the grid, then the grid is expanded in that direction
for the next search

34

17
2/27/2024

Support Vector Machine


Kernel Function

Kernel Function
Definition

• The kernel function is a function k which for all input vectors x,


z will satisfy the condition
k(x, z) = (x)T(z)
where (.) is mapping function.
• In other words, the kernel function is an inner product function
in the mapping space

36

18
2/27/2024

Kernel Function
Example

Show that k(x,z) = (xTz)2 is kernel function for x, z  R2

Proof:
Let x=(x1,x2), z=(z1,z2) then
(xTz)2 = (x1z1 + x2z2)2
= x12z12 + 2x1z1x2z2 + x22z22
= (x12, √2 x1x2, x22)T (z12, √2 z1z2, z22)
= (x)T(z)

So, k(x,z) = (xTz)2 is a kernel function with mapping function


(x) = (x12, √2 x1x2, x22), namely a mapping function from R2 to R3

37

Kernel Function
Practicality

• The kernel function allows us to calculate inner products in


higher-dimensional spaces directly from the original space
without explicitly projecting the data into that higher-
dimensional space.

Example:
– Let x = (1,2)T, z = (3,5)T is vector in original space R2
– (x) = (x12, √2 x1x2, x22) is mapping function from original space R2 to
higher dimensional space R3
– Inner product of x and z on higher dimensional space R3 is (x)T(z) =
169
– The above inner product can be determined directly from the original
space R2 using the kernel function, i.e., k(x,z) = (xTz)2 = 169
38

19
2/27/2024

Kernel Function
Practicality

• Meanwhile, inner multiplication is a very important operation


because it is closely related to geometric problems in building a
model

Example:
The distance between two vectors in a higher dimensional
space can be found directly from the original space using the
kernel function, i.e:
||(x)-(z)||2 = (x)T(x) + (z)T(z) - 2(x)T(z)
= k(x,x) + k(z,z) - 2k(x,z)

39

Kernel Function
Practicality

• In other words, the kernel function


allows us to process data in a higher
dimensional space without explicitly
projecting the data
• Thus, data that cannot be separated
linearly (non-linearly separable) is
expected to be separated linearly
(linearly separable) in these higher
dimensions.
• The process of generating kernel
functions in a model is known as the
kernel trick

40

20
2/27/2024

Kernel Function
Popular Examples

• Linear :
k(xi, xj) = xiT xj
• Polynomial :
k(xi, xj) = ( xiT xj + r)d, where  > 0
• Radial basis function (RBF) :

k ( x i , x j )=exp {− γ ∥ x i− x j∥ }
2

• Sigmoid :
T
k ( x i , x j ) = tanh (ρ x i x j + r ) ,
1
dimana tanh (a ) = 2 σ (a)− 1, dan σ (a) =
1+exp (a)
41

Support Vector Machine


Support Vector Regression

21
2/27/2024

Support Vector Regression


Problem Formulation

SVC SVR

43

Support Vector Regression


Problem Formulation

• For target data tn located in ε-tube will


fulfill the conditions:
tn ≤ y(xn) + ε
tn ≥ y(xn) – ε
• For target data tn located outside ε-tube,
likes soft-margin approach, we need slack
variables ξn, where ξn > 0 for tn > y(xn) + ε,
and ξ > 0 for tn < y(xn) – ε, so:
tn ≤ y(xn) + ε + ξn
tn ≥ y(xn) – ε - ξ

44

22
2/27/2024

Support Vector Regression


Problem Formulation

• Furthermore, determining the parameters of the SVR model can be


formulated as an optimization problem as follows:
1
𝑚𝑖𝑛 𝐶 ∑ (ξ +ξ ) + ∥𝒘∥
𝒘, 2

𝑠. 𝑡. 𝑡 ≤ y 𝒙 + ε + ξ
𝑡 ≥y 𝒙 −ε−ξ
ξ ≥ 0, ξ ≥ 0

• In other words, determining the values of the parameters w and b


becomes a quadratic programming problem, namely minimizing a
quadratic function with the constrains of a linear inequality

45

Support Vector Regression


Problem Formulation

SVC SVR

1 1
𝑚𝑖𝑛 ∥ 𝒘 ∥ +𝐶 ∑ ξ 𝑚𝑖𝑛 𝐶 ∑ (ξ +ξ ) + ∥𝒘∥
𝒘, 2 𝒘, 2

𝑠. 𝑡. 𝑡 y 𝒙 ≥ 1 − ξ 𝑠. 𝑡. 𝑡 ≤ y 𝒙 + ε + ξ
ξ ≥0 𝑡 ≥y 𝒙 −ε−ξ
ξ ≥ 0, ξ ≥ 0
46

23
2/27/2024

Support Vector Regression


Problem Formulation

Ridge Regression SVR

1 1
𝑚𝑖𝑛 𝐶 ∑ (𝑡 − 𝑦(𝒙 )) + ∥ 𝒘 ∥ 𝑚𝑖𝑛 𝐶 ∑ (ξ +ξ ) + ∥𝒘∥
𝒘, 2 𝒘, 2

𝑠. 𝑡. 𝑡 ≤ y 𝒙 + ε + ξ
𝑡 ≥y 𝒙 −ε−ξ
ξ ≥ 0, ξ ≥ 0
47

Solving Soft Margin


Lagrange Method

• To solve quadratic programming, the commonly used method is to


find the dual form using Lagrange multipliers 𝑎 ≥0, an≥0, μn≥0,
μ ≥0 with one Lagrange multiplier for each constraint, to form the
Lagrange function as follows:

48

24
2/27/2024

Solving Soft Margin


Lagrange Method

• Karush-Kuhn-Tucker (KKT) conditions states that the solution


obtained is optimal, if the following conditions are fulfilled:

a n (ϵ+ξ n+ y n− t n ) = 0 [ KKT − 1]
ān (ϵ+ξ̄n − y n+t n ) = 0 [ KKT − 2]
(C − a n ) ξ n = 0 [ KKT − 3]
(C − ān ) ξ̄n = 0 [ KKT − 4]

49

Solving Soft Margin


Lagrange Method

• The derivative of L with respect to w, b, ξn, and ξ is equal to zero,


then:

50

25
2/27/2024

Solving Soft Margin


Lagrange Method

• Substitute the derivative result into the Lagrange equation, namely:

so becomes:

51

Solving Soft Margin


Lagrange Method

• So, the dual form of the SVR problem is:

(Slide 18, Slide 20)

(Slide 20)

• The resulting dual form has simpler constraints, thus known as


bound-constrained optimization.

52

26
2/27/2024

Solving Soft Margin


Dual Form: Weight

• For example, the solution to quadratic programming is a* and a*, then:

• Only data with an* ≠ 0 and an*≠ 0 (support vectors) which plays a role in the
SVR model, so it can be written as:
𝑦 𝑥 = ∑ 𝑎 ∗ − 𝑎 ∗ 𝑘 𝑥, 𝑥 +𝑏

where S is a set of indices of support vectors

53

Solving Soft Margin


Dual Form: Support Vectors

• Based on [KKT-1] that an*(ϵ + ξn + yn -


tn) = 0, then ϵ + ξn + yn - tn = 0. In other
words, the support vector is the
training data located at the upper
boundary (ξn = 0) or above the upper
boundary (ξn > 0)
• Based on [KKT-2] that an*(ϵ + ξn - yn +
tn) = 0, then ϵ + ξn - yn + tn = 0. In other
words, the support vector is the
training data located at the lower
boundary (ξ = 0) or below the lower
boundary (ξ > 0)

54

27
2/27/2024

Solving Soft Margin


Dual Form: Bias

• Furthermore, parameter b can be determined based on a model containing


only support vectors, namely training data with 0 < an < C, and the
following conditions:

(1) (C − an )ξ n = 0 ( KKT − 3)
(2) a n (ε+ξ n + y n − t n ) = 0 ( KKT − 1)
(3) y ( x) = w T ϕ( x)+b

From (1), then ξn = 0. From (2), then ε + yn – tn = 0. Next, b can be


calculated from (3) as follows:

𝑏 =𝑦 𝑥 −𝑤 ϕ 𝑥
=𝑡 −ε−𝑤 ϕ 𝑥

=𝑡 −ε− 𝑎∗ − 𝑎∗ 𝑘 𝑥 , 𝑥

55

Solving Soft Margin


Methods

• The main part of the learning problem of SVR is solving bound-


constrained optimization problems
• There are several methods that have been developed to solve this
optimization problem, such as:
– SMO
– SVMLight
– LibSVM

56

28
2/27/2024

Prediction Function

• For example, given data z, predictions from that data are determined
based on the regression function y(x), i.e., y(z)

57

Practical Usage Tips

• It is highly recommended to scale the data, for example [0,1] or


[-1,1], or standardize the data, for example having a mean of 0 and
a variance of 1. This scaling and standardization is carried out both
on training data and testing data
• The recommended kernel function is the RBF function, so there
are two parameters that must be optimized at the model selection
stage, namely the C and gamma parameters of the RBF function
(Hsu, 2013)
• In practice, a search grid for optimal parameters C and gamma
with intervals 10-3 to 103 is sufficient. If the best parameters lie at a
boundary of the grid, then the grid is expanded in that direction in
the next search

58

29
2/27/2024

References

C. H. Bishop (2006). Pattern Recognition and Machine Learning, Springer (Bab


7.1, Bab 7.1.1, Appendix E)
C.-W. Hsu, C.-C. Chang, C.-J. Lin (2013). A practical guide to support vector
classification, http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

30

You might also like