[go: up one dir, main page]

0% found this document useful (0 votes)
22 views22 pages

Support Vector Machine and Support Vector Regression

The document provides an overview of Support Vector Machines (SVM) and Support Vector Regression (SVR), detailing their classification and regression techniques, respectively. It explains the concept of maximum-margin hyperplanes, support vectors, and the use of kernel functions to handle non-linear data. Additionally, it includes mathematical formulations, optimization problems, and Python implementation examples for both SVM and SVR.

Uploaded by

anirudh.s1402
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views22 pages

Support Vector Machine and Support Vector Regression

The document provides an overview of Support Vector Machines (SVM) and Support Vector Regression (SVR), detailing their classification and regression techniques, respectively. It explains the concept of maximum-margin hyperplanes, support vectors, and the use of kernel functions to handle non-linear data. Additionally, it includes mathematical formulations, optimization problems, and Python implementation examples for both SVM and SVR.

Uploaded by

anirudh.s1402
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Support vector machine and

Support vector regression


Support Vector machine
It is a classification technique

H1 does not separate the classes. H2 does, but


only with a small margin. H3 separates them with
the maximal margin.
This maximum-margin separator is determined
by a subset of the datapoints.

Datapoints in this subset are called “support vectors”.

It will be useful computationally if only a small fraction of the datapoints are support
vectors, because we use the support vectors to decide which side of the separator a test
case is on.
Maximum Margin Hyperplane in SVM
• In the case of support vector machines, a data point is viewed as a p-dimensional
vector and we want to know whether we can separate such points with a (p−1)
dimensional hyperplane. This is called linear classifier.
• So we choose the hyperplane so that the distance from it to the nearest data point on
each side is maximized. If such a hyperplane exists, it is known as the maximum-
margin hyperplane and the linear classifier it defines is known as a maximum-margin
classifier.
Linear SVM formulation

𝒘𝒊𝒅𝒕𝒉 = 𝟐/||𝒘|| 𝒏𝒆𝒆𝒅𝒔 𝒕𝒐 𝒃𝒆 𝒎𝒂𝒙𝒊𝒎𝒖𝒎


Or ||𝒘||𝟐 𝒊𝒔 𝒂𝒔 𝒔𝒎𝒂𝒍𝒍 𝒂𝒔 𝒑𝒐𝒔𝒔𝒊𝒃𝒍𝒆

Constraints
𝐰. 𝑥 + 𝑏 > +1 𝑓𝑜𝑟 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑠𝑒𝑠
𝐰. 𝑥 + 𝑏 < −1 𝑓𝑜𝑟 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑎𝑠𝑒𝑠

An optimization Problem!
Linear SVM formulation
• In the context of a linear SVM, the decision boundary can be represented as a hyperplane
defined by the equation ( 𝑤 ⋅ 𝑥 + 𝑏 = 0 ), where ( x ) is the input feature vector.
• ( w ) (Weights or Coefficients): This is a vector of weights associated with the features in
the input data. The weights influence the orientation of the decision boundary. They
determine how much each feature contributes to the decision-making process.
• (Bias or Intercept): This scalar value shifts the decision boundary away from the origin.
Providing additional flexibility for the model to achieve better separation between the
classes.

Hard Margin: If the training data is linearly separable,


we can select two parallel hyperplanes that separate
the two classes of data, so that the distance between
them is as large as possible.
Linear SVM formulation
To extend SVM to cases in which the data are not linearly separable,
the hinge loss function 𝐿 𝑦𝑖 𝑓 𝑥𝑖 is helpful.
𝐿 𝑦𝑖 𝑓 𝑥𝑖 = max 0,1 − 𝑦𝑖 𝑓 𝑥𝑖

The hinge loss penalizes instances that are misclassified or that violate the margin
constraint.
For linear SVM, primal form is fine!
𝐰. 𝑥 + 𝑏 ≥ +1 − 𝜉 𝑖 𝑓𝑜𝑟 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑠𝑒𝑠
𝐰. 𝑥 + 𝑏 ≤ −1 + 𝜉 𝑖 𝑓𝑜𝑟 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑎𝑠𝑒𝑠
𝑤𝑖𝑡ℎ 𝜉 𝑖 ≥ 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖
𝑛
||𝐰||2
𝑎𝑛𝑑 + 𝐶 ෍ 𝜉 𝑖 𝑎𝑠 𝑠𝑚𝑎𝑙𝑙 𝑎𝑠 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒
2
𝑖=1

𝝃𝒊 − 𝑠𝑙𝑎𝑐𝑘 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
C- is the hyper parameter
Explanation- slack variable
• In SVMs, slack variables 𝜉𝑖 are introduced
to allow for some degree of
misclassification when training the model,
making it possible to create a soft-margin
SVM. The concept of slack variables allows
the SVM to handle cases where classes are
not perfectly separable.

Definition: For an instance (i): 𝝃𝒊 ≥ 𝟎 represents the amount by which the margin is
violated. Essentially, slack variables measure how much an observation fails to meet the
margin constraint.
Physical representation of C

0 < c < 1 the width will expand


c > 1 the width will retract
How algorithm works
For linear SVM, primal form is fine!
𝐰. 𝑥 + 𝑏 ≥ +1 − 𝜉 𝑖 𝑓𝑜𝑟 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑠𝑒𝑠
𝐰. 𝑥 + 𝑏 ≤ −1 + 𝜉 𝑖 𝑓𝑜𝑟 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑎𝑠𝑒𝑠
𝑤𝑖𝑡ℎ 𝜉 𝑖 ≥ 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖
𝑛
||𝐰||2
𝑎𝑛𝑑 + 𝐶 ෍ 𝜉𝑖 𝑎𝑠 𝑠𝑚𝑎𝑙𝑙 𝑎𝑠 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒
2
𝑖=1

𝑖
One method-Gradient descent
𝜉 − 𝑠𝑙𝑎𝑐𝑘 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝒘 += 𝒚𝒊 ∗ 𝒍𝒓 ∗ 𝒙
C- is the hyper parameter
𝒃 += 𝒚𝒊 ∗ 𝒍𝒓
Iteratively evaluate optimum w and b Example python code is attached
The kernel trick Curved in low dimensional
feature space
Planar in high dimensional
feature space

Low-D
xb
• For many mappings from a low-D space to a xa
high-D space, there is a simple operation on
two vectors in the low-D space that can be 
used to compute the scalar product of their
two images in the high-D space. High-D
K ( x a , xb ) =  ( x a ) . ( xb ) Kernel function is
simply a function of
(x )
a
doing the scalar feature data
Letting the
kernel do product in the  ( xb )
obvious way
Different Kernel functions
• Linear Kernel: No mapping is needed as the data is already assumed
to be linearly separable.
• Polynomial Kernel: Maps inputs into a polynomial feature space,
enhancing the classifier’s ability to capture interactions between
features.
• Radial Basis Function (RBF) Kernel: Also known as the Gaussian
kernel, it is useful for capturing complex regions by considering the
distance between points in the input space.
• Sigmoid Kernel: Mimics the behavior of neural networks by using a
sigmoid function as the kernel.
Different Kernel functions

Polynomial: K (x, y ) = (x.y + 1) p

Gaussian Parameters
−||x − y||2 / 2 2 that the user
radial basis K (x, y ) = e must choose
function

Neural net: K (x, y ) = tanh ( k x.y −  )


How algorithm works
The kernel trick is fundamentally a feature of the dual formulation of Support Vector
Machines (SVM) and cannot be applied directly to the primal form. Hence SVM based on
dual form is often used. the dual formulation for SVM is derived from the Lagrangian of
the primal problem. 𝑛
𝑛 𝑛
1
maximize 𝑊(𝛼) = ෍ 𝜶𝒊 − ා ෍ 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝐾(𝑥𝑖 , 𝑥𝑗 )
2
𝑖=1 𝑗=1
𝑖=1
Subjected to σ𝑛𝑖=1 𝜶𝒊 𝑦𝑖 = 0 𝑎𝑛𝑑 (0 ≤ 𝜶𝒊 ≤ 𝐶) Remember: 𝐾(𝑥𝑖 , 𝑥𝑘 ) is the
transformation function (or kernel
𝑛 function).
𝑏 = 𝑦𝑘 − ෍ 𝛼𝑖 𝑦𝑖 𝐾(𝑥𝑖 , 𝑥𝑘 ) Our aim to evaluate 𝜶𝒊 using
𝑖=1 optimization. All other parameters
Prediction such as b is evaluated using
𝜶𝒊 using training data. Finally,
𝑛 function 𝑓(𝑥) is evaluated.
𝑓(𝑥) = ෍ 𝛼𝑖 𝑦𝑖 𝐾(𝑥𝑖 , 𝑥) + 𝑏
𝑖=1
SVM Implementation using Python

# Import necessary libraries


from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import metrics

# Loading Data
# Load dataset
cancer = datasets.load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target,
test_size=0.3, random_state=109)
SVM Implementation using Python
# Generating Model
# Create an SVM Classifier using a linear kernel
clf = svm.SVC(kernel='linear')
# Train the model using the training sets
clf.fit(X_train, y_train)
# Predict the response for the test dataset
y_pred = clf.predict(X_test)
# Evaluating the Model
# Model Accuracy: how often is the classifier correct?
accuracy = metrics.accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Model Precision: what percentage of positive tuples are labeled as such?
precision = metrics.precision_score(y_test, y_pred)
print("Precision:", precision)
# Model Recall: what percentage of positive tuples are labelled as such?
recall = metrics.recall_score(y_test, y_pred)
print("Recall:", recall)
Support vector regression

Support vector regression (SVR) is a


type of support vector machine (SVM)
that is used for regression tasks. It
tries to find a function that best
predicts the continuous output value
for a given input value.
𝒚𝒊 − 𝒘𝑻 𝝓 𝒙𝒊 + 𝒃 ≤ 𝜺 + 𝝃𝒊 for 𝒊 = 𝟏, … , 𝒎

Minimization Problem 𝒚𝒊 − 𝒘𝑻 𝝓 𝒙𝒊 + 𝒃 ≥ 𝜺 + 𝝃𝒊 ∗ for 𝒊 = 𝟏, … , 𝒎

Support vectors

Hyperplane

Decision boundary
In case of NON-LINEAR SVR:
• The kernel functions transform the data into a higher dimensional feature
space to make it possible to perform the linear separation.

Primal formulation

You can evaluate 𝜶 values using iterative mechanism,


applicable for small data sets (pls refer attached code)
Dual formulation (derived from derived from the Lagrangian of the
primal problem)! For large datasets (not important)

𝒏
𝒏 𝒏
∗ 𝟏
[𝒎𝒂𝒙 ∗
෍ (𝜶 𝒊 − 𝜶 𝒊 ) 𝒚 𝒊 − ා ෍ 𝑲(𝒙𝒊 , 𝒙𝒋 ) (𝜶𝒊 − 𝜶∗𝒊 )(𝜶𝒋 − 𝜶∗𝒋 )]
𝜶𝒊 ,𝜶𝒊 𝟐
𝒊=𝟏 𝒋=𝟏
𝒊=𝟏
𝒏

[෍(𝜶𝒊 − 𝜶∗𝒊 ) = 𝟎, 𝜶𝒊 ≥ 𝟎, 𝜶∗𝒊 ≥ 𝟎]


𝒊=𝟏
SVR from scratch (not important!)
import numpy as np # Sample dataset
data = np.array([[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]])
# [x1, x2, y] X = data[:, 0:2] # Input features (x1, x2)
Y = data[:, 2] # Target variable (y)
# Step 1: Choose a kernel function (linear kernel)
def kernel_function(x, y):
return np.dot(x, y)
# Linear kernel
# Step 2: Compute the kernel matrix K
n = X.shape[0]
K = np.zeros((n, n))
for i in range(n): for j in range(n):
K[i, j] = kernel_function(X[i, :], X[j, :])
print('Kernel Matrix K:') print(K)
SVR from scratch (not important!)
# Step 3: Setting up parameters for SVR
C = 1 # Regularization parameter
epsilon = 0.1 # Epsilon in the epsilon-insensitive loss function
# Placeholder values for Lagrange multipliers α and α*
alpha = np.array([1, 1, 1, 1]) # Placeholder values for Lagrange multipliers α
alpha_star = np.array([0, 0, 0, 0])
# Placeholder values for Lagrange multipliers α* # Calculate the bias term b (here we use
a simple average for demonstration)
b_values = []
for i in support_vector_indices:
b_value = Y[i] - np.sum((alpha - alpha_star) * K[i, :])
b_values.append(b_value)

b = np.mean(b_values) # Average of the computed b values for support vectors


SVR from scratch (not important!)
# Step 4: Compute the regression function f(x)
def regression_function(x):
return np.sum((alpha - alpha_star) * kernel_function(X, x)) + b
# Step 5: Make predictions with the regression function
new_data_point = np.array([2, 3])
# New input data point (x1=2, x2=3)
predicted_y = regression_function(new_data_point)
print(f'Predicted y for input ({new_data_point[0]:.1f},
{new_data_point[1]:.1f}) is: {predicted_y:.2f}')

You might also like