A supervised learning technique:
classification
What is classification?
• Classification is the process of categorizing a given set
of data into classes. The pre-defined classes act as our
labels, or ground truth.
• The model uses the features of an object to predict its
labels. E.g., filtering spam from non-spam emails or
classifying types of fruits based on their color, weight
and size.
1
What types of problems does classification
solve?
There are two types of classification problems
Binary Multi-class
The output is restricted The output has more
to two classes than two classes
| 2
To solve classification problems:
logistic regression
What is logistic regression?
Logistic regression is a linear regression but for
classification problems. Unlike linear regression, logistic
regression doesn’t need a linear relationship between
input and output variables.
| 3
Logistic regression uses a logistic function:
sigmoid function
The sigmoid function
takes any real input, and
outputs a value between
zero and one.
| 4
How can we measure the performance of a
logistic regression classifier?
• Once we have the predicted
results from our classification
model (classifier), the results are
compared with the actual label
(ground truth)
• Then the performance of the
model is being evaluated using
the confusion matrix
| 5
Applying the confusion matrix to measure
the model performance
Negative Positive
• True positives (TP) - results which were predicted
as positive & ground truth were also positive.
Negative TN FP
• False positives (FP) - instances predicted as
positives but actually were negative. Actual
Class
• True negatives (TN) - instances predicted as
negatives & their ground truth was also negative. Positive FN TP
• False negatives (FN) - instances predicted as
negative but their ground truth was positive.
Predicted Class
| 6
Possible Collaboration areas
Accuracy Precision Recall (Sensitivity) F1 score (F measure)
Indicates how Indicates how Indicates how many Indicates the equi-
accurately a result accurately positive positive samples the librium between the
can be correctly instances were classifier has falsely precision and the
predicted from the predicted and how predicted recall
total amount of many of them are
samples positive
The aim is to maximize true positives & true negatives; minimize false
positives & negatives
7
The evaluation metrics
| 8
Support vector machine (SVM)
What is support vector machine (SVM)?
• Support vector machine (SVM), is a supervised ML
technique that can be used to solve classification and
regression problems. It is, however, mostly used for
classification.
• In this algorithm, each feature & data points are plotted
in the space. Then, the SVM model finds boundaries to
separates different data samples into specific classes.
| 9
A practical example: finding a 2D plane that
differentiates two classes
Let’s say we have a dataset of
different animals of two classes:
birds & fish
• There are only three features:
body weight, body length, and
daily food consumption
• We draw a 3D grid and plot all
these points
A SVM model will try to find a
2D plane that differentiates
the 2 classes
| 10
If there are more than three features,
we would have a hyper-space
A hyper-space is a space with higher than 3 dimensions like
4D, 5D etc., and a separating line in a dimension higher than
3, is called a hyper-plane.
• If the hyper-planes are linear, the SVM is called
Linear Kernel SVM
• For nonlinear hyper-planes, a Polynomial Kernel
or other advanced SVMs are used
11