Chapter III - Supervised and Unsupervised Algorithms
Chapter III - Supervised and Unsupervised Algorithms
Chapter III - Supervised and Unsupervised Algorithms
• The working of Supervised learning can be easily understood by the below example and
diagram:
• Split the training dataset into training dataset, test dataset, and validation dataset.
• Determine the input features of the training dataset, which should have enough knowledge
so that the model can accurately predict the output.
• Determine the suitable algorithm for the model, such as support vector machine, decision
tree, etc.
• Execute the algorithm on the training dataset. Sometimes we need validation sets as the
control parameters, which are the subset of training datasets.
• Evaluate the accuracy of the model by providing the test set. If the model predicts the correct
output, which means our model is accurate.
Pr. Soufiane HAMIDA 7
Key Concepts
1. The Dataset
1) The Dataset
1) The Dataset
• The main objective in Supervised Learning is to find the model parameters that
minimize the Cost Function. To do this, we use a learning algorithm, the most
common example being the Gradient Descent algorithm,
• It is relatively easy to train a model that “works” well (low prediction error) on
the training data. Extreme example: learning “by rote”
30/03/2022 22
Over-fitting et Under-fitting
1. Over-fitting - Example
• Over-fitting occurs when the model gets so close to the function that it
pays too much attention to noise. The model learns the relationship
between entities and labels in so much detail and picks up the noise.
23
Over-fitting et Under-fitting
2. Under-fitting - Example
• Under-fitting is the opposite of over-fitting. This is when the model
does not approximate the function well enough and is therefore unable
to capture the underlying trend of the data.
24
Over-fitting et Under-fitting
25
Training and test set
30/03/2022 27
Cross validation
We use each of the blocks in turn as a validation set and the union of the others
as a training set.
30/03/2022 28
Cross validation
30/03/2022 29
Cross validation
30/03/2022 30
Model Selection: Validation Set
How to determine the best model among those learned:
• Idea: Select the one with the best performance on the test set.
30/03/2022 31
Model Selection: Cross-Validation
30/03/2022 32
Model Selection: Cross-Validation
30/03/2022 33
Hyper-parameters Tuning
30/03/2022 34
Hyper-parameters Tuning
30/03/2022 35
Hyper-parameters Tuning
30/03/2022 36
Hyper-parameters Tuning - Example
30/03/2022 37
Hyper-parameters Tuning - Example
30/03/2022 38
Evaluation of a Classification model: Confusion Matrix
• The matrix itself can be easily understood, but the related terminologies
may be confusing. Since it shows the errors in the model performance in
the form of a matrix, hence also known as an error matrix.
30/03/2022 39
Confusion Matrix in Machine Learning
• For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is
3*3 table, and so on.
• The matrix is divided into two dimensions, that are predicted values and actual
values along with the total number of predictions.
• Predicted values are those values, which are predicted by the model, and actual
values are the true values for the given observations.
30/03/2022 40
Confusion Matrix in Machine Learning
30/03/2022 41
Confusion Matrix in Machine Learning
• It looks like the below table:
30/03/2022 42
Confusion Matrix in Machine Learning
From the previous example, we can conclude that:
• The table is given for the two-class classifier, which has two predictions "Yes"
and "NO." Here, Yes defines that patient has the disease, and No defines that
patient does not has that disease.
• The classifier has made a total of 100 predictions. Out of 100 predictions, 89
are true predictions, and 11 are incorrect predictions.
• The model has given prediction "yes" for 32 times, and "No" for 68 times.
Whereas the actual "Yes" was 27, and actual "No" was 73 times.
30/03/2022 43
Multi-class classification : Confusion Matrix
Classe prédite
Classe réelle
Classe réelle
Classe prédite
30/03/2022 44
• Introduction
Calculations using Confusion Matrix
We can perform various calculations for the model, such as the model's
accuracy, using this matrix. These calculations are given below:
TP
Sensitivity=
TP + FN
TP + TN
Accuracy=
TP + TN + FP + FN
TP
Precision= TN
TP + FP Specificity=
TN + FP
30/03/2022 S.HAMIDA 45
ROC Curve
30/03/2022 46
Evaluation of a regression model
30/03/2022 47
Some ML Algorithms
ML Algorithms
• The naive Bayes classifier is based on Bayes' theorem. The latter is a classic of
probability theory. This theorem is based on conditional probabilities.
Conditionelles probabilites:
• What is the probability of an event produced?
• Know that someone other event has already happened.
Pr. Soufiane HAMIDA 94
Naive Bayes algorithm - Example
NO
Pr. Soufiane HAMIDA 98
Naive Bayes algorithm - USE CASES
The naive bayes classifier can be applied in various scenarios, one of the
classic use cases for this learning model is the classification of documents. It
involves determining whether a document corresponds to certain categories
or not. It’s used for:
• Spam filtering.
• Sentiment analysis.
• Recommendation systems.
Below are some main reasons which describe the importance of Unsupervised Learning:
• Unsupervised learning is helpful for finding useful insights from the data.
• In real-world, we do not always have input data with the corresponding output so
to solve such cases, we need unsupervised learning.
• It allows us to cluster the data into different groups and a convenient way to
discover the categories of groups in the unlabeled dataset on its own without the
need for any training.
• The algorithm takes the unlabeled dataset as input, divides the dataset into k-number
of clusters, and repeats the process until it does not find the best clusters. The value of
k should be predetermined in this algorithm.
1. Determines the best value for K center points or centroids by an iterative process.
2. Assigns each data point to its closest k-center. Those data points which are near to
the particular k-center, create a cluster.
• Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.
Pr. Soufiane HAMIDA 112
K-Means Clustering Algorithm
• Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:
Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different
clusters. It means here we will try to group these datasets into two different clusters.
• As we need to find the closest cluster, so we will repeat the process by choosing
a new centroid. To choose the new centroids, we will compute the center of
gravity of these centroids, and will find new centroids as follow:
• We can see in the following image; there are no dissimilar data points on either
side of the line, which means our model is formed.
• As our model is ready, so we can now remove the assumed centroids, and the two
final clusters will be as shown in the below image:
efficient clusters that it forms. But choosing the optimal number of clusters
is a big task. There are some different ways to find the optimal number of
clusters, but here we are discussing the most appropriate method to find the
• The Elbow method is one of the most popular ways to find the optimal number of
clusters. This method uses the concept of WCSS value. WCSS stands for Within
Cluster Sum of Squares, which defines the total variations within a cluster. The
formula to calculate the value of WCSS (for 3 clusters) is given below:
To find the optimal value of clusters, the elbow method follows the below steps:
• Plots a curve between calculated WCSS values and the number of clusters K.
• The sharp point of bend or a point of the plot looks like an arm, then that point
is considered as the best value of K.
Since the graph shows the sharp bend, which looks like an elbow, hence it is known as
the elbow method. The graph for the elbow method looks like the below image: