[go: up one dir, main page]

0% found this document useful (0 votes)
15 views122 pages

Chapter III - Supervised and Unsupervised Algorithms

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 122

Departement : Mathematics & Computer Science

Master of DPEIC – First year


Semester 2

OR & Artificial Intelligence

Chapter III - Supervised Machine Learning

Pr. Soufiane HAMIDA 1


Supervised ML Algorithms
How Supervised Learning Works?
• In supervised learning, models are trained using labelled dataset, where the model learns
about each type of data. Once the training process is completed, the model is tested on the
basis of test data (a subset of the training set), and then it predicts the output.

• The working of Supervised learning can be easily understood by the below example and
diagram:

Pr. Soufiane HAMIDA 6


Steps Involved in Supervised Learning
• First Determine the type of training dataset

• Collect/Gather the labelled training data.

• Split the training dataset into training dataset, test dataset, and validation dataset.

• Determine the input features of the training dataset, which should have enough knowledge
so that the model can accurately predict the output.

• Determine the suitable algorithm for the model, such as support vector machine, decision
tree, etc.

• Execute the algorithm on the training dataset. Sometimes we need validation sets as the
control parameters, which are the subset of training datasets.

• Evaluate the accuracy of the model by providing the test set. If the model predicts the correct
output, which means our model is accurate.
Pr. Soufiane HAMIDA 7
Key Concepts

To master supervised learning, you absolutely must understand


and know the following 4 concepts:

1. The Dataset

2. The learning algorithm

3. The Model and its parameters

4. The Cost Function

Pr. Soufiane HAMIDA 8


Steps Involved in Supervised Learning

1) The Dataset

We talk about supervised learning when we provide a machine with


many examples (࢞, ࢟) in order to make it learn the relationship that
connects ࢞ to ࢟.

Pr. Soufiane HAMIDA 9


Steps Involved in Supervised Learning

1) The Dataset

• The variable ࢟ is called Target. This is the


value we are trying to predict.

• The variable ࢞ is called Feature. A Feature


influences the value of ࢟, and we generally
have a lot of Features (࢞૚, ࢞૛, …) in our
Dataset which we group together in a
matrix ࢄ.

Example: a Dataset brings together examples of


apartments with their price ࢟ as well as some of
their characteristics (Features).

Pr. Soufiane HAMIDA 11


Steps Involved in Supervised Learning

2) The learning algorithm

• The main objective in Supervised Learning is to find the model parameters that
minimize the Cost Function. To do this, we use a learning algorithm, the most
common example being the Gradient Descent algorithm,

Pr. Soufiane HAMIDA 12


Steps Involved in Supervised Learning
3) The Model and its parameters
• The development of a model from Dataset. It can be a linear model or a non-linear
model like you.

• We define ࢇ, ࢈, ࢉ, etc. as the parameters of a model.


Pr. Soufiane HAMIDA 13
Steps Involved in Supervised Learning
4) The Cost Function

A model can produce errors when making


predictions compared to the actual values in our
dataset. These errors are a measure of how well
the model is performing — a lower error indicates
a better fit to the data.

The method by which we aggregate these errors


to measure the overall performance of the model is
known as the Cost Function or Loss Function.

Pr. Soufiane HAMIDA 14


Steps Involved in Supervised Learning
4) The Cost Function

• A 'good' model is generally characterized by


its ability to make accurate predictions on
new, previously unseen data.

• The smaller the value returned by the Cost


Function, the smaller the differences
between the predicted and actual values,
indicating a better performing model.

Pr. Soufiane HAMIDA 15


Types of Supervised ML Algorithms

• Supervised learning can be further divided into two types of problems:

Pr. Soufiane HAMIDA 16


Regression vs. Classification in ML

Pr. Soufiane HAMIDA 18


Recap

Regression Algorithm Classification Algorithm


In Regression, the output variable must be of In Classification, the output variable must be a
continuous nature or real value. discrete value.
The task of the regression algorithm is to map the
The task of the classification algorithm is to map the
input value (x) with the continuous output
input value(x) with the discrete output variable(y).
variable(y).
Regression Algorithms are used with continuous
Classification Algorithms are used with discrete data.
data.
In Regression, we try to find the best fit line, which
In Classification, we try to find the decision boundary,
can predict the output more accurately. which can divide the dataset into different classes.
Classification Algorithms can be used to solve
Regression algorithms can be used to solve the
classification problems such as Identification of spam
regression problems such as Weather Prediction,
emails, Speech Recognition, Identification of cancer
House price prediction, etc.
cells, etc.
The regression Algorithm can be further divided into The Classification algorithms can be divided into
Linear and Non-linear Regression. Binary Classifier and Multi-class Classifier.

Pr. Soufiane HAMIDA 19


Choosing the most appropriate algorithm

1. Problem Nature: Classification or Regression

2. Data Characteristics: Size of the Dataset, Feature Types, Feature


Dimensionality, Data Quality, …

3. Model Complexity and Interpretability: Complexity, Interpretability,

4. Experience and Domain Knowledge: Previous Successes and Expertise,

5. Model Updates and Scalability: Static vs. Dynamic Data, Scalability, ..

Pr. Soufiane HAMIDA 20


Performance Evaluation
Generalization and overfitting
Main challenge of Supervised learning:

• It is relatively easy to train a model that “works” well (low prediction error) on
the training data. Extreme example: learning “by rote”

• Generalization: ability of the model to make good predictions on data whose


label is unknown.

• Overfitting: when performance is better on learning data than on new data.

30/03/2022 22
Over-fitting et Under-fitting
1. Over-fitting - Example
• Over-fitting occurs when the model gets so close to the function that it
pays too much attention to noise. The model learns the relationship
between entities and labels in so much detail and picks up the noise.

23
Over-fitting et Under-fitting
2. Under-fitting - Example
• Under-fitting is the opposite of over-fitting. This is when the model
does not approximate the function well enough and is therefore unable
to capture the underlying trend of the data.

24
Over-fitting et Under-fitting

25
Training and test set

30/03/2022 27
Cross validation

• To use all the data for training and validation

• To obtain an average performance

• We separate the data set into K blocks (folds)

• In practice, K=5 or K=10 most often (balance between the number of


experiments and the size of each training set)

We use each of the blocks in turn as a validation set and the union of the others
as a training set.

30/03/2022 28
Cross validation

30/03/2022 29
Cross validation

30/03/2022 30
Model Selection: Validation Set
How to determine the best model among those learned:

- with different learning algorithms;

- with different hyperparameter(s) values for the same algorithm?

• Idea: Select the one with the best performance on the test set.

• Problem: we can no longer determine the generalization error because test


data has already been used.

- We separate the data into 3 sets: learning, validation and test.

30/03/2022 31
Model Selection: Cross-Validation

30/03/2022 32
Model Selection: Cross-Validation

30/03/2022 33
Hyper-parameters Tuning

GridSearchCV systematically works through multiple combinations of


parameter tunes, cross-validating as it goes to determine which tune gives the
best performance. It's thorough but can be slow for large datasets and many
parameters.

RandomSearchCV samples a fixed number of parameter settings from specified


distributions. This approach can be faster and more efficient, especially when
dealing with a large hyper-parameter space, as it doesn't try every combination
but selects at random to sample a wide range of values.

30/03/2022 34
Hyper-parameters Tuning

30/03/2022 35
Hyper-parameters Tuning

30/03/2022 36
Hyper-parameters Tuning - Example

30/03/2022 37
Hyper-parameters Tuning - Example

30/03/2022 38
Evaluation of a Classification model: Confusion Matrix

• The confusion matrix is a matrix used to determine the performance of


the classification models for a given set of test data. It can only be
determined if the true values for test data are known.

• The matrix itself can be easily understood, but the related terminologies
may be confusing. Since it shows the errors in the model performance in
the form of a matrix, hence also known as an error matrix.

30/03/2022 39
Confusion Matrix in Machine Learning

Some features of Confusion matrix are given below:

• For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is
3*3 table, and so on.

• The matrix is divided into two dimensions, that are predicted values and actual
values along with the total number of predictions.

• Predicted values are those values, which are predicted by the model, and actual
values are the true values for the given observations.

30/03/2022 40
Confusion Matrix in Machine Learning

• It looks like the below table:

30/03/2022 41
Confusion Matrix in Machine Learning
• It looks like the below table:

30/03/2022 42
Confusion Matrix in Machine Learning
From the previous example, we can conclude that:

• The table is given for the two-class classifier, which has two predictions "Yes"
and "NO." Here, Yes defines that patient has the disease, and No defines that
patient does not has that disease.

• The classifier has made a total of 100 predictions. Out of 100 predictions, 89
are true predictions, and 11 are incorrect predictions.

• The model has given prediction "yes" for 32 times, and "No" for 68 times.
Whereas the actual "Yes" was 27, and actual "No" was 73 times.

30/03/2022 43
Multi-class classification : Confusion Matrix

Classe prédite

Classe réelle

Classe réelle
Classe prédite

Binary classification problem Multiclass classification problem

30/03/2022 44
• Introduction
Calculations using Confusion Matrix
We can perform various calculations for the model, such as the model's
accuracy, using this matrix. These calculations are given below:

TP
Sensitivity=
TP + FN
TP + TN
Accuracy=
TP + TN + FP + FN

TP
Precision= TN
TP + FP Specificity=
TN + FP

30/03/2022 S.HAMIDA 45
ROC Curve

ROC Curve: The ROC is a graph displaying


a classifier's performance for all possible
thresholds. The graph is plotted between
the true positive rate (on the Y-axis) and the
false Positive rate (on the x-axis).

30/03/2022 46
Evaluation of a regression model

30/03/2022 47
Some ML Algorithms
ML Algorithms

Pr. Soufiane HAMIDA 49


Regression solutions

Types of Regression Algorithm:


1. Simple Linear Regression
2. Multiple Linear Regression
3. Polynomial Regression
4. K-Nearest Neighbors Regression
5. Decision Tree Regression
6. Random Forest Regression
7. ANN
8. …..

Pr. Soufiane HAMIDA 51


Classification solutions

Classification Algorithms can be further divided into the following types:


1. K-Nearest Neighbors (KNN)
2. Decision Tree
3. Random Forest
4. Support Vector Machines (SVM)
5. Artificial Neural Networks
6. Logistic Regression (LR)
7. Naïve Bayes
8. ….

Pr. Soufiane HAMIDA 52


K-Nearest Neighbors Algorithm (KNN)

Pr. Soufiane HAMIDA 53


K-Nearest Neighbors Algorithm (KNN)

K-NN (K-NEAREST NEIGHBORS) algorithm is one of the simplest


classification algorithms and it is used to identify data points that are
separated into multiple classes in order to predict the classification of a new
data point. 'sample.
K-NN is a non-parametric and lazy learning algorithm. It classifies new
cases based on a similarity measure (i.e. distance functions).

Pr. Soufiane HAMIDA 54


K-Nearest Neighbors Algorithm (KNN)

Pr. Soufiane HAMIDA 55


KNN Algorithm - Example
Input data:
A dataset D.
A distance definition function d.
An integer K
For a new observation X for which we want to predict its output variable y Do:
1. Calculate all the distances of this observation X with the other observations in
the dataset D
2. Retain the K observations from the dataset D closest to X using the distance
calculation function d
3. Take the values of y of the K observations retained:
1. If we perform a regression, calculate the mean (or median) of y retained
2. If we carry out a classification, calculate the mode of retention
4. Return the value calculated in step 3 as the value that was predicted by K-NN
for observation X.
End Algorithm
Pr. Soufiane HAMIDA 57
K-Nearest Neighbors Algorithm (KNN)

To predict category label ‫ ݕ‬of a new point ࢞ (classification):


• Find k nearest neighbors (according to some distance metric)
• Assign the majority label to the new point
To predict numeric value ‫ ݕ‬of a new point ࢞ (regression):
• Find k nearest neighbors
• “Average” the values associated with the neighbors

If we change k we may get a different prediction !!

Pr. Soufiane HAMIDA 58


kNN Prediction: What Label?

Pr. Soufiane HAMIDA 59


KNN Algorithm - Example

Pr. Soufiane HAMIDA 60


Linear and Logistic Regression
algorithm

Pr. Soufiane HAMIDA 61


Linear Regression algorithm

Pr. Soufiane HAMIDA 62


Linear Regression algorithm

Pr. Soufiane HAMIDA 63


Linear Regression algorithm

Pr. Soufiane HAMIDA 64


Linear Regression algorithm

Pr. Soufiane HAMIDA 65


Linear Regression algorithm

Pr. Soufiane HAMIDA 66


The Math Behind LR

Pr. Soufiane HAMIDA 67


The Math Behind LR

Pr. Soufiane HAMIDA 68


The Math Behind LR

Pr. Soufiane HAMIDA 69


The Math Behind LR

Pr. Soufiane HAMIDA 70


LR & LR - Difference

Pr. Soufiane HAMIDA 71


LR & LR - Difference

Pr. Soufiane HAMIDA 72


LR & LR - Difference

Pr. Soufiane HAMIDA 73


LR & LR - Difference

Pr. Soufiane HAMIDA 74


LR & LR - Difference

Pr. Soufiane HAMIDA 75


LR & LR - Difference

Pr. Soufiane HAMIDA 76


LR & LR - Difference

Pr. Soufiane HAMIDA 77


LR & LR - Difference

Pr. Soufiane HAMIDA 78


Applications of LR

Pr. Soufiane HAMIDA 79


Applications of LR

Pr. Soufiane HAMIDA 80


Applications of LR

Pr. Soufiane HAMIDA 81


Use case – Predicting Numbers

Pr. Soufiane HAMIDA 82


Use case – Predicting Numbers

Pr. Soufiane HAMIDA 83


Use case – Predicting Numbers

Pr. Soufiane HAMIDA 84


Use case – Predicting Numbers

Pr. Soufiane HAMIDA 85


Use case – Predicting Numbers

Pr. Soufiane HAMIDA 86


Use case – Predicting Numbers

Pr. Soufiane HAMIDA 87


Use case – Predicting Numbers

Pr. Soufiane HAMIDA 88


Use case – Predicting Numbers

Pr. Soufiane HAMIDA 89


Use case – Predicting Numbers

Pr. Soufiane HAMIDA 91


Naive Bayes algorithm

Pr. Soufiane HAMIDA 92


Naive Bayes algorithm

• Naive Bayes Classifier is a popular algorithm in Machine Learning. It is a


Supervised Learning algorithm used for classification. It is particularly
useful for text classification problems.

• The naive Bayes classifier is based on Bayes' theorem. The latter is a classic of
probability theory. This theorem is based on conditional probabilities.

Pr. Soufiane HAMIDA 93


Naive Bayes algorithm

Conditionelles probabilites:
• What is the probability of an event produced?
• Know that someone other event has already happened.
Pr. Soufiane HAMIDA 94
Naive Bayes algorithm - Example

Pr. Soufiane HAMIDA 95


Naive Bayes algorithm - Example

Pr. Soufiane HAMIDA 96


Naive Bayes algorithm - Example

Pr. Soufiane HAMIDA 97


Naive Bayes algorithm - Example

NO
Pr. Soufiane HAMIDA 98
Naive Bayes algorithm - USE CASES

The naive bayes classifier can be applied in various scenarios, one of the
classic use cases for this learning model is the classification of documents. It
involves determining whether a document corresponds to certain categories
or not. It’s used for:
• Spam filtering.
• Sentiment analysis.
• Recommendation systems.

Pr. Soufiane HAMIDA 99


PW

Pr. Soufiane HAMIDA 100


Unsupervised Machine Learning
What is Unsupervised Learning?

• As the name suggests, unsupervised learning is a machine learning technique in


which models are not supervised using training dataset. Instead, models itself
find the hidden patterns and insights from the given data. It can be compared to
learning which takes place in the human brain while learning new things.

• Unsupervised learning cannot be directly applied to a regression or classification


problem because unlike supervised learning, we have the input data but no
corresponding output data. The goal of unsupervised learning is to find the
underlying structure of dataset, group that data according to similarities, and
represent that dataset in a compressed format.

Pr. Soufiane HAMIDA 102


Example - Unsupervised Learning

• Suppose the unsupervised learning algorithm is given an


input dataset containing images of different types of cats and
dogs. The algorithm is never trained upon the given dataset,
which means it does not have any idea about the features of
the dataset. The task of the unsupervised learning algorithm
is to identify the image features on their own. Unsupervised
learning algorithm will perform this task by clustering the
image dataset into the groups according to similarities
between images.

Pr. Soufiane HAMIDA 103


Why use Unsupervised Learning?

Below are some main reasons which describe the importance of Unsupervised Learning:

• Unsupervised learning is helpful for finding useful insights from the data.

• Unsupervised learning is much similar as a human learns to think by their own


experiences, which makes it closer to the real AI.

• Unsupervised learning works on unlabeled and uncategorized data which make


unsupervised learning more important.

• In real-world, we do not always have input data with the corresponding output so
to solve such cases, we need unsupervised learning.

Pr. Soufiane HAMIDA 104


Working of Unsupervised Learning
Working of unsupervised learning can be understood by the below diagram:

Pr. Soufiane HAMIDA 105


Types of Unsupervised Learning Algorithm

Below is the list of some popular unsupervised learning algorithms:


• K-means clustering
• Hierarchal clustering
• Anomaly detection
• Independent Component Analysis
• Apriori algorithm

Pr. Soufiane HAMIDA 106


Advantages of Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to


supervised learning because, in unsupervised learning, we don't have labeled
input data.

• Unsupervised learning is preferable as it is easy to get unlabeled data in


comparison to labeled data.

Pr. Soufiane HAMIDA 107


Disadvantages of Unsupervised Learning

• Unsupervised learning is intrinsically more difficult than supervised learning as


it does not have corresponding output.

• The result of the unsupervised learning algorithm might be less accurate as


input data is not labeled, and algorithms do not know the exact output in
advance.

Pr. Soufiane HAMIDA 108


K-Means Clustering Algorithm
K-Means Clustering Algorithm

• K-Means Clustering is an unsupervised learning algorithm that is used to solve


the clustering problems in machine learning or data science.

• K-Means Clustering is an Unsupervised Learning algorithm, which groups the


unlabeled dataset into different clusters. Here K defines the number of pre-
defined clusters that need to be created in the process, as if K=2, there will be
two clusters, and for K=3, there will be three clusters, and so on.

Pr. Soufiane HAMIDA 110


K-Means Clustering Algorithm

• It allows us to cluster the data into different groups and a convenient way to
discover the categories of groups in the unlabeled dataset on its own without the
need for any training.

• It is a centroid-based algorithm, where each cluster is associated with a centroid.


The main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters.

Pr. Soufiane HAMIDA 111


K-Means Clustering Algorithm

• The algorithm takes the unlabeled dataset as input, divides the dataset into k-number
of clusters, and repeats the process until it does not find the best clusters. The value of
k should be predetermined in this algorithm.

• The k-means clustering algorithm mainly performs two tasks:

1. Determines the best value for K center points or centroids by an iterative process.

2. Assigns each data point to its closest k-center. Those data points which are near to
the particular k-center, create a cluster.

• Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.
Pr. Soufiane HAMIDA 112
K-Means Clustering Algorithm

Pr. Soufiane HAMIDA 113


How does the K-Means Algorithm Work?

• The working of the K-Means algorithm is explained in the below steps:


1. Step-1: Select the number K to decide the number of clusters.
2. Step-2: Select random K points or centroids. (It can be other from the input dataset).
3. Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
4. Step-4: Calculate the variance and place a new centroid of each cluster.
5. Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.
6. Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
7. Step-7: The model is ready.

Pr. Soufiane HAMIDA 114


How does the K-Means Algorithm Work?

• Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:

Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different
clusters. It means here we will try to group these datasets into two different clusters.

Pr. Soufiane HAMIDA 115


How does the K-Means Algorithm Work?

• We need to choose some random k points or


centroid to form the cluster. These points can
be either the points from the dataset or any
other point. So, here we are selecting the
below two points as k points, which are not
the part of our dataset. Consider the
following image:

Pr. Soufiane HAMIDA 116


How does the K-Means Algorithm Work?

• Now we will assign each data point of the


scatter plot to its closest K-point or centroid.
We will compute it by applying some
mathematics that we have studied to
calculate the distance between two points. So,
we will draw a median between both the
centroids.

Pr. Soufiane HAMIDA 117


How does the K-Means Algorithm Work?

• From the previous image, it is clear that


points left side of the line is near to the K1 or
blue centroid, and points to the right of the
line are close to the yellow centroid. Let's
color them as blue and yellow for clear
visualization.

Pr. Soufiane HAMIDA 118


How does the K-Means Algorithm Work?

• As we need to find the closest cluster, so we will repeat the process by choosing
a new centroid. To choose the new centroids, we will compute the center of
gravity of these centroids, and will find new centroids as follow:

Pr. Soufiane HAMIDA 119


How does the K-Means Algorithm Work?

• Next, we will reassign each datapoint to the


new centroid. For this, we will repeat the
same process of finding a median line. The
median will be like following image:

Pr. Soufiane HAMIDA 120


How does the K-Means Algorithm Work?

• From the previous image, we can see, one


yellow point is on the left side of the line, and
two blue points are right to the line. So, these
three points will be assigned to new
centroids.

Pr. Soufiane HAMIDA 121


How does the K-Means Algorithm Work?

• As reassignment has taken place, so we will


again go to the step-4, which is finding new
centroids or K-points.

• We will repeat the process by finding the center


of gravity of centroids, so the new centroids will
be as shown in the following image:

Pr. Soufiane HAMIDA 122


How does the K-Means Algorithm Work?

• As we got the new centroids so again will


draw the median line and reassign the data
points. So, the image will be:

Pr. Soufiane HAMIDA 123


How does the K-Means Algorithm Work?

• We can see in the following image; there are no dissimilar data points on either
side of the line, which means our model is formed.

Pr. Soufiane HAMIDA 124


How does the K-Means Algorithm Work?

• As our model is ready, so we can now remove the assumed centroids, and the two
final clusters will be as shown in the below image:

Pr. Soufiane HAMIDA 125


How to choose the value of "K number of clusters"

• The performance of the K-means clustering algorithm depends upon highly

efficient clusters that it forms. But choosing the optimal number of clusters

is a big task. There are some different ways to find the optimal number of

clusters, but here we are discussing the most appropriate method to find the

number of clusters or value of K. The method is given below:

Pr. Soufiane HAMIDA 126


Elbow Method

• The Elbow method is one of the most popular ways to find the optimal number of
clusters. This method uses the concept of WCSS value. WCSS stands for Within
Cluster Sum of Squares, which defines the total variations within a cluster. The
formula to calculate the value of WCSS (for 3 clusters) is given below:

Pr. Soufiane HAMIDA 127


Elbow Method

To find the optimal value of clusters, the elbow method follows the below steps:

• It executes the K-means clustering on a given dataset for different K values


(ranges from 1-10).

• For each value of K, calculates the WCSS value.

• Plots a curve between calculated WCSS values and the number of clusters K.

• The sharp point of bend or a point of the plot looks like an arm, then that point
is considered as the best value of K.

Pr. Soufiane HAMIDA 129


Elbow Method

Since the graph shows the sharp bend, which looks like an elbow, hence it is known as
the elbow method. The graph for the elbow method looks like the below image:

Pr. Soufiane HAMIDA 130


PW

Pr. Soufiane HAMIDA 132


Any questions ?
The End
Any questions ?

Pr. Soufiane HAMIDA 134

You might also like