Topic 9:
Machine Learning:
Supervised Learning
Term 2-ARTI 106
Computer Track
2024-2025
Learning outcomes
The main learning objectives of this topic are:
❑Define ML and supervised learning.
❑ Explain the tasks of ML such classification and
regression.
❑ Explore some evaluation methods in ML such as cross
validation and accuracy.
❑ Explain the steps of KNN algorithm.
Outlines
❑ Define machine learning
❑ Define Supervised learning
❑ SL process
❑ Classification and regression in SL
❑ Evaluation methods
❑ K-Nearest Neighbors algorithm
What is ML?
❑ Machine learning (ML) is a branch of artificial intelligence (AI) and
computer science that focuses on the using data and algorithms to
enable AI to imitate the way that humans learn, gradually
improving its accuracy.
❑ ML provides machines the ability to automatically learn from data
and past experiences to identify patterns and make predictions
with minimal human intervention.
❑ ML applications are fed with new data, and they can
independently learn, grow, develop, and adapt.
Types of ML
❑ Supervised Machine Learning
❑ Unsupervised Machine Learning
What is supervised Learning
❑ Supervised learning is the types of ML in which
machines are trained using “labelled” training data, and
on basis of that data, machines predict the output.
❑ In supervised learning, the training data provided to the
machines works as the supervisor that teaches the
machines to predict the output correctly. In applies the
same concept as a student learns in the supervision of
the teacher.
Example 1
Example 2
Steps involved in
supervised learning
❑Determine the type of dataset.
❑Collet the labelled data.
❑Split the dataset into training dataset and test dataset.
❑Identify the suitable algorithm for the model.
❑Execute the algorithm on the training dataset.
❑Evaluate the accuracy of the model by providing the test
set.
Types of SL algorithms
Classification tasks
❑ Classification algorithms refer to algorithms that address
classification problems where the output variable is
categorical; for example, yes or no, true or false, male or
female, etc. Real-world applications of this category are
evident in spam detection and email filtering.
❑ Some known classification algorithms include the Random
Forest Algorithm, Decision Tree Algorithm, Logistic
Regression Algorithm, and Support Vector Machine
Algorithm.
Regression
❑ Regression algorithms handle regression problems where
input and output variables have a linear relationship.
These are known to predict continuous output variables.
Examples include weather prediction, market trend
analysis, etc
❑ Popular regression algorithms include the Simple Linear
Regression Algorithm, Multivariate Regression Algorithm,
Decision Tree Algorithm, and Lasso Regression
Classification Vs Regressions
❑The main goal of classification is to predict the target class
(Yes/ No).
❑The main goal of regression algorithms is the predict the
discrete or a continues value.
❑If forecasting target class ( Classification )
❑If forecasting a value ( Regression )
Classification Vs Regressions
Evaluations metrics Types
❑Different types of evaluation metrics for different
types of ML algorithm (classification, regression,
ranking..).
❑Some metrics can be useful for more than one type of
algorithm such as Precision-Recall.
❑Some popular classification metrics are: Accuracy,
Confusion matrix and AUC.
Evaluations metrics…
❑Methods which determine an algorithm’s performance
and behavior.
❑Helpful to decide the best model to meet the target
performance.
❑Helpful to parameterize the model in such a way that can
offer best performing algorithm.
Accuracy vs Confusion matrix
❑ Accuracy is a ratio between the number of correct predictions and total
number of predictions:
accuracy=#correct predictions/#total data points
❑ Confusion matrix shows a more detailed breakdown of correct and
incorrect classifications for each class.
Accuracy=(30+930)/(30+10+30+930)
=960/1000
=0.96
Accuracy = 96%
Holdout set
❑ Holdout set: The available data set D is divided into two disjoint subsets,
❖ the training set Dtrain (for learning a model)
❖ the test set Dtest (for testing the model)
❑ Important: training set should not be used in testing and the test set should
not be used in learning.
❑ The test set is also called the holdout set (the examples in the original data
set D are all labeled with classes.)
❑ This method is mainly used when the data set D is large.
Cross-validation: How it works
1. Split the dataset into N equal-size disjoint subsets (called folds).
2. Iterate N times:
❑ In each iteration:
❑ Use N-1 folds for training.
❑ Use the remaining 1 fold for testing.
3. Repeat this process until every fold has been used as the test set once.
4. Average the evaluation metrics (accuracy, precision, etc.) across all N iterations.
❑ 10-fold and 5-fold cross-validations are commonly used.
❑ This method is used when the available data is not large.
Application using KNN
❑ kNN is a simple algorithm that stores all available cases and
classifies new cases based on a similarity measure (eg distance
function).
❑ k is usually chosen empirically via a validation set or cross-
validation by trying a range of k values.
❑ Distance function is crucial, but depends on applications.
❑ Simple Explanation of the K-Nearest Neighbors (KNN)
Algorithm:
https://www.youtube.com/watch?v=zeFt_JCA3b4
Thank you for your attention