Supervised Learning Notes

The document outlines popular supervised learning algorithms, including regressors and classifiers, as well as ensemble methods like bagging, boosting, and stacking. It details the steps in supervised learning, from problem identification to model evaluation and improvement, while also discussing the strengths and limitations of various algorithms. Additionally, it provides guidance on selecting the appropriate Naïve Bayes algorithm based on data characteristics.

Uploaded by

kexeca3712

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views7 pages

Supervised Learning Notes

Uploaded by

kexeca3712

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Popular Supervised Learning Algorithms:

Regressor Classifier
Linear Regression Logistic Regression
from sklearn.linear_model import LinearRegression from sklearn.linear_model import LogisticRegression
LinearRegression() LogisticRegression()
k-nearest neighbors (kNN) k-nearest neighbors (kNN)
from sklearn.neighbors import KNeighborsRegressor from sklearn.neighbors import KNeighborsClassifier
KNeighborsRegressor() KNeighborsClassifier()
Naïve Bayes
from sklearn.naive_bayes import GaussianNB
GaussianNB()
Naïve Bayes
from sklearn.naive_bayes import CategoricalNB
CategoricalNB()
Decision Tree Decision Tree
from sklearn.tree import DecisionTreeRegressor from sklearn.tree import DecisionTreeClassifier
DecisionTreeRegressor() DecisionTreeClassifier()
Support Vector Machines (SVMs) Support Vector Machines (SVMs)
from sklearn.svm import SVR from sklearn.svm import SVC
SVR() SVC()
Popular Ensemble Methods:

Bagging with Random Forest Bagging with Random Forest

from sklearn.ensemble import RandomForestRegressor from sklearn.ensemble import RandomForestClassifier
RandomForestRegressor() RandomForestClassifier()
Boosting with Adaboost Boosting with Adaboost
from sklearn.ensemble import AdaBoostRegressor from sklearn.ensemble import AdaBoostClassifier
AdaBoostRegressor() AdaBoostClassifier()
Stacking Stacking
from sklearn.ensemble import StackingRegressor from sklearn.ensemble import StackingClassifier
StackingRegressor() StackingClassifier()
Bagging
Often uses homogeneous weak learners
Learns base models parallelly, they are independent of each other
Combines them via the averaging process
Focuses on getting an ensemble model with less variance than the base models
Boosting
Often uses homogeneous weak learners
Learns base models sequentially and adaptive
Combines them via deterministic strategy
Focuses on getting an ensemble model with less bias than the base models

Stacking
Often uses heterogeneous weak learners
Learns base models in parallel
Combines them via learning a meta-model
Focuses on getting an ensemble model with less bias than the base models
Steps in supervised learning:

◆ Identify the problem

◆ Acquire the necessary Data
◆ Import the necessary packages
◆ Import the dataset
◆ Data Pre-Processing:
⧫ Sample the data (if data is huge)
⧫ Clean the data (if required)
⧫ Handle the missing values (if required)
◆ Feature engineering:
⧫ Feature Transformations: transforms features from one representation to another.
 Convert categorical variables into numerical (if required)
 Feature Scaling. For example: normalization, standardization
⧫ Feature Extraction: Extracting important features from a data set to identify useful information. For example: PCA
(Principal Component Analysis), LDA (Linear Discriminant Analysis) etc
⧫ Feature Selection: Features which are irrelevant or redundant should be removed, and features which are most useful
for the model should be prioritized. For example: through correlation identify the features which have more impact on
the target column
⧫ Feature Construction: Inferring new attributes that capture the important information more efficiently than original
attributes.
◆ Split the data : Two-thirds (or 70% to 80%) of the dataset will be used to train the model and one-third (or 20% to 30% ) of the
data will be used to test the model. After splitting, we will have the following 4 segments:
⧫ Dependent variables for training
⧫ Independent variables for training
⧫ Dependent variables for testing
⧫ Independent variables for testing
◆ Model Training & Testing
⧫ Build a model using an algorithm (built-in method from a library). For example, we can use LogisticRegression() from
sklearn.linear_model.
⧫ Fit the model
⧫ Predict the targets in the test data
◆ Evaluate the model performance
⧫ Classification Problems:
 Create a confusion Matrix
 Calculate the Accuracy
 Create a Classification Report (precision, recall, f1-score, support)
 Receiver Operating Characteristic Curve (ROC)
 Area Under The ROC Curve (AUC)
⧫ Regression Problems:
 R-Squared (R²)
 Mean Squared Error (MSE)
 Root Mean Squared Error (RMSE)
 Mean Absolute Error (MAE)
 Mean Absolute Percentage Error (MAPE)
◆ Improve the performance of the model
Strengths Limitations
Linear Regression
Works with numeric attributes For high dimensional data, model prone to overfitting
Model learning time is fast Cannot perform variable selection
Prediction is powerful if the dimension is high Sensitive to outliers

Naïve Bayes
Easy and fast Assumes all features are uncorrelated or independent
Used for both binary and multi-class classification Cannot learn the relationship between features
Performs well for multi-class classification as compared to other
classification models
Popular for text classification problem
Data need not to scale

Decision Trees
Easy to interpret and visualize Sensitive to noisy data → can overfit
Can easily capture non-linear patterns Results in different decision tree if there is small variation (or
Requires fewer data pre-processing from the user (e.g., no need to variance) in data. This can be reduced by bagging and boosting
normalize columns) algorithms.
Can be used for feature engineering, such as predicting missing Biased with an imbalanced dataset → recommended to balance out
values, suitable for variable selection the dataset before creating the decision tree.
No assumptions are required about distribution because of the non-
parametric nature of the algorithm
Which Naïve Bayes Algorithm to use when?
● Multinomial Naive Bayes: It is useful in classification of feature vectors where each value represents its relative frequency. Input will
be discrete values.
This classifier is used when the data is multinomial distributed. It is primarily used for document classification problems, which helps to
define a particular document that belongs to which category such as Sports, Politics, education, etc. The classifier uses the frequency of
words for the predictors.
● Bernoulli Naive Bayes: It is also suitable for discrete data but assumes that all our features are binary. For example:
0 can be represented as “word does not occur in the document”
1 can be represented as “word occurs in the document”
This classifier works similar to the Multinomial classifier, but the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This is famous for document classification tasks.
● Gaussian Naive Bayes: It is used when the target column contains continuous values which can be converted into classes.
This model assumes that features follow a normal distribution, which means if the predictors take continuous values instead of discrete,
then the model assumes that these values are sampled from the Gaussian distribution.
The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary / Boolean features.
● Normalization (Min/Max Scaling)

● Standardization using Z-Transform

● Euclidean distance

● Manhattan distance

Machine Learning
No ratings yet
Machine Learning
32 pages
CS Study Guide
No ratings yet
CS Study Guide
3 pages
ML Unit-4
No ratings yet
ML Unit-4
20 pages
Machine Learning Engineer Cheatsheet
No ratings yet
Machine Learning Engineer Cheatsheet
3 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
2 Classification
No ratings yet
2 Classification
38 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Final ML
No ratings yet
Final ML
2 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
Financial Machine Learning-Unit-1: Dr. J.Dhanalakshmi
No ratings yet
Financial Machine Learning-Unit-1: Dr. J.Dhanalakshmi
70 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
SML
No ratings yet
SML
8 pages
Data Science Lecture: Classification & Regression
No ratings yet
Data Science Lecture: Classification & Regression
27 pages
Machine Learning AL-405 GS Answers
No ratings yet
Machine Learning AL-405 GS Answers
3 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Unit 3 Ds
No ratings yet
Unit 3 Ds
10 pages
Machine Learning Algorithms Notes
No ratings yet
Machine Learning Algorithms Notes
167 pages
ML Models
No ratings yet
ML Models
21 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
MLT Study
No ratings yet
MLT Study
22 pages
End SEM V IMP DSE 2
No ratings yet
End SEM V IMP DSE 2
9 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Assessing A Single Classification Algorithm and Two Classification Algorithms
No ratings yet
Assessing A Single Classification Algorithm and Two Classification Algorithms
12 pages
AI For Eng Supervised-Learning
No ratings yet
AI For Eng Supervised-Learning
25 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Unit-4-AIML 1
No ratings yet
Unit-4-AIML 1
19 pages
MMC102 - Module 4 - Notes
No ratings yet
MMC102 - Module 4 - Notes
39 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
No ratings yet
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
44 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
Summary - Data Analytics& Machine Learning
No ratings yet
Summary - Data Analytics& Machine Learning
18 pages
DM Assignment 2
No ratings yet
DM Assignment 2
23 pages
مشین سیکھنا
No ratings yet
مشین سیکھنا
5 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
ML - ML in Nutshell
No ratings yet
ML - ML in Nutshell
7 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Machine Learning
No ratings yet
Machine Learning
37 pages
Assignment 4 Reportdocx
No ratings yet
Assignment 4 Reportdocx
10 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
2-ML Principles
No ratings yet
2-ML Principles
34 pages
Lec05 - Supervised
No ratings yet
Lec05 - Supervised
26 pages
Module 9
No ratings yet
Module 9
16 pages
Limits at Infinity
No ratings yet
Limits at Infinity
5 pages
PHPMailer Tutorial
No ratings yet
PHPMailer Tutorial
16 pages
10 Adj CL
No ratings yet
10 Adj CL
3 pages
BMW R1200 GS - Touratech Catalog
100% (1)
BMW R1200 GS - Touratech Catalog
75 pages
W Sat Email Decoder Manual
No ratings yet
W Sat Email Decoder Manual
33 pages
1 s2.0 S1364032121011783 Main
No ratings yet
1 s2.0 S1364032121011783 Main
19 pages
Ai Syllabus
No ratings yet
Ai Syllabus
6 pages
Solid State Devices Course Guide
No ratings yet
Solid State Devices Course Guide
11 pages
DSA Notes Well Organised
No ratings yet
DSA Notes Well Organised
166 pages
Rolling Stock Content
No ratings yet
Rolling Stock Content
8 pages
Grade 8 Physics: Newton's Third Law
No ratings yet
Grade 8 Physics: Newton's Third Law
5 pages
2022 Kilbaha Exam 2
No ratings yet
2022 Kilbaha Exam 2
27 pages
Electrical Installation in Hazardous Area Presentation
100% (1)
Electrical Installation in Hazardous Area Presentation
79 pages
Product List of FUCHENG HYDRAULIC
No ratings yet
Product List of FUCHENG HYDRAULIC
4 pages
11.three Dimentional Geometry 2ndPUC PYQs - 6e4756d8 E43f 4616 Bb4f 141924acb0e0
No ratings yet
11.three Dimentional Geometry 2ndPUC PYQs - 6e4756d8 E43f 4616 Bb4f 141924acb0e0
2 pages
Master Test List
No ratings yet
Master Test List
36 pages
Complex Numbers - Polar Form
No ratings yet
Complex Numbers - Polar Form
59 pages
Selected Solutions To Ahlfors
No ratings yet
Selected Solutions To Ahlfors
33 pages
Academic Transcript: Geophysics Engineering
No ratings yet
Academic Transcript: Geophysics Engineering
1 page
Mineral Exploration Techniques
100% (1)
Mineral Exploration Techniques
18 pages
Gear Design and Classification
No ratings yet
Gear Design and Classification
19 pages
Multi-Tenant Data Architecture For SaaS
No ratings yet
Multi-Tenant Data Architecture For SaaS
7 pages
Screw Conveyor Example Calculation1 (New)
No ratings yet
Screw Conveyor Example Calculation1 (New)
9 pages
Binaryhexaoctal
No ratings yet
Binaryhexaoctal
4 pages
Introduction To Chemical Engineering Fluid Mechanics
No ratings yet
Introduction To Chemical Engineering Fluid Mechanics
2 pages
2011 Monsoon Second Test
No ratings yet
2011 Monsoon Second Test
2 pages
David Rumelhart
No ratings yet
David Rumelhart
3 pages
Eclipse Control Flow Graph Plugin
No ratings yet
Eclipse Control Flow Graph Plugin
31 pages
Circle The Correct Answer On The Answer Sheet.: Multiple Choice Questions No. 1-15 (Each Number 0.5 Point)
100% (1)
Circle The Correct Answer On The Answer Sheet.: Multiple Choice Questions No. 1-15 (Each Number 0.5 Point)
8 pages

Supervised Learning Notes

Uploaded by

Supervised Learning Notes

Uploaded by

Popular Supervised Learning Algorithms:

Bagging with Random Forest Bagging with Random Forest

◆ Identify the problem

● Standardization using Z-Transform

You might also like