[go: up one dir, main page]

0% found this document useful (0 votes)
19 views7 pages

Supervised Learning Notes

The document outlines popular supervised learning algorithms, including regressors and classifiers, as well as ensemble methods like bagging, boosting, and stacking. It details the steps in supervised learning, from problem identification to model evaluation and improvement, while also discussing the strengths and limitations of various algorithms. Additionally, it provides guidance on selecting the appropriate Naïve Bayes algorithm based on data characteristics.

Uploaded by

kexeca3712
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

Supervised Learning Notes

The document outlines popular supervised learning algorithms, including regressors and classifiers, as well as ensemble methods like bagging, boosting, and stacking. It details the steps in supervised learning, from problem identification to model evaluation and improvement, while also discussing the strengths and limitations of various algorithms. Additionally, it provides guidance on selecting the appropriate Naïve Bayes algorithm based on data characteristics.

Uploaded by

kexeca3712
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Popular Supervised Learning Algorithms:

Regressor Classifier
Linear Regression Logistic Regression
from sklearn.linear_model import LinearRegression from sklearn.linear_model import LogisticRegression
LinearRegression() LogisticRegression()
k-nearest neighbors (kNN) k-nearest neighbors (kNN)
from sklearn.neighbors import KNeighborsRegressor from sklearn.neighbors import KNeighborsClassifier
KNeighborsRegressor() KNeighborsClassifier()
Naïve Bayes
from sklearn.naive_bayes import GaussianNB
GaussianNB()
Naïve Bayes
from sklearn.naive_bayes import CategoricalNB
CategoricalNB()
Decision Tree Decision Tree
from sklearn.tree import DecisionTreeRegressor from sklearn.tree import DecisionTreeClassifier
DecisionTreeRegressor() DecisionTreeClassifier()
Support Vector Machines (SVMs) Support Vector Machines (SVMs)
from sklearn.svm import SVR from sklearn.svm import SVC
SVR() SVC()
Popular Ensemble Methods:

Bagging with Random Forest Bagging with Random Forest


from sklearn.ensemble import RandomForestRegressor from sklearn.ensemble import RandomForestClassifier
RandomForestRegressor() RandomForestClassifier()
Boosting with Adaboost Boosting with Adaboost
from sklearn.ensemble import AdaBoostRegressor from sklearn.ensemble import AdaBoostClassifier
AdaBoostRegressor() AdaBoostClassifier()
Stacking Stacking
from sklearn.ensemble import StackingRegressor from sklearn.ensemble import StackingClassifier
StackingRegressor() StackingClassifier()
Bagging
Often uses homogeneous weak learners
Learns base models parallelly, they are independent of each other
Combines them via the averaging process
Focuses on getting an ensemble model with less variance than the base models
Boosting
Often uses homogeneous weak learners
Learns base models sequentially and adaptive
Combines them via deterministic strategy
Focuses on getting an ensemble model with less bias than the base models

Stacking
Often uses heterogeneous weak learners
Learns base models in parallel
Combines them via learning a meta-model
Focuses on getting an ensemble model with less bias than the base models
Steps in supervised learning:

◆ Identify the problem


◆ Acquire the necessary Data
◆ Import the necessary packages
◆ Import the dataset
◆ Data Pre-Processing:
⧫ Sample the data (if data is huge)
⧫ Clean the data (if required)
⧫ Handle the missing values (if required)
◆ Feature engineering:
⧫ Feature Transformations: transforms features from one representation to another.
 Convert categorical variables into numerical (if required)
 Feature Scaling. For example: normalization, standardization
⧫ Feature Extraction: Extracting important features from a data set to identify useful information. For example: PCA
(Principal Component Analysis), LDA (Linear Discriminant Analysis) etc
⧫ Feature Selection: Features which are irrelevant or redundant should be removed, and features which are most useful
for the model should be prioritized. For example: through correlation identify the features which have more impact on
the target column
⧫ Feature Construction: Inferring new attributes that capture the important information more efficiently than original
attributes.
◆ Split the data : Two-thirds (or 70% to 80%) of the dataset will be used to train the model and one-third (or 20% to 30% ) of the
data will be used to test the model. After splitting, we will have the following 4 segments:
⧫ Dependent variables for training
⧫ Independent variables for training
⧫ Dependent variables for testing
⧫ Independent variables for testing
◆ Model Training & Testing
⧫ Build a model using an algorithm (built-in method from a library). For example, we can use LogisticRegression() from
sklearn.linear_model.
⧫ Fit the model
⧫ Predict the targets in the test data
◆ Evaluate the model performance
⧫ Classification Problems:
 Create a confusion Matrix
 Calculate the Accuracy
 Create a Classification Report (precision, recall, f1-score, support)
 Receiver Operating Characteristic Curve (ROC)
 Area Under The ROC Curve (AUC)
⧫ Regression Problems:
 R-Squared (R²)
 Mean Squared Error (MSE)
 Root Mean Squared Error (RMSE)
 Mean Absolute Error (MAE)
 Mean Absolute Percentage Error (MAPE)
◆ Improve the performance of the model
Strengths Limitations
Linear Regression
Works with numeric attributes For high dimensional data, model prone to overfitting
Model learning time is fast Cannot perform variable selection
Prediction is powerful if the dimension is high Sensitive to outliers

Naïve Bayes
Easy and fast Assumes all features are uncorrelated or independent
Used for both binary and multi-class classification Cannot learn the relationship between features
Performs well for multi-class classification as compared to other
classification models
Popular for text classification problem
Data need not to scale

Decision Trees
Easy to interpret and visualize Sensitive to noisy data → can overfit
Can easily capture non-linear patterns Results in different decision tree if there is small variation (or
Requires fewer data pre-processing from the user (e.g., no need to variance) in data. This can be reduced by bagging and boosting
normalize columns) algorithms.
Can be used for feature engineering, such as predicting missing Biased with an imbalanced dataset → recommended to balance out
values, suitable for variable selection the dataset before creating the decision tree.
No assumptions are required about distribution because of the non-
parametric nature of the algorithm
Which Naïve Bayes Algorithm to use when?
● Multinomial Naive Bayes: It is useful in classification of feature vectors where each value represents its relative frequency. Input will
be discrete values.
This classifier is used when the data is multinomial distributed. It is primarily used for document classification problems, which helps to
define a particular document that belongs to which category such as Sports, Politics, education, etc. The classifier uses the frequency of
words for the predictors.
● Bernoulli Naive Bayes: It is also suitable for discrete data but assumes that all our features are binary. For example:
0 can be represented as “word does not occur in the document”
1 can be represented as “word occurs in the document”
This classifier works similar to the Multinomial classifier, but the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This is famous for document classification tasks.
● Gaussian Naive Bayes: It is used when the target column contains continuous values which can be converted into classes.
This model assumes that features follow a normal distribution, which means if the predictors take continuous values instead of discrete,
then the model assumes that these values are sampled from the Gaussian distribution.
The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary / Boolean features.
● Normalization (Min/Max Scaling)

● Standardization using Z-Transform

● Euclidean distance

● Manhattan distance

You might also like