[go: up one dir, main page]

0% found this document useful (0 votes)
30 views121 pages

Unit 4

AIML part 4

Uploaded by

manishukale472
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views121 pages

Unit 4

AIML part 4

Uploaded by

manishukale472
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 121

Third Year Engineering

Artificial Intelligence and Machine Learning


Class - T.Y. (SEM-IV)
Unit – IV Introduction To Supervised Learning

AY 2024-2025
Unit IV - Syllabus
Unit – Introduction to Supervised Learning 09 hours
• Feature Selection vs. feature extraction techniques,

• Linear Regression;

• Logistic Regression;

• Decision Trees;

• Random Forests;

• Support Vector Machines (SVM);

• Naive Bayes;

• K-Nearest Neighbors (KNN)


Unit IV – Outline
• Feature Selection vs. feature extraction techniques,

• Linear Regression;

• Logistic Regression;

• Decision Trees;

• Random Forests;

• Support Vector Machines (SVM);

• Naive Bayes;

• K-Nearest Neighbors (KNN)


INTRODUCTION TO FEATURE SELECTION
• Feature selection in machine learning refers to the process of
choosing the most relevant and informative features or variables
from a dataset.
• The goal is to improve model performance by reducing the number of
features while retaining the most important ones that contribute the
most to the predictive power of the model.
INTRODUCTION TO FEATURE SELECTION
There are several reasons for performing feature selection:
• Improved Model Performance: Reducing irrelevant or redundant
features can prevent overfitting and help the model generalize
better to new, unseen data.
• Faster Training: With fewer features, training machine learning
models becomes faster and more efficient.
• Enhanced Interpretability: Selecting key features can help in
understanding the important factors that influence predictions,
providing better insights.
INTRODUCTION TO FEATURE SELECTION
Feature selection methods can be broadly categorized into three types:
• Filter Methods: These methods assess the relevance of features based on
statistical measures, such as correlation, mutual information, or
significance tests, independent of any machine learning algorithm.
• Wrapper Methods: These methods involve evaluating subsets of features
by training models and selecting the subset that produces the best model
performance. This is often computationally expensive but can yield good
results.
• Embedded Methods: These methods incorporate feature selection as
part of the model-building process, where the model itself decides which
features are most important during training. Examples include
regularization techniques like Lasso Regression and decision trees with
built-in feature importance measures.
INTRODUCTION TO FEATURE EXTRACTION
• Feature extraction in machine learning involves transforming raw
data into a set of new, more meaningful features that represent
the essential characteristics of the original data.
• It aims to reduce the dimensionality of the data while retaining
important information.
FEATURE EXTRACTION: OBJECTIVES
The primary objectives of feature extraction are:
• Dimensionality Reduction: Converting high-dimensional data into
a lower-dimensional space by extracting the most relevant
information, which helps in alleviating computational complexity
and addressing the curse of dimensionality.
• Improved Model Performance: Creating more informative and
discriminative features can enhance the performance of machine
learning algorithms by focusing on the most critical aspects of the
data.
TYPES OF FEATURE EXTRACTION
Feature extraction techniques can be categorized into various methods:
• Principal Component Analysis (PCA): A popular technique that
transforms data into a new coordinate system by identifying and
retaining the most significant components while discarding less important
ones.
• Linear Discriminant Analysis (LDA): Similar to PCA but focuses on
maximizing class separability, making it particularly useful for
classification tasks.
• Autoencoders: Neural networks that learn to encode input data into a
lower-dimensional representation, then decode it back to the original
space. The encoded values serve as the extracted features.
TYPES OF FEATURE EXTRACTION
Feature extraction techniques can be categorized into various
methods:
• Feature Scaling and Normalization: Techniques that rescale
features to a standard range or normalize them to make the data
more amenable to machine learning algorithms.
• Feature Engineering: Creating new features from existing ones
based on domain knowledge or specific insights about the data,
which may involve mathematical transformations, combining
features, or extracting more relevant information.
FEATURE SELECTION VS EXTRACTION
Aspect Feature Selection Feature Extraction

Objective Choose most relevant features Transform original features into new ones

Purpose Select subset of original features Create a new set of features

Outcome Subset of original features Transformed or derived new features

Process Selection based on relevance/importance Transformation/Compression/Summarization

PCA, LDA, Autoencoders, Feature Scaling,


Method Types Filter, Wrapper, Embedded Methods
etc.

Information Type Retains original feature values Creates new feature values

Reduces dimensionality and captures


Advantages Preserves interpretability
complexity

Computational efficiency Enhanced information and noise reduction

Disadvantages Potential loss of information Potential loss of interpretability

Limited scope for capturing complexity Loss of nuanced information from original data

Correlation, Mutual Information, Lasso Principal Component Analysis (PCA), Linear


Example Techniques
Regression Discriminant Analysis (LDA), Autoencoders
Unit IV – Outline
• Feature Selection vs. feature extraction techniques,

• Linear Regression;

• Logistic Regression;

• Decision Trees;

• Random Forests;

• Support Vector Machines (SVM);

• Naive Bayes; K-Nearest

• Neighbors (KNN)
WHAT IS SUPERVISED LEARNING?
Features:
• Dogs and cats both have 4 legs
and a tail.
• Dogs come in small to large
sizes. Cats, on the other hand,
are always small.
• Dogs have a long mouth while
cats have smaller mouths.
• Dogs bark while cats meow.
• Different dogs have different
ears while cats have almost the
same kind of ears.
WHY IS IT IMPORTANT?
• Supervised learning gives the algorithm experience which can be used
to output the predictions for new unseen data
• Experience also helps in optimizing the performance of the algorithm
TYPES OF SUPERVISED LEARNING
Supervised learning can be further classified into:
• Classification

• Regression
PROBLEMS IN MACHINE LEARNING
PROBLEMS IN MACHINE LEARNING
INTRODUCTION TO REGRESSION
Regression: Regression analysis is a form of predictive modelling
technique which investigates the relationship between a dependent
and independent variable.
USES OF REGRESSION
• Determining the strength of predictors (strength of the effect that
the independent variable have on the dependent variable)
• Forecasting an effect
• Trend forecasting
INTRODUCTION TO LINEAR REGRESSION
• Linear regression is like drawing a straight line through data points
to predict future outcomes or understand the relationship between
two variables.
• It's used when we want to find a relationship between one thing we
want to predict (called the dependent variable) and one or more
things we use to make that prediction (called independent variables
or predictors).
LINEAR REGRESSION: WORKING
• Imagine you have a bunch of points on a graph.
• Linear regression finds the best-fitting line that goes through those
points.
• Once you have this line, you can use it to make predictions about
future points or understand how changes in one variable might
affect another.
LINEAR REGRESSION: WORKING
LINEAR REGRESSION: WORKING
LINEAR REGRESSION
LINEAR REGRESSION
R-SQUARED VALUE
• R-squared value is a statistical measure of how close the data are to
the fitted regression line.
• It is also known as coefficient of determination, or the coefficient of
multiple determination
GOODNESS OF FIT
GOODNESS OF FIT
• When the value of R square is equal to 1 then the actual values lies
on the regression line.
MEAN SQUARED ERROR
GRADIENT DESCENT
• Gradient descent is an algorithm that finds best fit line for a given
training dataset
GRADIENT DESCENT: EXAMPLE
• Area = [2600, 3000, 3200, 3600, 4000]
• Price = [550k, 565k, 610k, 680k, 725k]
GRADIENT DESCENT: EXAMPLE
• Area = [2600, 3000, 3200, 3600, 4000]
• Price = [550k, 565k, 610k, 680k, 725k]
CONT.…
CONT.….

For Slope
CONT.…
CODE
LINEAR REGRESSION: APPLICATIONS
• Predicting house prices based on factors like size, number of rooms,
location, etc.
• Forecasting sales based on advertising spending, seasonality, or
other factors.
• Understanding how temperature affects ice cream sales.
LINEAR REGRESSION: ADVANTAGES
• Simplicity: Easy to understand and implement.
• Interpretability: Provides insights into the relationship between
variables.
• Speed: Quick to train and make predictions.
LINEAR REGRESSION: DISADVANTAGES
• Assumes Linearity: Assumes that the relationship between
variables is linear, which might not always be the case.
• Sensitivity to Outliers: Outliers (extreme data points) can
significantly impact the model's performance.
• Limited Complexity: Cannot capture complex relationships between
variables without modifications (like polynomial regression).
Unit IV – Outline
• Feature Selection vs. feature extraction techniques,

• Linear Regression;

• Logistic Regression;

• Decision Trees;

• Random Forests;

• Support Vector Machines (SVM);

• Naive Bayes;

• K-Nearest Neighbors (KNN)


INTRODUCTION TO LOGISTIC REGRESSION
• Logistic regression is a type of machine learning algorithm used for
binary classification tasks,
• which means it predicts the probability of an input belonging to one
of two categories.
• It's called "regression" but actually works for classification.
LOGISTIC REGRESSION: WORKING
• It models the relationship between a dependent binary variable
(target) and one or more independent variables (features).
• Utilizes the logistic function (sigmoid) to transform predictions into
probabilities between 0 and 1.
• The model makes predictions by calculating the probability that an
input belongs to a particular class.
LOGISTIC REGRESSION CURVE
EXAMPLE
CONT.…
CLASSIFICATION
▪Classification
is a process of categorizing a given set of data into classes,
It can be performed on both structured or unstructured data.
▪The process starts with predicting the class of given data points. The
classes are often referred to as target, label or categories.
CLASSIFICATION TERMINOLOGIES
TYPES OF LEARNERS IN CLASSIFICATION
CLASSIFICATION ALGORITHMS
In machine learning, classification is a supervised learning concept
which basically categorizes a set of data into classes.
LOGISTIC REGRESSION
It is a classification algorithm in machine learning that uses one or
more independent variables to determine an outcome.
It will have only two possible outcomes.
LOGISTIC REGRESSION: APPLICATIONS
• Medical Diagnosis: Predicting if a patient has a disease based on
symptoms.
• Marketing: Determining if a customer will buy a product.
• Credit Risk Assessment: Evaluating the risk of default for loans.
• Image Segmentation: Identifying objects in images as part of
computer vision tasks.
LOGISTIC REGRESSION: ADVANTAGES
• Simplicity: Easy to implement and understand.
• Efficiency: Computationally inexpensive and performs well on small
to medium-sized datasets.
• Interpretability: Provides insight into the importance of features on
the outcome.
LOGISTIC REGRESSION: DISADVANTAGES
• Linear Assumption: Assumes a linear relationship between features
and outcomes, which may not hold in real-world scenarios.
• Limited Complexity: Not suitable for complex patterns in data.
• Sensitivity to Outliers: Influenced by outliers that skew the model's
predictions.
LINEAR VS LOGISTIC REGRESSION
LINEAR VS LOGISTIC REGRESSION
Unit IV – Outline
• Feature Selection vs. feature extraction techniques,

• Linear Regression;

• Logistic Regression;

• Decision Trees;

• Random Forests;

• Support Vector Machines (SVM);

• Naive Bayes;

• K-Nearest Neighbors (KNN)


INTRODUCTION TO DECISION TREES
• A decision tree is a
graphical
representation of all
the possible
solution to a
decision based on
certain conditions.
INTRODUCTION TO DECISION TREES
• A decision tree is a
graphical
representation of all
the possible
solution to a
decision based on
certain conditions.
INTRODUCTION TO DECISION TREES
• A decision tree is a graphical
representation of all the
possible solution to a decision
based on certain conditions.
INTRODUCTION TO DECISION TREES
• A decision tree is a graphical
representation of all the
possible solution to a decision
based on certain conditions.
INTRODUCTION TO DECISION TREES
• Decision Trees are hierarchical tree-like structures used in machine
learning for both classification and regression tasks.
• They operate by partitioning the feature space into smaller and
more manageable regions or segments, leading to a tree-like
structure where each internal node represents a decision based on
a feature attribute, and each leaf node represents a class label (in
classification) or a numerical value (in regression).
DECISION TREE TERMINOLOGIES
PROBLEMS THAT DECISION TREE CAN SOLVE
PROBLEMS THAT DECISION TREE CAN SOLVE
PROBLEMS THAT DECISION TREE CAN SOLVE
CART (CLASSIFICATION & REGRESSION TREES )
ALGORITHM
• The algorithm is based on Classification and Regression Trees by
Breiman et al (1984). A CART tree is a binary decision tree that is
constructed by splitting a node into two child nodes repeatedly,
beginning with the root node that contains the whole learning sample.
• The main elements of CART (and any decision tree algorithm) are:
• Rules for splitting data at a node based on the value of one variable;
• Stopping rules for deciding when a branch is terminal and can be split
no more; and
• Finally, a prediction for the target variable in each terminal node.
EXAMPLE
• Q: Which one among them should you pick first?
• Ans: Determine the attribute that best classifies the training
data.

Q: How do we choose the best attribute?


• OR
Q: How does a tree decide where to split?
HOW DOES A TREE DECIDE WHERE TO SPLIT?
BUILD OUR DECISION TREE
(STEP 1: COMPUTE THE ENTROPY FOR THE
DATASET)
BUILD OUR DECISION TREE
(STEP 2: WHICH NODE TO SELECT AS ROOT NODE)
BUILD OUR DECISION TREE
(STEP 2: WHICH NODE TO SELECT AS ROOT NODE)
BUILD OUR DECISION TREE
(STEP 2: WHICH NODE TO SELECT AS ROOT NODE)
BUILD OUR DECISION TREE
(STEP 2: WHICH NODE TO SELECT AS ROOT NODE)
THIS IS HOW YOUR COMPLETE TREE WILL LOOK
LIKE
WHAT IS PRUNING?
• A decision tree is a graphical
representation of all the
possible solution to a decision
based on certain condition.
DECISION TREES: WORKING (SUMMARY)
• At the root node, the decision tree selects the most significant
feature that best separates the data based on a specific criterion
(e.g., information gain, Gini impurity).
• Each subsequent internal node further partitions the data by asking
questions based on features, aiming to maximize information gain
or minimize impurity.
• The process continues until a stopping condition is met (e.g.,
maximum tree depth, minimum number of samples per leaf).
• The final leaf nodes provide the predicted output or class labels.
DECISION TREES: APPLICATIONS
• Finance: Credit scoring, fraud detection.
• Healthcare: Diagnosis of diseases based on symptoms.
• Business: Market segmentation, predicting customer behavior.
• Robotics: Path planning, decision-making in autonomous systems.
DECISION TREES: ADVANTAGES
• Interpretability: Decision trees are easy to interpret and visualize,
making them accessible for non-technical stakeholders.
• Handling Non-linearity: Can capture nonlinear relationships in the
data without requiring complex transformations.
• Mixed Data Types: Can handle both numerical and categorical data
without much preprocessing.
DECISION TREES: DISADVANTAGES
• Overfitting: Decision trees can create overly complex models that
memorize noise in the data, leading to poor generalization on
unseen data.
• Instability: Small variations in the data may result in significantly
different trees.
• Biased Toward Dominant Classes: Tends to favor classes with more
instances, which might lead to bias against minority classes.
Unit IV – Outline
• Feature Selection vs. feature extraction techniques,

• Linear Regression;

• Logistic Regression;

• Decision Trees;

• Random Forests;

• Support Vector Machines (SVM);

• Naive Bayes;

• K-Nearest Neighbors (KNN)


INTRODUCTION TO RANDOM FOREST
• Random Forest is an ensemble learning method in machine learning
that builds multiple decision trees and merges their predictions to
make more accurate and robust predictions.
• It operates by creating an ensemble or a collection of decision trees
and combines their outputs through voting (in classification) or
averaging (in regression) to provide the final prediction.
INTRODUCTION TO RANDOM FOREST
RANDOM FOREST: WORKING
• Random Forest builds multiple decision trees using a technique
called bagging (bootstrap aggregating).
• It randomly selects subsets of the training data with replacement
and constructs individual decision trees on these subsets.
• Each tree is built using a random subset of features at each node
split, reducing correlation among trees.
• During prediction, the Random Forest aggregates the predictions of
all trees to reach a final consensus prediction.
RANDOM FOREST: APPLICATIONS
• Predictive Modeling: Classification and regression tasks in various
domains, such as finance, healthcare, marketing.
• Anomaly Detection: Identifying unusual patterns or outliers in data.
• Feature Importance: Assessing the importance of different features
in predictive models.
RANDOM FOREST: ADVANTAGES
• Improved Accuracy: Random Forest generally provides higher
accuracy compared to individual decision trees by reducing
overfitting.
• Reduced Overfitting: By averaging or voting among multiple trees,
it mitigates overfitting issues seen in individual trees.
• Feature Importance: Provides a ranking of feature importance
based on how much they contribute to the predictive performance.
RANDOM FOREST: DISADVANTAGES
• Computational Complexity: Constructing multiple trees and
aggregating predictions can be computationally intensive, especially
for large datasets.
• Less Interpretability: Random Forests are less interpretable
compared to single decision trees.
• Resource Consumption: Requires more memory and computational
resources due to maintaining multiple trees.
Unit IV – Outline
• Feature Selection vs. feature extraction techniques,

• Linear Regression;

• Logistic Regression;

• Decision Trees;

• Random Forests;

• Support Vector Machines (SVM);

• Naive Bayes;

• K-Nearest Neighbors (KNN)


INTRODUCTION TO SVM
• Support Vector Machine (SVM) is a supervised machine learning
algorithm used for classification, regression, and outlier detection
tasks.
• It's particularly effective for linearly separable data and can also
handle non-linear data by using appropriate kernel functions.
• SVM aims to find the optimal hyperplane that best separates
different classes in the feature space while maximizing the margin
between the classes.
SVM: WORKING
• An SVM kernel basically adds more dimensions to a low dimensional
space to make it easier to segregate the data.
• It converts the inseparable problem to separable problems by
adding more dimensions using the kernel trick.
• A support vector machine is implemented in practice by a kernel.
• The kernel trick helps to make a more accurate classifier.
• Different kernels:
• Linear Kernel
• Polynomial Kernel
• Radial Basis Function Kernel
SVM: WORKING
SVM: APPLICATIONS
• Image Classification: Recognizing objects in images.
• Text Classification: Spam filtering, sentiment analysis.
• Bioinformatics: Protein classification, disease detection.
• Finance: Credit scoring, stock market prediction.
SVM: ADVANTAGES
• Effective in High-Dimensional Spaces: Works well even in
high-dimensional spaces and with datasets where the number of
features is greater than the number of samples.
• Versatility: SVM can handle both linear and non-linear data by using
appropriate kernel functions.
• Regularization: Incorporates regularization parameters to control
overfitting.
SVM: DISADVANTAGES
• Sensitivity to Parameter Tuning: Performance depends on choosing
appropriate parameters and the kernel function.
• Computationally Intensive: Training can be time-consuming for
large datasets.
• Not Ideal for Large Datasets: SVM may not scale well to extremely
large datasets due to its computational complexity.
Unit IV – Outline
• Feature Selection vs. feature extraction techniques,

• Linear Regression;

• Logistic Regression;

• Decision Trees;

• Random Forests;

• Support Vector Machines (SVM);

• Naive Bayes;

• K-Nearest Neighbors (KNN)


INTRODUCTION TO NAÏVE BAYES
• Naive Bayes is among one of the most simple and powerful
algorithms for classification based on Bayes’ Theorem with an
assumption of independence among predictors.
• Naive Bayes model is easy to build and particularly useful for very
large data sets.
• There are two parts to this algorithm:
• Naïve
• Bayes
NAÏVE BAYES
• The Naïve Bayes classifier assumes that the presence of a feature in
a class is unrelated to any other feature.
• Even if these features depend on each other or upon the existence
of the other features, all of these properties independently
contribute to the probability that a particular fruit is an apple or an
orange or a banana and that is why it is known as “Naive”.
NAÏVE BAYES
• Bayes’ theorem describes the
probability of an event, based
on prior knowledge of
conditions that might be
related to the event.
BAYES THEOREM: EXAMPLE
• Problem : Probability of the Card we picked at random to be a King
given that it is a Face Card
BAYES THEOREM: PROOF
NAÏVE BAYES: WORKING
NAÏVE BAYES: WORKING
NAÏVE BAYES: WORKING
NAÏVE BAYES: WORKING
NAÏVE BAYES: WORKING
NAÏVE BAYES: WORKING IN INDUSTRY
NAÏVE BAYES: TYPES
• Gaussian
• Multinomial
• Bernoulli
Unit IV – Outline
• Feature Selection vs. feature extraction techniques,

• Linear Regression;

• Logistic Regression;

• Decision Trees;

• Random Forests;

• Support Vector Machines (SVM);

• Naive Bayes;

• K-Nearest Neighbors (KNN)


INTRODUCTION TO KNN
• K Nearest Neighbor is a simple algorithm that stores all the available
cases and classifies the new data or case based on a similarity measure.
Q: What does ‘k’ in KNN Algorithm represent?
Ans: k in KNN algorithm represents the number of nearest neighbor points
which are voting for the new test data’s class.

Note:
• If k=1, then test examples are given the same label as the closest example
in the training set.
• If k=3, the labels of the three closest classes are checked and the most
common (i.e., occurring at least twice) label is assigned, and so on for
larger ks.
INTRODUCTION TO KNN
INTRODUCTION TO KNN: INDUCTIVE ASSUMPTION
• Similar inputs map to similar outputs
• If not true -> learning algorithm is impossible
• If true -> learning reduces to defining “similar”
• Not all similarities created equal
• Predicting a person’s weight may depend on different attributes than
predicting their IQ.
BASIC KNN CLASSIFICATION
• Training method
• Save the training examples
• At prediction time
• Find the k training examples (x1,y1),…(xk, yk) that are closest to the test
example x
• Predict the most frequent class among those y’s
WHAT IS THE DECISION BOUNDARY?
Voronoi Diagram
BASIC KNN CLASSIFICATION
• Training method
• Save the training examples
• At prediction time
• Find the k training examples (x1,y1),…(xk, yk) that are closest to the test
example x
• Predict the most frequent class among those y’s

Improvements:
• Weighting examples from the neighbourhood
• Measuring “closeness”
• Finding “close” examples in a large training set quickly.
KNN
KNN
KNN
APPLICATION OF KNN IN INDUSTRY
HOW TO CHOOSE THE VALUE OF K IN KNN
ALGORITHM
KNN IS A LAZY LEARNER
KNN: EXAMPLE
121

You might also like