PARULINSTITUTEOF ENGINEERING &TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY
PARULUNIVERSITY
Subject: Pattern Recognition
Unit 2 : Concept of feature extraction and
dimensionality
Computer Science & Engineering
Ishwarlal Rathod (Assistant Prof. PIET-CSE)
Outline
• Curse of dimensionality
• Dimension reduction methods - Fisher discriminant
analysis
• Dimension reduction methods - Principal component
analysis
• Hidden Markov Models (HMM) basic concepts
• Gaussian mixture models
Curse of dimensionality:
What is Feature extraction?
• Feature extraction is a family of dimensionality
reduction techniques where a new set of features is
built from the original feature set. In order to reduce
dimensionality, the number of the new features is
lower than the number of the original ones.
Curse of dimensionality:
• Curse of Dimensionality refers to a set of problems that arise
when working with high-dimensional data.
• The dimension of a dataset corresponds to the number of
attributes/features that exist in a dataset.
• A dataset with a large number of attributes, generally of the
order of a hundred or more, is referred to as high dimensional
data.
Curse of dimensionality:
• Some of the difficulties that come with high dimensional data
manifest during analyzing or visualizing the data to identify
patterns, and some manifest while training machine learning
models.
• The difficulties related to training machine learning models due
to high dimensional data are referred to as the ‘Curse of
Dimensionality’.
Curse of dimensionality:
• Illustration of the curse of dimensionality, showing how the
number of regions of a regular grid grows exponentially with the
dimensionality D of the space. For clarity, only a subset of the
cubical regions are shown for D =3.
Domains of curse of dimensionality:
• Anomaly Detection
• Combinatorics
• Machine Learning
Dimension reduction methods:
• The number of input features, variables, or columns
present in a given dataset is known as
dimensionality, and the process to reduce these
features is called dimensionality reduction.
• "It is a way of converting the higher dimensions
dataset into lesser dimensions dataset ensuring
that it provides similar information.“
Dimension reduction methods:
• It is commonly used in the fields that deal with
high-dimensional data, such as speech recognition,
signal processing, bioinformatics, etc. It can also be
used for data visualization, noise reduction, cluster
analysis, etc.
Dimension reduction methods:
Benefits of applying Dimensionality Reduction:
1. By reducing the dimensions of the features, the space required to
store the dataset also gets reduced.
2. Less Computation training time is required for reduced
dimensions of features.
3. Reduced dimensions of features of the dataset help in visualizing
the data quickly.
4. It removes the redundant features (if present) by taking care of
multicollinearity.
Disadvantages of dimensionality Reduction:
1. Some data may be lost due to dimensionality reduction.
2. In the PCA dimensionality reduction technique, sometimes the
principal components required to consider are unknown.
Fisher discriminant analysis:
• Fisher's linear discriminant can be used as a supervised learning
classifier. Given labeled data, the classifier can find a set of
weights to draw a decision boundary, classifying the data.
• Fisher's linear discriminant attempts to find the vector that
maximizes the separation between classes of the projected data.
• Fisher's linear discriminant is a classification method that projects
high-dimensional data onto a line and performs classification in
this one-dimensional space.
Principal component analysis (PCA):
• Principal component analysis (PCA) is a popular technique for
analyzing large datasets containing a high number of
dimensions/features per observation, increasing the
interpretability of data while preserving the maximum amount of
information, and enabling the visualization of multidimensional
data.
• PCA is a dimensionality reduction that identifies important
relationships in our data, transforms the existing data based on
these relationships, and then quantifies the importance of these
relationships so we can keep the most important relationships and
drop the others.
Principal component analysis (PCA):
There are four steps for the PCA:
1. We identify the relationship among features through
a Covariance Matrix.
2. Through the linear transformation or eigen decomposition of the
Covariance Matrix, we get eigenvectors and eigenvalues.
3. Then we transform our data using Eigenvectors into principal
components.
4. Lastly, we quantify the importance of these relationships using
Eigenvalues and keep the important principal components.
Principal component analysis (PCA):
Hidden Markov Models (HMM):
• Hidden Markov Models (HMMs) are the most popular recognition
algorithm for pattern recognition. Hidden Markov Models are
mathematical representations of the stochastic process, which
produces a series of observations based on previously stored data.
• The developed algorithms in the HMM-based statistical
framework are robust and effective in real-time scenarios.
• Hidden Markov Models are frequently used in real-world
applications to implement gesture recognition and comprehension
systems.
• Every state of the model can only observe one symbol in the
Markov chain.
Hidden Markov Models (HMM):
• In contrast, every state in the topology of a Hidden Markov Model
can see one symbol emerging from a particular gesture.
• The matrix representing the observation probability distribution
contains the likelihood of observing a symbol in each state. As an
illustration, the probability that a symbol will emit is determined
by its observation probability in the first state.
• In the recognition task, the emission distribution is another name
for the observation probability distribution. For the following
reasons, HMM states are also referred to as hidden.
• First, choosing to emit a symbol denotes the second process.
Hidden Markov Models (HMM):
• Second, an HMM’s emitter only releases the observed symbol.
• Finally, since the current states are derived from the previous
states, the emitting states are unknown.
Hidden Markov Models (HMM):
• HMMs states are called hidden because of (1) Choosing to observe
a symbol denotes the execution of the second process; (2) an
HMM emitter only emits the symbol that has been chosen to be
seen; and (3) the emitting states are unknown since the current
states are derived from the prior states.
• Due to their stochastic character, HMMs are more adaptable and
well-known in the field of pattern recognition.
• HMMs problems:
1. Evaluation problem
2. Decoding problem
3. Estimation problem
Gaussian mixture models:
• A Gaussian mixture model is a type of clustering algorithm that
assumes that the data point is generated from a mixture of
Gaussian distributions with unknown parameters.
• In machine learning and data analysis, it is often necessary to
identify patterns and clusters within large sets of data. However,
traditional clustering algorithms such as k-means clustering have
limitations when it comes to identifying clusters with different
shapes and sizes.
Gaussian mixture models:
• Gaussian mixture models (GMMs) are a type of machine
learning algorithm. They are used to classify data into different
categories based on the probability distribution.
Advantages of Gaussian Mixture Models:
• Flexibility
• Robustness
• Speed
• To Handle Missing Data
• Interpretability
Disadvantages of Gaussian Mixture Models:
• Sensitivity To Initialization
• Assumption Of Normality
• Number Of Components
• High-dimensional data
• Limited expressive power
Real-Life Examples of Gaussian mixture models:
• Clustering
• Anomaly detection
• Speech recognition
• Computer vision
• Finding patterns in medical datasets
• Modeling natural phenomena
• Customer behaviour analysis
• Stock price prediction
• Gene expression data analysis
Thank You!!!
[Link]
[Link]