Machine Learning

The document discusses dimensionality reduction in machine learning, emphasizing its importance in simplifying models and improving performance by retaining essential information while reducing the number of features. It outlines techniques such as feature selection and feature extraction, with a focus on Principal Component Analysis (PCA) and Lasso regression as a regularization method for enhancing predictive modeling. The advantages and disadvantages of dimensionality reduction are also highlighted, including its role in preventing overfitting and improving data visualization.

Uploaded by

vardhinijothi11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

Machine Learning

Uploaded by

vardhinijothi11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CP4252 & MACHINE LEARNING

1.Dimensionality reduction:
Machine Learning:
As discussed in this article, machine learning is nothing but a field of study which
allows computers to “learn” like humans without any need of explicit programming.
What is Predictive Modeling:
Predictive modeling is a probabilistic process that allows us to forecast outcomes, on the
basis of some predictors. These predictors are basically features that come into play when
deciding the final result, i.e. the outcome of the model.
Dimensionality reduction is the process of reducing the number of features (or dimensions)
in a dataset while retaining as much information as possible. This can be done for a variety
of reasons, such as to reduce the complexity of a model, to improve the performance of a
learning algorithm, or to make it easier to visualize the data. There are several techniques
for dimensionality reduction, including principal component analysis (PCA), singular value
decomposition (SVD), and linear discriminant analysis (LDA). Each technique uses a
different method to project the data onto a lower-dimensional space while preserving
important information.
What is Dimensionality Reduction?
Dimensionality reduction is a technique used to reduce the number of features in a dataset
while retaining as much of the important information as possible. In other words, it is a
process of transforming high-dimensional data into a lower-dimensional space that still
preserves the essence of the original data.
In machine learning, high-dimensional data refers to data with a large number of features or
variables. The curse of dimensionality is a common problem in machine learning, where the
performance of the model deteriorates as the number of features increases. This is because
the complexity of the model increases with the number of features, and it becomes more
difficult to find a good solution. In addition, high-dimensional data can also lead to
overfitting, where the model fits the training data too closely and does not generalize well
to new data.
Dimensionality reduction can help to mitigate these problems by reducing the complexity
of the model and improving its generalization performance. There are two main approaches
to dimensionality reduction: feature selection and feature extraction.
Feature Selection:
Feature selection involves selecting a subset of the original features that are most relevant
to the problem at hand. The goal is to reduce the dimensionality of the dataset while
retaining the most important features. There are several methods for feature selection,
including filter methods, wrapper methods, and embedded methods. Filter methods rank the
features based on their relevance to the target variable, wrapper methods use the model
performance as the criteria for selecting features, and embedded methods combine feature
selection with the model training process.
Feature Extraction:
Feature extraction involves creating new features by combining or transforming the
original features. The goal is to create a set of features that captures the essence of the
original data in a lower-dimensional space. There are several methods for feature
extraction, including principal component analysis (PCA), linear discriminant analysis
(LDA), and t-distributed stochastic neighbor embedding (t-SNE). PCA is a popular
technique that projects the original features onto a lower-dimensional space while
preserving as much of the variance as possible.
Why is Dimensionality Reduction important in Machine Learning and Predictive
Modeling?
An intuitive example of dimensionality reduction can be discussed through a simple e-
mail classification problem, where we need to classify whether the e-mail is spam or not.
This can involve a large number of features, such as whether or not the e-mail has a generic
title, the content of the e-mail, whether the e-mail uses a template, etc. However, some of
these features may overlap. In another condition, a classification problem that relies on both
humidity and rainfall can be collapsed into just one underlying feature, since both of the
aforementioned are correlated to a high degree. Hence, we can reduce the number of
features in such problems. A 3-D classification problem can be hard to visualize, whereas a
2-D one can be mapped to a simple 2-dimensional space, and a 1-D problem to a simple
line. The below figure illustrates this concept, where a 3-D feature space is split into two 2-
D feature spaces, and later, if found to be correlated, the number of features can be reduced
even further.
There are two components of dimensionality reduction:
 Feature selection: In this, we try to find a subset of the original set of variables,
or features, to get a smaller subset which can be used to model the problem. It
usually involves three ways:
1. Filter
2. Wrapper
3. Embedded
 Feature extraction: This reduces the data in a high dimensional space to a lower
dimension space, i.e. a space with lesser no. of dimensions.
Methods of Dimensionality Reduction
The various methods used for dimensionality reduction include:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)
Dimensionality reduction may be both linear and non-linear, depending upon the method
used. The prime linear method, called Principal Component Analysis, or PCA, is discussed
below.
Principal Component Analysis
This method was introduced by Karl Pearson. It works on the condition that while the data in
a higher dimensional space is mapped to data in a lower dimension space, the variance of the
data in the lower dimensional space should be maximum.

It involves the following steps:

 Construct the covariance matrix of the data.
 Compute the eigenvectors of this matrix.
 Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a
large fraction of variance of the original data.
Hence, we are left with a lesser number of eigenvectors, and there might have been some data
loss in the process. But, the most important variances should be retained by the remaining
eigenvectors.
Advantages of Dimensionality Reduction
 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.
 Improved Visualization: High dimensional data is difficult to visualize, and
dimensionality reduction techniques can help in visualizing the data in 2D or 3D,
which can help in better understanding and analysis.
 Overfitting Prevention: High dimensional data may lead to overfitting in machine
learning models, which can lead to poor generalization performance.
Dimensionality reduction can help in reducing the complexity of the data, and
hence prevent overfitting.
 Feature Extraction: Dimensionality reduction can help in extracting important
features from high dimensional data, which can be useful in feature selection for
machine learning models.
 Data Preprocessing: Dimensionality reduction can be used as a preprocessing step
before applying machine learning algorithms to reduce the dimensionality of the
data and hence improve the performance of the model.
 Improved Performance: Dimensionality reduction can help in improving the
performance of machine learning models by reducing the complexity of the data,
and hence reducing the noise and irrelevant information in the data.
Disadvantages of Dimensionality Reduction
 It may lead to some amount of data loss.
 PCA tends to find linear correlations between variables, which is sometimes
undesirable.
 PCA fails in cases where mean and covariance are not enough to define datasets.
 We may not know how many principal components to keep- in practice, some
thumb rules are applied.
 Interpretability: The reduced dimensions may not be easily interpretable, and it
may be difficult to understand the relationship between the original features and
the reduced dimensions.
 Overfitting: In some cases, dimensionality reduction may lead to overfitting,
especially when the number of components is chosen based on the training data.
 Sensitivity to outliers: Some dimensionality reduction techniques are sensitive to
outliers, which can result in a biased representation of the data.
 Computational complexity: Some dimensionality reduction techniques, such as
manifold learning, can be computationally intensive, especially when dealing with
large datasets.
2.Lasso regression technique:
Lasso regression is a regularization technique. It is used over regression methods for a
more accurate prediction. This model uses shrinkage. Shrinkage is where data values are
shrunk towards a central point as the mean. The lasso procedure encourages simple, sparse
models (i.e. models with fewer parameters). This particular type of regression is well-suited
for models showing high levels of multicollinearity or when you want to automate certain
parts of model selection, like variable selection/parameter elimination.Lasso Regression uses
L1 regularization technique (will be discussed later in this article). It is used when we have
more features because it automatically performs feature selection.
There are two main regularization techniques, namely Ridge Regression and Lasso
Regression. They both differ in the way they assign a penalty to the coefficients. In this blog,
we will try to understand more about Lasso Regularization technique.

L1 Regularization: If a regression model uses the L1 Regularization technique, then it is

called Lasso Regression. If it used the L2 regularization technique, it’s called Ridge
Regression. We will study more about these in the later sections.

L1 regularization adds a penalty that is equal to the absolute value of the magnitude of
the coefficient. This regularization type can result in sparse models with few coefficients.
Some coefficients might become zero and get eliminated from the model. Larger penalties
result in coefficient values that are closer to zero (ideal for producing simpler models). On the
other hand, L2 regularization does not result in any elimination of sparse models or
coefficients. Thus, Lasso Regression is easier to interpret as compared to the Ridge. While
there are ample resources available online to help you understand the subject, there’s nothing
quite like a certificate. Check out to upskill in the domain. This course will help you learn
from a top-ranking global school to build job-ready AIML skills. This 12-month program
offers a hands-on learning experience with top faculty and mentors. On completion, you will
receive a Certificate from The University of Texas at Austin, and Great Lakes Executive
Learning.

Here’s a step-by-step explanation of how LASSO regression works:

1.Linear regression model: LASSO regression starts with the standard linear regression
model, which assumes a linear relationship between the independent variables (features) and
the dependent variable (target). The linear regression equation can be represented as follows:
make file Copy codey = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ + ε

2.L1 regularization: LASSO regression introduces an additional penalty term based on the
absolute values of the coefficients. The L1 regularization term is the sum of the absolute
values of the coefficients multiplied by a tuning parameter λ:scssCopy codeL₁ = λ * (|β₁| + |β₂|
+ ... + |β
.
3.Objective function: The objective of LASSO regression is to find the values of the
coefficients that minimize the sum of the squared differences between the predicted values
and the actual values, while also minimizing the L1 regularization term:makefileCopy
codeMinimize: RSS + L₁

4.Model fitting: To estimate the coefficients in LASSO regression, an optimization

algorithm is used to minimize the objective function. Coordinate Descent is commonly
employed, which iteratively updates each coefficient while holding the others fixed.
LASSO regression offers a powerful framework for both prediction and feature selection,
especially when dealing with high-dimensional datasets where the number of features is
large. By striking a balance between simplicity and accuracy, LASSO can provide
interpretable models while effectively managing the risk of overfitting.

Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
Dimensionality
No ratings yet
Dimensionality
9 pages
Unit 4 - ML (NEW)
No ratings yet
Unit 4 - ML (NEW)
80 pages
ML Unit 4 (R22)
No ratings yet
ML Unit 4 (R22)
34 pages
ML Unit 4 at VS
No ratings yet
ML Unit 4 at VS
33 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Machine Learning Dimensionality Guide
No ratings yet
Machine Learning Dimensionality Guide
9 pages
AML Unit 5
No ratings yet
AML Unit 5
13 pages
Chapter6 - Unit IV2024
No ratings yet
Chapter6 - Unit IV2024
84 pages
L-10 - Presentation1-09052024-072206pm
No ratings yet
L-10 - Presentation1-09052024-072206pm
27 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
ML Unit 4
No ratings yet
ML Unit 4
20 pages
Unit 5 Notes New
No ratings yet
Unit 5 Notes New
6 pages
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
20 pages
Dimensionality Reduction in Machine Learning-1
No ratings yet
Dimensionality Reduction in Machine Learning-1
16 pages
Feature Selection & Feature Extraction
No ratings yet
Feature Selection & Feature Extraction
19 pages
Unit-13 Feature Selection and Extraction
No ratings yet
Unit-13 Feature Selection and Extraction
24 pages
Top 11 Dimensionality Reduction Techniques
No ratings yet
Top 11 Dimensionality Reduction Techniques
12 pages
Unit 3
No ratings yet
Unit 3
23 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
Unit 5 - Machine Learning - Www.a2softech - Xyz - A2kash
No ratings yet
Unit 5 - Machine Learning - Www.a2softech - Xyz - A2kash
12 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
38 pages
Module 2 ML
No ratings yet
Module 2 ML
15 pages
Deminesionality Reduction
No ratings yet
Deminesionality Reduction
13 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
ASM-BDM - Module 3 - Notes
No ratings yet
ASM-BDM - Module 3 - Notes
12 pages
Unit IV Dimensionality Reduction
No ratings yet
Unit IV Dimensionality Reduction
34 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
66 pages
ML Unit 4
No ratings yet
ML Unit 4
50 pages
ML Unit 4
No ratings yet
ML Unit 4
41 pages
Unit 3
No ratings yet
Unit 3
24 pages
Dimensionality Reduction-PCA FA LDA
No ratings yet
Dimensionality Reduction-PCA FA LDA
12 pages
Dimensionality Reduction in ML
No ratings yet
Dimensionality Reduction in ML
6 pages
A Review of Various Linear and Non Linear Dimensionality Reduction Techniques
No ratings yet
A Review of Various Linear and Non Linear Dimensionality Reduction Techniques
7 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
47 pages
Feature Selection Extraction
No ratings yet
Feature Selection Extraction
24 pages
Dimension Reduction: P Adraig Cunningham University College Dublin
No ratings yet
Dimension Reduction: P Adraig Cunningham University College Dublin
24 pages
Lecture4-Dimensionality Reduction Methods
No ratings yet
Lecture4-Dimensionality Reduction Methods
40 pages
ISOMAP in ML
No ratings yet
ISOMAP in ML
12 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
14 pages
Chapter 1.2. Overview of ML
No ratings yet
Chapter 1.2. Overview of ML
17 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
ICACCI 2015 7275954-Pca
No ratings yet
ICACCI 2015 7275954-Pca
4 pages
Feature Dimensionality Reduction: A Review: Survey and State of The Art
No ratings yet
Feature Dimensionality Reduction: A Review: Survey and State of The Art
31 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
ML Team 13 B Div
No ratings yet
ML Team 13 B Div
6 pages
Unit-4 Dimensionality Reduction
No ratings yet
Unit-4 Dimensionality Reduction
17 pages
ML 4
No ratings yet
ML 4
14 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
104 pages
Introduction To Dimensionality Reduction-1
No ratings yet
Introduction To Dimensionality Reduction-1
16 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
ML Module 6
No ratings yet
ML Module 6
29 pages
ML Unit 2 Part - 2
No ratings yet
ML Unit 2 Part - 2
6 pages
Ds Lab Prinout With Result
No ratings yet
Ds Lab Prinout With Result
98 pages
Ds Lab Prinout With Result Edited - Removed
No ratings yet
Ds Lab Prinout With Result Edited - Removed
99 pages
Cp4292 Multicore Lab Multicore Lab Removed
No ratings yet
Cp4292 Multicore Lab Multicore Lab Removed
37 pages
A Distributed File System-1
No ratings yet
A Distributed File System-1
65 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
Numpy: Introduction and Applications To Data Processing (Draft Version)
No ratings yet
Numpy: Introduction and Applications To Data Processing (Draft Version)
48 pages
Unit 1 - Data Structure - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Data Structure - WWW - Rgpvnotes.in
9 pages
322 - Lobna Hassan PDF
No ratings yet
322 - Lobna Hassan PDF
171 pages
Darden Case Book 2021-2022
95% (22)
Darden Case Book 2021-2022
146 pages
Regular Expressions Quick Reference
No ratings yet
Regular Expressions Quick Reference
3 pages
ADANI Medical MAMMOSCAN Leaflet A3 Eng 020916
No ratings yet
ADANI Medical MAMMOSCAN Leaflet A3 Eng 020916
2 pages
Hyper - : Threading Technology
No ratings yet
Hyper - : Threading Technology
20 pages
Phys XSetup
No ratings yet
Phys XSetup
17 pages
German A1 Book
No ratings yet
German A1 Book
62 pages
Cmi Emerchant Certification V1.1
No ratings yet
Cmi Emerchant Certification V1.1
26 pages
Truvision DVR 42
No ratings yet
Truvision DVR 42
2 pages
Siemens PLC Programmer Interview QA
No ratings yet
Siemens PLC Programmer Interview QA
2 pages
Robot Workbench FreeCAD
No ratings yet
Robot Workbench FreeCAD
2 pages
Grammar Advtenses KEY
No ratings yet
Grammar Advtenses KEY
14 pages
Installation Information Emg Model: DG-20 (Pro-Series Assembly)
No ratings yet
Installation Information Emg Model: DG-20 (Pro-Series Assembly)
2 pages
198 Listas M3u 17 12
No ratings yet
198 Listas M3u 17 12
31 pages
2 - Types of Buses and Bus Architecture
No ratings yet
2 - Types of Buses and Bus Architecture
15 pages
PM Pmbok 4Th Edition Documents
No ratings yet
PM Pmbok 4Th Edition Documents
1 page
ModarisReleaseNotes en
No ratings yet
ModarisReleaseNotes en
5 pages
Telarus Releases The First Phase of SolutionVue™ To Its Technology Advisors: Cybersecurity Quick Solution Assessments
No ratings yet
Telarus Releases The First Phase of SolutionVue™ To Its Technology Advisors: Cybersecurity Quick Solution Assessments
4 pages
Chapter # 6
No ratings yet
Chapter # 6
21 pages
Mafia PC Game Manual
100% (1)
Mafia PC Game Manual
20 pages
Generator Testing Guidelines
No ratings yet
Generator Testing Guidelines
38 pages
Compiler Design - Practice Set 1
No ratings yet
Compiler Design - Practice Set 1
3 pages
Jagmohan Vijay Jandhyala: Senior Engineer-Civil/Structural Present Location: HYDERABAD, INDIA
No ratings yet
Jagmohan Vijay Jandhyala: Senior Engineer-Civil/Structural Present Location: HYDERABAD, INDIA
5 pages
Lm1205/Lm1207 130 Mhz/85 MHZ RGB Video Amplifier System With Blanking
No ratings yet
Lm1205/Lm1207 130 Mhz/85 MHZ RGB Video Amplifier System With Blanking
24 pages
Headway HMT-100 Operation Manual
No ratings yet
Headway HMT-100 Operation Manual
31 pages
Power Supplies Using SCR's
No ratings yet
Power Supplies Using SCR's
11 pages
Hele 5
No ratings yet
Hele 5
3 pages
Install AutoDock Vina on Ubuntu
No ratings yet
Install AutoDock Vina on Ubuntu
3 pages