0% found this document useful (0 votes)

5 views158 pages

Unit-II Feature Engineering - Removed

pdf of feature engineering

Uploaded by

shrustiturkane24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views158 pages

Unit-II Feature Engineering - Removed

pdf of feature engineering

Uploaded by

shrustiturkane24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 158

Unit –II : Feature Engineering

CO2: Apply various data pre-processing techniques to simplify and speed up

machine learning algorithms
2
Unit –II : Feature Engineering

● Part-1 Concept of Feature, Preprocessing of data: Normalization and Scaling, Standardization,

Managing missing values,
● Part-2 Introduction to Dimensionality Reduction, Principal Component Analysis (PCA)Feature
Extraction: Kernel PCA, Local Binary Pattern.
● Part-3 Introduction to various Feature Selection Techniques, Sequential Forward Selection,
Sequential Backward Selection.
● Part-4 Statistical feature engineering: Mean, Median, Mode etc. based feature vector creation.
● Part-5 Multidimensional Scaling, Matrix Factorization Techniques.
3
Feature Engineering

● Feature engineering is the pre-processing step of machine learning, which is used to transform raw
data into features that can be used for creating a predictive model using Machine learning or
statistical Modelling.
● Feature engineering in machine learning aims to improve the performance of models.

4
https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering

What is a Feature?

● Generally, all machine learning algorithms take input data to

generate the output.
● The input data remains in a tabular form consisting of rows (instances
or observations) and columns (variable or attributes), and these
attributes are often known as features.
● For example, an image is an instance in computer vision, but a line in
the image could be the feature.
5
https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering

What is a Feature Engineering?

● Feature engineering is the pre-processing step of machine learning,

which extracts features from raw data.
● It helps to improve the accuracy of the model for unseen data.
● The predictive model contains predictor variables and an outcome
variable, and while the feature engineering process selects the most
useful predictor variables for the model.

https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering

What is a Feature Engineering?

7
https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering

Feature Creation
Feature Engineering Process:

Transformations

Feature Extraction

Feature Selection

8
https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering
Feature Engineering Process:
Feature Creation
● Feature creation is ﬁnding the most useful
variables to be used in a predictive model.
Transformations

Feature Extraction

Feature Selection

9
https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering
Feature Engineering Process:

Feature Creation ● The transformation step of feature engineering

involves adjusting the predictor variable to improve
the accuracy and performance of the model. For
Transformations
example, it ensures that the model is ﬂexible to
take input of the variety of data; it ensures that all
Feature Extraction the variables are on the same scale, making the
model easier to understand.
Feature Selection

10
https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering

Feature Creation
Feature Engineering Process:

Transformations ● Feature extraction is an automated feature

engineering process that generates new variables
by extracting them from the raw data.
Feature Extraction
● Feature extraction methods include cluster
analysis, text analytics, edge detection algorithms,
Feature Selection
and principal components analysis (PCA).
11
https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering

Feature Creation

Transformations
Feature Engineering Process:

Feature Extraction ● Feature selection is a way of selecting the subset of

the most relevant features from the original
features set by removing the redundant, irrelevant,
Feature Selection
or noisy features.
12
https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering
Beneﬁts of Feature Engineering:

● It helps in avoiding the curse of dimensionality.

● It helps in the simpliﬁcation of the model so that the researchers can
easily interpret it.
● It reduces the training time.
● It reduces overﬁtting hence enhancing the generalization.

13
https://www.javatpoint.com/feature-engineering-for-machine-learning
Unit –II : Feature Engineering

● Part-1 Concept of Feature, Preprocessing of data: Normalization and Scaling, Standardization,

Managing missing values
● Part-2 Introduction to Dimensionality Reduction, Principal Component Analysis (PCA)Feature
Extraction: Kernel PCA, Local Binary Pattern.
● Part-3 Introduction to various Feature Selection Techniques, Sequential Forward Selection,
Sequential Backward Selection.
● Part-4 Statistical feature engineering: Mean, Median, Mode etc. based feature vector creation.
● Part-5 Multidimensional Scaling, Matrix Factorization Techniques.

14
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

15
https://www.youtube.com/watch?v=AOfzlVi-NJs
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

16
https://www.youtube.com/watch?v=AOfzlVi-NJs
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

Scaling:

● Numerical features of the dataset do not have a certain range and they
differ from each other.
● Can’t expect age and income columns to have the same range.
● But from the machine learning point of view, how these two columns can be
compared?
● The continuous features become identical in terms of the range, after a
scaling process

17
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

Scaling:

Two common ways of scaling: value normalized

0 2 0.23
1) Normalization 1 45 0.63
2 -23 0.00
3 85 1.00
X_norm = (X - X_min)/(X_max - X_min) 4 28 0.47
5 2 0.23
Normalization (or min-max normalization) scale all values in a ﬁxed 6 35 0.54
range between 0 and 1. 7 -12 0.10
18 data= {'value':[2,45, -23, 85, 28, 2, 35, -12]}
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

Scaling:

2) Standardization or Z-Score Normalization:

Transformation of features by subtracting from mean and dividing by standard deviation.

This is often called as Z-score.
X_new = (X - mean)/Standard deviation

Standardisation is more robust to outliers, and in many cases, it is preferable over Max-Min
Normalization

19
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

Scaling:

2) Standardization or Z-Score Normalization:

The standard deviation formula may look confusing, but it will make sense after we break it down.
Step 1: Find the mean.
Step 2: For each data point, ﬁnd the square of its distance to the mean.
Step 3: Sum the values from Step 2.
Step 4: Divide by the number of data points.
Step 5: Take the square root.
20
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

Scaling: 2) Standardization or Z-Score Normalization: click here

21
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

Scaling: 2) Standardization or Z-Score Normalization:

University Question: Link of paper

Consider a vector x = (23, 29, 52, 31, 45, 19, 18, 27) Apply feature scaling and ﬁnd out min-max scaled
values as well as z-score values.

22
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

Scaling:

2) Standardization or Z-Score Normalization:

●A z-score tells us how many standard deviations away a value is from the mean.
●If a value has a z-score equal to 0, then the value is equal to the mean.
●If a value has a z-score equal to -1.3, then the value is 1.3 standard deviations below the
mean.
●If a value has a z-score equal to 2.2, then the value is 2.2 standard deviations above the

23
mean.
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

24
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

Though the data is in the correct format, there is a case that some values are missing, Suppose we have
the data of students and one of the features is landline contact no. Now for some of the records values
are missing for this feature that is landline contact no. this is because of the fact that some students
don't have landline contact no. at home.
Another case of missing data can be due to the process of collecting data. For example initially for
storing students data only one region is considered so pincode is not considered initially. Later on when
we want to expand the dataset for all the regions pincode is missing which is a necessary feature.

25
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

Types of missing data:

1. Missing Completely at Random (MCAR)
2. Missing at Random (MAR)
3. Missing Not at Random (MNAR)

26
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

1. Missing Completely at Random (MCAR)

● There’s no relationship between whether a data
point is missing and any values in the data set
● For example, the thermometer cannot measure
temperature as it has been damaged. So
temperature data is missing.
● The missing data are nothing but a random
subset of the data.
● Other variables are not affected by the
missingness.
● It rarely happens that data is MCAR.
27
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

2.Missing at Random (MAR)

● Missing at Random means the data is missing
relative to the observed data.
● It is not related to the speciﬁc missing values.
● For example, a student cannot take admission
because his/her score is less than the merit
score.
● The data is not missing across all observations
but only within sub-samples of the data.
● We could easily notice that IQ score is missing
for youngsters (<40)
28
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

3.Missing Not at Random (MNAR)

● It is nor Type I neither Type II , and the data will
be missing based on the missing column itself

● The MNAR category applies when the missing

data has a structure to it. In other words, there
appear to be reasons the data is missing.

● The fact that data are missing on IQ score with

only the people having a low score

29
Part-1
Preprocessing of data:
Concept of Feature Normalization and Scaling Standardization Managing missing values

So how can we handle missing data?

Some ways to handle missing data are,

● Deleting the record with missing data
● Replacing missing data with constant
● Imputation.(Mean, Median etc for numerical variables.)

30
Unit –II : Feature Engineering

● Part-1 Concept of Feature, Preprocessing of data: Normalization and Scaling, Standardization,

Managing missing values
● Part-2 Introduction to Dimensionality Reduction, Principal Component Analysis (PCA)Feature
Pattern
Extraction: Kernel PCA, Local Binary Pattern.
● Part-3 Introduction to various Feature Selection Techniques, Sequential Forward Selection,
Sequential Backward Selection.
● Part-4 Statistical feature engineering: Mean, Median, Mode etc. based feature vector creation.
● Part-5 Multidimensional Scaling, Matrix Factorization Techniques.

31
Introduction to Dimensionality Reduction

● Dimensionality reduction is the process of reducing the number of random variables under
consideration, by obtaining a set of principal variables.

● In machine learning classification problems, there are often too many factors on the basis of which
the final classification is done.
● These factors are basically variables called features.
● The higher the number of features, the harder it gets to visualize the training set and then work on
it.
● Sometimes, most of these features are correlated, and hence redundant. This is where dimensionality
reduction algorithms come into play.

32
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction

33
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction

There are two components of dimensionality reduction:

Selection: Choosing a subset of the original pool of features.
Extraction: Getting useful features from existing data.

34
https://www.geeksforgeeks.org/dimensionality-reduction/
Why Feature Selection?

1
Introduction to Dimensionality Reduction

Feature selection: In this, we try to ﬁnd a subset of the original set of variables, or features, to get a
smaller subset which can be used to model the problem. It usually involves three ways:

○ Filter
○ Wrapper
○ Embedded

36
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction : Feature selection

Filter Method:

○ These methods use statistical measures to rank

features based on their relevance to the target
variable.
○ Features with high scores are considered more
important.
○ Common ﬁlter methods include Pearson correlation,
Chi-square test, and Information Gain
37
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction : Feature selection

Wrapper Method:

○ In wrapper methodology, selection of features is

done by considering it as a search problem, in
which different combinations are made, evaluated,
and compared with other combinations.
○ It trains the algorithm by using the subset of
features iteratively.
○ Forward Selection, Backward Elimination etc.
38
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction : Feature selection

Embedded Method:

○ These approaches combine feature selection with

the model training process.
○ The model itself decides which features are
essential and which ones can be discarded.
○ Lasso and Ridge regression are examples of
embedded methods.

39
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction : Feature selection

40
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction : Feature selection

41
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction : Feature selection

Filter method Wrapper method

Measure the relevance of features with the Measure the usefulness of a subset of
dependent variable feature

This method is fast and is computationally This method is slow and is computationally
less expensive more expensive

Useful for large datasets Useful for small datasets

Might fail to find the best subset of features Always provide the best subset of features

Avoid overfitting Prone to overfitting

42
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction

Feature Extraction: By ﬁnding a smaller set of new variables, each being a combination of the input
variables, containing basically the same information as the input variables.
The various methods used for dimensionality reduction include:
● Principal Component Analysis (PCA)
● Kernel PCA
● Linear Discriminant Analysis (LDA)

https://www.analyticsvidhya.com/blog/2021/04/guide-for-feature-extraction-techniques/

43
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction

Methods of Dimensionality Reduction

The various methods used for dimensionality reduction include:
● Principal Component Analysis (PCA)
○ Unsupervised algorithm useful for dimensionality reduction.
● Linear Discriminant Analysis (LDA)
● Generalized Discriminant Analysis (GDA)

44
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction

Methods of Dimensionality Reduction

The various methods used for dimensionality reduction include:
● Principal Component Analysis (PCA)
● Linear Discriminant Analysis (LDA)
○ Projects data in such a way that separability is maximised.
● Generalized Discriminant Analysis (GDA)

45
https://www.geeksforgeeks.org/dimensionality-reduction/
Introduction to Dimensionality Reduction

Methods of Dimensionality Reduction

The various methods used for dimensionality reduction include:
● Principal Component Analysis (PCA)
● Linear Discriminant Analysis (LDA)
● Generalized Discriminant Analysis (GDA)
○ It is effective approach for extracting nonlinear features

46
https://www.geeksforgeeks.org/dimensionality-reduction/
Feature Engineering
What is a Feature Engineering?

● Feature engineering is the pre-processing step of machine learning,

https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering
What is a Feature Engineering?

https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering
Feature Creation
Feature Engineering Process:

Transformations

Feature Extraction
49

Feature Selection

https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering
Feature Creation
Feature Engineering Process:

Transformations

Feature Extraction
50

Feature Selection

https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering
Feature Engineering Process:
Feature Creation
● Feature creation is ﬁnding the most useful
variables to be used in a predictive model.
Transformations

Feature Extraction
51

Feature Selection

https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering
Feature Engineering Process:
Feature Creation
● The transformation step of feature engineering
involves adjusting the predictor variable to improve
Transformations the accuracy and performance of the model. For
example, it ensures that the model is ﬂexible to

Feature Extraction take input of the variety of data; it ensures that all
52
the variables are on the same scale, making the
model easier to understand.
Feature Selection

https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering
Feature Creation
Feature Engineering Process:

Transformations ● Feature extraction is an automated feature

engineering process that generates new variables
by extracting them from the raw data.
Feature Extraction
● Feature extraction 53 methods include cluster
analysis, text analytics, edge detection algorithms,
Feature Selection
and principal components analysis (PCA).

https://www.javatpoint.com/feature-engineering-for-machine-learning
Feature Engineering
Feature Creation

Transformations
Feature Engineering Process:

Feature Extraction ● Feature selection is a way of selecting the subset of

54
the most relevant features from the original
features set by removing the redundant, irrelevant,
Feature Selection
or noisy features.
https://www.javatpoint.com/feature-engineering-for-machine-learning
Benefits of Feature Engineering
● It helps in avoiding the curse of dimensionality.

● It helps in the simpliﬁcation of the model so that the

researchers can easily interpret it.
● It reduces the training time.
● It reduces overﬁtting hence enhancing
55
the generalization.

https://www.javatpoint.com/feature-engineering-for-machine-learning
Overfitting and Underfitting
Overfitting Underfitting
Y Y

X X
• Sphere
• Play Ball
• Cannot Eat
• Radius : 5cm Sphere
10 cm
Ball ????
Principal Component Analysis
2
PC1

1 PC2
Covariance Matrix
Eigenvalues and Eigenvectors
Eigenvalues and Eigenvectors
Eigenvalues and Eigenvectors
● For a square matrix, A, a non-zero vector is called an eigenvector if multiplication by A results
in a scalar multiple of

A*x = λ*x

● The scalar λ is called the eigenvalue associated with the Eigenvector.

● There are n eigenvalues ( λ1, λ2 ... λn ) exist for a n Î n matrix. The eigenvalues are
calculated by using the formula:

| A-λ*I= 0 |

● Where A is the covariance matrix and I is Identity matrix.

● The determinant of the resulting matrix, results into polynomial of order n.
Eigenvalues and Eigenvectors
● The determinant of the resulting matrix, results into polynomial of order n.

● By setting this polynomial equal to zero and solving for λ the desired

eigenvalues are generated.

● Here n solutions are generated; it means that neigen values are derived.

● It is not essential that all eigenvalues are unique

Eigenvalues and Eigenvectors For PCA
WORKING OF PCA
WORKING OF PCA
WORKING OF PCA
WORKING OF PCA
WORKING OF PCA
WORKING OF PCA
WORKING OF PCA
WORKING OF PCA
WORKING OF PCA
PCA in Face Recognition
Python code PCA
Kernel Tick https://www.youtube.com/watch?v=vMmG_7JcfIc

82
Kernel Tick

83
Kernel Tick

84
Kernel Tick

85
Kernel Tick

86
Kernel Tick

87
Kernel Tick

88
Kernel Tick

89
Kernel Tick

90
Kernel Tick

91
Kernel Tick

92
Kernel Tick

93
Kernel Tick
● Let us say that we have two points, x= (2, 3, 4) and y= (3, 4, 5)
● As we have seen, K(x, y) = < f(x), f(y) >.
● Let us ﬁrst calculate < f(x), f(y) >
○ f(x)=(x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)
○ f(y)=(y1y1, y1y2, y1y3, y2y1, y2y2, y2y3, y3y1, y3y2, y3y3)
● so,
○ f(2, 3, 4)=(4, 6, 8, 6, 9, 12, 8, 12, 16)and
○ f(3 ,4, 5)=(9, 12, 15, 12, 16, 20, 15, 20, 25)
● so the dot product,
○ f (x). f (y) = f(2,3,4) . f(3,4,5)=
○ (36 + 72 + 120 + 72 +144 + 240 + 120 + 240 + 400)=1444
94
Kernel Tick
● Another way x= (2, 3, 4) and y= (3, 4, 5)
● K(x, y) =
○ (2*3 + 3*4 + 4*5) ^2
○ =(6 + 12 + 20)^2
○ =38*38
○ =1444.

95
Why Kernel Tick?
● This as we ﬁnd out, f(x).f(y) and K(x, y) give us the same

result

● the ﬁrst method required a lot of calculations(because of

projecting 3 dimensions into 9 dimensions)

● using the kernel, it was much easier.

96
Types of Kernel

● Linear Kernel
● Polynomial Kernel
● Exponential Kernel
● Gaussian Kernel
● Sigmoid Kernel
● And Many more
97
Types of Kernel - Linear Kernel

● Let us say that we have two vectors with name x1 and Y1, then

the linear kernel is deﬁned by the dot product of these two

vectors:

● K(x1, x2) = x1 . x2

98
Types of Kernel - Polynomial Kernel

● A polynomial kernel is deﬁned by the following equation:

● K(x1, x2) = (x1 . x2 + 1)d,

● Where, d is the degree of the polynomial and x1 and x2 are

vectors

99
Types of Kernel - Gaussian Kernel
● This kernel is an example of a radial basis function kernel.

● The given sigma plays a very important role in the performance

of the Gaussian kernel and should neither be overestimated and

nor be underestimated, it should be carefully tuned according

to the problem.
100
Types of Kernel - Exponential Kernel
● This is in close relation with the previous kernel i.e. the

Gaussian kernel with the only difference is – the square of the

norm is removed.

● The function of the exponential function is:

101
Linear PCA Vs Kernel PCA

1
1
Kernel PCA
● PCA is a linear method. It works great for linearly separable
datasets.
● However, if the dataset has non-linear relationships, then it
produces undesirable results.
● Kernel PCA is a technique which uses the so-called kernel trick
and projects the linearly inseparable data into a higher
dimension where it is linearly separable.
● There are various kernels that are popularly used; some of them
are linear, polynomial, RBF, and sigmoid 1
Kernel PCA
Step 1:
● First choose a kernel functions k(x_i, x_j) and let T be any
transformation to a higher dimension.
Step 2:
● Find the covariance matrix of data, But here, kernel function is
used to calculate this matrix. It is the matrix that results from
applying kernel function to all pairs of data.

K=T(X)T(X)^T 1
Kernel PCA
Step 3:

● Center the kernel matrix (this equivalent to subtract the mean

of the transformed data and dividing by standard deviations)

● K_new = K - 2(I)K + (I)K(I)

● where I is a matrix that its all elements are equal to i/d.

1
Kernel PCA
Step 4:
● find eigenvectors and eigenvalues of this matrix.
● Sort eigenvectors based on their corresponding eigenvalues in
a decreasing order.
● choose the number of dimensions that needed in reduced
dataset, let's call it k.
● choose our first k eigenvectors and concatenate them in one
matrix.
● Finally, Calculate the product of that matrix with your data.
The result will be reduced dataset. 1
Kernel PCA
Step 4:
● find eigenvectors and eigenvalues of this matrix.
● Sort eigenvectors based on their corresponding eigenvalues in
a decreasing order.
● choose the number of dimensions that needed in reduced
dataset, let's call it k.
● choose our first k eigenvectors and concatenate them in one
matrix.
● Finally, Calculate the product of that matrix with your data.
The result will be reduced dataset. 1
Local Binary Pattern
● Local Binary Pattern (LBP) is a simple but powerful technique

used in machine learning to analyze textures and patterns in

images.

● It's especially useful for tasks like image classiﬁcation, face

recognition, and texture analysis.

1
Local Binary Pattern
● It combined statistical and structural methods and was ﬁrst

described in 1994.

● The Local Binary Pattern is a technique of local

representation of a picture.

● It comprises relative values by comparing each pixel with its

neighboring pixels.
1
Local Binary Pattern

1
Local Binary Pattern

● Binary value = 11100010

● Decimal Value = 226.
● It indicates that all these pixels around the central value equal
to 226. 1
Local Binary Pattern

● Binary value = 11100010

● Decimal Value = 226

1
Local Binary Pattern
The original LBP operator labels the pixels of an image with decimal
numbers, called Local Binary Patterns which encode the local
structure around each pixel.

1. Each pixel is compared with its eight neighbors in a 3x3

neighborhood by subtracting the center pixel value.
2. The resulting strictly negative values are encoded with 0 and
the others with 1.
3. A binary number is obtained by concatenating all these binary
codes in a clockwise direction starting from the top-left one
and its corresponding decimal value is used for labeling. 1
The LBP descriptor is deﬁned as a grey-scale invariant texture
measure derived from a general deﬁnition of texture in a local
neighborhood.

115
Unit –II : Feature Engineering

● Part-1 Concept of Feature, Preprocessing of data: Normalization and Scaling, Standardization,

139
Feature Selection

• ﬁnd a smaller subset of a many-dimensional data set to

create a data model
• ﬁnding k features of the d dimensions that give us the most
information and discard the other (d − k) dimensions.
• Subset selection is one of the widely used method
Forward Selection
o It starts with no variables or null model.
o In next step it will add one by one feature which is not
already considered before.
o At each step after adding the one feature the error is
checked.
o The process is continuing until it will ﬁnd the subset of
features that decreases the error the most, or until any
further addition does not decrease the error.
Algorithm -Forward Selection
Algorithm -Backward Elimination
1. Start with F containing all features
2. Remove one attribute from F that causes the least error

3. Stop if removing a feature does not decrease the error

Comment
The complexity of backward search has the same order of
complexity as forward search, except that training a system with
more features is costlier than training a system with fewer
features, and forward search may be preferable especially if we
expect many useless features.
148
149
Unit –II : Feature Engineering

● Part-1 Concept of Feature, Preprocessing of data: Normalization and Scaling, Standardization,

150
Statistical Feature Engineering
Feature engineering refers to a process of selecting & transforming variables/features in
your dataset when creating a predictive model using machine learning.
• Therefore you have to extract the features from the raw dataset you have collected before
training your data in machine learning algorithms.
• Feature engineering has two goals:
– Preparing the proper input dataset, compatible with the machine learning algorithm
requirements.
– Improving the performance of machine learning models.

151
Feature Vector Creation
● There are various techniques for creating feature vectors, each
tailored to diﬀerent types of data and tasks.
● Here are some common techniques for creating feature vectors
○ Counter based
○ Mean
○ Median
○ Mode
● Other are: LBP, MDS, Label encoding, One hot Encoding,TF IDF,
Histograms etc. 1
Counter based Vectorization

1
Counter based Vectorization
● Counter-based feature vector creation is a technique that
involves counting the occurrences of certain elements or
events in a dataset and representing these counts as features
in a vector format.
● This technique is commonly used in natural language
processing (NLP) for text analysis, where words or phrases are
counted to create feature vectors
1
Counter based Vectorization
● Original Data:
○ Review 1: "The product is great and durable."
○ Review 2: "I am satisﬁed with this purchase."
○ Review 3: "Not worth the money, very disappointed."
● consider the words "product," "great," "durable," "satisﬁed," "purchase,"
"worth," "money," and "disappointed" as our vocabulary.
● Arrange in Alphabetical Order

1
Counter based Vectorization

● consider the words "product," "great," "durable," "satisﬁed," "purchase,"

"worth," "money," and "disappointed" as our vocabulary.
● "disappointed", "durable”, "great", "money", "product", “purchase",
"satisfied", “worth”
● Feature Vectors:
○ Review 1 Feature Vector: [0,1,1,0,1,0,0,0]
○ Review 2 Feature Vector: [0,0,0,0,0,1,1,0]
○ Review 3 Feature Vector: [1,0,0,1,0,0,0,1]
1
●
Mean-Based Feature Extraction
● Mean-based feature extraction involves taking the average of
specific attributes or measurements for a set of data points.
● This can be useful when you want to represent the typical or
average value of certain characteristics within a group.
● It creates a new feature that captures the overall performance of
● This can be useful in cases where you want to simplify the
representation of data or when the average behavior of a group is
of interest. 1
Mean based Vectorization
● Original Data:
○ Student 1: Math = 85, English = 75, Science = 90
○ Student 2: Math = 70, English = 80, Science = 85
○ Student 3: Math = 95, English = 92, Science = 88
● Calculate Mean of each instance
● Feature Vectors:
○ New Feature for Student 1: Mean Score = 83.33
○ New Feature for Student 2: Mean Score = 78.33
○ New Feature for Student 3: Mean Score = 91.67
1
Median-Based Feature Extraction
● Mean-based feature extraction involves that captures the middle
value for a set of data points.
● The median is useful when you want to understand the typical or
central value while being less affected by extreme values
(outliers).
● It is particularly relevant when dealing with data that might have
outliers or skewed distributions, as it provides a more robust
measure of central tendency compared to the mean. 1
Median based Vectorization
● Original Data:
○ Student 1: Math = 85, English = 75, Science = 90
○ Student 2: Math = 70, English = 80, Science = 85
○ Student 3: Math = 95, English = 92, Science = 88
● Calculate Mean of each instance
● Feature Vectors:
○ New Feature for Student 1: Median Score = 85
○ New Feature for Student 2: Median Score = 80
○ New Feature for Student 3: Median Score = 92
1
Mode-Based Feature Extraction
● It involves calculating the mode, which is the most frequently
occurring value, of certain attributes or measurements for a group
of data points
● The mode is useful when you want to identify the most common
attribute value in a dataset.
● Mode-based feature extraction is particularly relevant when
dealing with categorical data or discrete variables, where you're
interested in identifying the most typical or popular value within a
group. 1
Mode based Vectorization
● Original Data:
○ Student 1: Math = 85, English = 75, Science = 90
○ Student 2: Math = 70, English = 80, Science = 85
○ Student 3: Math = 95, English = 92, Science = 88
● Calculate Mean of each instance
● Feature Vectors:
○ New Feature for Student 1: Mode Score = None (No mode)
○ New Feature for Student 2: Mode Score = None (No mode)
○ New Feature for Student 3: Mode Score = None (No mode)
1
Multidimensional Scaling
● Multidimensional Scaling (MDS) is a way to show how different
things are from each other.
● Imagine you have things like colors, faces, or even opinions
about politics.
● MDS helps us see how similar or different these things are by
putting them on a graph.
● Things that are very similar are close together on the graph, and
things that are less similar are farther apart
1
Multidimensional Scaling
● MDS can also help us with a tricky problem.
● Imagine you have a lot of information about things, but it's hard
to understand because there's too much.
● MDS can simplify this by making the information simpler, like
turning a big puzzle into a smaller one.
● This smaller puzzle still keeps the important parts of the big
one.

1
Multidimensional Scaling
● The "multi" part means that MDS is not just for
two-dimensional pictures.
● It can work for 3D, 4D, or even more dimensions. This is like
having more layers in your graph.
● People use MDS in many diﬀerent areas. It's like a tool that can
help us understand things better.

1
Multidimensional Scaling
● The term scaling comes from psychometrics, where abstract
concepts (“objects”) are assigned numbers according to a rule
● For example, you may want to quantify a person’s attitude to
global warming. You could assign a “1” to “doesn’t believe in
global warming”, a 10 to “ﬁrmly believes in global warming”
and a scale of 2 to 9 for attitudes in between.

1
Multidimensional Scaling
● You can also think of “scaling” as the fact that you’re essentially
scaling down the data (i.e. making it simpler by creating
lower-dimensional data).
● Data that is scaled down in dimension keeps similar properties.
For example, two data points that are close together in
high-dimensional space will also be close together in
low-dimensional space

1
Multidimensional Scaling
For example, if you had a list
of cities and only knew how
far apart they are, MDS
could help you create a map
that shows their distances
and positions, even if you
don't know exactly where
they are.

1
Multidimensional Scaling
Step 1: Assign a number of points to coordinates in n-dimensional
space.
● N-dimensional space could be 2-dimensional, 3-dimensional, or
higher spaces (at least, theoretically, because 4-dimensional
spaces and above are diﬃcult to model). The orientation of the
coordinate axes is arbitrary and is mostly the researcher’s
choice.
● For maps like the one in the simple example above, axes that
represent north/south and east/west make the most sense
1
Multidimensional Scaling
Step 2: Calculate Euclidean distances for all pairs of points.

● The Euclidean distance is the “as the crow ﬂies”

straight-line distance between two points x and y in
Euclidean space. It’s calculated using the Pythagorean
theorem (c2 = a2 + b2),
● although it becomes somewhat more complicated for
n-dimensional space This results in the similarity matrix.

1
Multidimensional Scaling
Step 3: Compare the similarity matrix with the original input
matrix by evaluating the stress function.

● Stress is a goodness-of-ﬁt measure, based on diﬀerences

between predicted and actual distances.
● In his original 1964 MDS paper, Kruskal wrote that ﬁts close to
zero are excellent, while anything over 0.2 should be
considered “poor”.
●
1
Multidimensional Scaling
Step 4: Adjust coordinates, if necessary, to minimize stress.

1
Why MDS?

1
Types of MDS

1
Types of MDS
Metric MDS :
● also known as Principal Coordinate Analysis (PCoA).
● Make sure not to confuse it with Principal Component Analysis (PCA), a
separate yet similar technique.
● Metric MDS attempts to model the similarity/dissimilarity of data by
calculating distances between each pair of points using their geometric
coordinates.
● The key here is the ability to measure a distance using a linear scale.
● Eg if the distance between two points is 10 units, it means they're twice as
far apart as two points that are only 5 units apart.
1
Types of MDS
Non Metric MDS :
● It is used when you have data with ordered values, like ratings.
● It's about showing the relationships between items based on their order,
rather than exact distances.
● Imagine you asked people to rate products from 1 to 5.
● In non-metric MDS, the focus is on the order of ratings (1 < 2 < 3 < 4 < 5)
rather than the actual numerical diﬀerence between them.
● It helps create a map that captures the ranking relationships between
items, even if you can't say exactly how much better one item is compared to
another.
1
MDS for Face Recognization

1
Matrix Factorization
● The goal here is expressing a matrix as the product of two smaller
matrices.
● In the image below the blue matrix is your data where each row is a
sample and each column is a feature.
● The Archetypes are the simplest forms you are going to use to
reconstruct your data

1
Matrix Factorization
● One row of your data will be expressed as a linear combination of your
archetypes.
● The coeﬃcients of your linear combination (in red in the image) are
your low dimensional representation. And that is basically Matrix
Factorization.

1
Matrix Factorization

1
Matrix Factorization
● In the above graph, on the left-hand side, we have cited individual preferences, wherein 4 individuals
have been asked to provide a rating on safety and mileage.
● Cars are been rated based on the number of features (items) they oﬀer. A ranking of 4 implies high
features, and 1 depicts fewer features.
● The blue colored ? mark is the sparse value, wherein either person does not know about the car or is
not part of the consideration list for buying the car or has forgotten to rate.

1
Thank You !!!

186
Unit –II : Feature Engineering

● Part-1 Concept of Feature, Preprocessing of data: Normalization and Scaling, Standardization,

187

ML Unit 2
No ratings yet
ML Unit 2
90 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Feature Engineering for BE Students
No ratings yet
Feature Engineering for BE Students
91 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
No ratings yet
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
69 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
No ratings yet
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
6 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Session 7 Feature Selection & Dimensionality Reduction
No ratings yet
Session 7 Feature Selection & Dimensionality Reduction
20 pages
Summery of Feature Eng
No ratings yet
Summery of Feature Eng
4 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
139 pages
DM - MOD - 1 Part III
No ratings yet
DM - MOD - 1 Part III
12 pages
Model Selection and Feature Engineering
No ratings yet
Model Selection and Feature Engineering
64 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Unit 2 Feature Engineering
No ratings yet
Unit 2 Feature Engineering
64 pages
5 Preprocessing
No ratings yet
5 Preprocessing
44 pages
Feature Engineering in ML Guide
No ratings yet
Feature Engineering in ML Guide
6 pages
ML Lec 4
No ratings yet
ML Lec 4
9 pages
AIPPTMaker - Data Preprocessing and Feature Engineering - Key To Improving AI Algorithm Performance
No ratings yet
AIPPTMaker - Data Preprocessing and Feature Engineering - Key To Improving AI Algorithm Performance
35 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
Life Lesson
No ratings yet
Life Lesson
13 pages
Data
No ratings yet
Data
36 pages
Data Preprocessing Guide
No ratings yet
Data Preprocessing Guide
19 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Unit 1
No ratings yet
Unit 1
8 pages
Explore Feature Engineering
No ratings yet
Explore Feature Engineering
10 pages
What Is A Feature?: 5.5M 732 Oops Concepts in Java
No ratings yet
What Is A Feature?: 5.5M 732 Oops Concepts in Java
20 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
Ai - Foundations of Machine Learning III
No ratings yet
Ai - Foundations of Machine Learning III
98 pages
NN 7
No ratings yet
NN 7
26 pages
AI-Module 4 - Updated
No ratings yet
AI-Module 4 - Updated
53 pages
ML Self Unit 2
No ratings yet
ML Self Unit 2
20 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
86 pages
Data Mining Lab Guide
No ratings yet
Data Mining Lab Guide
58 pages
AI Feature Engineering in Detail
No ratings yet
AI Feature Engineering in Detail
12 pages
Feature Engineering Essentials
No ratings yet
Feature Engineering Essentials
19 pages
Feature Engineering PDF
No ratings yet
Feature Engineering PDF
19 pages
ML - Unit-2 FULL - Feature Engineering Theory-13!09!24-1
No ratings yet
ML - Unit-2 FULL - Feature Engineering Theory-13!09!24-1
29 pages
Feature Engineering
No ratings yet
Feature Engineering
15 pages
Deep Learning Vocabulary
No ratings yet
Deep Learning Vocabulary
6 pages
2 - Machine Learning - 130824
No ratings yet
2 - Machine Learning - 130824
81 pages
Feature Engineering
No ratings yet
Feature Engineering
2 pages
Feature Engineering: Getting The Most Out of Data For Predictive Models
No ratings yet
Feature Engineering: Getting The Most Out of Data For Predictive Models
75 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
ML Unit 2 Part 2
No ratings yet
ML Unit 2 Part 2
23 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
Feature Engineering and Normalization
No ratings yet
Feature Engineering and Normalization
7 pages
ML-Unit 3
No ratings yet
ML-Unit 3
58 pages
UNIT 2 DT
No ratings yet
UNIT 2 DT
8 pages
ML Micro U2 Insem
No ratings yet
ML Micro U2 Insem
4 pages
Feature Engineering Overview
No ratings yet
Feature Engineering Overview
1 page
Data Preprocessing
No ratings yet
Data Preprocessing
49 pages
UNIT04
No ratings yet
UNIT04
35 pages
Hindi Handwritten Character Recognition Using Deep Learning
No ratings yet
Hindi Handwritten Character Recognition Using Deep Learning
7 pages
Chess Analysis
100% (1)
Chess Analysis
8 pages
A New Context-Based Feature For Classification of Emotions in Photographs
No ratings yet
A New Context-Based Feature For Classification of Emotions in Photographs
30 pages
APRAAR
No ratings yet
APRAAR
9 pages
Feature Extraction
No ratings yet
Feature Extraction
70 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Nepali News Classification
No ratings yet
Nepali News Classification
5 pages
Visual Based Product Identification For Blind: Project Report On
No ratings yet
Visual Based Product Identification For Blind: Project Report On
23 pages
A Fast Machine Learning Model For ECG-Based Heartbeat Classification and Arrhythmia Detection
No ratings yet
A Fast Machine Learning Model For ECG-Based Heartbeat Classification and Arrhythmia Detection
11 pages
Chinese Comments Sentiment Classification Based On Word2vec and SVM
No ratings yet
Chinese Comments Sentiment Classification Based On Word2vec and SVM
7 pages
A Hybrid Approach Handwritten Character Recognition For Mizo Using Artificial Neural Network
No ratings yet
A Hybrid Approach Handwritten Character Recognition For Mizo Using Artificial Neural Network
6 pages
AI-Driven Project Risk Management
No ratings yet
AI-Driven Project Risk Management
11 pages
Visualization and Understanding Cnns
No ratings yet
Visualization and Understanding Cnns
27 pages
Two-Layer Gaussian Process Regression With Example Selection For Image Dehazing
No ratings yet
Two-Layer Gaussian Process Regression With Example Selection For Image Dehazing
13 pages
High Precision 6-DoF Grasp Detection in Cluttered Scenes Based On Network Optimization and Pose Propagation
No ratings yet
High Precision 6-DoF Grasp Detection in Cluttered Scenes Based On Network Optimization and Pose Propagation
8 pages
Mini Project
No ratings yet
Mini Project
43 pages
Face Recognition Using Back Propagation Neural Network
No ratings yet
Face Recognition Using Back Propagation Neural Network
1 page
01208315
No ratings yet
01208315
10 pages
Classification of Painting Style
No ratings yet
Classification of Painting Style
9 pages
Digital Image Forgeries and Passive Image Authentication Techniques: A Survey
100% (1)
Digital Image Forgeries and Passive Image Authentication Techniques: A Survey
18 pages
Comparing Categorical Encoding Methods
No ratings yet
Comparing Categorical Encoding Methods
11 pages
yO5PJdPFShyuTyXTxbocww - Feature Engineering - Course Summary
No ratings yet
yO5PJdPFShyuTyXTxbocww - Feature Engineering - Course Summary
6 pages
Data Management Technologies and Applications 6th International Conference DATA 2017 Madrid Spain July 24 26 2017 Revised Selected Papers Joaquim Filipe Online Version
100% (1)
Data Management Technologies and Applications 6th International Conference DATA 2017 Madrid Spain July 24 26 2017 Revised Selected Papers Joaquim Filipe Online Version
60 pages
Sample Paper of Acm
No ratings yet
Sample Paper of Acm
8 pages
002-Supervised Learning Setup 00 W2L1
No ratings yet
002-Supervised Learning Setup 00 W2L1
18 pages
Rahma Et Al - Recognize Assyrian Cuneiform Characters by Virtual Dataset
No ratings yet
Rahma Et Al - Recognize Assyrian Cuneiform Characters by Virtual Dataset
7 pages
Big Data and Data Science in Critical Care
No ratings yet
Big Data and Data Science in Critical Care
10 pages
Text Classification with Scikit-Learn
No ratings yet
Text Classification with Scikit-Learn
75 pages
An Empirical Study of The Naive Bayes Classifier: I. Rish
No ratings yet
An Empirical Study of The Naive Bayes Classifier: I. Rish
6 pages
Competitive Learning Neural Networks
100% (1)
Competitive Learning Neural Networks
10 pages