Lecture 1.
• Normalization and Standardization
• Overfitting and Underfitting
Dr. Mainak Biswas
Normalization and Standardization
• Both Normalization and Standardization are
techniques used to adjust the scale of features
in a dataset
• They are crucial in machine learning to ensure
that all features contribute equally to the
model and prevent any feature from
dominating due to its scale
Dr. Mainak Biswas
Normalization
• Normalization (also called Min-Max Scaling) is the
process of transforming features such that they lie within
a specific range, typically [0, 1] or [-1, 1]
• This is done by scaling the data to a fixed range based on
the minimum and maximum values of the feature
• Formula:
𝑥 − min(𝑥)
𝑥′ =
max 𝑥 − min(𝑥)
where x is the original value, min(x)is the minimum value,
and max(x) is the maximum value in the dataset
• Usage: Algorithms like k-Nearest Neighbors (k-NN), and
Neural Networks, which are sensitive to the scale of
features.
Dr. Mainak Biswas
Normalization Example
𝒙 − 𝐦𝐢𝐧(𝒙)
SL Values 𝒙= Normalized
𝐦𝐚𝐱 𝒙 − 𝐦𝐢𝐧(𝒙)
Values
𝟏𝟎 − 𝟏𝟎
1 10 0
𝟓𝟎 − 𝟏𝟎
𝟐𝟎 − 𝟏𝟎
2 20 0.25
𝟓𝟎 − 𝟏𝟎
𝟑𝟎 − 𝟏𝟎
3 30 0.50
𝟓𝟎 − 𝟏𝟎
𝟒𝟎 − 𝟏𝟎
4 40 0.75
𝟓𝟎 − 𝟏𝟎
𝟓𝟎 − 𝟏𝟎
5 50 1.00
𝟓𝟎 − 𝟏𝟎
Dr. Mainak Biswas
Standardization
• Standardization: Transforming data to have a mean of 0 and
a standard deviation of 1 (also known as Z-Score Scaling)
• It centers the data and scales it based on standard
deviation
• Formula:
′
𝑥−𝜇
𝑥 =
𝜎
where 𝜇 is the mean and 𝜎 is the standard deviation of the
dataset
• Usage: Algorithms like Support Vector Machines (SVM),
Logistic Regression, and Principal Component Analysis
(PCA) which assume a normal distribution or work better
with data centered around 0.
Dr. Mainak Biswas
Dr. Mainak Biswas
Standardization Example
SL Values Mean Standard Deviation 𝑥−𝜇 Standardize
𝑥′ =
(𝜇) (𝜎) 𝜎 d Values
(10 𝟏𝟎 − 𝟑𝟎
1 10 𝟏𝟎 − 𝟑𝟎 𝟐 −1.41
+ 𝟏𝟒. 𝟏𝟒
+
20 𝟐𝟎 − 𝟑𝟎 𝟐 𝟐𝟎 − 𝟑𝟎
2 20 + +
−0.71
𝟏𝟒. 𝟏𝟒
30 𝟑𝟎 − 𝟑𝟎 𝟐
𝟑𝟎 − 𝟑𝟎
3 30 + + 0.00
40 𝟏𝟒. 𝟏𝟒
𝟒𝟎 − 𝟑𝟎 𝟐
+ + 𝟒𝟎 − 𝟑𝟎
4 40 50) 𝟓𝟎 − 𝟑𝟎 𝟐
0.71
𝟏𝟒. 𝟏𝟒
/5 𝟓
= 30 = 𝟐𝟎𝟎 = 𝟏𝟒. 𝟏𝟒 𝟓𝟎 − 𝟑𝟎
5 50 1.41
𝟏𝟒. 𝟏𝟒
Dr. Mainak Biswas
Overfitting and Underfitting
• Overfitting and Underfitting are concepts in
machine learning that describe how well a
model generalizes to new data
• They are often indicators of how effectively a
model has learned patterns from the training
data
Dr. Mainak Biswas
Overfitting
• Overfitting occurs when a model learns not only the underlying
patterns in the training data but also the noise and details that do
not generalize to unseen data
• Symptoms
– High accuracy on training data
– Poor performance on validation or test data
• Causes
– Model is too complex (e.g., too many parameters or layers)
– Insufficient training data
– Training for too many epochs without regularization
• Prevention
– Use regularization techniques
– Reduce the model's complexity
– Use more training data or data augmentation
Dr. Mainak Biswas
Underfitting
• Underfitting occurs when a model is too simple to capture the
underlying patterns in the data
• Symptoms
– Poor performance on both training and validation/test data
– Model fails to capture the complexity of the data
• Causes
– Model is too simple
– Insufficient training time
– Features used in the model are not relevant or sufficient
• Prevention
– Use a more complex model
– Train the model for more epochs
– Provide better or more features to the model
Dr. Mainak Biswas
Differences
Aspect Overfitting Underfitting
Performance on Training
High accuracy Low accuracy
Data
Performance on Test Data Poor Poor
Model Complexity Too complex Too simple
Generalization Poor Poor
Dr. Mainak Biswas