Lecture 1.3

The document discusses normalization and standardization techniques used in machine learning to adjust feature scales, ensuring equal contribution to models. It also covers overfitting and underfitting, which describe how well a model generalizes to new data, along with their symptoms, causes, and prevention strategies. Normalization scales data to a specific range, while standardization transforms data to have a mean of 0 and a standard deviation of 1.

Uploaded by

mous7457

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views11 pages

Lecture 1.3

Uploaded by

mous7457

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Lecture 1.

• Normalization and Standardization

• Overfitting and Underfitting

Dr. Mainak Biswas

Normalization and Standardization
• Both Normalization and Standardization are
techniques used to adjust the scale of features
in a dataset
• They are crucial in machine learning to ensure
that all features contribute equally to the
model and prevent any feature from
dominating due to its scale

Dr. Mainak Biswas

Normalization
• Normalization (also called Min-Max Scaling) is the
process of transforming features such that they lie within
a specific range, typically [0, 1] or [-1, 1]
• This is done by scaling the data to a fixed range based on
the minimum and maximum values of the feature
• Formula:
𝑥 − min⁡(𝑥)
𝑥′ =
max 𝑥 − min⁡(𝑥)
where x is the original value, min(x)is the minimum value,
and max(x) is the maximum value in the dataset
• Usage: Algorithms like k-Nearest Neighbors (k-NN), and
Neural Networks, which are sensitive to the scale of
features.
Dr. Mainak Biswas
Normalization Example
𝒙 − 𝐦𝐢𝐧⁡(𝒙)
SL Values 𝒙= Normalized
𝐦𝐚𝐱 𝒙 − 𝐦𝐢𝐧⁡(𝒙)
Values
𝟏𝟎 − 𝟏𝟎
1 10 0
𝟓𝟎 − 𝟏𝟎
𝟐𝟎 − 𝟏𝟎
2 20 0.25
𝟓𝟎 − 𝟏𝟎
𝟑𝟎 − 𝟏𝟎
3 30 0.50
𝟓𝟎 − 𝟏𝟎
𝟒𝟎 − 𝟏𝟎
4 40 0.75
𝟓𝟎 − 𝟏𝟎

𝟓𝟎 − 𝟏𝟎
5 50 1.00
𝟓𝟎 − 𝟏𝟎

Dr. Mainak Biswas

Standardization
• Standardization: Transforming data to have a mean of 0 and
a standard deviation of 1 (also known as Z-Score Scaling)
• It centers the data and scales it based on standard
deviation
• Formula:
′
𝑥−𝜇
𝑥 =
𝜎
where 𝜇 is the mean and 𝜎 is the standard deviation of the
dataset
• Usage: Algorithms like Support Vector Machines (SVM),
Logistic Regression, and Principal Component Analysis
(PCA) which assume a normal distribution or work better
with data centered around 0.

Dr. Mainak Biswas

Dr. Mainak Biswas
Standardization Example
SL Values Mean Standard Deviation 𝑥−𝜇 Standardize
𝑥′ =
(𝜇) (𝜎) 𝜎 d Values
(10 𝟏𝟎 − 𝟑𝟎
1 10 𝟏𝟎 − 𝟑𝟎 𝟐 −1.41
+ 𝟏𝟒. 𝟏𝟒
+
20 𝟐𝟎 − 𝟑𝟎 𝟐 𝟐𝟎 − 𝟑𝟎
2 20 + +
−0.71
𝟏𝟒. 𝟏𝟒
30 𝟑𝟎 − 𝟑𝟎 𝟐
𝟑𝟎 − 𝟑𝟎
3 30 + + 0.00
40 𝟏𝟒. 𝟏𝟒
𝟒𝟎 − 𝟑𝟎 𝟐
+ + 𝟒𝟎 − 𝟑𝟎
4 40 50) 𝟓𝟎 − 𝟑𝟎 𝟐
0.71
𝟏𝟒. 𝟏𝟒
/5⁡ 𝟓
= 30 = 𝟐𝟎𝟎 = 𝟏𝟒. 𝟏𝟒 𝟓𝟎 − 𝟑𝟎
5 50 1.41
𝟏𝟒. 𝟏𝟒

Dr. Mainak Biswas

Overfitting and Underfitting
• Overfitting and Underfitting are concepts in
machine learning that describe how well a
model generalizes to new data
• They are often indicators of how effectively a
model has learned patterns from the training
data

Dr. Mainak Biswas

Overfitting
• Overfitting occurs when a model learns not only the underlying
patterns in the training data but also the noise and details that do
not generalize to unseen data
• Symptoms
– High accuracy on training data
– Poor performance on validation or test data
• Causes
– Model is too complex (e.g., too many parameters or layers)
– Insufficient training data
– Training for too many epochs without regularization
• Prevention
– Use regularization techniques
– Reduce the model's complexity
– Use more training data or data augmentation

Dr. Mainak Biswas

Underfitting
• Underfitting occurs when a model is too simple to capture the
underlying patterns in the data
• Symptoms
– Poor performance on both training and validation/test data
– Model fails to capture the complexity of the data
• Causes
– Model is too simple
– Insufficient training time
– Features used in the model are not relevant or sufficient
• Prevention
– Use a more complex model
– Train the model for more epochs
– Provide better or more features to the model

Dr. Mainak Biswas

Differences

Aspect Overfitting Underfitting

Performance on Training
High accuracy Low accuracy
Data
Performance on Test Data Poor Poor
Model Complexity Too complex Too simple
Generalization Poor Poor

Dr. Mainak Biswas

Lecture 5
No ratings yet
Lecture 5
26 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
U&O Fitting
No ratings yet
U&O Fitting
6 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
Unit 4
No ratings yet
Unit 4
33 pages
Data Normalization Machine Learning
No ratings yet
Data Normalization Machine Learning
5 pages
Underfitting and Overfitting
No ratings yet
Underfitting and Overfitting
4 pages
DL Unit1
100% (2)
DL Unit1
79 pages
نسخة من prep
No ratings yet
نسخة من prep
17 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Scaling Techniques
No ratings yet
Scaling Techniques
30 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Feature Engineering for BE Students
No ratings yet
Feature Engineering for BE Students
91 pages
Data Preprocessing
No ratings yet
Data Preprocessing
49 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Overfitting Vs Underfitting
No ratings yet
Overfitting Vs Underfitting
3 pages
Overfitting & Underfitting in Machine Learning
No ratings yet
Overfitting & Underfitting in Machine Learning
9 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
Unit II - 2.5 - Overfitting Underfitting at CSJMU - 6 Slides Handouts
No ratings yet
Unit II - 2.5 - Overfitting Underfitting at CSJMU - 6 Slides Handouts
5 pages
DS Notes
No ratings yet
DS Notes
36 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
Data Mining
No ratings yet
Data Mining
33 pages
Feature Scaling
No ratings yet
Feature Scaling
13 pages
ML Overfitting & Underfitting Guide
No ratings yet
ML Overfitting & Underfitting Guide
3 pages
DADM S2 Data Preprocessing-Data Cleaning and Transformation
No ratings yet
DADM S2 Data Preprocessing-Data Cleaning and Transformation
12 pages
Data Preprocessing PT 2
No ratings yet
Data Preprocessing PT 2
7 pages
Underfitting and Overfitting Slides and Transcript
No ratings yet
Underfitting and Overfitting Slides and Transcript
13 pages
Emsemble Methods-Pages-Deleted
No ratings yet
Emsemble Methods-Pages-Deleted
2 pages
Data Preprocessing: Normalize vs. Standardize
No ratings yet
Data Preprocessing: Normalize vs. Standardize
10 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
Week 15
No ratings yet
Week 15
41 pages
My Notes
No ratings yet
My Notes
15 pages
Reserch Papers On Deep Learning Mpgi
No ratings yet
Reserch Papers On Deep Learning Mpgi
6 pages
Underfitting
No ratings yet
Underfitting
13 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Data Pre Processing
No ratings yet
Data Pre Processing
23 pages
Diagnosing Bias Vs Variance
No ratings yet
Diagnosing Bias Vs Variance
11 pages
Machine Learning Notes Anna University
No ratings yet
Machine Learning Notes Anna University
9 pages
Bias and Variance in Machine Learning
No ratings yet
Bias and Variance in Machine Learning
3 pages
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
No ratings yet
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
7 pages
Big Data Normalization for ML
No ratings yet
Big Data Normalization for ML
6 pages
Standardization Campusx
No ratings yet
Standardization Campusx
4 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
5 pages
2022 Scribe Lecture7
No ratings yet
2022 Scribe Lecture7
9 pages
All Models Are Wrong
No ratings yet
All Models Are Wrong
429 pages
M.L L-7 Bias & Variance
No ratings yet
M.L L-7 Bias & Variance
38 pages
Machine Learning Feature Scaling
No ratings yet
Machine Learning Feature Scaling
26 pages
TOPIC 3 Pima Indian
No ratings yet
TOPIC 3 Pima Indian
16 pages
Notes 1
No ratings yet
Notes 1
3 pages
Unit 4
No ratings yet
Unit 4
35 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Class Test1,2,3-Answer Key
No ratings yet
Class Test1,2,3-Answer Key
23 pages
Reading Aloud Choral Reading
No ratings yet
Reading Aloud Choral Reading
8 pages
COM 142 Teaching Outline For SEM 2
No ratings yet
COM 142 Teaching Outline For SEM 2
3 pages
Training Creativity and Innovation Workshop
No ratings yet
Training Creativity and Innovation Workshop
4 pages
CCE Guidelines
No ratings yet
CCE Guidelines
7 pages
EDUC 5240 Written Assignment Unit 4
No ratings yet
EDUC 5240 Written Assignment Unit 4
4 pages
Realization of The Housekeeping Students After Choosing The Housekeeping Track
No ratings yet
Realization of The Housekeeping Students After Choosing The Housekeeping Track
12 pages
Mentor Placement Report Comments
No ratings yet
Mentor Placement Report Comments
1 page
Lesson Plan Writing KSSR SJK Year 4
100% (2)
Lesson Plan Writing KSSR SJK Year 4
12 pages
Grade 8 Comprehensive Science 3 Regular and Advanced
0% (1)
Grade 8 Comprehensive Science 3 Regular and Advanced
37 pages
Immediate Access Financial Institutions Management A Risk Management Approach 11th Edition Saunders Verified PDF Download
No ratings yet
Immediate Access Financial Institutions Management A Risk Management Approach 11th Edition Saunders Verified PDF Download
406 pages
Grade 11 English Home Language Annual Teaching Plan 2024 - Fs - Gde
No ratings yet
Grade 11 English Home Language Annual Teaching Plan 2024 - Fs - Gde
44 pages
PGDip in Bilingual Education BE Program 2024 OUSL
No ratings yet
PGDip in Bilingual Education BE Program 2024 OUSL
4 pages
5a742 Daksha Plan 2025 - English - 30th August
No ratings yet
5a742 Daksha Plan 2025 - English - 30th August
4 pages
C Commission On Higher Education: Republic of The Philippines Office of The President
No ratings yet
C Commission On Higher Education: Republic of The Philippines Office of The President
3 pages
Makatao
No ratings yet
Makatao
5 pages
Bed in Foundation Phase Teaching Assignment 1: Due Date Module Name Module Code NQF Level Examiner Internal Moderator Academic Head Total Marks
No ratings yet
Bed in Foundation Phase Teaching Assignment 1: Due Date Module Name Module Code NQF Level Examiner Internal Moderator Academic Head Total Marks
13 pages
Education Psychology CAT
No ratings yet
Education Psychology CAT
2 pages
Taller Inglés 11.100
100% (1)
Taller Inglés 11.100
3 pages
The Differences Between Academic Track Students Stress
0% (1)
The Differences Between Academic Track Students Stress
50 pages
Reading Assessment Tool For Kinder Ii
100% (1)
Reading Assessment Tool For Kinder Ii
4 pages
Soal Pedagogy
No ratings yet
Soal Pedagogy
34 pages
WJ-IV-Ach Cluster Template
No ratings yet
WJ-IV-Ach Cluster Template
8 pages
Edward LP For Final Demo
No ratings yet
Edward LP For Final Demo
1 page
Result of SSC or Equivalent Examination - 2015: Web Based Result Publication System For Education Boards
No ratings yet
Result of SSC or Equivalent Examination - 2015: Web Based Result Publication System For Education Boards
2 pages
Adp 7-0 Training Units and Developing Leaders August 2012
100% (1)
Adp 7-0 Training Units and Developing Leaders August 2012
28 pages
2025 Economics Grade 12 Research Project Rubric Final
No ratings yet
2025 Economics Grade 12 Research Project Rubric Final
2 pages
JHS 1 CT WK2
No ratings yet
JHS 1 CT WK2
2 pages
Detailed Lesson Plan in MAPEH 8 - Baseball and Softball
75% (8)
Detailed Lesson Plan in MAPEH 8 - Baseball and Softball
4 pages
Mga Alamat
No ratings yet
Mga Alamat
20 pages
Tiếng Anh 11 i-Learn Smart World - Kiểm tra cuối kỳ 2 (2025 Format) - 2
No ratings yet
Tiếng Anh 11 i-Learn Smart World - Kiểm tra cuối kỳ 2 (2025 Format) - 2
6 pages