Normalization and Calibration
Normalization (Unsupervised Scaling)
Normalization is used to adjust the scale of quantitative features without using target labels.
Purpose:
• Ensures all features contribute equally during model training.
• Useful when features are on different scales.
Methods of Normalization:
1. Z-score Normalization (Standardization)
• Best for normally distributed data.
• Formula:
$$z = \frac{x - \text{mean}}{\text{std deviation}}$$
2. Robust Normalization
• Best for non-normal data.
• Formula:
$$\frac{x - \text{median}}{\text{IQR}}$$
(IQR - interquartile range)
3. Min-Max Normalization
• Scales values to the [0, 1] range.
• Formula:
$$\frac{x - \text{min}}{\text{max} - \text{min}}$$
o Truncation may be used if exact min/max aren't known.
Calibration (Supervised Scaling)
Calibration adjusts feature scaling using target labels, often in binary classification.
Purpose:
• Adds meaningful class information to features.
• Helps models (e.g., linear classifiers) handle categorical/ordinal features more effectively.
How Calibration Works:
For a feature value ( v = F(x) ), create a calibrated feature ( Fc(x) ) that estimates:
( P(\text{positive class} | v) ) → ( Fc: X → [0, 1] )
Benefits:
✔ Makes features suitable for models that depend on probability (e.g., Naive Bayes).
✔ No further training needed after calibration.
✔ Helps the algorithm decide how to use the feature (numerical, categorical, or ordinal).
Examples
Normalization (Unsupervised)
Dataset with "Age" Feature:
Person Age
A 20
B 25
C 30
D 35
Using Min-Max Normalization:
• Min (l) = 20
• Max (h) = 35
• Formula:
$$\frac{\text{Age} - 20}{35 - 20}$$
Normalized Values:
Person Age Normalized Age
A 20 0.00
B 25 0.33
C 30 0.67
D 35 1.00
Calibration (Supervised)
Binary Classification Example - Product Purchase
Age Group Bought Product (1 = Yes, 0 = No)
20–29 2 Yes, 8 No
30–39 7 Yes, 3 No
Probability Estimation for Calibration:
• For 20–29:
$$P(\text{Yes}) = \frac{2}{2+8} = 0.2$$
• For 30–39:
$$P(\text{Yes}) = \frac{7}{7+3} = 0.7$$
Calibrated Feature Table:
Person Age Group Calibrated Value (P(Yes))
A 20–29 0.2
B 30–39 0.7
Final Steps
Now, you can:
1. Copy & paste this into a Word or Google Doc.
2. Add diagrams if needed.
3. Save or export as PDF.
Want me to improve the formatting further or add more details? I'm happy to help!