Features Vs
Target
TYPES OF
MACHINE
LEARNING
1 . SUPERVISED
LEARNING
1 . SUPERVISED
LEARNING
2 . UNSUPERVISED
LEARNING
3 . REINFORCEMENT
LEARNING
Preprocessing
tools
Feauture
Scikit-
learn selection train,
test, split
Model
Algorithms
evaluation
S C I K I T-
LEARN
Scikit-learn, also known as sklearn, is a popular open-source machine learning library in
Python
that provides a wide range of tools for data analysis, modeling, and evaluation.
Sklearn is built on top of NumPy, SciPy, and Matplotlib, and supports integration with
Pandas,
which makes it easy to use in data science workflows.
Sklearn is widely used in the data science community for various applications such as
predictive
modeling, natural language processing, computer vision, and time series forecasting,
among others.
I NS TA L L AT I O
N
IMPOR
T
Feature
Scaling
PREPROCESSING Encoding
Imputing null values
Outlier - detection & Handling
F E AT U R E
SCALING
F E AT U R E
SCALING
Feature scaling is a method used to normalize the range of
features
of data.
Fea ture S c a ling involves m odifying va lues by m ethods like
Normalization or Standardization.
It helps to avoid bias in machine learning
model.
WHY
SCALING?
When dataset has
numerical
fea tures a nd ea c h of them a re in
different scale.
ML m odel ca n put weight
on features with larger scale.
S c a ling helps to c ontribute
a ll features equally.
NORMALIZATI
ON
It is the method of scaling
the
d a t a by fitting the d a t a
points between a range of 0
to 1.
MIN-MAX
SCALER
MinMaxScaler from sklearn perform
normalization
S TA N DA R D I Z AT I
ON
This converts all the d a t a
points
to h a v e a m e a n value of 0
and standard deviation of 1
S TA N D A R D
SCALER
StandardScaler from sklearn perform
standardization
ROBUST
SCALER
This uses interquartile range
so
that it is robust to outliers
ROBUST
SCALER
ROBUST
SCALER
WHICH I S
BETTER?
Normalization:
Useful when the d a t a doesn't follow ga us sia n(norma l)
distrubution Useful in algorithms like KNN, and Neural networks
like CNN, ANN
Standardization:
W hen your d a t a follows gaussian distribution
R ob u st Scaler:
W hen your d a t a has outliers
E N C OD IN
G
ENCODIN
G
Machine learning models c a n only work with numerical
values.
For this reason, it is necessary to transform the categorical values
of
the relevant features into numerical ones.
This process is called feature
encoding.
T Y P E S OF
ENCODING
1. Nominal encoding :
Represent d a t a without an y order or
hierarchy It c a n be done with
OneHotEncoder
2. Ordinal Encoding :
Assigning unique integer based on
rank/order It c a n be done with LabelEncoder
ONEHOT
ENCODER
LABEL
ENCODER