Exploratory Data Analysis -
1. Overall Data Distribution, shape
2. Target Feature - Result
1. Count plot - Yes/No
2. Check the distribution - imbalance or not? - **Insights** (we will tackle in the modelling Notes-
2)
1. Pair plot with all the numerical features
2. Separate the categorical and numerical features
3. Categorical -
1. Univariate Analysis -
1. Hygiene checks in the data
1. a) Check the categories of the columns - more than expected
categories like – Grad/undergrad (convert it into two categories)
2. b) Suppose city - 60% - Delhi, 1% Ahmedabad - try to merge for later
purpose
2. Missing values - Treat them with a method - Mode/Max freq/KNN imputer
from sklearn/Unknown
3. Check that - "?"/special characters - value counts on each of the categorical - if
you can run a loop
4. Create a few count plots to show freq - run a loop to get all the plots in 1 go -
**Insights**
2. Bi-variate Analysis -
1. Categorical to categorical (X1 v/s X2) - stack bar plot
2. Categorical to numerical (X1_cat v/s X2_num) - bar plot/swarm/violin/box -
**Insights**
3. Categorical to Target Feature (X1_cat v/s Target conversion) – stack bar
plot/swarm/violin/box Plot - **Insights**
4. Numerical -
1. Univariate Analysis -
1) Hygiene checks on the data
2) Missing values - Mean/Median/KNN imputer/simple imputer
3) Distribution and box plots with a loop - **Insights**
4) Outliers - boxplot - IQR method/percentile method (99%,95%)
5) Distribution and box plots with a loop - verify the outliers are removed -
**Insights**
6) Skewness in the data - right skewed - take a log else take a squareroot
2. Bi-variate Analysis -
1. Correlation -
1. a) Correlation between (X1_num v/s X2_num) - heatmap -
**Insights**
2. b) Scatter plots (X1_num v/s X2_num) - regplot - **Insights**
2. Relation with target feature (X1_num v/s Target) - BOX/Swarm/violin -
**Insights**
3. Relation with Categorical feature (X1_num v/s X1_cat) - BOX/Swarm/violin -
**Insights**
5. Overall Pairplot - Try to see the separation between the - creation the distribution plot with a
hue of target - Pair plot