Exploratory Data Analysis
(EDA)
Exploration of Data
using Visualization
in Python
Topics covered
Exploratory Data Analysis (EDA)—Introduction, Data Exploration,
Handling -Duplicates, Outliers, Missing values, Univariate Analysis,
and Bivariate Analysis.
1. Importing the Python Libraries
2. Handling Duplicates
3. Handling Outliers
4. Handling Missing Values
5. Univariate Analysis
6. Bivariate Analysis
Exploratory data analysis (EDA)
Exploratory data analysis, popularly known as EDA, is a process of
performing initial investigations on the dataset to discover the
structure and content. It is often known as Data Profiling.
•EDA is where we get a basic understanding of the data at hand,
which helps us further with data cleaning and preparation.
What is Exploratory Data Analysis (EDA)?
EDA is the process of analyzing datasets to summarize their
key characteristics, detect patterns, spot anomalies, and
check assumptions using visualization and statistical
methods.
EDA is used to:
• Understand dataset structure
• Detect missing values, outliers, and errors
• Identify patterns and relationships
• Guide feature selection for machine learning
Why is EDA Important?
Before building models, we need to understand and clean the data.
• Poor data quality can lead to wrong conclusions.
• Helps choose the right analysis techniques.
Real-Life Examples:
• Stock Market: Identifying trends before forecasting stock prices
• E-Commerce: Understanding customer purchase behavior
• Healthcare: Detecting anomalies in patient data for diagnosis
Imagine an app-based food delivery company analyzing its data. They perform EDA to:
• Check missing order details
• Find peak hours of delivery
• Analyze the relationship between delivery time & customer ratings
Key Steps in Exploratory Data Analysis (EDA)
Step Description Example
1.Data Collection & Loading Import the dataset from a file (CSV, df = pd.read_csv("data.csv")
Excel, Database, API)
2. Data Cleaning Handle missing values, duplicates, Remove duplicates, fill missing values
and inconsistent formatting with mean/median
3. Data Exploration Understand dataset structure, df.describe(), df.info()
summary statistics, and
distributions
4. Outlier Detection Identify and handle anomalies using Detect extreme values in sales or
boxplots, IQR, or Z-score customer spending
5. Data Visualization Use plots to understand Histograms, Scatter plots, Correlation
distributions and relationships heatmaps
6. Feature Selection & Select relevant variables, scale or Standardization, One-Hot Encoding
Transformation encode categorical data
Types of Data in EDA
Data Type Description Example Variables
Numerical Data Data represented in numbers, can Revenue, Age, Temperature
be continuous or discrete
Categorical Data Data divided into categories, Gender (Male/Female), Customer
either nominal or ordinal Type (New/Returning)
Ordinal Data Categorical data with a Satisfaction Level (Low, Medium,
meaningful order/ranking High)
Date-Time Data Data related to time, useful for Sales Date, Transaction Time
trend analysis
Boolean Data Binary values representing Is_Defaulted (Yes/No), Purchased
True/False or Yes/No (0/1)
Python
Data Visualization
Syntax / Code– Red Colour
Importing the Python Libraries
•NumPy
•Pandas
•Matplotlib and
•Seaborn.
Dataset – EDA-1-Black_Friday_3
This dataset comprises of sales transactions captured at
a retail store. It’s a classic dataset to explore and expand
your feature engineering skills and day to day
understanding from multiple shopping experiences.
Import Python Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#to ignore warnings
import warnings
warnings.filterwarnings('ignore')
#Step-1-Importing Dataset
import pandas as pd
BF=pd.read_excel(r"D:\DataSet2024\Black_Friday_3.xlsx")
print(BF)
#Step-2-Display the Variable Names and their Data
Types-Meta Data
Referred to as data that describes
#Meta Data other data, metadata is structured
BF.info() reference data that helps to sort
and identify attributes of the
information it describes.
#Step-3-Count the Number of Non-Missing
Values for each Variable
BF.count()
#Step-4-Descriptive Statistics
BF.describe()
#Step-5-Inclusion of categorical variables
BF.describe(include='all')
#Step-6-Handling Duplicates
This involves 2 steps: Detecting
duplicates and Removing duplicates.
The pandas. DataFrame. dupli
cated() method is used to find
duplicate rows in a
BF.duplicated() DataFrame. It returns a
boolean series which
identifies whether a row is
duplicate or unique.
#Step-7-To remove the duplicates(if any)
BF.drop_duplicates()
#BF.drop_duplicates(subset='User_ID')
This by default keeps just the first occurrence of the duplicated value in
the User_ID variable and drops the rest of them.
Do not want to remove the duplicate values from the User_ID variable permanently so
just to see the output and not make any permanent change in our data frame.
BF.drop_duplicates(subset='User_ID' , inplace=False)
Handling Outliers
Outliers are the extreme values on the low and the high
side of the data.
Handling Outliers involves 2 steps: Detecting outliers
and Treatment of outliers.
Detecting Outliers
Consider any variable from data frame and determine
the upper cut off and the lower cutoff with the help of
any of the 3 methods namely :
• Percentile Method
• IQR Method
• Standard Deviation Method
IQR Method of Outlier Detection
minimum is the minimum
value in the dataset,
and maximum is the
maximum value in the
dataset.
Determining if there are any outliers in the data
set using the IQR(Interquartile range) Method.
Finding the minimum(p0), maximum(p100), first
quartile(q1), second quartile(q2), the third quartile(q3), and
the iqr(interquartile range) of the values in the Purchase
variable.
#Step-8-IQR(Interquartile range) Method -
Execute
p0=BF.Purchase.min()
p100=BF.Purchase.max()
q1=BF.Purchase.quantile(0.25)
q2=BF.Purchase.quantile(0.5)
q3=BF.Purchase.quantile(0.75)
iqr=q3-q1
Using the Interquartile Rule to Find Outliers
1. Calculate the interquartile range for the data
2. Multiply the interquartile range (IQR) by 1.5 (a constant used to
discern outliers)
3. Add 1.5 x (IQR) to the third quartile. Any number greater than this
is a suspected outlier
4. Subtract 1.5 x (IQR) from the first quartile. Any number less than
this is a suspected outlier
Why to use 1.5 in IQR
• About 68.26% of the whole data lies within one standard deviation (<σ) of
the mean (μ), taking both sides into account, the pink region in the figure.
• About 95.44% of the whole data lies within two standard deviations (2σ)
of the mean (μ), taking both sides into account, the pink+blue region in
the figure.
• About 99.72% of the whole data lies within three standard deviations
(<3σ) of the mean (μ), taking both sides into account, the
pink+blue+green region in the figure.
• And the rest 0.28% of the whole data lies outside three standard
deviations (>3σ) of the mean (μ), taking both sides into account, the little
red region in the figure. And this part of the data is considered as outliers.
• The first and the third quartiles, Q1 and Q3, lies at -0.675σ and +0.675σ
from the mean, respectively.
#Step-9-To find the lower cutoff(lc) and the upper
cutoff(uc) of the values
lc = q1 - 1.5*iqr
uc = q3 + 1.5*iqr
#Step-10-Execute
lc
uc
Outlier - Condition
If lc < p0 → There are NO Outliers on the lower side
If uc > p100 → There are NO Outliers on the higher side
#Step-11-Execute
print( "p0 = " , p0 ,", p100 = " , p100 ,", lc = " , lc ,", uc = " , uc)
Output
p0 = 12 , p100 = 23961 , lc = -3523.5 , uc = 21400.5
Clearly lc < p0 so there are no outliers on the lower side. But uc <
p100 so there are outliers on the higher side.
#Step-12-Pictorial representation of the
outlier by drawing the box plot
BF.Purchase.plot(kind='box')
Outlier Treatment
• Clip the values instead of removing them from the
variable.
• During this process, replace values that are outside
the range with the lower or upper cutoff, as
appropriate.
• Once outliers are removed from the data, and all of
the data is within the range.
#Step-13-Clipping all values greater than the
upper cutoff to the upper cutoff :
BF.Purchase.clip(upper=uc)
#Step-14-To finally treat the outliers and
make the changes permanent
BF.Purchase.clip(upper=uc,inplace=True)
BF.Purchase.plot(kind='box')
#Step-15-Handling Missing Values
#Detecting the Missing Values #Pandas.isna(obj)[source] Detect
missing values for an array-like
object. This function takes a scalar
or array-like object and indicates
whether values are missing ( NaN in
BF.isna() numeric arrays, None or NaN in
object arrays, NaT in datetimelike).
#df.isna() returns True for the missing values
and False for the non-missing values.
#Step-16-To find out the percentage of
missing values in each variable
BF.isna().sum()/BF.shape[0]
Missing Value Treatment
To treat the missing values we can opt for a method from
the following :
• Drop the variable
• Drop the observation(s)
• Missing Value Imputation
Missing Value Treatment
• 31.56% of the values for the variable Product Category 2 are
missing.
• We should not discard such a large number of observations or the
variable itself. Therefore, we will use imputation.
• In this process, missing data is imputed by replacing the missing
values with an appropriate value, which could be a constant, mean,
median, mode, or a predictive model output.
• Since Product Category 2 is a categorical variable, we will impute
the missing values using the mode.
#Step-17-Missing Value Treatment
#Execute 1st
BF.Product_Category_2.mode()[0]
BF.Product_Category_2.fillna(BF.Product_Category_2.mo
de()[0],inplace=True)
# Execute 2nd
BF.isna().sum() #BF.isna().sum()/BF.shape[0]
#Step-17-Missing Value Treatment
For the variable Product_Category_3, 69.67% of the
values are missing, which is a significant proportion.
Therefore, we will drop this variable.
BF.dropna(axis=1,inplace=True)
#Step-18-Missing Value Treatment
How to check?
BF.dtypes
Univariate Analysis
In this type of analysis, charts are plotted for a single
variable. These charts help visualize how the data is
distributed and structured based on the variable type-
categorical or numerical.
For continuous variables, we use box plots and histograms
to examine the data distribution.
Syntax Example :
1. Declared file name = Train
Declared file name.Variable Name.hist() Variable Name = Purchase
plt.show()
2.
Declared file name.groupby(‘Variable Name’).Variable
Name.count().plot(kind=‘pie’ or ‘barh’)
plt.show()
3.
sns.countplot(Declared file name.Varible name)
plt.show()
#Step– 19- Distribution of Purchase
# Histogram
import matplotlib.pyplot as plt
BF.Purchase.hist()
plt.show()
For Categorical Variables
• To analyze the distribution (Spread) of categorical
variables, we use frequency plots such as bar charts
and horizontal bar charts.
• To understand the composition (Arrangement) of data,
we use pie charts.
#Step-20-Composition of Gender
import matplotlib.pyplot as plt
print(BF['Gender'].value_counts()) # Print the counts
# Plot the pie chart
BF['Gender'].value_counts().plot(kind='pie', autopct='%1.1f%%')
# Save as PNG the figure before showing it
plt.savefig("gender_pie_chart.png", dpi=300, bbox_inches='tight')
plt.show()
Gender
M 414259
F 135809
Name: count, dtype: int64
#Step-21-Composition of Gender
BF.groupby('Gender').Gender.count().plot(kind='pie')
plt.show()
#Step-22-Distribution of Marital_Status
import seaborn as sns
sns.countplot(x='Marital_Status',data=BF)
plt.show()
#Step-22A-Distribution of Marital_Status
sns.countplot(x='Marital_Status', data=BF,
hue='Gender', palette='coolwarm')
plt.show()
#Step-22B-Distribution of Marital_Status
import seaborn as sns
import matplotlib.pyplot as plt
# Create count plot with hue for Gender
ax = sns.countplot(x='Marital_Status', data=BF, hue='Gender', palette='viridis')
# Display counts on top of bars
for bar in ax.containers:
ax.bar_label(bar)
plt.show()
palette_options = [
'viridis', 'magma', 'plasma', 'inferno', # Scientific color maps
'coolwarm', 'RdYlBu', 'Spectral', # Diverging color maps
'Blues', 'Reds', 'Greens', 'Purples', # Single-tone color maps
'pastel', 'deep', 'muted', 'bright', 'dark', 'colorblind' # Seaborn-
themed palettes
]
print(palette_options)
#Step-23-Composition of City_Category
BF.groupby('City_Category').City_Category.count().plot(kind
='pie')
plt.show()
#Step-23A-Composition of City_Category
import matplotlib.pyplot as plt
# Plot the pie chart with numbers
BF.groupby('City_Category').City_Category.count().plot(kind='pie',
autopct='%1.1f%%')
plt.ylabel('') # Removes the default y-axis label
plt.title("City Category Distribution")
plt.show()
Step-24-Distribution of Age
sns.countplot(x='Age',data=BF)
plt.show()
import seaborn as sns
import matplotlib.pyplot as plt
# Create the count plot with a color palette
ax = sns.countplot(x='Age', data=BF, palette='viridis') # Change palette if
needed
# Add numbers on top of each bar
for bar in ax.containers:
ax.bar_label(bar)
plt.title("Age Distribution")
plt.show()
#Step-25-Composition of Stay_In_Current_City_Years
import matplotlib.pyplot as plt
BF.groupby('Stay_In_Current_City_Years').City_Cate
gory.count().plot(kind='pie', autopct='%1.1f%%')
plt.show()
#Step-26-Distribution of Occupation
Chart=sns.countplot(x='Occupation',data=BF)
for container in Chart.containers:
Chart.bar_label(container)
plt.show()
#Step-27-Distribution of Occupation
sns.countplot(x='Occupation',data=BF)
plt.show()
#Step-28-Distribution of Product_Category_1
BF.groupby('Product_Category_1').Product_Category_1.c
ount().plot(kind='barh')
plt.show()
#Step-29-Histogram - Multiple
df=pd.DataFrame(BF,columns=['Purchase','Product
_Category_1','Occupation'])
df.diff().hist(bins=15)
#Step-30-Histogram with grid
BF.Purchase.plot(kind='hist' , grid = True)
plt.show()
Bivariate Analysis
In this type of analysis, we consider two variables at a
time and create charts based on them. Since there are
two types of variables - categorical and numerical-
bivariate analysis can have three cases
1. Numerical & Numerical
2. Numerical & Categorical
3. Categorical & Categorical
1.Numerical & Numerical
To examine the relationship between two
variables, we create scatter plots and a correlation
matrix with a heatmap overlay
Scatter Plot
Our dataset contains only one numerical variable, so we
cannot create a scatter plot.
How can we address this?
Consider a scenario where we treat all numerical
variables (i.e., those with data types of int or float) as
numerical variables
#Step-31 # Considering 2 categorical (Numerical data)
variables Product_Category_1 and Product_Category_2
BF.plot(x='Product_Category_1',y='Product_Cate
gory_2',kind = 'scatter')
plt.show()
#Step-32# Considering 2 categorical
variables Product_Category_1 and Product_Categ
ory_2
plt.scatter(x=BF.Product_Category_1 ,
y=BF.Product_Category_2)
plt.show()
#Step-32-Correlation Matrix : Finding the
correlation among all numerical variables
BF.drop('User_ID',axis=1, inplace=True)
BF.select_dtypes(['float64' , 'int64']).corr()
Product_Categor Product_Categor
Occupation Marital_Status Purchase
y_1 y_2
Occupation 1.000000 0.024280 -0.007618 0.001566 0.020853
Marital_Status 0.024280 1.000000 0.019888 0.010260 -0.000599
Product_Category_1 -0.007618 0.019888 1.000000 0.279247 -0.347413
Product_Category_2 0.001566 0.010260 0.279247 1.000000 -0.131104
Purchase 0.020853 -0.000599 -0.347413 -0.131104 1.000000
#Step-34-Correlation Matrix - Finding a
correlation between all the numeric variables
BF['Marital_Status'].corr(BF['Occupation'])
#Step-35-Heatmap
Creating a heatmap using Seaborn on the
correlation matrix helps visualize the
relationships between numerical columns in the
dataset.
!pip install seaborn --upgrade
sns.heatmap(BF.select_dtypes(['float64' , 'int64'])
.corr(),annot=True)
plt.show()
#annot: If True, write the
data value in each cell.
2.Numerical & Categorical
•To analyze the composition of the data, create bar
and line charts
•To compare two variables, create bar and line charts
#Step-36#Comparison between Purchase and
Occupation: Bar Chart
BF.groupby('Occupation').Purchase.sum().plot(kin
d='bar')
plt.show()
#Step-37#Comparison between Purchase and
Occupation: Bar Chart
summary=BF.groupby('Occupation').Purchase.su
m()
sns.barplot(x=summary.index ,
y=summary.values)
plt.show()
#Step-38-Comparison between Purchase
and Age: Line Chart
BF.groupby('Age').Purchase.sum().plot(kind='line')
plt.show()
#Step-39-Comparison between Purchase
and City_Category: Area Chart
BF.groupby('City_Category').Purchase.sum().plot(
kind='area')
plt.show()
BF.groupby('City_Category').Purchase.sum().plot(k
ind='area', color='skyblue')
plt.show()
Replace 'skyblue' with any preferred color (e.g.,
'red', 'green', '#FFA07A', etc.).
#Step-40-Comparison between Purchase and
Marital_Status
sns.boxplot(x='Marital_Status',y='Purchase',data=BF)
plt.show()
Generate the Box Plot & Display Numerical Summary
# Load the Black Friday dataset
BF=pd.read_excel(r"D:\Mallie IV\AcademicYear2025_26\Black_Friday_3.xlsx")
# Create the box plot
plt.figure(figsize=(8, 5))
ax = sns.boxplot(x='Marital_Status', y='Purchase', data=BF)
# Calculate and display descriptive statistics
stats = BF.groupby('Marital_Status')['Purchase'].describe()
print(stats) # Print numerical summary of the box plot
# Adding median values to the plot
medians = BF.groupby('Marital_Status')['Purchase'].median()
for i, median in enumerate(medians):
plt.text(i, median, f'{median:.0f}', horizontalalignment='center', fontsize=12, color='black', fontweight='bold')
plt.title("Box Plot of Marital Status vs. Purchase (Black Friday Data)")
plt.xlabel("Marital Status (0 = Single, 1 = Married)")
plt.ylabel("Purchase Amount")
plt.show()
Numerical Summary (from describe())
Marital Count Mean Median Q1 (25%) Q3 (75%) Min Max Std Dev
Status (Q2)
Single (0) 324,731 ₹9,266 ₹8,044 ₹5,605 ₹12,061 ₹12 ₹23,961 ₹5,027
Married (1) 225,337 ₹9,261 ₹8,051 ₹5,843 ₹12,042 ₹12 ₹23,961 ₹5,017
How to Read the Box Plot?
Box Plot Element Where to Find in Graph? Interpretation
Median (Q2 - 50th Horizontal line inside each Very close for both groups (₹8,044 vs.
Percentile) box ₹8,051) → No significant difference in
typical spending.
Interquartile Range (IQR: Height of the box (from Q1 Singles: ₹5,605 - ₹12,061, Married:
Q3 - Q1) to Q3) ₹5,843 - ₹12,042 → Similar spending
range.
Whiskers (Min & Max, Lines extending from the Both groups have identical min (₹12)
Excluding Outliers) box and max (₹23,961).
Outliers Dots outside the whiskers Few high-value purchases exist in both
groups but not significantly different.
Box Width (Variability in Width of the box Almost equal spread, meaning both
Data) groups have similar purchase behavior.
Key Insights
•Spending patterns between singles and married individuals are almost identical.
•The mean, median, and IQR values are very close for both groups.
•Both groups have similar spending variations (standard deviation ≈ ₹5,000).
•The max purchase amount is the same for both groups (₹23,961).
•This suggests that high spenders exist equally in both categories.
•The IQR range is slightly wider for singles (₹5,605 - ₹12,061) vs. married (₹5,843 -
₹12,042).
•This means that married individuals have slightly more consistent spending
behavior, but the difference is minor.
•There is no strong indication that marital status influences purchase behavior
significantly.
•Since both groups have nearly identical distributions, other factors (like Age,
Occupation, or City Category) may influence spending more than marital status.
Conclusion
• The box plot visually confirms that there is no major
difference in spending behavior between singles and
married individuals.
• We should focus on other variables (such as Age,
Product Category, or Income) to find stronger
patterns.
How to Interpret the Box Plot?
Each box represents the distribution of purchases for a specific
marital status.
•Key elements of the box plot:
•The box → Shows the middle 50% of data (Interquartile Range,
IQR).
•The line inside the box → Represents the median purchase
amount.
•Whiskers → Indicate the range of most values, excluding outliers.
•Dots outside the whiskers → Represent outliers, meaning
unusually high or low purchases.
How to get rupee symbol?
import seaborn as sns
import matplotlib.pyplot as plt
# Simple box plot
sns.boxplot(x=["Single", "Single", "Married", "Married"], y=[8000, 12000,
15000, 18000])
# Use Unicode for ₹ symbol
plt.ylabel("\u20B9 Purchase")
plt.show()
3.Categorical & Categorical
To analyze the relationship between two
variables, create a crosstab and overlay a
heatmap
Step 41: Relationship Between Age and Gender –
Creating a crosstab to display data for Age and
Gender
pd.crosstab(BF.Age,BF.Gender)
Gender F M
Age
0-17 5083 10019
18-25 24628 75032
26-35 50752 168835
36-45 27170 82843
46-50 13199 32502
51-55 9894 28607
55+ 5083 1642
#Step-42-Heatmap: Creating a Heat Map on
the top of the crosstab
sns.heatmap(pd.crosstab(BF.Age,BF.Gender))
plt.show()
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Create and print crosstab values
crosstab_data = pd.crosstab(BF['Age'], BF['Gender'])
print(crosstab_data)
# Create heatmap
sns.heatmap(crosstab_data, cmap="Blues")
plt.show()
Interpretation of Age vs. Gender Heatmap
Aspect Observation Interpretation
Dominant Age Group 26-35 (Male: 168,835, Female: 50,752) This age group has the highest number of buyers,
especially males. It suggests that young working
professionals are the biggest shoppers.
Second Highest Age Group 18-25 (Male: 75,032, Female: 24,628) The 18-25 age group is the second most active, likely
consisting of college students and young employees.
Males are significantly more than females.
Older Age Groups (46-55, Fewer buyers compared to younger groups The number of shoppers decreases with age,
55+) indicating that middle-aged and senior customers
shop less on Black Friday.
Male vs. Female Male dominate all age groups Across all age categories, the number of male buyers
Comparison is consistently higher than female buyers, showing
that males participate more in Black Friday shopping.
Least Active Age Group 55+ (Male: 16,421, Female: 5,083) The senior group (55+) has the least shoppers,
possibly due to lower tech adoption or shopping
preferences.
Balanced Gender Ratio Most balanced in the 36-45 group (M: While male still outnumber females, the gender gap is
82,843, F: 27,170) slightly less pronounced in this category.
Frequency Distribution
1e8 is standard scientific notion, and here it indicates an
overall scale factor for the y-axis. That is, if there's a 2 on the
y-axis and a 1e8 at the top, the value at 2 actually indicates
2*1e8 = 2e8 = 2 * 10^8 = 200,000,000
Workbook-2-Plotly Library
Dataset : World Happiness Report
Live Dashboards
Step-1
import numpy as np
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
Step-2
WHR=pd.read_excel(r"D:\DataSet2024\World_H
appiness_R22_4.xlsx")
print(WHR)
WHR.info()
Step-3 – Scatter Diagram
fig = px.scatter(WHR, x="Happiness_Score",
y="GDP_Per_Capita", color='Country')
fig.show()
Step-4 - Scatter Diagram
fig = px.scatter(WHR, x="Happiness_Score",
y="GDP_Per_Capita", color='Rank')
fig.show()
Step-5-Scatter diagram
fig = px.scatter(WHR, x="Happiness_Score",
y="GDP_Per_Capita", color='PerceptionsOfCorruption')
fig.show()
Step-6 – Line chart - Multiple
fig = px.line(WHR, x='Happiness_Score',
y="Country")
fig.show()
Step-7 – Line chart
fig = px.line(WHR, x='Happiness_Score',
y=["GDP_Per_Capita","Social_Support",'Healthy_
Life_Expectancy','FreedomOfChoices',"Generosit
y"])
fig.show()
Step-8 – Line chart
fig = px.line(WHR,
x='Happiness_Score',y='Social_Support',color='Co
untry')
fig.show()
Step-9 – Bar Chart
fig = px.bar(WHR, x='Happiness_Score',
y='Country')
fig.show()
Step-10 – Bar chart
fig = px.bar(WHR, x='Country',
y='Happiness_Score')
fig.show()
Step-11 – Bar Chart
fig = px.bar(WHR, x='Country',
y='Happiness_Score',color='Rank')
fig.show()
Step-12 – Pie chart
fig = px.pie(WHR, values='Happiness_Score',
names='Rank', title='Happiness Report')
fig.show()
Step-13-Histogram
fig = px.histogram(WHR,
x="Happiness_Score",title="Happiness Report")
fig.show()
Step-14-Histogram with Colour
fig = go.Figure(data = [go.Histogram(x =
WHR.Happiness_Score,xbins=go.histogram.XBins(
size=0),marker=go.histogram.Marker(color="gree
n") ) ])
fig.show()