Data visualization
What is data visualization?
It is a graphical representation of information or data or both.
Example:
❖Information : Hierarchy of a family
❖Data : Credit card spends
❖Both - Blood pressure or sugar level of a patient
Why do we require graphs?
❖To tell a story
❖To improve understanding of facts
❖To compare variables
❖To reduce clutter
❖To enable better decision-making
Observation of honey production across years
Year 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Honey Production 219519000 202387000 219558000 185748000 171265000 181372000 182729000 173969000 154238000 147621000
Data Visualisation - Honey Production
2500
2000
Honey Production
1500
1000
500
0
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Years
Box plot - It helps to identify outliers
Data type and plotting options
Analysis Data Type Plotting options
One Continuous Variable Histogram , Boxplot
Univariate
One Categorical Variable displot, countplot, pie -chart
Two continuous Variable Scatter Plot, point plot joint plot
Bivariate Two Categorical Variable Bar Plot
One Continuous & One Categorical Varibale Box Plot, Bar Plot
Three Continuous Variable
Multivariate Two continuous and one Categorial Varibale Scatter Plot
Two Categorical and One Continuous Varibale
Data Science –Life Cycle
Data Exploratory Data Analysis
Collection (EDA)
What should be done? Prescriptive Descriptive What is happening?
Analysis Analysis
Predictive Diagnostic
What could happen? Analysis Analysis Why did it happen?
Data Science –Life cycle Parallel Steps/Task
Step 1: Data collection (What information to collect?)
Making observation from the real world & Quantifying the information into a • Business Understanding
tabular format
Eg. Capturing Brands, quantity , cost and frequency of purchase of items from
a retail store.
Step 2 : Descriptive Analysis (What has happened?)
Use data aggregation and possibly data mining to provide insight into the • Data Understanding
past • Data Preparation
Ex: Who is buying our products? Or Who is buying our competitors product?
Data Science –Life cycle Parallel Steps
Step 3:Diagnostic Analysis(Why did it happen?)
One basically drills down the data to identify the root cause of the
problem
Ex: There was an offer/discount given in product hence the sales volume went up
• Model Building
• Model Validation
Step 4:Predictive Analysis(What could happen?)
Use statistical and other forecasting techniques including datamining and machine
learning to understand the future
Ex: Using regression you say, the price of the product should be between Rs. 100-125
Data Science –Life cycle Parallel Steps
Step 5: Prescriptive Analysis(What should we do?)
Use optimization and simulation methods to make decisions • Recommending an output
and describe possible outcomes. • Implementation
Ex: We recommend to reduce the price below 125 to increase
sales by 5%