0% found this document useful (0 votes)

22 views20 pages

M2 - Visualization of Categorical and Numerical Data

Uploaded by

krishnabadhe20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views20 pages

M2 - Visualization of Categorical and Numerical Data

Uploaded by

krishnabadhe20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Visualization

Visualization of Categorical and Numerical Data

Data Visualization

Table of Contents
Introduction ............................................................................................................................................................................. 3

1. Visual Analysis of Statistical Data ..................................................................................................................................... 4

1.1. Key Measures Computed for Statistical Data ............................................................................................................ 4

1.2. Examining the Data .................................................................................................................................................... 5

2. Data Visualization - Variation within Categorical Measures .......................................................................................... 7

2.1. Bar Graphs .................................................................................................................................................................. 7

2.2. Specialized Bar Graphs ........................................................................................................................................... 11

2.3. Treemaps .................................................................................................................................................................. 13

3. Variation within Numerical Measures ............................................................................................................................. 15

3.1. Histograms ................................................................................................................................................................ 15

Summary ............................................................................................................................................................................... 19
Data Visualization

Introduction
The topic gives an overview of the need to use visual analysis techniques for examining
statistical data. These techniques are used to understand the variation within categorical
and numerical data measures and also the relationships between these measures.

The visualization methods used for graphically examining the variation within the categorical,
and numerical types of data measures are covered in this topic. The remaining methods of
visualization through time and across space, and relationships between the measures are
covered in the subsequent topic.

Learning Objectives
Upon completion of this topic, you will be able to:
• Explain the importance of visual analysis for statistical data
• Describe the techniques available for visualizing variation within categorical and numerical
data.
Data Visualization

1. Visual Analysis of Statistical Data

In the data analysis process, it is essential to get a thorough understanding of the data prior
to applying modelling for extracting insights. However, the huge volume of data available in
today’s enterprises is a challenge in getting sufficient clarity regarding the data.

Hence, the approach taken will be to move top-down by arriving at an initial set of
characteristics which describe the data at an overall level. This is achieved by using data
visualization methods on sample data to unearth critical facts and trends.

1.1. Key Measures Computed for Statistical Data

To analyze the datasets, the following statistical measures are computed for key variables:
a. Location: The mean, median and mode
b. Variability: Percentiles, variance and standard deviation
c. Shape: Skew and kurtosis

Visualization techniques are especially useful for identifying outliers in distributions and
checking for associations between variables. Often, visualization of data distributions also
provides insight into very different behavior of data distributions in datasets which have
identical location or variability measures as shown in the figure 1.1.
Data Visualization

Figure 1.1. - Anscombe quartet

1.2. Examining the Data
To understand the data in a dataset, both the categorical and numerical (or quantitative)
measures associated with it need to be examined.

Categorical measures belong to a category. Typical examples of such data are product,
country, customer and territory. Numerical measures are measurable quantitatively.
Examples are profit, revenue, expense and blood pressure.

Statistical meaning is found both in the variations within the categorical and numerical
measures as well as in relationships among them. The following table summarizes the
different types of variations and relationships.

Variation within categorical How items in the categories relate to each other?
measures (Ranking, Part-to-whole)
Variation within numerical How values in the measure are distributed across the
measures range? (Distribution)
Data Visualization

Variation through time How values change through time? (Time-series)

Relationship between How measures relate to one another? (Correlation)
numerical measures
Variation across space Where are values located in space relative to one
another? (Spatial)
Relationship between How categories relate to each other mediated by
categorical measures measures? (Inter-category)

Table 1.1 – Summary table of variations and relationships within and across measures.
Data Visualization

2. Data Visualization - Variation within Categorical Measures

This analysis is done to evaluate how data items within a category in the dataset relate to
each other in terms of ranking (Example - highest to lowest) and proportion to the whole
that is, part-to-whole. The visualization methods used in this analysis are explained below.

2.1. Bar Graphs

Bar graphs are most suited for displaying values subdivided into discrete instances along a
nominal or ordinal scale. The visual weight of bars, places emphasis on the individual
values in the graph, and makes it easy to compare individual values to one another by
simply comparing the height of the bars.

There are three main types of bar graphs:

a. Horizontal: It is perfect for comparative ranking, like a top-five list. It is also preferred
in situations where the category labels are very descriptive, and adjusting them within
the axis of a column graph becomes an issue.

Figure 2.1. - Ranking of top five products sales.

Source: datapine.com
Data Visualization

b. Vertical/Column: Good for showing chronological data, such as growth over specific
periods, and comparing data across categories.

Figure 2.2. - Comparing product sales across channels and countries

Source: datapine.com
Data Visualization

c. Stacked: Useful for handling part-to-whole relationships.

Figure 2.3. - Age-wise distribution of new customers across quarters

Source: datapine.com

Two other techniques used to visualize part-to-whole distribution of items in the category
are the Pie Chart and the Dough-nut Chart. The arc length of each sector and consequently
its area is proportional to the quantity it represents.
Data Visualization

Car Taxi Two wheeler Cycle Local bus Metro Walk

3%
6%

28%
14%

12%

35%

Figure 2.4. - Commuting means by employees in XYZ company

Figure 2.5. - Doughnut chart of age structure

Source - Devexpress
Data Visualization

2.2. Specialized Bar Graphs

a. Bullet graph
The bullet graph is a variation of the bar graph, which depicts a performance measure
along with a comparative value and a qualitative measure to show if performance is
good, bad or intermediate.

Figure 2.6. - Bullet Graph

Source: Wikipedia - Bullet Graph

In the figure 2.6, the dark bar represents the performance measure, the vertical marker
represents the comparative value, and the colour shading represents performance
degree, with the lightest shade denoting the best performance.

b. Pareto Charts
Pareto charts are helpful to depict part-to-whole relationships. The items are shown as
bars arranged in descending order of value. The line denotes cumulative value totalling
to 100% and helps to pin-point the main contributing parts of the whole.
Data Visualization

Figure 2.7. Pareto Graph/Charts

Source: pareto-chart.com

c. Deviation Bar Graphs

These are bar graphs which directly express the variation in value between two points in
time. They are very useful in cases where the focus is exclusively on variation of a value
in time, regardless of ranking or part-to-whole relationships.
Data Visualization

Figure 2.8. - Deviation in diseases before and after 4 weeks of sanitation drive
Source: www.cdc.gov
2.3. Treemaps
When the number of products to be compared in the categories exceed what a bar graph
can handle, treemaps are used. Treemaps are designed to display part-to-whole
relationships.

They use rectangles contained within larger rectangles to represent a hierarchy of up to 3

levels. In addition to rectangle size, we can also use colour to display another attribute.
Data Visualization

Figure 2.9. - Proportion of individual country GDP to Top 15 Nation GDP 2011
Source - Satori group
Data Visualization

3. Variation within Numerical Measures

When examining the variation within numerical measures, the focus is on understanding
how data is distributed across the range from the lowest to highest in other words, the data
distribution.

3.1. Histograms
These are the most commonly used graphs for summarising distributions and frequently
used to understand the data. It is useful when there are a large number of observations.

The spread of values is divided into intervals of equal size. Bars are used to display
percentage of values in each interval. The histogram is very helpful for easy visual
recognition of key characteristics or patterns in the data. For example, highest/lowest
scores, where scores are centred and whether scores are clustered together or scattered.

Figure 3.1. - Pattern of frequency of arrivals at park gate

Source - Wikimedia
Data Visualization

Figure 3.2. - Bimodal histogram showing 2 peaks in distribution of weights

Source – Minitab

a. Relative frequency histogram

In a relative frequency histogram, the vertical scale is marked with relative frequencies
instead of actual frequencies.
Relative Frequency = Class Frequency/Sum of all Frequencies

Figure 3.3. - Relative frequency percentage of occurrences over the year

Source - The National Severe Storms Laboratory
Data Visualization

b. Frequency polygons
These are similar in purpose to a histogram, but use a line to represent the values
instead of bars. Line segments are connected to points located directly above class
midpoint values. The heights of the points correspond to the class frequencies, and the
line segments are extended to the right and left so that the graph begins and ends on
the horizontal axis.

Figure 3.4. - Frequency polygon of bacterial cell lengths

Source - Kean University

Figure 3.5. - Frequency polygon with multiple distributions

Source - onlinestatbook.com
Frequency polygons have two advantages over histograms:
• The shape of the distribution is shown more clearly
Data Visualization

• The shapes of multiple distributions can be compared in a single graph

c. Cumulative Frequency Distributions

Frequency polygons are also good choices to display cumulative frequency distributions.
Cumulative frequency for a given class is the sum of the frequencies for that class and
the preceding classes.

Figure 3.6. - Cumulative frequency distribution

Source - sychstat.missouristate.edu
3.4. Box Plots
Invented by John Tukey, box-plots are an excellent tool for comparing multiple distributions.
To draw a box-plot, we need to do the following:
a. Order the data
b. Obtain the minimum and maximum values
c. Obtain the median and the quartiles Q1 and Q3.
d. Draw a line from the minimum to the maximum value
Data Visualization

e. Draw a box with its lines drawn at Q1, the median and Q3.
The IQR is the inter-quartile range i.e. the value of Q3-Q1. Values which fall outside the
lower and upper inferences i.e. below (Q1- 1.5 IQR) and above (Q3 + 1.5 IQR) are
considered as Outliers.

Figure 3.7. - Box-plots

Source - whatissixsigma.net
The box-plot is very useful to display the full range of data that is, center, spread of values
from min. to max. and outlier values. It is also helpful to check if values are clustered or
evenly distributed.

Summary
In the data analysis process, it is important to thoroughly understand the data prior to further
analysis in order to obtain insights. Visualizations help tremendously in this process to
unearth facts and trends which might otherwise not have been visible except through
graphical means.
Data Visualization

Visualization techniques are used to understand the shape of the data distribution. In order
to understand the data in the dataset, both the categorical and numerical (or quantitative)
measures associated with it need to be examined. Statistical meaning is found both in the
variations within, as well as relationships between categorical and numerical measures.

Variation within categorical measures is visualized primarily using bar graphs (horizontal,
column and stacked), pie and doughnut charts, and specialized bar graphs such as bullet
charts, pareto charts, deviation bar graphs and treemaps.

Variation within numerical measures is visualized using Histograms (including relative

frequency), frequency polygons and box-plots.

M2 - Visualization Across Time, Space, Relationships
No ratings yet
M2 - Visualization Across Time, Space, Relationships
14 pages
Ameer Data Visualization and Techniques
No ratings yet
Ameer Data Visualization and Techniques
4 pages
Effective Data Visualization Techniques
No ratings yet
Effective Data Visualization Techniques
12 pages
DataScience&Analytics DataVisualiztn
No ratings yet
DataScience&Analytics DataVisualiztn
26 pages
Tableau Assignment Raspinder 1
No ratings yet
Tableau Assignment Raspinder 1
12 pages
DV-Viva-Voice-Data Visualization
No ratings yet
DV-Viva-Voice-Data Visualization
12 pages
Data Visualization
No ratings yet
Data Visualization
24 pages
Tableau Assignment
No ratings yet
Tableau Assignment
12 pages
Types of Chart
No ratings yet
Types of Chart
18 pages
Data Visualization Techniques
No ratings yet
Data Visualization Techniques
11 pages
Scientific Design Choices in Data Visualization
No ratings yet
Scientific Design Choices in Data Visualization
11 pages
How To Choose The Right Data Visualization
100% (3)
How To Choose The Right Data Visualization
26 pages
Chart Types
No ratings yet
Chart Types
20 pages
Unit 5-Data Visualization
No ratings yet
Unit 5-Data Visualization
22 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
2 pages
Unit5 1
No ratings yet
Unit5 1
12 pages
DVP 3
No ratings yet
DVP 3
97 pages
Data Viz
No ratings yet
Data Viz
22 pages
Unit 3 DATA VISUAIZATION
No ratings yet
Unit 3 DATA VISUAIZATION
25 pages
Data Visualisation: Why Is Data Visualization Important?
No ratings yet
Data Visualisation: Why Is Data Visualization Important?
19 pages
Meet Up 1 - Introduction To Data Visualization
No ratings yet
Meet Up 1 - Introduction To Data Visualization
51 pages
Data Visualization
50% (2)
Data Visualization
44 pages
09 Plotting and Visualization
No ratings yet
09 Plotting and Visualization
97 pages
Chapter 3 Non Spatial Data Visualization
No ratings yet
Chapter 3 Non Spatial Data Visualization
45 pages
Data Visualization Techniques
No ratings yet
Data Visualization Techniques
15 pages
Module 2 - Visualizing Statistical Data - V0.1alpha - 13thjul Revised File 1.0 - ID
No ratings yet
Module 2 - Visualizing Statistical Data - V0.1alpha - 13thjul Revised File 1.0 - ID
20 pages
U1T3 - White Paper - Data Visualization Techniques From Basics To Big Data With SAS Visual Analytics
No ratings yet
U1T3 - White Paper - Data Visualization Techniques From Basics To Big Data With SAS Visual Analytics
19 pages
Week 4 Assignment
No ratings yet
Week 4 Assignment
5 pages
Module 4
No ratings yet
Module 4
91 pages
Data Visualisation
No ratings yet
Data Visualisation
22 pages
Unit 2 Chapter 2 Notes - Statistics
No ratings yet
Unit 2 Chapter 2 Notes - Statistics
4 pages
02a EDA and Data Visualization
No ratings yet
02a EDA and Data Visualization
79 pages
DV Unit-I
No ratings yet
DV Unit-I
25 pages
Choosing the Best Data Visualization
100% (3)
Choosing the Best Data Visualization
27 pages
Data Science Unit-5 B.sc. III Sem. MDC
No ratings yet
Data Science Unit-5 B.sc. III Sem. MDC
12 pages
Bda - Rahul Parida
No ratings yet
Bda - Rahul Parida
15 pages
DSS Chapter SEVEN
No ratings yet
DSS Chapter SEVEN
12 pages
10 - Data Analytics and Visualizations
No ratings yet
10 - Data Analytics and Visualizations
18 pages
Fallsem2023-24 Cse3020 Eth5
No ratings yet
Fallsem2023-24 Cse3020 Eth5
55 pages
Datavisualisation 220819011038 A5ac220e
No ratings yet
Datavisualisation 220819011038 A5ac220e
22 pages
DAV Lab Sample
No ratings yet
DAV Lab Sample
21 pages
Business Analytics
No ratings yet
Business Analytics
13 pages
Data Visualization
No ratings yet
Data Visualization
38 pages
Data Visualization
100% (1)
Data Visualization
23 pages
Data Visu Ans
No ratings yet
Data Visu Ans
20 pages
Runit 2
No ratings yet
Runit 2
50 pages
Big Data Visualization and Common Adopattation Issues
No ratings yet
Big Data Visualization and Common Adopattation Issues
34 pages
Data Analysis 3,4 5
No ratings yet
Data Analysis 3,4 5
119 pages
Unit III
No ratings yet
Unit III
105 pages
Data Visualization Shorts
No ratings yet
Data Visualization Shorts
68 pages
MCA - S3 - Data Visualisation - U2
No ratings yet
MCA - S3 - Data Visualisation - U2
17 pages
DV Methods
No ratings yet
DV Methods
6 pages
Visualization Summarization S25 Lec6,7
No ratings yet
Visualization Summarization S25 Lec6,7
98 pages
Lesson 05 Data Visualization
No ratings yet
Lesson 05 Data Visualization
107 pages
Reasearch Methodology and Statistics
No ratings yet
Reasearch Methodology and Statistics
13 pages
Data Visualization Tech.
No ratings yet
Data Visualization Tech.
6 pages
Tableau Charts
No ratings yet
Tableau Charts
3 pages
Daunit 5
No ratings yet
Daunit 5
18 pages
Science Notes
No ratings yet
Science Notes
2 pages
7 Tools of Statistical Process Control
100% (1)
7 Tools of Statistical Process Control
3 pages
Rational Choice 1st Edition Itzhak Gilboa Download Full Chapters
100% (6)
Rational Choice 1st Edition Itzhak Gilboa Download Full Chapters
80 pages
TMR in Teaching Mass Measurement
100% (1)
TMR in Teaching Mass Measurement
11 pages
Quantum Mechanics Exam Questions
No ratings yet
Quantum Mechanics Exam Questions
62 pages
Correlation and Regression
No ratings yet
Correlation and Regression
15 pages
Basic Calculus Guide
No ratings yet
Basic Calculus Guide
324 pages
Mock 21 Abhinay Sharma Sir
No ratings yet
Mock 21 Abhinay Sharma Sir
93 pages
Chapter 4 Motion in Two and Three Dimensions
No ratings yet
Chapter 4 Motion in Two and Three Dimensions
17 pages
Polynomial Curve Fitting
No ratings yet
Polynomial Curve Fitting
44 pages
CECORREL2 - Solution 2
No ratings yet
CECORREL2 - Solution 2
45 pages
9 Projection Geometry
No ratings yet
9 Projection Geometry
124 pages
Literacy Journey from Childhood to High School
No ratings yet
Literacy Journey from Childhood to High School
5 pages
DSP Filter Insights
No ratings yet
DSP Filter Insights
4 pages
Cheat Sheet Template
No ratings yet
Cheat Sheet Template
3 pages
Nonlinear and Time-Dependent Analysis of Continuous Unbonded Pre-Stressed Beam
No ratings yet
Nonlinear and Time-Dependent Analysis of Continuous Unbonded Pre-Stressed Beam
11 pages
Aujero 2019 J. Phys. Conf. Ser. 1180 012003
No ratings yet
Aujero 2019 J. Phys. Conf. Ser. 1180 012003
10 pages
Biostatistics Previous Year Question Papers
No ratings yet
Biostatistics Previous Year Question Papers
6 pages
Activity Sheets: Quarter 3 - MELC 19
100% (1)
Activity Sheets: Quarter 3 - MELC 19
14 pages
Enhanced GWO for Heart Disease Prediction
No ratings yet
Enhanced GWO for Heart Disease Prediction
13 pages
Lec STARCCM FoundationTrainingV2.0
No ratings yet
Lec STARCCM FoundationTrainingV2.0
342 pages
Global Governance Why What Whither 1st Edition Thomas G. Weiss PDF Available
100% (2)
Global Governance Why What Whither 1st Edition Thomas G. Weiss PDF Available
87 pages
Data Interpretation IBPS PO PDF Set 6
No ratings yet
Data Interpretation IBPS PO PDF Set 6
38 pages
Giữa kỳ lớp 6-giải
No ratings yet
Giữa kỳ lớp 6-giải
35 pages
Heizer Om13 TB MD
No ratings yet
Heizer Om13 TB MD
34 pages
Genetic Algorithms in Manufacturing
No ratings yet
Genetic Algorithms in Manufacturing
5 pages
Dbms Practical File PDF
No ratings yet
Dbms Practical File PDF
35 pages
Paper 2 - 2017 - Grade 08 Mathematics Second Term Test Paper in Western Province
No ratings yet
Paper 2 - 2017 - Grade 08 Mathematics Second Term Test Paper in Western Province
6 pages
Number Theory
No ratings yet
Number Theory
24 pages
Circles A
No ratings yet
Circles A
24 pages