See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/389064233
Introduction to Data Analytics. Chapter 4. Data Visualisation
Preprint · February 2025
DOI: 10.13140/RG.2.2.17068.58245
CITATIONS READS
0 182
1 author:
Francisco Gonzalez-Longatt
Loughborough University
567 PUBLICATIONS 5,527 CITATIONS
SEE PROFILE
All content following this page was uploaded by Francisco Gonzalez-Longatt on 17 February 2025.
The user has requested enhancement of the downloaded file.
Chapter 4
Chapter
4
D ata V isualisation
F. Gonzalez-Longatt
4.1 D ata V isualisations
One of the most valuable exercises when exploring new datasets is plotting the data somehow. Data
visualisation is the graphical representation of information and data. Using visual elements like charts, graphs,
and maps, data visualisation tools provide an accessible way to see and understand trends, outliers, and patterns
in data.
The benefits of data visualisation include:
• Improved comprehension: Visuals make complex data easier to understand.
• Quick insights: Patterns and trends become apparent quickly.
• Better communication: Visuals can convey information more effectively than text alone.
• Enhanced decision-making: Clear visuals can support better analysis and informed decisions.
4.2 B ar C harts
A bar chart, also known as a bar graph, is a type of graphical representation used to display categorical data.
They are used in a graphical representation used to display and compare the frequency, count, or any other
value of different categories. It consists of rectangular bars, where the length of each bar is proportional to the
value it represents (see Figure 4.1).
Figure 4.1. B ar charts are good for sum m arising data w ithin categories. This show s the installed
electricity capacity in G B for the period 1996 — 2018 1 .
1 Source: https://assets.publishing.service.gov.uk/media/5e184c1de5274a06bba64e37/UK_Energy_in_Brief_2019.pdf
Introduction to Data Analytics
Example: Create a bar plot for three category data Group1 = [1, 1.5, 7], Group2 = [2, 2.3, 4], Group3 = [9,
4, 5]. Write a Python code to create the bar plot using matplotlib2.
Solution:
import numpy as np
import matplotlib.pyplot as plt
barWidth = 0.25
fig = plt.subplots(figsize =(12, 8))
# plt.subplots: Creates a figure and a grid of subplots with a single call
# Create a dataset
Group1 = [1, 1.5, 7]
Group2 = [2, 2.3, 4]
Group3 = [9, 4, 5]
All rights reserved. No part of this publication may be reproduced or distributed in any form without permission of the author. Copyright © 2008-2025. http:www.fglongatt.org
br1 = np.arange(len(Group1))
br2 = [x + barWidth for x in br1]
br3 = [x + barWidth for x in br2]
# matplotlib.pyplot.bar: The bars are positioned at x with the given alignment.
# Their dimensions are given by height and width. The vertical
# baseline is bottom (default 0).
plt.bar(br1, Group1, color ='r', width = barWidth,
edgecolor ='grey', label ='Group 1')
plt.bar(br2, Group2, color ='g', width = barWidth,
edgecolor ='grey', label ='Group 2')
plt.bar(br3, Group3, color ='b', width = barWidth,
edgecolor ='grey', label ='Group 3')
plt.xlabel('Category', fontweight ='bold', fontsize = 15)
plt.ylabel('Value', fontweight ='bold', fontsize = 15)
plt.xticks([r + barWidth for r in range(len(Group1))],
[‘Category 1’, ‘Category 2’, ‘Category 3’])
plt.legend()
plt.show()
Source: Data_Visu_Bar_plot.py
Figure 4.2. B ar plot created using P ython C ode.
Example: Create a bar plot for three category data Group1 = [1, 1.5, 7], Group2 = [2, 2.3, 4], Group3 = [9,
4, 5]. Write a MATLAB code to create the bar plot.
Solution:
data = [1 2 9; 1.5 2.3 4; 7 2 5];
% Create the bar chart
bar(data);
% Add labels and title
xlabel('Category');
ylabel('Values');
title('Grouped Bar Chart');
legend(‘Group 1’, ‘Group 2’, ‘Group 3’);
% Set the x-tick labels
set(gca, ‘XTickLabel’, {‘Category 1’, ‘Category 2’, ‘Category 3’});
Source: Data_Statistic_Visualisation_v1.mlx
2 https://matplotlib.org/
Dr. F. Gonzalez-Longatt 4.Data_Visualisation_v2.docx Page 2
Introduction to Data Analytics
Figure 4.3. B ar plot created using M A TLA B C ode.
All rights reserved. No part of this publication may be reproduced or distributed in any form without permission of the author. Copyright © 2008-2025. http:www.fglongatt.org
4.3 P ie C harts
A pie chart is a circular statistical graphic that is divided into slices to illustrate numerical proportions. It is
divided into slices to represent different categories, with each slice’s size proportional to the quantity it represents.
Figure 4.4. P ie charts are good for show ing the proportions of data ascribed to different categories.
N ote that w e do not know anything about the absolute sizes unless told. The total electricity generated in
2017 w as 338.2 TW h, while in 2018, it w as 332.9 TW h 1 .
Example: Create a pie chart category data Group1 = [1, 1.7, 7]. Write a Python code to create the bar plot
using matplotlib3.
import matplotlib.pyplot as plt
# Data
data = [1, 1.7, 7]
labels = [‘Category 1’, ‘Category 2’, ‘Category 3’]
colors = ['#ff9999','#66b3ff','#99ff99']
# Create pie chart
plt.pie(data, labels=labels, colors=colors, autopct='%1.1f%%', startangle=140)
# Add title
plt.title('Pie Chart of Given Data')
# Show plot
plt.axis(‘equal’) # Equal aspect ratio ensures the pie chart is circular
plt.tight_layout()
plt.show()
Source: Data_Visu_Pie_Chart.py
Figure 4.5. P ie C hart created using P ython C ode.
3 https://matplotlib.org/
Page 3 4.Data_Visualisation_v2.docx Dr. F. Gonzalez-Longatt
Introduction to Data Analytics
Example: Create a pie chart category data Group1 = [1, 1.7, 7]. Write a MATLAB code to create a pie chart
for the data.
data = [1, 1.7, 7];
% Labels for the categories
labels = {‘Category 1’, ‘Category 2’, ‘Category 3’};
% Create the pie chart
pie(data, labels);
% Add title
title('Pie Chart with Three Categories');
Source: Data_Statistic_Visualisation_v1.mlx
All rights reserved. No part of this publication may be reproduced or distributed in any form without permission of the author. Copyright © 2008-2025. http:www.fglongatt.org
Figure 4.6. P ie C hart created using M A TLA B C ode.
4.4 Scatter C harts
A scatter plot is a data visualisation type that typically displays values for two variables as a collection of
points. Each point represents an observation in the dataset, with its position determined by the values of the
two variables.
The scatter plot is a great way to understand whether there is a correlation between variables. Figure 4.5
shows the relationship between Temperatures and Electricity Load. However, this relationship’s form is unclear,
and further separation of the points (for example, into night and day) may be useful.
Figure 4.7. Scatter charts of Load (kW ) versus Tem perature (°C ).
Example: Create a scatter plot of the data represented by x = [1, 2, 3, 4, 5], y = [2, 4, 1, 8, 7]. Write a Python
code to create a scatter plot for the data using matplotlib.
Solution:
import matplotlib.pyplot as plt
# Data
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 8, 7]
# Create scatter plot
plt.scatter(x, y, color='blue', marker='o')
# Add title and labels
plt.title('Scatter Plot of Given Data')
plt.xlabel('X values')
plt.ylabel('Y values')
# Show plot
plt.tight_layout()
plt.show()
Source: Data_Visu_scatter_plot.py
Dr. F. Gonzalez-Longatt 4.Data_Visualisation_v2.docx Page 4
Introduction to Data Analytics
Figure 4.8. Scatter plot created using P ython C ode.
Example: Create a scatter plot of the data represented by x = [1, 2, 3, 4, 5], y = [2, 4, 1, 8, 7]. Write a
All rights reserved. No part of this publication may be reproduced or distributed in any form without permission of the author. Copyright © 2008-2025. http:www.fglongatt.org
MATLAB code to create a scatter plot for the data.
% Example data
x = [1, 2, 3, 4, 5];
y = [2, 4, 1, 8, 7];
% Create the scatter plot
figure;
scatter(x, y, 'filled');
% Add labels and title
xlabel('X-axis Label');
ylabel('Y-axis Label');
title('Scatter Plot Example');
% Customise the appearance (optional)
grid on; % Add grid lines
Source: Data_Statistic_Visualisation_v1.mlx
Figure 4.9. Scatter plot created using M A TLA B C ode.
4.5 H istogram s
A histogram is a graphical representation used to display the distribution of a dataset. It consists of rectangular
bars (bins) representing the frequency of data points within specific ranges or intervals.
Histograms can be used to understand the distribution of data. They are incredibly useful when dealing with
large datasets since scatter plots can sometimes be very tedious to plot due to computational limitations when
there are many data points. A normalised histogram is often used as an approximation for the data PDF.
Figure 4.10. H istogram s of load per hour (kW h).
Page 5 4.Data_Visualisation_v2.docx Dr. F. Gonzalez-Longatt
Introduction to Data Analytics
4.6 B oxplots
A boxplot, also known as a box-and-whisker plot, is a standardised way of displaying the distribution of data
based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It
provides a visual representation of the central tendency, spread, and variability of a dataset. Boxplots are useful
for understanding the spread of data and can be grouped into different categories.
All rights reserved. No part of this publication may be reproduced or distributed in any form without permission of the author. Copyright © 2008-2025. http:www.fglongatt.org
Figure 4.11. B oxplot of hourly tem peratures grouped by m onth for London in 2013.
4.7 D uration C urves
A duration curve is a graphical representation used primarily in hydrology, power systems, and other fields
to show the distribution of a specific variable over time. It displays the relationship between the magnitude of
the variable (e.g., streamflow, power generation) and the percentage of time that the variable is equalled or
exceeded. Load duration curves are essential for visualising load data. Essentially, these are just ordered plots of
the loads, typically throughout the year.
Figure 4.7 shows the load duration curve, which indicates the number of hours the load is below a particular
value in the year.
Figure 4.12. Load duration curve.
Dr. F. Gonzalez-Longatt 4.Data_Visualisation_v2.docx Page 6
Introduction to Data Analytics
Table of Contents
4 ........................................................................................................................................................................................... 1
4.1 Data Visualisations .................................................................................................................................................................................... 1
4.2 Bar Charts .................................................................................................................................................................................................. 1
4.3 Pie Charts ................................................................................................................................................................................................... 3
4.3.1 Scatter Charts.................................................................................................................................................................................. 4
4.3.2 Histograms....................................................................................................................................................................................... 5
4.3.3 Boxplots ........................................................................................................................................................................................... 6
4.3.4 Duration Curves .............................................................................................................................................................................. 6
All rights reserved. No part of this publication may be reproduced or distributed in any form without permission of the author. Copyright © 2008-2025. http:www.fglongatt.org
Page 7 4.Data_Visualisation_v2.docx Dr. F. Gonzalez-Longatt
Introduction to Data Analytics
R epositories
Em ail
fglongatt@fglongatt.org
G itH ub (P roject repository)
All rights reserved. No part of this publication may be reproduced or distributed in any form without permission of the author. Copyright © 2008-2025. http:www.fglongatt.org
https://github.com/fglongatt
R esearchG ate (P roject repository)
https://www.researchgate.net/profile/Francisco_Gonzalez-Longatt
D OI: 10.13140/R G .2.2.10986.84163
D ocum ent H istory
V ersion R esponsible A ction D ate
0 FGL Creation 09/01/2025
1 FGL 17/02/2025
2 FGL DOI created! 17/02/2025
The documents are created by Francisco M. Gonzalez-Longatt and contain copyrighted material, trademarks, and other proprietary information. All
rights reserved. No part of the documents may be reproduced or copied in any form or by any means - such as graphic, electronic, or mechanical, including
photocopying, taping, or information storage and retrieval systems without the prior written permission of Francisco M. Gonzalez-Longatt. The use of
these documents by you, or anyone else authorised by you, is prohibited unless expressly permitted by Francisco M. Gonzalez-Longatt. You may not alter
or remove any trademark, copyright, or other notice from the documents. The documents are provided “as is”, and Francisco M. Gonzalez-Longatt shall
not have any responsibility or liability whatsoever for the results of the use of the documents by you.
Dr. F. Gonzalez-Longatt 4.Data_Visualisation_v2.docx Page 8
View publication stats