67%(3)67% found this document useful (3 votes) 6K views320 pagesData Analysis in Microsoft Excel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Deliver Awesome Analytics in
3 Easy Steps
Using VLOOKUPS,
Pivot Tables, Charts And More
A | |
i AaraAAALE
\ » |
fa
a
i
|
Exce
Alex HollowayDATA ANALYSIS IN
MICROSOFT EXCEL
DELIVER AWESOME ANALYTICS IN 3
EASY STEPS USING VLOOKUPS, PIVOT
TABLES, CHARTS AND MORE
ALEX HOLLOWAY© Copyright 2023 by Alex Holloway - All rights
reserved.
The content within this book may not be reproduced,
duplicated, or transmitted without direct written
permission from the author or the publisher.
Under no circumstances will any blame or legal
responsibility be held against the publisher, or author,
for any damages, reparation, or monetary loss due to the
information contained within this book, either directly or
indirectly.
Legal Notice:
This book is copyright protected. It is only for personal
use, You cannot amend, distribute, sell, use, quote, or
paraphrase any part of the content within this book,
without the consent of the author or publisher.
Disclaimer Notice:Please note the information contained within this
document is for educational and entertainment purposes
only. All effort has been expended to present accurate, up-
to-date, reliable, and complete information. No warranties
of any kind are declared or implied. Readers acknowledge
that the author is not engaged in the rendering of legal,
financial, medical, or professional advice. The content
within this book has been derived from various sources.
Please consult a licensed professional before attempting
any techniques outlined in this book.
By reading this document, the reader agrees that under
no circumstances is the author responsible for any losses,
direct or indirect, that are incurred as a result of the use
of the information contained within this document,
including, but not limited to, errors, omissions, or
inaccuracies.This book is dedicated to Meg, who
inspired me to follow my dreams.CONTENTS
Introduction
1 Getting Started
1. What Is Excel and Why Is It Useful?
2. What Is Data and Data Analysis?
3, The 3-Step System in Data Analysis
um. Let’s Go
4, Getting to Know Excel
. The Project
. Prepare
. Analyse
. Consider
w
oN OD
m. Over to You
9. Complete a Project
10. What Next?
Notes
Bibliography
About the AuthorINTRODUCTION
Are you tired of spending hours sifting through data
in Excel, trying to find insights and patterns? Do you
feel overwhelmed by the amount of information and
unsure of where to start? You're not alone. Data anal-
ysis can be a daunting task, but it's a necessary one
in today's world of business and technology. In fact,
data is created in huge volumes every day, and the
direction of travel is one way: up! Did you know:
- An estimated 1.145 trillion megabytes of
data are produced daily?!
- By 2025, the total amount of data
created, captured, copied and consumedglobally is forecast to reach over 180
zettabytes (that’s 180 billion terabytes or
1,800,000,000,000,000 gigabytes)?2
That's why I wrote Data Analysis in Microsoft Excel to
help you navigate the complex world of data analysis
in Excel. In this book, you'll learn how to use Excel to
organise and analyse data, identify trends and pat-
terns, and make data-driven decisions.
With this book, you'll not only save time but also
gain a competitive advantage in your career. You'll
be able to present data in a meaningful way to stake-
holders, make more informed decisions and, ulti-
mately, drive business success. If you are a student,
this book will help you better utilise data in your
studies, come to better conclusions and be able to vi-
sualise and describe your observations.
As an experienced data analyst and Excel user, I un-
derstand the challenges you face and the pain you
experience. That's why I've written this book in away that's easy to understand and follow, with prac-
tical examples and step-by-step instructions.
In Data Analysis in Microsoft Excel, you'll learn how
to create powerful data visualizations, use Excel
functions to write formulas, and much more. You'll
also gain a deeper understanding of data analysis
concepts and best practices.
So, whether you're a beginner or an experienced
Excel user, this book is for you. I'm confident that by
the end of the book, you'll feel empowered to tackle
any data analysis project that comes your way. Let's
get started!PART ONE
GETTING STARTED
Data analysis can be performed using a variety of
tools and techniques, including Excel, R, Python,
and SQL. It is important to note that the process
of data analysis requires a combination of technical
skills and domain (subject area) knowledge, as well
as critical thinking, creativity and the ability to com-
municate the results effectively.
In Part I, we will learn about Excel itself, what we
mean by data and data analysis, and introduce the 3-
Step System for producing awesome analysis.CHAPTER 1
WHAT IS EXCEL AND
WHY IS IT USEFUL?
"Excel is the Swiss Army knife of business
tools."
GUY KAWASAKI, AMERICAN
MARKETING SPECIALIST, AUTHOR, AND
SILICON VALLEY VENTURE CAPITALIST
xcel is a powerful tool for managing and
E analysing data. It is a spreadsheet program
that allows users to organize and manipulate data
in a variety of ways. A spreadsheet is a computer
application that simulates a paper worksheet, whereusers can input data, perform calculations and cre-
ate charts and graphs.
One of the reasons Excel is so popular is that it is
user-friendly and easy to learn. It is also versatile
and can be used for a wide range of tasks, from basic
data entry to complex financial modelling. Some of
the main reasons Excel is used in data analysis in-
clude the ability to organize and analyse data, create
charts and graphs, and perform complex calcula-
tions.
In addition to its many uses in business, Excel can
also be used for personal and educational purposes.
For example, it can be used to create a budget, track
expenses, or even plan a vacation. The possibilities
are endless, and with a little creativity, you can use
Excel to solve a wide range of problems.
SPREAD... WHAT?
A spreadsheet is a software program allowing usersto organise, analyse, and store data in a tabular
format. The data can be manipulated, sorted, and
calculated using formulas and presented in various
forms, such as charts and graphs. Microsoft Excel
is the most popular spreadsheet software, but other
options are available, such as Google Sheets and
Apple Numbers.
The concept of a spreadsheet dates back to the
1960s when computer scientists at MIT (Massachu-
setts Institute of Technology) developed the first
electronic spreadsheet program called VisiCalc. This
program allowed users to create spreadsheets on a
computer rather than on paper, which made data
manipulation and analysis much more efficient.
Since then, spreadsheet software has evolved to in-
clude a wide range of features and capabilities. In the
1980s and 1990s, Lotus 1-2-3 was a popular appli-
cation which has since been eclipsed in popularity
by Microsoft’s Excel, owing in no small part to its
bundling within the Office suite of products and thewidescale adoption of Microsoft software in organi-
sations in all sectors.
Spreadsheets are widely used in both personal and
professional settings. In a business setting, they are
often used to organize and analyse financial data,
such as budgeting and forecasting. They can also
be used to track and analyse sales data, inventory
levels, and customer information. In a personal set-
ting, spreadsheets can be used for tasks such as bud-
geting, tracking personal expenses, and creating a
household inventory.
One of the key benefits of using a spreadsheet
is the ability to perform calculations on the data.
Spreadsheets use formulas, which are a set of in-
structions that tell the software how to manipulate
the data. Formulas can be used to perform calcula-
tions such as adding, subtracting, multiplying, and
dividing. They can also be used to perform more
complex calculations, such as finding the average of
a set of numbers or calculating a running total.Another benefit of spreadsheets is the ability to cre-
ate charts and graphs that can help to visualize the
data. These charts and graphs can be used to quickly
identify trends and patterns in the data. They can
also be used to communicate the data to others, such
as in a business setting where a manager may need
to present the data to a team or upper management.
Spreadsheets also have the ability to store and or-
ganize large amounts of data. This can be a great
advantage in a business setting as it allows for easy
access to historical data, which can be used for fore-
casting, budgeting and analysis. In a personal set-
ting, it can be used to store and organize information
such as household expenses, your shopping list or
how much you spent while away travelling.
Overall, spreadsheets are an essential tool for
organizing, analysing, and presenting data. They
are widely used in both personal, educational, and
professional settings and have many useful features
suited to relevant contexts. With the help of spread-
sheets, data can be transformed into valuable infor-mation that can be used to make decisions and im-
prove outcomes.
WHY IS EXCEL SO POPULAR?
One reason for its popularity is its integration with
other programs in the Microsoft Office suite. Excel
is included in the Microsoft Office suite and is often
used in conjunction with other programs such as
Word and PowerPoint. This makes it easy for users to
work on documents and presentations that require
data analysis and visualization.
Additionally, Excel has been around for a long time
and has a large user base. It has been on the market
for over 30 years, meaning that many people have
learned how to use it and are familiar with its inter-
face. This makes it easy for new users to learn and
for experienced users to continue to use Excel.
Another reason for its popularity is its wide range
of applications. Excel is used in many different in-dustries and for many different purposes. It is used
in finance, accounting, marketing, and many other
fields. It is also widely used in businesses of all sizes
and in both public and private sectors.
Finally, Excel is widely supported by third-party
software and tools. Many Excel add-ins, templates,
and other tools are available to extend its function-
ality. This makes it easy for users to customize Excel
to meet their specific needs and automate repetitive
tasks.
So, Excel's popularity is due to a combination of its
wide range of features and capabilities, integration
with other programs in the Microsoft Office suite,
long history, wide range of applications, and wide
support from third-party software and tools. This
makes it a versatile and powerful tool that is widely
used and well-liked among users.CHAPTER SUMMARY/KEY TAKEAWAYS.
- Excel is a popular spreadsheet program that
allows users to organize and manipulate data
in a variety of ways.
-It is user-friendly and easy to learn, with a
wide range of uses from basic data entry to
complex financial modelling.
- Excel can be used for personal, educational,
and professional purposes, including
budgeting, tracking expenses, and data
analysis.
+ It is integrated with other programs in the
Microsoft Office suite, making it easy for users
to work on documents and presentations that
require data analysis and visualization.
+ Excel has been around for over 30 years and
has a large user base, making it easy for newusers to learn and for experienced users to
continue to use it.
-It is used in many different industries and
for many different purposes and is widely
supported by third-party software and tools.
In the next chapter, we will explore two key ques-
tions: What is Data? And What is Data Analysis?
Excel is our tool of choice, but understanding Data
and the art and practice of Data Analysis is the key
skill of the data analyst.CHAPTER 2
WHAT IS DATA AND
DATA ANALYSIS?
ata is all around us. It is the information
we use to make decisions, understand pat-
terns, and make predictions. Simply put, data is a
collection of facts and figures that can be analysed
to reveal insights and trends. It can come in many
forms, such as numbers, words, or images. For exam-
ple, a weather app on your phone uses data to show
you the temperature, humidity, and forecast for
your location. Online retailers use data to track cus-
tomer behaviour and recommend products. Social
media platforms use data to show you which posts
are most popular among your friends. All these ex-
amples show how data is used in our everyday lives.In this chapter, we will explore the basics of data and
how it can be analysed to gain valuable insights.
WHERE DOES DATA COME FROM?
Data is information that we record and store for
various reasons. It can come from a wide range of
sources, such as observations, surveys, experiments,
and measurements. For example, a weather station
might record temperature, humidity, and precipita-
tion levels, while a retail store might record sales
numbers and customer information. The data can be
collected by individuals, organisations, or even ma-
chines. The reason for recording and storing data
can vary, but it is often done for simple record keep-
ing, to help make decisions, track progress, or better
understand a particular phenomenon. Data can be
used for things like tracking inventory, monitoring
customer behaviour, or even guiding public policy
decisions. Essentially, data helps us make sense ofthe world around us and make more informed deci-
sions.
Did you know your weekly food shopping list jotted
down on a scrap of paper is actually data? You may
not be able to easily analyse this data or even want or
need to, but you have created data (the items on the
list) via a process of data entry (pen and paper) and
stored it (the piece of paper). With enough scraps
of paper and knowledge of when they were writ-
ten, you could analyse your favourite foods, estimate
when you would run out of a particular item or pre-
dict the price of your weekly food bill.
DATA CREATION AND COLLECTION
Data is created by a variety of methods. Typically,
these fall into the following two categories:
1. Administrative/transactional - this data is
collected for record keeping and is generallyessential to the functioning of the business,
school, charity, or system in question.
Examples would include your grocery store
recording what you bought, how much you
spent and whether you used your loyalty card.
2. Curated - this data has been consciously
gathered to serve some purpose. Unlike
administrative/transactional data, it is
typically information which is not readily
available through our day-to-day interactions
as customers or users of a service. An example
of this would be how a group of consumers
felt about a new range of products the grocery
store released this year.
How much you spent and on what is available to the
grocery store because it needs this information to
charge you at the till and to manage the stock levels
on the shelves. It is critical to capture this informa-
tion. However, doing so is also relatively easy as each
item is scanned at the till.Conversely, how you feel about that new pizza you
bought last week is not information that the store
has access to. It can make assumptions about how
popular the pizza is by using its sales data, but it
can never know what you liked about it and how it
could be even better. It is also less convenient for the
store to gather this information as a) at the point of
buying the item, you haven't yet tried it, b) even if
you could explain why you bought it, it would hold
up the next customer, and c) people don’t necessarily
want to publicly discuss their buying decisions with
a stranger!
Here are some familiar examples of administrative/
transactional data collection you may know:
- Computer Systems: Businesses can collect
data from various computer systems, such
as website analytics, customer relationship
management (CRM) systems, and enterprise
resource planning (ERP) systems. This
data can include information on websitetraffic, customer interactions, and financial
transactions.
- Social Media: Businesses can collect data
from various social media platforms such
as Facebook, Twitter, and Instagram. This
data can include information on customer
sentiment, brand mentions, and engagement
metrics.
+ Mobile Apps: Businesses can collect data from
mobile apps used by customers. This data can
include information on app usage, location,
and customer behaviour.
- Electronic Point of Sale (POS): Businesses
can collect data from electronic point of sale
systems used in retail stores and restaurants.
This data can include information on sales,
inventory, and customer behaviour.
+ Internet of Things (IoT) Devices: Businesses
can collect data from IoT devices such as
sensors, cameras, and smart appliances. This
data can include information on customer
behaviour, environment, and usage patterns.Here are some examples of curated data collection
you may know or even have participated in:
+ Surveys: which are used to gather information
from a large group of people through self-
administered questionnaires.
+ Interviews: which are used to gather in-depth
information from a smaller group of people
through face-to-face or phone conversations.
«Focus Groups: which are used to gather
information from a group of people through
moderated discussion.
- Observations: which are used to gather
information by observing people and their
behaviour in a natural setting.
- Experiments: which are used to gather
information by manipulating one or more
variables and measuring the effect ona
specific outcome.«Document Review: which are used to
gather information by reviewing existing
documents.
It's worth noting that both types of data collection
can be used together and complement each other.
For example, administrative data can be used to
identify patterns and trends, while curated data can
be used to gather more in-depth information and
context.
SO WHAT IS DATA ANALYSIS, THEN?
Data analysis is the process of evaluating, organiz-
ing, and interpreting data in order to extract useful
information and insights. It involves a variety of
techniques and methods for cleaning, transforming,
and modelling data, as well as visualizing and com-
municating the results. The goal of data analysis is to
identify patterns, trends, and relationships withinthe data that can be used to make informed deci-
sions and improve organizational performance.
Data analysis can be applied to a wide range of fields
and industries, including business, finance, health-
care, and science. For example, in business, data
analysis can be used to identify customer segments,
predict sales, and optimize marketing strategies. In
finance, data analysis can be used to identify market
trends, evaluate investments, and manage risk. In
healthcare, data analysis can be used to improve pa-
tient outcomes, reduce costs, and identify areas for
improvement.
There are several different types of data analysis, in-
cluding descriptive, diagnostic, predictive, and pre-
scriptive. Descriptive analysis is used to summarize
and describe the data, such as calculating means,
medians, and standard deviations. Diagnostic anal-
ysis is used to identify potential issues or prob-
lems within the data, such as identifying outliers or
missing values. Predictive analysis is used to make
predictions about future events or outcomes, suchas forecasting sales or identifying potential fraud.
Prescriptive analysis is used to recommend actions
or decisions based on the data, such as identifying
the best marketing strategy or the most efficient
supply chain. Some of these are considered advanced
techniques and are thus not covered in this book.
However, once you master the basics, you will be
well positioned to move on to these more advanced
practices.
Data analysis can be done using a variety of tools
and techniques, including Excel, R, Python, and SQL.
It is important to note that the process of data
analysis requires a combination of technical skills
and domain knowledge, as well as critical thinking,
creativity and ability to communicate the results
effectively. Excel is often considered the first tool in
this toolset with which to gain experience and skill
and remains useful even to an experienced Data
Scientist.WHAT CAN YOU DO WITH IT?
Effective Data Analysis is very valuable to an organ-
isation trying to make the best use of its resources:
whether to make a profit or simply to have the great-
est positive impact on its stakeholders. Thousands or
millions of dollars can be added to a business’ rev-
enue or saved from its costs by the effective use of
the data analyst’s skillset and tools. How is that pos-
sible, you might ask?
For example:
- Targeting customers more effectively: by
analysing data on customer demographics,
purchase history, and browsing behaviour,
a business can identify which segments
of its customer base are most likely to be
interested in its products or services. This
allows the business to focus its advertisingand marketing efforts on those segments,
resulting in a more efficient use of resources
and higher conversion rates. For example,
a fashion retailer may discover that a large
portion of its customer base is composed of
young professional women and can then tailor
its advertising and product offerings to that
demographic, resulting in increased sales and
customer loyalty.
Reducing operational costs: by analysing
data on business operations, a company can
identify areas that are not performing as well
as others and make adjustments to reduce
costs. For example, a manufacturing company
may use data analysis to determine that a
certain production line is not operating at
optimal efficiency, resulting in increased costs
and lower output. The company can then
make changes to that production line, such
as reallocating resources or investing in new
equipment to improve efficiency and lower
costs, resulting in increased profitability.«Improving problem-solving: by analysing
data, a business can make more informed
decisions and avoid costly pitfalls. For
example, a retail company may use data
analysis to identify that a particular product
line is not selling well. By analysing data on
the product's sales and customer feedback, the
company can make a decision to discontinue
the product, resulting in cost savings anda
more efficient use of resources.
- Developing new products: by collecting and
analysing data, a business can obtain more
accurate information that can inform its
future strategies and plans. For example,
a food and beverage company may use
data analysis to determine the customer
preferences and trends of the market. Based
on that information, the company can develop
new products that align with customer
preferences, resulting in increased sales and
customer loyalty.+ Identifying and addressing customer
complaints and feedback: by collecting and
analysing customer feedback, a business
can identify common complaints and
issues and take steps to address them. For
example, a restaurant may use data analysis
to identify that a high number of customers
are complaining about long wait times. The
restaurant can then take steps to address
the issue, such as hiring additional staff or
implementing a reservation system, resulting
in improved customer satisfaction and
retention.
- Improving logistics and supply chain
management: by analysing data on inventory
levels, sales, and shipping times, a business
can identify bottlenecks and inefficiencies
in its logistics and supply chain operations.
For example, a distributor may use data
analysis to identify that a certain product
is consistently out of stock, resulting in lost
sales. The distributor can then work with itssuppliers to improve delivery times or increase
inventory levels, resulting in improved
customer service and increased sales.
The answer then is found in understanding and in-
sight, which leads to high quality decision-making,
and this applies not just in business but also in areas
from academic research, public health and your per-
sonal finances.
CHAPTER SUMMARY/KEY TAKEAWAYS
+ Data is collected and stored for many reasons
—much of this is to enable an organisation to
function, but it can be deliberately collected to
find out more about a phenomenon or group
of people.
- Data analysis is the process of evaluating,
organising, and interpreting data in order to
extract useful information and insights.+ There are several different types of data
analysis, including descriptive, diagnostic,
predictive, and prescriptive. They all seek to
better understand the world or make better
decisions through the use of data.
- Effective data analysis can have a huge
impact on an organisation, whether through
the better use of limited resources or the
identification of new opportunities to attract
customers or develop products, and so on.CHAPTER 3
THE 3-STEP SYSTEM IN
DATA ANALYSIS
W hatever your trade, it is likely the process
by which you take an idea or some raw ma-
terials to a finished product—one which contains a
number of key steps. For example, making an item of
clothing would involve initially considering who it is
for, what size they are and what they want. Second,
you might come up with a design on paper and show
it to your friend or customer and get their views on
it. If they’re happy, you can carry on sourcing some
materials, making some initial cuts, pinning the ma-
terial together and start seeing the garment's shape
appear. Next, you might check in with the customer
and establish if you’re heading in the right direc-tion with it. Hopefully, you've interpreted what they
wanted correctly and can keep on stitching away
until you produce the finished item. In an ideal
world, they’ll be delighted with what you produce,
and it won’t require any adjustment, but typically
there’ll be something you need to tweak to make it fit
just right. The better you know your customer, the
easier it will be to get it right the first time, but this
takes time and experience and an understanding of
what they like, their body shape and style. You might
get lucky from time to time, but you shouldn’t really
rely on luck! Instead, true craftsmanship is a process
by which you minimise wasted effort and produce
excellent results in a repeatable way.
The process of data analysis always follows 3 key
steps: Prepare, Analyse, and Consider.
PREPARE
Prepare is the first step in this process. We considerwhat we are doing, why, for whom and, therefore,
how it might be achieved. It can be tempting to con-
sider “How?” first when asked to do something but
resist the urge to act before deciding how to place
your first step.
An excellent place to start your thinking is with the
problem statement. What is a problem statement?
A problem statement is a concise description of
a problem or issue that needs to be addressed. It
should clearly define the problem, its significance,
and the desired outcome. A good problem statement
should be:
1. Specific: The problem should be clearly
defined and focused without being too broad
or vague.
2. Measurable: The problem should be
measurable so that progress towards resolving
it can be tracked.3. Relevant: The problem should be relevant and
important and have a direct impact on the
organisation or individual.
4. Time-bound: The problem should have a clear
deadline or timeframe for resolution.
5. Objective: The problem statement should be
free of personal opinions or biases and instead
be based on facts and evidence.
A problem statement should be no more than a
few sentences long and provide a clear and concise
description of the problem. It should also include
information on the scope of the problem and any
relevant constraints or limitations. These examples
from across different industries might give some in-
sight into what this looks like:
1, Customer Service: "There is a significant
increase in customer complaints about long
wait times at our restaurant, leading to
decreased customer satisfaction and repeatbusiness. We need to find a solution to reduce
wait times and improve customer experience
by the end of Q2 2023”.
2. Healthcare: "Patients at our hospital
are experiencing long wait times for
appointments with specialists, leading to
decreased satisfaction and decreased patient
retention. We need to find a solution to reduce
wait times and improve patient experience by
the end of Q4 2022”.
3. Retail: "Our store is facing a decline in sales
due to increased competition from online
retailers. We need to find a solution to increase
sales and improve customer engagement by
the end of Q3 2023”.
4. Education: "Student performance in
mathematics at our school is consistently
below average, leading to a low graduation
rate in STEM fields. We need to find a
solution to improve student performance in
mathematics by the end of the 2022-2023
school year”.5. Manufacturing: "Our manufacturing plant
is facing increased production costs due to
outdated equipment and processes. We need
to find a solution to reduce production costs
and improve efficiency by the end of Q2 2023”.
6. Government: "The city is facing a high rate of
traffic congestion during rush hour, leading to
increased travel time and decreased commuter
productivity. We need to find a solution
to reduce traffic congestion and improve
commuter experience by the end of Q4 2022”.
As a data analyst, you often will not be handed a
problem statement. More often, it will be for you to
ask or determine what the context for a request of
you actually entails. This contextualisation of what
you are being asked to do is one of the key non-
technical skills of the role. You may even find that
the person asking you to do something doesn’t really
know why they are asking you to do it! Being able to
dig a little deeper and ask around is key to minimis-ing wasted effort. Some assumptions are useful in
life—others are not.
Having framed your analysis and understood what
you are trying to do and why, you can next consider
for whom you are doing this analysis. What we
are considering here is audience. The person or peo-
ple who will be reviewing your analysis are typically
looking to learn something and then take action.
In order for your work to be effective, it is worth
considering what your ‘customer’ can do with the
output: are they the CEO or a team leader in an
operations team? Taking the example of reducing
wait times at a restaurant, the CEO of the restau-
rant chain and the shift manager who schedules
how many staff have to work at any one time have
different interests and “levers” (what actions they
can take within the scope of their role and respon-
sibilities). Whilst your analysis might lead you to
conclude either way that “shifts are poorly sched-
uled, and we are often understaffed”, you wouldn't
have approached the shift manager’s request fromthe perspective that expanding the restaurant or
moving to new premises would be in scope. The
shift manager would have been much more inter-
ested in understanding, say, staff-to-booking ratios
throughout the day compared to another restaurant
in the chain where complaints were lower—as one
of the levers they control in the role is how many
staff to schedule. They cannot acquire new premises.
Audience is an essential consideration in our next
step, Analyse, but we will come to that later.
Once you have a problem statement and understand
your audience, you can finally move on to consider
how you are going to tackle the analysis. Here we
start to consider the data itself. Specifically:
+ What data do I have/can I access?
+ What do I need to do to it (if anything) to begin
exploring the issues?
+ What are the limitations of the data? What
can and can’t it tell me?When considering what data you have or can access,
time is a major factor. Like the supply of housing in
an economy or the range of pharmaceutical drugs
available to treat a health condition, the amount of
data you can access in the short term to consider a
problem is essentially fixed. In the long term, (al-
most) anything is possible. To illustrate this further,
here are some examples:
«Ina business context, you can only access
data created in the company’s systems which
you can also then extract to analyse (into an
Excel readable format, in this case). You may
also have access to publicly available data,
such as industry metrics or any intelligence
the company buys and makes available such
as market research. If you need to analyse a
problem and have only been given a few days
to do it, even getting the accounting team to
raise a purchase order to buy some industry
data may be out of scope. If the company hasan IT team pulling data out of systems intoa
useable format for analysts such as yourself,
getting access to data you know is being
gathered but cannot access may take weeks or
months.
-In an academic context, you may only have
access to publicly available data or research
available to you through university systems.
You may have been asked to gather some data
yourself by setting up a study or focus group,
or controlled experiment, which takes time.
If your initial focus group’s data created some
interesting insights, a follow-up research piece
is out of scope in the short term and may leave
your first set of conclusions clouded by a set of
second questions and hypotheses which merit
further investigation.
- Ina personal context, you may decide you
want to take better control of your finances.
You could probably access your banking data
and download it, but it is very unlikely to be
categorised by categories such as “Mortgage/Rent”, “Food & Drink”, “Car Expenses”, and so
on. So you might only be able to produce an
“Income and Expenditure” analysis the first
time you attempt it.
This leads nicely to the second consideration noted
above: What do you need to do with the data be-
fore it becomes useful? If you are determined to save
more money each month, is it enough to know that
you are spending more than you earn? How much
more valuable would it be to know that actually,
your grocery budget has doubled over the last year
or that your spending on fuel for your car now that
your commute is an extra 10 miles is really cutting
into your budget? In this case, your banking data is
made much more valuable by taking the time to cat-
egorise the spending.
Do you recall the example about the shift manager
tackling the wait times at the restaurant? Would it
be more useful to have a list of the names, contact
details, and booking times of everyone who attendedthe restaurant last week or a summary of how many
bookings there were each day in, say, 30-minute
time slots?
Both examples reveal the value of preparing data for
analysis as, in its raw form, it cannot always answer
the questions you pose to it. In example one, we are
enriching the data by categorising each transaction
from the banking statement. In example two, we are
summarising the restaurant booking data to make it
more useable.
The final consideration is to ask what the limita-
tions of the data are? Data is typically gathered with
one or more purposes in mind, and its value only
stretches so far beyond this purpose. A bank records
very accurate transaction data so as to keep your ac-
count balance up to date and provides this back to
you so that you can understand where your money
has gone. Imagine how much less trusting of a bank
you would be if they could only tell you how much
money they think you have and couldn’t prove it.
However, if you had access to, say, 10 people’s bank-ing information for a year, could you accurately an-
alyse how wealthy they are? You might be able to ap-
proximate an answer using some assumptions such
as ‘the more you earn and save, the more you can
invest’, but really you only have part of the picture.
Many forms of wealth are held outside of the bank-
ing system, and even if you have a mortgage, the
balance on your mortgage account only describes
what you owe, not what your property is worth. To
take this a step further, your current account might
only show payments into your mortgage account—
not its balance. Are you a new home owner, making
your initial mortgage payments on your first home,
or are you about to pay off the mortgage and be free
of this debt: you cannot tell and do not have enough
information.
Another key limitation, particularly common in a
business context, is that of data quality. If you have
a dataset about 1,000 customer orders, you can be
fairly sure that you know what was ordered and
how much the customer was billed (even then, er-rors can occur—what if the business was interna-
tional, charged customers in their local currency,
but your report is in US dollars and the exchange
rate conversion wasn’t performed correctly for your
dataset. . .?). A business couldn’t function without
this information. However, if you wanted to calcu-
late the gross profit on each sale (sale price—cost of
item) and someone within the business was respon-
sible for manually calculating what an item costs to
produce—how confident would you be that this was
always right? Have you ever given fake contact de-
tails when buying something online to avoid being
contacted with marketing emails? Have you said you
never buy a certain type of product during a survey
so that you don't have a salesman give you their
pitch? What about a time you were genuinely pre-
pared to share your mobile number with a retailer
to get postal tracking information, only to miskey
it and never hear anything until the package ar-
rived? The analyst's role then includes data cleaning/
cleansing—another form of preparing data for use.
Sometimes you can fix an issue, and sometimes youmust throw the data out. Other times, you have to
say, “Only 80% of the data was useful, so the analysis
is based on that”. Spotting issues, adapting to them
and being clear about the approach you have taken is
another key skill.
With that, we have covered the main elements of
Prepare. Chapter Six considers this in more detail,
and our example project will cover some practical
examples of dealing with these challenges.
ANALYSE
We arrive then at Analyse. You would be forgiven
for thinking this would make up 80% of the con-
tent of this book. The most common mistake would-
be analysts make is thinking that turning data into
information is the extent of the role and that they
can hand this over for someone else to use. However,
what you are really responsible for producing is
value. This means different things to different people—but you know this already because you just read
the Prepare step. What you are trying to achieve,
why, and for whom dictates the output that is valu-
able.
Some people will tell you that a good analyst always
produces Insight. Qlik, the software company which
produces Business Intelligence tools, defines it as
such:
The broad definition of insight is a deep under-
standing of a situation (or person or thing). In the
context of data and analytics, the word insight
refers to an analyst or business user discovering
a pattern in data or a relationship between vari-
ables that they didn’t previously know existed.1
Insights can have tremendous value to an organ-
isation, and indeed, this is the most value-adding
activity of the data analysis function. However, as
an analyst in the real world, you are often simply
trying to make an organisation function better, onepiece of work at a time. Sometimes, you are just
summarising table bookings at a restaurant so that a
shift manager can see how busy the place gets on a
Tuesday night!
What then is the process and skillset required dur-
ing the Analyse step? We have our data and an un-
derstanding of the problem and audience, so we are
initially concerned with discovering the shape of the
data—how much do we have? How many rows and
columns are there? If it covers a period of time, how
many years’ or months’ worth do we have?
We can then consider more specific questions. Is it
customer data? If so, what do we know about them?
Do we have their age? How much have they spent?
What have they bought? If it is booking data, do we
know when they booked? How did they book? How
many people are in the booking party? Adults, chil-
dren? Who took the booking?
We can use our technical skills to explore these ques-
tions and produce summary data, tables and chartsto begin looking for patterns and relationships in the
answers to these questions. Initially, we are both ex-
ploring and developing new hypotheses. As we work
through the analysis step, our hypotheses become
more refined and specific—and along the way, we
discard any which prove to be uninteresting or irrel-
evant.
How then to assess relevance? Enter domain knowl-
edge. This is where the value of experience comes to
bear when analysing a problem and is often why it is
hard to get an interview or a job offer for a data ana-
lyst role (or any role, in fact). Businesses understand
the value of experienced staff when delivering their
roles, and being able to apply both knowledge and
skill is where you can truly create value. Consider
again the example of the restaurant where cus-
tomers experience long waiting times. The follow-
ing thoughts might occur to an analyst considering
the issue. From first to last, we move from common
sense to domain experience:1. When the restaurant is busiest, waiting times
are the longest.
2. It can get particularly busy at lunchtimes,
evenings and weekends.
3. Bookings in advance help the restaurant plan
for busy periods.
4. Last-minute cancellations frustrate planning
but create availability in the short term.
5. The greater the lead time between booking
and reservation date, the more opportunity to
plan effectively.
6. A fixed availability of fixtures and fittings (e.g.,
high chairs for young children) effectively
limits the capacity for simultaneous bookings
for certain family types and group sizes.
7. Split-shift (staff member has to go home
and come back between lunch and evening
service) staff rotations are unpopular and
could lead to an increase in unplanned staff
absence and last-minute understaffing issues.Technical Effective Domain
Skill PE Ast} Knowledge
Having both technical skill and domain knowledge
allows you quickly (technical skill) and effectively
(domain knowledge) consider hypotheses that will
add value. In exploring these hypotheses, you ap-
proach your problem much like a sculptor aiming to
turn a block of marble into a chiselled figure: an un-
formed block (dataset) whittled down into the vague
shape of a person (broad stroke analysis—what do I
have, how much data is there?), finally adding detail
(specific analysis of a data item or segment to gain a
new insight).What do we mean by a hypothesis? In simple terms,
a hypothesis is an educated guess or a proposed ex-
planation for something that is yet to be proven or
tested.
For example, let's say you're curious about why
plants in your garden seem to grow better when
you water them with rainwater instead of tap water.
Your hypothesis might be: "Plants grow better with
rainwater because it has natural nutrients that tap
water lacks”.
The purpose of a hypothesis is to provide a starting
point for research and experimentation. Once you
have a hypothesis, you can design analyses to test it
and see if it is supported by evidence. If the evidence
supports your hypothesis, it may become a theory or
a widely accepted explanation for a phenomenon. If
the evidence does not support your hypothesis, you
may need to revise or come up with a new one.With the benefit of domain knowledge, you can
come up with better hypotheses more quickly (and
waste less time on naive hypotheses).
The process of analysis can also lead you to identify
new limitations in the data. We previously consid-
ered the purpose and method of data collection and
data quality as limitations of a dataset. We may find
additional limitations through the process of analy-
sis. Imagine an analysis of sales at a car dealership.
A comparison of sales this year vs last year to de-
termine what types of vehicles are experiencing the
fastest sales growth might reveal that the sales of
pink SUVs doubled from one year to the next and
that you were able to increase your profit on the sales
of these vehicles by 50% over that period! However,
you sold only one last year and two this year, which
is unlikely to form the basis of a new stock pur-
chasing strategy for a dealership selling 1,000 vehi-
cles a year. The limitation here is that you do not
have enough information to draw any conclusions.
You have enough contextual data to know that pink