INTRODUCTION
TO ANALYTICS
2022 – 2023
LESSON 1.
CORE ANALYTICS CONCEPTS.
SOURCES & TYPES OF DATA
Learning Objectives
• Understand core concepts in data domain
• Distinguish data, information and knowledge
• Discuss sources of data and how it is used
• Identify structured and unstructured data
• Explain characteristics of data
• Define big data
Agenda
1. Learning principles
2. Icebreaker
3. Why study data? Why is data important?
4. What is data, information and knowledge
5. Key concepts
6. Types and sources of data
7. Metadata
8. Big data
Learning Principles
1. There is more than one way to define a concept – use of multiple
sources
2. Different modes of learning – use words, pictures, discussions and
independent research
3. Examples reinforce theory – your own examples are most valuable
4. Co-operation and group discussions encourage sharing
5. This is an emerging discipline – the field is changing at high speed
6. Explore the links used in class materials for extra reading material
(optional)
Let’s break
some ice
WHY IS DATA AND
ANALYTICS
IMPORTANT?
Module 1
Data is….
A set of values of qualitative
or quantitative variables
Facts and statistics collected about one or more persons
together for reference or or objects
analysis
Things known or assumed
as facts, making the basis
of reasoning or
calculation
Information output by a sensing device
or organ that includes both useful and
irrelevant or redundant information
and must be processed to be
information in digital form that can be
meaningful
transmitted or processed
Source: Wikipedia, Merriam-Webster
Data is….
Scott Mackey, https://www.adlibsoftware.com/blog/authors/scott-mackey.aspx https://www.ceotodaymagazine.com/2018/04/is-data-the-new-gold/
The value of data and analytics
• Study and better understand needs and motivations of:
customers, competitors, partners, employees, stockholders
• Create knowledge
• Reduce uncertainty
• Improve business processes
• Increase automation
• Find weak spots, resource drains and problem root causes
• Exploit competitive advantage
• Make just-in-time decisions
• Satisfy regulatory and audit requirements
DATA, INFORMATION,
AND KNOWLEDGE
Module 1
Aren’t these interchangeable?
Knowledge
Information
Data
How do we use data?
The temperature is 20 degrees.
How should you dress?
Putting data into context
20 Celsius
20 Fahrenheit= -6 Celsius
20 Kelvin = -253 Celsius
Raw data is…
Unsuitable Inconsistent
Unformatted Outdated
Has errors Incomplete
Has too much volume
From data to knowledge
I see
a lion
Electrical Recognizing
Light rays
impulses a lion
source: www.theeyewearboutique.co.za
From data to knowledge
I see
a lion
Electrical Recognizing
Light rays
impulses a lion
DATA INFORMATION KNOWLEDGE
source: www.theeyewearboutique.co.za https://www.youtube.com/watch?v=i3_n3Ibfn1c
Raw data requires…
- integration
- design
- architecting
- modelling
…to become useful information
Analyzed information; insights drawn from the analysis
of the information; understanding the information;
ability to make predictions
Knowledge Generalized | Understanding | Insights
Data captured, processed and categorised so
that is can be stored, consumed and shared
Information Organized | Structured | Processed
Objective facts, symbols and
Data observations of the real world
Raw | Random | Unorganized
Data vs Information vs Knowledge
Data Objective facts of the world around us; Facts and observations without
Facts that exist whether we observe interpretation; can be perceived with
them or not our senses
Information Information is created by humans that Makes data relevant to a particular
observe and collect data with a context;
purpose; Data processed to become useful;
Stored and shared via human created Answers questions What, When, Where,
media (clay, paper, silicon chips) Which
Knowledge Insights gained from analysis of Application of information to achieve
information to: results and make decisions;
Gain experience; Understand How and Why
Make connections;
Compare information;
Make predictions
Later in the course
Module 2: Data life • How is data collected, stored and managed
cycle
Module 3: Types of • How different types of analytics reflect stages of
analytics knowledge:
From What happened?
To Why did it happen?
To What will happen?
Module 4: Analytics • How does data becomes information through data
life cycle architecting, modelling, integration and design
• How do we extract knowledge from information by
generating analytical insights
KEY CONCEPTS
Module 1
Data Management:
Encompasses the
policies, procedures,
and standards used to
design and manage the
information of the
enterprise, to meet the
data consumption
requirements of all
applications and
business processes.
https://managementmania.com/en/three-tier-architecture Sources: Textbook Chapter 5;
Gartner IT Glossary
Data Science:
a broad field that refers
to the collective
processes, theories,
concepts, tools and
technologies that
enable the review,
analysis and extraction
of valuable knowledge
and information from
raw data.
https://www.edureka.co/blog/what-is-data-science/ https://www.techopedia.com/
Data Mining:
the exploration and
analysis of large
dataset to discover
meaningful patterns
and rules using data
science principles
Reporting:
The process of organizing
data into informational
summaries in order to
monitor how different areas
of a business are
performing
https://www.annualreports.com/HostedData/AnnualReportArchive/a/NASDAQ_AMZN_2018.pdf
Analytics:
The examination of
information to
uncover insights that
give a
businessperson the
knowledge to make
informed decisions.
Source: Textbook Chapter 1
Insights:
Knowledge,
experience, and
predictions gained from
analysis of information
to:
•Make connections;
•Compare information;
•Make decisions and
change actions
https://www.adp.com/spark/articles/2018/11/from-data-to-insight-overview.aspx
SOURCES AND TYPES
OF DATA
Module 1
Sources of data
External
Internal Open- Third-party
source data
Types of data
What data about yourself can be found on social media?
Structured vs Unstructured Data
Structured Unstructured
a.k.a “traditional”, “quantitative”, a.k.a “non-traditional”, “qualitative”
“transactional” Free form, unorganized, variable
numbers, symbols and categories nature of data
well-defined format lack of structure or very complex
easy to search and organize structure
easily stored in table format stored in data lakes or NoSQL
(spreadsheets, relational solutions, cannot be stored in table
databases) format
Usually managed using Structured Processed by machine learning and
Query Language (SQL ) artificial intelligence (AI) tools
https://lawtomated.com/structured-data-vs-unstructured-data-what-are-they-and-why-care/
What is Semi-Structured?
Like Structured Like Unstructured
Contains some structured data - Properties and tags are combined with
properties and tags that allow for unstructured data
partial categorization Example: digital camera image
Date: <>
Examples: XML, RSS feeds, JSON
Time: <>
(JavaScript Object Notation)
Longitude: <>
Latitude: <>
Aperture: <>
Resolution: <>
Structured/Semi- Structured/Unstructured
Examples
Semi-
Structured Unstructured
Structured
Text
Numbers Social media
Email
Categories Satellite images
XML files
Codes Presentations
JSON messages
Dates PDFs
Digital photo files
Character strings Audio recordings
Accessible PDFs
Binary (True/False) Video
Website content
Structured/Semi- Structured/Unstructured
Examples
Semi-
Structured Unstructured
Structured
Text
Numbers Social media
XML files
Categories Satellite images
Email
Codes Presentations
JSON messages
Dates PDFs
Digital photo files
Character strings Audio recordings
Accessible PDFs
Binary (True/False) Video
Website content
Rectangular datasets
(spreadsheets, database Next lesson
tables)
Rectangular Dataset
Table ~ dataset ~
data array ~
tabular set ~
rectangular dataset
Row ~
record ~
entity ~
case~
instance ~
observation
Column ~ field ~ attribute ~
variable ~ feature ~ data element
Types of Data
Quantitative Data Qualitative Data
(Numerical) (Categorical)
Observations that can be Observations that cannot
measured and expressed as be measured but can be
numbers described or categorised
Continuous (Analog): Nominal: Ordinal:
Discrete: Boolean:
Measurements on a scale; Limited set of Limited set of
Can be counted; possible Two mutually
infinite values in any interval; values with no values with a
values can be listed; values exclusive
any value can be meaningful order meaningful order:
cannot be meaningfully categories
subdivided into finer or ranking: Low, Medium,
divided True/False; Yes/No
increments White, Blue, Yellow High
0, 1,2,3, 4…
0.0001, 124.569, 2*105
Quantitative Data
Type Description Examples
Integer Whole numbers. 1, 10, 20000, -345, 0
The range of numbers is defined by the length of
the data field
Decimal Decimal numbers. 0.1234; -908.00001; 2000.00;
The range is defined by a combination of: 0.00
precision (total number of digits)
scale (number of digits to the right of the decimal
point)
Floating Decimal numbers with a floating point. 3.56E6 = 3.56 * 10^6
(Engineering Defined as a mantissa (significand) plus an -5.701E-12 = -5.701 * 10^-12
notation) exponent (order of magnitude, power of 10)
Date Short and long dates Mar/05/2020; 2019-01-31
Time Time; may be combined with the date 20:03:00; 11:54; 23:59:59
Qualitative Data
Type Description Examples
Boolean True or false values 0/1, T/F, Y/N
(a.k.a. Binary)
Character One character A, x, N, b
String Sequence of characters of limited “This is a string”
length
Code / Categories or codes where each value 01/02/03; A/B/C; CAD/USD/EUR
Category has a discrete meaning
Later in the course
Module 5: Data • How can structured data be modelled
modelling
Module 6: Business • More on data sources of structured and unstructured data;
Intelligence
• Approaches to storing, processing and transforming
Architecture
structured and unstructured data
BIG DATA
Module 1
What is big data
Big data: an accumulation of data that is too large and complex for processing
by traditional database management tools.
Characteristics of big data:
Enormously large volumes of data are being generated, in particular by
Volume
internet-connected devices.
Increased demand for current and real-time data. More data is used for
Velocity reporting and analytics as soon as it is generated.
Continued expansion of the sources of data in a large variety of formats, in
Variety particular unstructured data that requires new methods of processing.
Veracity Variability in the accuracy, quality and trustworthiness of data generated from
a wide range of sources
Value The potential value from using big data to support business goals and
objectives
Sources: Merriam-Webster; Textbook Chapter 1
Sources of big data
https://www.smartdatacollective.com/big-data-20-free-big-data-sources-everyone-should-know/
Big vs. Small Data
Small Data Big Data
Objectives Specific, pre-defined Broad or undefined
Structure Structured Semi-structured or unstructured
Volumes Small to medium (Gigabytes to Huge (petabytes +)
terabytes)
Storage One computer or server Multiple servers, cloud
Sources Traditional (enterprise systems) Include non-traditional sources (social
media, Internet of Things, media)
Velocity Near real-time or batch Often real time
Usage Business intelligence and reporting Advanced and predictive analytics
Analysis Easy to analyze and visualize, analysis Difficult to get the information and
can be done manually or with analyze, requires sophisticated and
traditional tools e.g. SQL specialized tools
Data Storage Units
http://www.itutility.net/data-measurement-abbreviations-refresher/
How big is a Terabyte?
1 byte A single character (e.g. ASCII code for ‘B’ is 01000010)
1 Kilobyte A very short story
5 Megabytes The complete works of Shakespeare, a high-resolution photo or 30
seconds of high-quality video
1 Gigabyte A symphony in high-fidelity sound or 10 meters of shelved books
1 Terabyte 50,000 trees made into paper and printed
200 Petabytes All printed materials ever
5 Exabytes All words ever spoken by human beings
https://www.zmescience.com/science/how-big-data-can-get/
It’s not the size of the data – it’s how we use it
Textbook Chapter 1 Figure 1.1