0% found this document useful (0 votes)

13 views18 pages

Descriptive Analytics

Descriptive analytics utilizes simple statistics and data visualization to analyze past data and identify trends, aiding organizations in decision-making. It involves various data types, measurement scales, and visualization techniques like histograms, bar charts, and box plots to summarize and interpret data. Effective data visualization enhances understanding, facilitates quicker decisions, and supports predictive analytics by revealing patterns and trends.

Uploaded by

mythris107

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views18 pages

Descriptive Analytics

Uploaded by

mythris107

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Descriptive Analytics

Descriptive analytics is the simplest form of analytics that mainly uses simple descriptive statistics, data visualization techniques, and
business related queries to understand past data. One of the primary objectives of descriptive analytics is innovative ways of data
summarization. Descriptive analytics is used for understanding the trends in past data which can be useful for generating insights.

Various tools and techniques are used in describing the data. Descriptive statistics such as measures of central tendency, measures of
variation and measures of shape can provide useful insights. Many different plots such as histogram, bar chart, pie-chart, box plot,
scatter plot and tree diagram can provide insights about past data and subsequently assist with further analysis by generating new
hypotheses.

Descriptive analytics is about finding “what has happened” by summarizing the data using innovative methods and analysing the past
data using simple queries. Analysing past data can provide insights that can assist organizations to take appropriate decisions.

Trends obtained through descriptive analytics can be used to derive actionable items. For example, when Hurricane Charley struck
the U.S. in 2004, Linda M. Dillman, Walmart’s Chief Information Officer, wanted to understand the purchasing behaviour of their
customers (Hays, 2004). Using data mining techniques, Walmart found that the demand for strawberry pop-tarts went up over 7 times
during the hurricane compared to their normal sales rate; the pre-hurricane top-selling item was found to be beer. These insights were
used by Walmart when the next hurricane — Hurricane Frances — hit the U.S. in August−September 2004; most of the items
predicted by Walmart sold quickly. Although the high pre-hurricane demand for beer can be intuitively predicted, the demand for
strawberry pop-tarts was a complete surprise.
DATA TYPES AND SCALES
Structured and Unstructured Data

Data at a macro-level can be classified as structured and unstructured data. Structured data means that the data is described in a matrix form with
labelled rows and columns. Any data that is not originally in the matrix form with rows and columns is an unstructured data. For example,
e-mails, click streams, textual data, images (photos and images generated by medical devices), log data, and videos. Machine generated data
such as images generated by satellite, magnetic resonance imaging (MRI), electrocar-diogram (ECG) and thermography are few examples of
unstructured data.

Cross-sectional, Time Series, and Panel Data

Another important classification of data is based on the type of data collected. Based on the type of data collected, the data is grouped into the
following three classes:

1. Cross-Sectional Data: A data collected on many variables of interest at the same time or duration of time is called cross-sectional data. For
example, consider data on movies such as budget, box-office collection, actors, directors, genre of the movie during year 2017.

2. Time Series Data: A data collected for a single variable such as demand for smartphones collected over several time intervals (weekly,
monthly, etc.) is called a time series data.

3. Panel Data: Data collected on several variables (multiple dimensions) over several time intervals is called panel data (also known as
longitudinal data). Example of a panel data is data collected on variables such as gross domestic product (GDP), Gini index, and unemployment
rate for several countries over several years.
TYPES OF DATA MEASUREMENT SCALES

Structured data can be either numeric or alphanumeric and may follow different scales of measurement (level of measurement). It is
important to understand the type of variables within the data with respect to the measurement scale since the model specification while
building analytics models such as regression may depend on the scale of measurement.

Nominal Scale (Qualitative Data)

Nominal scale refers to variables that are basically names (qualitative data) and also known as categorical variables. For example,
variables such as marital status (single, married, divorced) and industry type (manufacturing, healthcare, banking and finance) fall
under nominal scale. During data collection, it is usual to assign a numerical code to represent a nominal variable.

Ordinal Scale

Ordinal scale is a variable in which the value of the data is captured from an ordered set, which is recorded in the order of magnitude.
For example, in many survey data, Likert scale is used. Likert scale is finite (usually a 5 point scale) and the data collector would have
defined the order of preference.For example, assume that a feedback is collected on a training program using 5-point Likert scale in
which 1 = Poor, 2 = Fair, 3 = Good, 4 = Very Good, and 5 = Excellent.
Interval Scale

Interval scale corresponds to a variable in which the value is chosen from an interval set. Variable such as temperature measured in
centigrade (°C) or intelligence quotient (IQ) score are examples of interval scale. In interval scale, the ratios do not make sense. For
example, 40°C is not twice hot as 20°C.Similarly, a person with an IQ score of 160 is not twice smarter than a person with an IQ
score of 80. However, 40°C is 20°C more than 20°C, IQ score of 160 is 80 more than an IQ score of 80.

Ratio Scale

Any variable for which the ratios can be computed and are meaningful is called ratio scale. Most variables come under this type; for
example: demand for a product, market share of a brand, sales, salary, and so on. If Ms Hawai Sundari’s salary is 40,000 per month
and Ms Dawai Sundari’s salary is 90,000 per month then we can interpret that Dawai Sundari earns 2.25 times the salary of Hawai
Sundari.
POPULATION AND SAMPLE

Population is the set of all possible observations (often called cases, records, subjects or data points) for a given context
of the problem. The size of the population can be very large in many cases. For example, in 2014, close to 834.08 million
people were eligible to vote in the Indian general elections (Source: Election Commission of India). Thus, the population
size of the voters in 2014 was 834.08 million which included all eligible voters. During every election, media and other
organizations collect data to predict likely winner of election through opinion polls (and they rarely get it right due to
complexities associated with collecting right sample). It is very difficult (also practically impossible) to collect data from
all 834.08 million eligible voters about their choice of candidate, so the opinion polls are based on opinion expressed by a
subset of voters called sample.

Population (also known as universal set) is the set of all possible data for a given context whereas sample is the subset
taken from a population. In many analytical problems, we make inference about the population based on the sample data.
There are many challenges in sampling (process of selecting an observation from the population). An incorrect sample
may result in bias and incorrect inference about the population.
MEASURES OF CENTRAL TENDENCY

Measures of central tendency are the measures that are used for describing the data using a single value. Mean, median and mode are
the three measures of central tendency and are frequently used to compare different data sets. Measures of central tendency help users to
summarize and comprehend the data.

PERCENTILE, DECILE, AND QUARTILE

Percentile, decile and quartile are frequently used to identify the position of the observation in the data set. Percentile score is frequently
used in education to identify the position of a student in the group. Another frequent application of percentile is the percentile life used
in asset management.

MEASURES OF VARIATION

One of the primary objectives of analytics is to understand the variability in the data. Predictive analytics techniques such as regression
attempt to explain variation in the outcome variable (Y) using predictor variables (X). Variability in the data is measured using the
following measures:

1. Range 2. Inter-Quartile Distance (IQD) 3. Variance 4. Standard Deviation

MEASURES OF SHAPE − SKEWNESS AND KURTOSIS

Skewness is a measure of symmetry or lack of symmetry.

Kurtosis is another measure of shape, aimed at shape of the tail, that is, whether the tail of the data distribution is heavy or light.
DATA VISUALIZATION
Data visualization is an integral part of descriptive analytics and it assists decision makers with useful insights. There are many useful
charts such as histogram, bar chart, pie-chart, box-plot that would assist data scientist with visualization of the data.

Data visualization is crucial in making sense of data, especially in large volumes, and here’s why it's important:

1. Simplifies Complex Data

● Visual representations (like charts, graphs, maps) help convert complex datasets into easily understandable formats. This is
particularly helpful when dealing with large amounts of data that are hard to interpret just by looking at raw numbers.

2. Faster Decision Making

● Well-crafted visualizations allow decision-makers to quickly grasp insights and trends, leading to quicker and more informed
decisions. For example, sales trends can be identified at a glance from a line graph.

3. Identifies Trends and Patterns

● Patterns, correlations, or anomalies that are otherwise hard to see become apparent when data is visualized. Tools like scatter
plots or line charts reveal trends over time or relationships between variables.
4. Improves Data Storytelling
● Visualization helps in telling a story with data, making it more engaging and easier for audiences to remember key insights. It allows data
to be presented in a narrative format that resonates with stakeholders.
5. Enhances Data Accuracy
● By visualizing data, it becomes easier to spot errors, inconsistencies, or outliers that could skew analysis, helping ensure data accuracy and
reliability.
6. Supports Predictive Analytics
● Predictive trends or forecasting can be better understood with visual aids, as they clearly present historical data and predictions about future
outcomes, such as with time series charts.
7. Engages Audience
● Visual elements are more engaging than plain text or numbers. They can hold the attention of an audience and make presentations more
impactful, ensuring better communication of key findings.
8. Better Comparison
● Charts and graphs enable easier comparison between categories, time periods, or variables. Bar charts, pie charts, and histograms are perfect
for showcasing comparisons in sales, performance, or demographic data.
9. Facilitates Collaboration
● When multiple teams are involved in decision-making, visualizing data ensures that everyone, regardless of expertise, can understand the
data and contribute effectively.
Histogram

Histogram is the visual representation of the data which can be used to assess the probability distribution (frequency
distribution) of the data. It is a frequency distribution of data arranged in consecutive and non-overlapping intervals.
Histograms are created for continuous numerical) data. The following steps are used in constructing histograms:

1. Divide the data into finite number of non-overlapping and consecutive bins (intervals). The total number of bins to be
used can be calculated using Eqs

2. Count the number of observations from the data that fall under each bin (interval).

3. Create a frequency distribution (bin in the horizontal axis and frequency in

the vertical axis) using the information obtained in steps 1 and 2.

Histogram is very useful since it assists data scientist to identify the following:

1. The shape of the distribution and to assess the probability distribution of the data.

2. Measures of central tendency such as median and mode.

3. Measures of variability such as spread.

4. Measure of shape such as skewness.

Histograms are also useful in identifying the presence of outliers. One of the first steps in constructing histogram is
identifying the number of bins. There are many different formulas used in literature and one of the simplest formula is
Bar Chart
A bar chart uses rectangular bars to compare different categories. The length of each bar represents the value of the category.Bars can
be oriented vertically or horizontally.

Bar chart is a frequency chart for qualitative variable (or categorical variable). Histograms cannot be used when the variable is
qualitative. Bar chart can be used to assess the most-occurring and least-occurring categories within a data set.

How it's useful in Data Analytics:

Bar charts are great for comparing quantities across different categories, helping analysts easily identify the highest and lowest values.

Example:

You want to compare the sales of different products: Product A, B, and C.

Each product has a bar showing its sales volume.

Pie Chart

Pie chart is mainly used for categorical data and is a circular chart that displays the proportion of each category in the data set. Pie
chart helps to visualize the proportion (percentage) of each category as sector of
a circle. The pie chart for the movie genre based on the Bollywood movie
data set is shown in Figure

Example:

● If you want to show how a company's revenue is split across different

departments (e.g., marketing, sales, R&D), each slice of the pie will
represent the percentage of total revenue for each department.

How it's useful in Data Analytics:

● Pie charts are helpful in visualizing the relative proportions of parts to

the whole, making it easier to understand distributions.

Scatter Plot
Scatter plot is a plot of two variables that will assist data scientists to understand if there is any relationship between two variables. The
relationship could be linear or non-linear. Scatter plot is also useful for assessing the strength of the relationship and to find if there are
any outliers in the data.

Example:

● If you plot the height and weight of students, each student is represented by a point on the scatter plot, showing the relationship
between height and weight.

How it's useful in Data Analytics:

● Scatter plots are useful for identifying correlations between

variables and spotting trends or clusters in data.
Coxcomb Chart

Coxcomb chart (also known as polar area chart or roses) is an extension of pie chart made popular by Florence Nightingale (Lewi, 2006). In a
Coxcomb chart, each area represents the magnitude of the category. The main difference between the regular pie chart and coxcomb chart is that in
the case of pie chart the radius of each sector is same, whereas, in coxcomb chart the radius of the sector is adjusted to create the magnitude of the
area.

Florence Nightingale collected data from Crimean war (war between British and French on one side and Russians on the other side) on causes of
mortality among soldiers. She classified the causes into three categories:

1. Preventable diseases

2. Wounds sustained in the war

3. Other causes

In Figure (originally prepared by Florence Nightingale), the largest area of the chart corresponds to the cause ‘preventable diseases’.

How it's useful in Data Analytics:

● Coxcomb charts are useful when you want to compare categories over time or when pie charts might not visually differentiate categories
well.
Box Plot (or Box and Whisker Plot)
Box plot (aka Box and Whisker plot) is a graphical representation of numerical data that can be used to understand the variability of
the data and the existence of outliers. Box plot is designed by identifying the following descriptive statistics:

1. Lower quartile (1st Quartile), median and upper

quartile (3rd Quartile).

2. Lowest and highest value.

3. Inter-quartile range (IQR).

Example:

● You want to compare test scores in a class. A box plot would show the median score, the range of the middle 50% of scores,
and any outliers.

How it's useful in Data Analytics:

● Box plots are helpful for showing the spread and identifying outliers in data, giving a quick overview of the data distribution.
Treemap

Treemap is a hierarchical map made up of nested rectangles frequently used as part of business intelligence reports which helps
organizations to understand the data hierarchically. To construct a treemap,the data should be hierarchical with several levels. The size
of rectangle and colour are used for describing/differentiating the characteristics of the data.

● A treemap is a chart that displays data in nested rectangles, where the size of each rectangle is proportional to the value it
represents.

Example:

● If you want to show the market share of different smartphone brands, each brand would be a rectangle, and the size would
indicate its market share.

How it's useful in Data Analytics:

● Treemaps are excellent for displaying hierarchical data or comparing the

relative size of different categories in a clear and visual way.

Descriptive Analytics
No ratings yet
Descriptive Analytics
15 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Statistics Module: Arijit Mitra
No ratings yet
Statistics Module: Arijit Mitra
25 pages
Module 1
No ratings yet
Module 1
9 pages
QM 1
No ratings yet
QM 1
58 pages
Quantitative Methods 3
No ratings yet
Quantitative Methods 3
174 pages
Notes (Chapter 1 - 3)
No ratings yet
Notes (Chapter 1 - 3)
15 pages
Statistics and Analysis Notes
No ratings yet
Statistics and Analysis Notes
8 pages
Reviewer in StatAna - Chapter 1
No ratings yet
Reviewer in StatAna - Chapter 1
6 pages
MGT 1103
No ratings yet
MGT 1103
4 pages
Data Analytics Theory
No ratings yet
Data Analytics Theory
35 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
Part 1 - Basic Statistics
No ratings yet
Part 1 - Basic Statistics
44 pages
Basic Statistics
100% (10)
Basic Statistics
73 pages
DA Major Notes
No ratings yet
DA Major Notes
46 pages
2 Descriptive Analytics
No ratings yet
2 Descriptive Analytics
32 pages
Introduction To STATISTICS-new
No ratings yet
Introduction To STATISTICS-new
44 pages
Unit - 2 - DSV (202046707)
No ratings yet
Unit - 2 - DSV (202046707)
53 pages
Statistics Introduction
No ratings yet
Statistics Introduction
37 pages
Data Managementmmw
No ratings yet
Data Managementmmw
26 pages
Introduction To Statistics Final
No ratings yet
Introduction To Statistics Final
30 pages
Data and Statistics in Business
No ratings yet
Data and Statistics in Business
15 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
BE279 An Introduction To Data and Statistics
No ratings yet
BE279 An Introduction To Data and Statistics
35 pages
Topic 1 Introduction To Statistics
No ratings yet
Topic 1 Introduction To Statistics
35 pages
Lec2 Data
No ratings yet
Lec2 Data
51 pages
Business Statistics - Pre Induction Module
No ratings yet
Business Statistics - Pre Induction Module
158 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
01 Data & Statistics
No ratings yet
01 Data & Statistics
35 pages
IML U2
No ratings yet
IML U2
15 pages
CHAPTER 4 Data Management
No ratings yet
CHAPTER 4 Data Management
16 pages
Data Science Stats for Analysts
No ratings yet
Data Science Stats for Analysts
91 pages
All The Statistical Concept You Required For Data Science
No ratings yet
All The Statistical Concept You Required For Data Science
26 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Basic Ideas of Data Management
No ratings yet
Basic Ideas of Data Management
32 pages
Stat
No ratings yet
Stat
5 pages
H
No ratings yet
H
6 pages
BA1 Introduction 2025
No ratings yet
BA1 Introduction 2025
55 pages
Chapter 1 Classification and Graphical Presentation (Becon 2025)
No ratings yet
Chapter 1 Classification and Graphical Presentation (Becon 2025)
67 pages
1 Introduction
No ratings yet
1 Introduction
15 pages
1 L2 Intro DAM
No ratings yet
1 L2 Intro DAM
27 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
Final AB 19-21 PIM3 Basics of Business Statistics
No ratings yet
Final AB 19-21 PIM3 Basics of Business Statistics
37 pages
02data DMDW
No ratings yet
02data DMDW
40 pages
Excel & Python Statistical Functions
No ratings yet
Excel & Python Statistical Functions
44 pages
Statistics
No ratings yet
Statistics
88 pages
Quantitative Methods For Management: Term II 4 Credits MGT 408
No ratings yet
Quantitative Methods For Management: Term II 4 Credits MGT 408
49 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
42 pages
EBA2123 1.data and Statistics
No ratings yet
EBA2123 1.data and Statistics
36 pages
Chapter 1 & 2 - Stats
No ratings yet
Chapter 1 & 2 - Stats
5 pages
4G1 Series: Engine
No ratings yet
4G1 Series: Engine
136 pages
5 - 2-Chemical Kinetics (Level)
No ratings yet
5 - 2-Chemical Kinetics (Level)
33 pages
A Thesis Proposal On Designing A Art Gallery
100% (3)
A Thesis Proposal On Designing A Art Gallery
12 pages
ASI Eksklusif: Data dan Dukungan
No ratings yet
ASI Eksklusif: Data dan Dukungan
10 pages
Case Study - Nestle
No ratings yet
Case Study - Nestle
3 pages
Student Portfolio Record 2025
No ratings yet
Student Portfolio Record 2025
14 pages
Vane Pumps & Parts - V10, V20, V2010, V2020 FLUIDYNE
No ratings yet
Vane Pumps & Parts - V10, V20, V2010, V2020 FLUIDYNE
8 pages
SNCP
No ratings yet
SNCP
1 page
RC Low Pass Filter
100% (1)
RC Low Pass Filter
2 pages
Teaching Demo Score Sheet Batch 2 April 20 2024
No ratings yet
Teaching Demo Score Sheet Batch 2 April 20 2024
10 pages
Buy Google Review
No ratings yet
Buy Google Review
6 pages
Offshore Wind Atlas of The Dutch Part of The North Sea: A.J. Brand
No ratings yet
Offshore Wind Atlas of The Dutch Part of The North Sea: A.J. Brand
9 pages
Unit IV Assessment Part I Internal Control A Tool in Managing Risk
No ratings yet
Unit IV Assessment Part I Internal Control A Tool in Managing Risk
7 pages
Efficient SNR Estimation in OFDM System
No ratings yet
Efficient SNR Estimation in OFDM System
3 pages
Engine Oil 20W50 CH4-SL
No ratings yet
Engine Oil 20W50 CH4-SL
2 pages
Software Testing Plan Guide
No ratings yet
Software Testing Plan Guide
9 pages
Midterm Exam in Methods of Research
100% (2)
Midterm Exam in Methods of Research
2 pages
Ferranti Mercury Computer Features
No ratings yet
Ferranti Mercury Computer Features
12 pages
15 Co-Education Advantages and Disadvantages - Indiaclass
No ratings yet
15 Co-Education Advantages and Disadvantages - Indiaclass
1 page
2010 Chevrolet Captiva Sport X1 (LE5 o LE9)
No ratings yet
2010 Chevrolet Captiva Sport X1 (LE5 o LE9)
5 pages
Homeroom Guidance Module # 3
No ratings yet
Homeroom Guidance Module # 3
6 pages
Native Plants Index Updated PDF
No ratings yet
Native Plants Index Updated PDF
404 pages
Lecture 08 Loops & Types
No ratings yet
Lecture 08 Loops & Types
17 pages
Revisiting Corporate Governance in Vietnam
No ratings yet
Revisiting Corporate Governance in Vietnam
20 pages
Computer Simulation in Management Science 5Th Edition by Michael Piddâ Isbn 0470092300 9780470092309
100% (9)
Computer Simulation in Management Science 5Th Edition by Michael Piddâ Isbn 0470092300 9780470092309
83 pages
Tantra Yukti (Final) 2023
No ratings yet
Tantra Yukti (Final) 2023
109 pages
Performing Feminisms Feminist Critical Theory and Theatre Edited by Sue Ellen Case Baltimore Johns Hopkins University Press 1990 PP 327 Dollar1495
No ratings yet
Performing Feminisms Feminist Critical Theory and Theatre Edited by Sue Ellen Case Baltimore Johns Hopkins University Press 1990 PP 327 Dollar1495
1 page
3.3: Graphing and Solving Systems of Linear Inequalities
No ratings yet
3.3: Graphing and Solving Systems of Linear Inequalities
10 pages
Template 2 RMYA School Science6
No ratings yet
Template 2 RMYA School Science6
8 pages
Combined SoW ESL Stage 1 - 3
No ratings yet
Combined SoW ESL Stage 1 - 3
147 pages