BUSINESS INTELLIGENCE
PROGRAM
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
The better you understand the data, the better the
results from the modeling or mining process will be.
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
Descriptive Analysis: Take it easy!
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
Descriptive Analysis: an important first step for conducting statistical
analyses!
A descriptive analysis is It gives you an
idea of the distribution of your data,
helps to:
1. Detect outliers and typos.
2. Enable you identify associations
among variables.
3. Making you ready to conduct further
statistical analyses.
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
Descriptive Analysis: divided into 02 types by variable!
1.Descriptive analysis for each
individual variable.
2.Descriptive analysis for combinations
of variables.
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
Descriptive Analysis: divided into 03 types by Purposes!
1.Phân tích khác biệt: analysis of
difference.
2.Phân tích liên quan: Association of
Analysis.
3. Phân tích tương quan (Correlation
Analysis).
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
Descriptive Analysis: Best Approach!???
Picture of TowardsDataScience.com
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
Descriptive Analysis: Best Approach!???
First and Foremost:
decide about the types of
variables and then use
approaches for
descriptive analyses
based on variable type.
Picture of Wikilearn.com
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
For quantitative variables:
it is a good idea to first create:
a histogram
a box-and-whisker plot to get an idea of the shape of the distribution.
If the shape is symmetric, then calculate and present mean and
standard deviation whereas if the shape is skewed, calculate and
present median and quartiles.
You could also calculate and present min and max values. These
descriptive analyses would also help you identify outlying and improbable
values so that you can double check data entry errors.
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
For quantitative variables:
http://statulator.com/descriptive.html
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
For categorical variables:
Create frequency tables and present them in:
bar charts
pie charts or
doughnut charts.
These approaches are sufficient to get an idea of distributions of variables
and of typos and other errors in data entry..
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
For qualitative variables:
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
Descriptive analysis for combination of two variables???
Quantitative Qualitative
Quantitative A B
Qualitative C D
(A): Prepare a scatter plot
(B and C): Calculate summary statistics of the quantitative variable classified by the qualitative
variable and prepare box-and-whisker plots of the quantitative variable by the categorical
variable
(D): Prepare a contingency table
Lesson 1: Diagnostics Analytics Fundamentals
2. Types of Analytics
Qualitative
There is an orientation or attitude that is found in the best problem solvers that reflects an active openness
to new ideas and data, and a suspicion of standard or conventional answers.
Business Intelligence Demo
Comparison of BI tools
Business Intelligence
Preview Data before Analysis
Business Intelligence
Preview Data before Analysis
3. DESCRIPTIVE ANALYSIS lessons
1. Business Intelligence Analysis
2. Descriptive Functions in EXCEL
3. Descriptive Analytics
Descriptive Analytics
1. Business Intelligence Analytics
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
Using your imagination and
analytical thinking to ask
interesting questions about
Purpose of analysis the many data sets available
to you. We will supply you with
the tools to answer these
questions.
Lesson 7: Business Data Analytics Methodologies
1. Business Intelligence Analysis
Descriptive Analytics
2. Descriptive Functions in EXCEL
Descriptive Analytics
Descriptive Analytics
Descriptive Analytics
3. Descriptive Analytics – Categorical Variable
1. Frequency
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
2. Central
Comparison of the different scales of Measurement
Descriptive Analytics
3. Descriptive Analytics – Categorical Variable
2. Mode
Data types
Nominal -- categorical data , gender, type of school.(categories)
Ordinal data-- Ordinal Number is a number that tells the position of something in a list, such as 1st, 2nd, 3rd,
4th, 5th etc.
Interval data -- 6-7,8-9,100-200
Ratio -- Ex: Number of Applications/Number of Loans
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
Descriptive Analysis
• Analyze and describe the features of a data.
• It deals with the summarization of information.
• In the descriptive analysis, we deal with the past data to draw conclusions and present our data
in the form of dashboards.
• A data may have many observations, and a summary set of numbers that describe those
multiple observations, is called descriptive analysis
• In businesses, descriptive analysis is used for determining the Key Performance Indicator or KPI
to evaluate the performance of the business.
Ex: Cash flow analysis, sales and revenue reports,
performance analysis … are common examples
of descriptive analytics.
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
1. What is the shape of the
distribution? Do the
values tend to fall into
some recognizable
pattern?
2. What is the location of
the variable? That is,
where are the numbers
centered?
3. How much variation is
involved? Are the values
widely dispersed or are
they all fairly close in
value?
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
Descriptive Statistic Analysis
A set of number that “describe” a data
3 Categories of Descriptive statistics
Measures of central Measures of dispersion/
Measures of shape
tendency/ averages spreading data
Some “central” How ‘spread-out’ or Data can be
aspect of the data ‘dispersed’ the data plotted into a
is histogram to have a
general idea of
its shape, or
distribution.
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
1. Frequency
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
2. Central
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
2. Central
Data types
Nominal -- categorical data , gender, type of school.(categories)
Ordinal data-- Ordinal Number is a number that tells the position of something in a list, such as 1st, 2nd, 3rd,
4th, 5th etc.
Interval data -- 6-7,8-9,100-200
Ratio -- Ex: Number of Applications/Number of Loans
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
2. Central
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
2. Central
Exercise
1. Formula
2. Descriptive Statistics
3. Power Pivot
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
3. Dispersion
Variability is usually the
enemy. Being close to a
target value on average
is not good enough if
there is a lot of variability
around the target.
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
3. Dispersion
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
3. Dispersion
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
3. Dispersion
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
3. Dispersion
Usefulness of Standard Deviation
Variability is an important property of any numerical variable, and there are several measures for quantifying the amount of variability.
Of these, standard deviation is by far the most frequency quoted measure. It is measured in the same units as the variable, it has a long
tradition, and, at least for many data sets, it obeys the empirical rules discussed here. There empirical rules give a very specific meaning
to standard deviation.
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape – Standard Error
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape – Standard Error
Sai số chuẩn trong tiếng Anh là Standard Error, viết tắt là SE.
Sai số chuẩn (SE) là một thuật ngữ thống kê đo lường độ chính xác mà phân phối mẫu đại diện cho một tổng
thể bằng cách sử dụng độ lệch chuẩn. Trong thống kê, nếu một giá trị trung bình mẫu khác với giá trị trung bình
thực tế của tổng thể, sự chênh lệch này được gọi là sai số chuẩn của giá trị trung bình.
Sai số chuẩn càng nhỏ, mẫu đó sẽ càng đại diện cho tổng thể.
Mối quan hệ giữa sai số chuẩn và độ lệch chuẩn là như sau, đối với một cỡ mẫu nhất định, sai số chuẩn bằng
độ lệch chuẩn chia cho căn bậc hai của kích cỡ mẫu. Sai số chuẩn tỉ lệ nghịch với kích thước mẫu: cỡ mẫu càng
lớn, sai số chuẩn càng nhỏ vì thống kê sẽ gần hơn với giá trị thực tế.
Exercise
1. Formula
2. Descriptive Statistics
3. Power Pivot
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
Measure of shape
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
For highly skewed data, the median is
typically a better measure of central
tendency. The median is unaffected by the
extreme values, whereas the mean can be
very sensitive to extreme values.
Empirical Rules for Interpreting Standard
Deviation. These Empirical Rules give a
concrete meaning to standard deviation
for symmetric, bell-shaped distributions.
However, they tend to be les accurate for
skewed distributions.
Kurtosis is all about extreme events as Covid
19, 2008 Economic Crisis …
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
The rule of thumb seems to be: Kurtosis:
• If the skewness is between -0.5 and 0.5, the data are (1) Around the middle: Normal (Kurtosis
fairly symmetrical. =3), Kurtosis =5: % value around the
• If the skewness is between -1 and -0.5(negatively middle is bigger than normal case
skewed) or between 0.5 and 1(positively skewed), Ex: 68% revenue of the company
the data are moderately skewed. around 300-500 mio. VND/1 month, but
• If the skewness is less than -1(negatively skewed) or Leptokurtic: 95% revenue of the
greater than 1(positively skewed), the data are company around 300-500 mio. VND
highly skewed.
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape – Confident Level vs Confident Interval
The confidence level refers to the long-term success rate of the method, that is, how often this type of interval will
capture the parameter of interest.
A specific confidence interval gives a range of plausible values for the parameter of interest.
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape – Confident Level vs Confident Interval
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
Phân phối chuẩn là dạng phân phối xác suất quan trọng nhất trong thống kê.
Một biến có dạng phân phối chuẩn khi phân phối xác suất của nó có dạng hình chuông
“bell-shaped curve”. Mean nằm chính giữa (hình dưới, mean=100), và các giá trị của
biến xoay quanh mean. Mean=median=mode.
Phân phối chuẩn tắc
(standardized normal distribution)
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
Why the Normal Distribution the basis for so
much of statistical theory?
• One reason is practical. Many histograms
based in real data resemble the bell-shaped
normal curve to a remarkable extent.
Granted, not all histograms are symmetric
and bell-shaped, but a surprising number
are.
• Another reason is theoretical. The Normal
Distribution has many appealing properties
that have enabled researchers to build the
rich statistical theory that finds widespread
use in business, the sciences, and other fields.
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
4. Shape
1. Formulas: NORM.S.DIST,
NORM.INV, BINOM.DIST, ...
2. Stat Tool, Fit Tool ...
3. Frequency, Bins Chart
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
Histograms versus Summary Measures
It is import to remember that each of the summary measures we have discussed for a
numerical variables – mean, media, standard deviation, and others – describes one aspect of
a numerical variable. In contrast, a histogram provides the complete picture. It indicates the
“center” of the distribution, the variability, the skewness, and other aspects, all in one
convenient chart.
A histogram can be created by Excel tools only.
Histograms versus Box Plots
Histograms and Box Plots are complementary ways of displaying the distribution of numerical
variable. Although histograms are much more popular and are arguably more intuitive, box-
plots are still informative. Besides, side-by-side box plots are very useful for comparing 2 or
more populations.
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
5. Time Series
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
5. Time Series
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
5. Time Series
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
5. Time Series
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
5. Time Series
Descriptive Analytics
3. Descriptive Analytics – Numerical Variable
5. Time Series
THANK YOU!