[go: up one dir, main page]

0% found this document useful (0 votes)
47 views31 pages

Iba Unit - Ii

The document discusses different types of descriptive statistics including measures of central tendency, measures of dispersion or variability, and data visualization tools. It provides details on calculating and interpreting various descriptive statistics like mean, median, mode, range, standard deviation, variance, and coefficient of variation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views31 pages

Iba Unit - Ii

The document discusses different types of descriptive statistics including measures of central tendency, measures of dispersion or variability, and data visualization tools. It provides details on calculating and interpreting various descriptive statistics like mean, median, mode, range, standard deviation, variance, and coefficient of variation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT - II

DESCRIPTIVE ANALYTICS
Descriptive statistics refers to a set of methods used to
summarize and describe the main features of a dataset,
such as its central tendency, variability, and
distribution. These methods provide an overview of
the data and help identify patterns and relationships.

Descriptive statistics are methods used to summarize and describe the main
features of a dataset.

 Examples include measures of central tendency, such as mean, median, and


mode, which provide information about the typical value in the dataset.

 Measures of variability, such as range, variance, and standard deviation,


describe the spread or dispersion of the data.

 Descriptive statistics can also include graphical methods, including histograms,


box plots, and scatter plots, to visually represent the data.
• Descriptive statistics is important because it allows us to summarize and
describe data meaningfully.
 It helps us understand a dataset's main features and characteristics,
identify patterns and trends, and gain insights from the data.
 Descriptive statistics provide a foundation for further analysis,
decision-making, and communication of findings.
Types of Descriptive statistics
Descriptive statistics allow you to characterize your data based on its properties. There
are four major types of descriptive statistics:

1. Measures of Frequency:
 Count, Percent, Frequency
 Shows how often something occurs
 Use this when you want to show how often a response is given

2. Measures of Central Tendency


 Mean, Median, and Mode
 Locates the distribution by various points
 Use this when you want to show how an average or most commonly indicated
response
3. Measures of Dispersion or Variation
 Range, Variance, Standard Deviation
 Identifies the spread of scores by stating intervals
 Range = High or Low points
 Variance or Standard Deviation = difference between observed score and mean
 Use this when you want to show how "spread out" the data are. It is helpful to
know when your data are so spread out that it affects the mean

4. Measures of Position
 Percentile Ranks, Quartile Ranks
 Describes how scores fall in relation to one another. Relies on standardized scores
 Use this when you need to compare scores to a normalized score.
Measures of Central Tendency
• Mean : Mean Provides a measure of central location for the data. The mean
(average) of a data set is defined as the ratio of sum of all observations to the number
of observations
Measures of Central Tendency
• Median: The value of the middle-most observation obtained after arranging the data in
ascending order is called the median of the data. Many an instance, it is difficult to consider
the complete data for representation, and here median is useful. Among the statistical
summary metrics, the median is an easy metric to calculate. Median is also called the Place
Average, as the data placed in the middle of a sequence is taken as the median.
Measures of Central Tendency
• Mode : The mode of a set of data is simply the value that appears most frequently in
the set.
 The word modal is often used when referring to the mode of a data set.
 If a data set has only one value that occurs most often, the set is called unimodal.
 A data set that has two values that occur with the same greatest frequency is referred
to as bimodal.
 When a set of data has more than two values that occur with the same greatest
frequency, the set is called multimodal.
Measures of Dispersion or Variability
1. Range
• Range refers to the difference between each series’ minimum and maximum values. The
range offers us a good indication of how dispersed the data is, but we need other measures of
variability to discover the dispersion of data from central tendency measurements. A range is
the most common and easily understandable measure of dispersion. It is the difference
between two extreme observations of the data set. If X max and X min are the two extreme
observations then
Range = X max – X min or Range = Highest Value – Lowest Value

Merits of Range Demerits of Range


• It is the simplest of the measure of dispersion • It is based on two extreme observations.
Hence, get affected by fluctuations
• Easy to calculate
• A range is not a reliable measure of
• Easy to understand dispersion
• Independent of change of origin • Dependent on change of scale
Measures of Dispersion or Variability
2. Standard Deviation is a measure of how much the data is dispersed from its mean. A
high standard deviation implies that the data values are more spread out from the mean.
Standard Deviation is often denoted by the symbol SD or the Greek symbol σ or the Latin
letter ‘s’. SD or σ is used for population standard deviation and ‘s’ is used for sample
standard deviation.

Standard Deviation Formula


Measures of Dispersion or Variability
3. Variance
• The variance is a measure of how much the data deviates from the mean. It is
calculated by finding the difference between each value in the dataset and the mean,
squaring the differences, and then averaging them. Variance is a more complex
measure of dispersion but it is relatively less affected by outliers.
Measures of Dispersion or Variability
4. Coefficient of variation
• The coefficient of variation (relative standard deviation) is a statistical measure of the
dispersion of data points around the mean. The metric is commonly used to compare the
data dispersion between distinct series of data. Unlike the standard deviation that must
always be considered in the context of the mean of the data, the coefficient of variation
provides a relatively simple and quick tool to compare different data series.
• Data visualization is the graphical representation of information and data in a
pictorial or graphical format (Visualization of Data could be: charts, graphs,
and maps).
• Data visualization tools provide an accessible way to see and understand
trends, patterns in data, and outliers.
• Data visualization tools and technologies are essential to analyzing massive
amounts of information and making data-driven decisions.
• The concept of using pictures is to understand data that has been used for
centuries. General types of data visualization are Charts, Tables, Graphs, Maps,
and Dashboards.
• Additionally, it provides an excellent way for employees or business owners to
present data to non-technical audiences without confusion.
 Data-ink ratio is the proportion of data-ink to the total amount of ink used in a
table or chart.
 Data-ink is the ink used in a table or chart that is necessary to convey the
meaning of the data to the audience.
 Non Data-ink is the ink used in a table or chart that serves no useful purpose in
conveying the data to the audience.
 The data-ink ratio can also be represented as:
• This is an example of a graph with a low
Data-Ink Ratio.
• The border around the graph, the
background color and the grid lines are all
unnecessary data ink.

• an example of a graph with a high Data-


Ink Ratio
• We have deleted the border around the
graph, the background color and the grid
lines and have thus drawn the viewer's
attention to horizontal scales that are data-
ink. There is nothing else to distract and
the key features of the data stand out
clearly.
Data Visualization tools
1. Table
A table is an arrangement of classified data in rows and columns or possibly in a
more complex structure.
• Tables are widely used in communication, research and data analysis.
Tables should be used when:
• Readers need to refer to specific numerical values.
• Readers need to make precise comparisons between different values and not
just relative comparisons.
• The values being displayed have different units or very different magnitudes.
Line chart of monthly costs & revenues at ABC Company
• Table Design Principles
While designing the tables,
 Appropriate data-ink ratio should be considered
 Avoid use of unnecessary ink in tables
 Avoid vertical lines in a table unless they are necessary for clarity
 Horizontal lines are necessary only for separating column titles or when
indicating a calculation has taken place
 Cross Tabulation
Table used for describing the data for two variables is a cross tabulation which provides
a tabular summary of data for two variables.
– Table Quality ratings , meal price for first 10 restaurents
Restaurant Quality Rating Meal Price ($)
1 Good 18 Quality Ratings are an example of
2 Very Good 22 Categorical data
3 Good 28
4 Excellent 38 Meal Price is an example of
5 Very Good 33 quantitative data
6 Good 28
7 Very Good 19
8 Very Good 11
9 Very Good 23
10 Good 13
• Cross tabulation of Quality Rating and meal price for first
10 restaurants
Meal Price ($)
Quality Rating 10-19 20-29 30-39 Total
Good 2 2 0 4
Very Good 2 2 1 5
Excellent 0 0 1 1
Total 4 4 2 10
 Pivot Tables:
A PivotTable is a powerful tool to calculate, summarize, and analyze data that lets you see
comparisons, patterns, and trends in your data.

A pivot table allows you to extract the significance from a large, detailed data set.
A pivot table is a summary of your data, packaged in a chart that lets you report on and explore
trends based on your information. Pivot tables are particularly useful if you have long rows or
columns that hold values you need to track the sums of and easily compare to one another.

In other words, pivot tables extract meaning from that seemingly endless jumble of numbers on
your screen. More specifically, it lets you group your data in different ways so you can draw
helpful conclusions more easily.

Pivot tables are a technique in data processing. They arrange and rearrange statistics inorder to
draw attention to useful information.
Charts
1. Scatter charts
• Scatter charts have been said to be one of the most versatile and useful inventions in the
history of statistical graphs. While this may be a bold claim, scatter charts take confusing
data and make sense of it. They are far more than just a tool for visualization; they are a
tool for discovery.

• Analyzing patient experiences helps the medical industry provide better services and
improve their offerings. Hospitals can aggregate patient feedback through several
sources: service calls, online forms, rating systems, interactive voice responses systems
(IVRs), and much more. These data sets can help health professionals attain high-quality
insights about their patients’ specific concerns, such as doctor availability, wait times,
communication, and medications.
• Like most other graph or chart types, a scatterplot has an X and a Y axis. The X is the
horizontal line with the independent variable and the Y is the vertical with the dependent
variable. An even scale is created on both axes, and then a mark or dot is made at the
point that represents the intersection of the two coordinates.
• There are other patterns to be found within a scatter chart:

Linear or nonlinear: A linear—straight—correlation can be formed through the data points,


but a non-linear correlation might show a curved relationship.
Weak or strong: The stronger the correlation is, the closer the dots will be together. A weak
correlation will have more data points spread out.

• In order to clearly show these relationships and trends, many scatter charts utilize trend
lines. A trend line is drawn on the chart to emphasize the direction and strength of the
trend.
2. Line charts
A line chart graphically represents an asset's price over time by connecting a series of data
points with a line. This is the most basic type of chart used in finance, and it typically only
depicts a security's closing prices. Line charts can be used for any time frame but most often
have day-to-day price changes.

• A line chart displays information as a series of data points connected by straight line
segments.
• A line chart visually represents an asset's price history using a single line.
• Line charts usually only plot the closing prices, thus reducing noise from less critical times in
the trading day, such as the open, high, and low prices.
• Line charts are very useful for time series data collected over a period of time
(minutes,hours,days,years,etc.,)
3. Sparkline Charts
• Sparkline is a tiny chart in a worksheet cell that provides a visual representation of data. Use
sparklines to show trends in a series of values, such as seasonal increases or decreases,
economic cycles, or to highlight maximum and minimum values. Position a sparkline near its
data for greatest impact.

4. Bar charts or column charts


A bar chart is a type of graph that is used to show and compare different measures for different
categories of data or data series. This chart type can either be in horizontal or in a vertical
orientation. In vertical form, it is usually called a column chart while in the horizontal form it is
referred to a bar chart. Horizontal bars are typically simply referred to as bars and vertical bars
as columns.
Despite the difference in representation, the names of these charts are usually often used
interchangeably. Bar and column charts are very commonly used in data representation because
they are simple to display data and very easy to interpret.
5. Pie Charts and 3-D Charts

• One kind of graph that displays the information in the circular graph is a pie chart. It is a
kind of graphical representation of data where the slices of pie show the relative sizes of the
data. A 3D pie chart is similar to a standard pie chart. They are configured the same way,
with the exception that the former uses Pie Series 3D for its series and Pie Chart 3D for the
chart. Additionally, it adds depth and angle parameters to customize the depth and angle at
which we view the chart.
• The pie 3D chart consists of a circle divided into sectors, each of which represents a
percentage of the total values in a dataset. The pie 3D chart gives the simple pie chart depth,
which adds a layer of aesthetics. The graph is useful for showing how much each item
contributes to the whole. The diameter of the graph’s arc and the size of the dependent
variable are inversely related. The arcs are joined to the circle’s centre by radial lines, which
cut the pie into slices. It is used to display the market share of brands in a specific industry or
the percentage split or contribution of various factors, such as the breakdown of sales by
product category.
6. Bubble charts
• A bubble chart is a way to display multiple data points and easily evaluate their relationships visually.
Bubble charts are effective visualizations that allow viewers to quickly analyze information from
several sources, making it easy to identify patterns in the data.
• A bubble chart is a type of graph that represents three variables using bubble sizes, colors, and
positions.
• Like other graphs, the bubble chart has an x-axis and y-axis to represent two variables, and the size of
the bubbles represents the third variable. The larger the bubble, the higher the value of the third
variable.
• For example, a bubble chart can be used to represent the relationship between age (x-axis), income (y-
axis), and expense (bubble size). The chart shows that as age increases, income also increases, but so
does the expense.
• Bubble charts are often used to compare data quickly and easily. We can understand the relationship
between multiple variables by looking at the bubble's size, position, and color. A bubble chart
representing the performance of different products or services can help identify the best-selling
product or service based on revenue, quantity sold, and customer satisfaction. This way, businesses
can analyze and make informed decisions based on the data.
Data Dashboards
• A data dashboard is an interactive tool that allows you to track, analyze, and display
KPIs and metrics. Modern dashboards allow you to combine real-time data from
multiple sources and provide you AI-assisted data preparation, chart creation, and
analysis. In this way, data dashboards turn raw data from across your organization
into data-driven insights that improve a specific process, a department, or your entire
business.

• Here are the four key benefits of any modern, well-designed dashboard.
1.Track and analyze your KPIs and metrics. Data dashboards make it easy for everyone
to gauge progress against KPIs and metrics. KPIs (key performance indicators) are
targets your teams should shoot for to make the most strategic impact on your
organization. For example, a KPI might be, “new clients per quarter”. Metrics are
measures of everyday activities that support your KPIs. An example metric might be,
“monthly prospect calls”.
• Turn big data into big value. Visualizing data from a wide range of sources across your
organization in charts, graphs, and maps on data dashboards helps you and other
stakeholders understand and engage with your data and then gain insights that improve your
business decisions.

• Make faster decisions. Well-designed dashboards tell you a story about your data at a quick
glance. The best data dashboards offer real-time analytics, letting you analyze and respond to
real-time information about your products, customers, and applications as it is generated.

• Make better forecasts. Top tools allow you to embed predictive analytics right within your
dashboards. These AI-powered dashboards help you make predictions about future outcomes
based on historical and current data. These predictions help you create more accurate
forecasts, mitigate risk, improve efficiency, and identify opportunities.

You might also like