BCPC 112: INTRODUCTION TO
BUSINESS STATISTICS
LECTURE TWO
Organizing and Visualizing Data
Emmanuel Kpeglo (Ph.D.) 22/05/2025 1
Learning Objectives
In this lecture, you learn:
To develop tables and charts for numerical
data
To develop tables and charts for categorical
data
The principles of properly presenting graphs
Emmanuel Kpeglo (Ph.D.) 22/05/2025 2
Organizing Data
Data collected in original form is called raw data
Raw data must be organized and visually represented
to make the information meaningful and easier to
understand.
Frequency distributions and visual displays help reveal
trends, patterns, and key characteristics in the data.
Organizing data allows for better comparison across
time, by different researchers, or from various sources.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 3
Organizing Qualitative Data
Qualitative Data
Tallying Data
One Two or More
Categorical Categorical
Variable Variables
Summary Contingency
Table Table
Emmanuel Kpeglo (Ph.D.) 22/05/2025 4
Summary Table
A summary table indicates the frequency, amount, or percentage of items
in a set of categories so that you can see differences between categories.
Summary Table From A Survey of 1000 Twenty-five soldiers were given a
Banking Customers blood test to determine their blood
type
Banking Preference Percent Blood Type Frequency Percent
ATM 16% A 5 20%
Automated or live telephone 2% B 7 35%
Drive-through service at branch 17% O 9 45%
In person at branch 41% AB 4 16%
Internet 24% Total 25 100%
Emmanuel Kpeglo (Ph.D.) 22/05/2025 5
Contingency Table
Used to examine relationships between two or more
categorical variables.
Displays a joint tally or cross-tabulation of responses for
the categorical variables.
For two variables, one is represented by the rows and
the other by the columns of the table.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 6
Contingency Table
(Service Calls Example)
A random sample of 400 Contingency Table Showing the
Frequency of Customer Service Calls
customer service calls is by Duration and Issue Resolution
selected. Status
Each call is classified as short, Issue Issue
Resolved Unresolved Total
moderate, or long in duration.
Short 170 20 190
Each call is also reviewed to Calls
determine whether the issue Moderate 100 40 140
was resolved or unresolved. Calls
The results are then Long Calls 65 5 70
summarized in the
contingency table shown to Total 335 65 400
the right.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 7
Contingency Table Based on
Percentage of Overall Total
Issue Issue 42.50% = 170 / 400
Resolved Unresolved Total 25.00% = 100 / 400
16.25% = 65 / 400
Short 170 20 190
Calls
Moderate 100 40 140
Calls Issue Issue
Resolved Unresolved Total
Long Calls 65 5 70
Short 42.50% 5.00% 47.5%
Calls
Total 335 65 400 Moderate 25.00% 10.00% 35.0%
Calls
83.75% of sampled calls have issues
resolved and 47.50% of sampled calls are Long Calls 16.25% 1.25% 17.5%
short in duration.
Total 83.75% 16.25% 100.0%
Emmanuel Kpeglo (Ph.D.) 22/05/2025 8
Contingency Table Based on
Percentage of Row Totals
Issue Issue 89.47% = 170 / 190
Resolved Unresolved Total 71.43% = 100 / 140
Short 170 20 190 92.86% = 65 / 70
Calls
Moderate 100 40 140
Calls Issue Issue
Resolved Unresolved Total
Long Calls 65 5 70
Short 89.47% 10.53% 100.0
Calls %
Total 335 65 400
Moderate 71.43% 28.57% 100.0
Moderate calls have a greater chance Calls %
(28.57%) of having issues unresolved than Long Calls 92.86% 7.14% 100.0
small (10.53%) or long (7.14%) calls. %
Emmanuel Kpeglo (Ph.D.) 22/05/2025 9
Contingency Table Based on
Percentage of Column Total
Issue Issue
Resolved Unresolved Total 50.75% = 170 / 335
30.77% = 20 / 65
Short 170 20 190
Calls
Moderate 100 40 140
Calls Issue Issue
Long Calls 65 5 70 Resolved Unresolved
Short 50.75% 30.77%
Calls
Total 335 65 400
Moderate 29.85% 61.54%
There is a 61.54% chance that invoices Calls
with errors are of medium size.
Long Calls 19.40% 7.69%
Total 100.0% 100.0%
Emmanuel Kpeglo (Ph.D.) 22/05/2025 10
Contingency Table
Pension Plan Example
Pension Plan
1 2 3 Total
classification
Salaried workers 182 213 203 598
Job
Hourly workers 154 138 110 402
Total 336 351 313 1 000
A company has to choose among three pension plans. Management, therefore,
decided to seek the opinions of a random sample of 1000 employees. The result is
shown in the table above. Pension Plan
classification 1 2 3 Total
Salaried workers 18.2% 21.3% 20.3% 59.8%
Job
Hourly workers 15.4% 13.8% 11.0% 40.2%
Total 33.6% 35.1% 31.3% 100.0%
Emmanuel Kpeglo (Ph.D.) 22/05/2025 11
Organizing Quantitative Data
Ordered Array
Frequency
Quantitative Data
Distribution
Cumulative
Distribution
Emmanuel Kpeglo (Ph.D.) 22/05/2025 12
Ordered Array
An ordered array is a sequence of data, in rank order, from the
smallest value to the largest value.
Shows range (minimum value to maximum value)
May help identify outliers (unusual observations)
Day Students
16 17 17 18 18 18
Age of
Surveyed
19 19 20 20 21 22
College 22 25 27 32 38 42
Students Night Students
18 18 19 19 20 21
23 28 32 33 41 45
Emmanuel Kpeglo (Ph.D.) 22/05/2025 13
Frequency Distribution
Class: Each category of the frequency distribution.
Frequency: The number of data values falling within each
class.
Class limits: The smallest and largest values that can actually
belong to a class.
Class boundaries: These are the actual limits used to avoid
gaps between adjacent classes, especially when dealing with
continuous data..
Class interval: The width of each class. This is the difference
between the lower limit of the class and the lower limit of
the next higher class.
Class mark: The midpoint of each class. This is midway
between the upper and lower class limits.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 14
Frequency Distribution
The frequency distribution (grouped or ungrouped) is a summary table
in which the data are arranged into numerically ordered classes.
Regarding grouped frequency distribution, you must give attention to
selecting the appropriate number of class groupings for the table,
determining a suitable width of a class grouping, and establishing the
boundaries of each class grouping to avoid overlapping.
The number of classes depends on the number of values in the data. With
a larger number of values, typically there are more classes. In general, a
frequency distribution should have at least 5 but no more than 15 classes.
To determine the width of a class interval, you divide the range (Highest
value–Lowest value) of the data by the number of class groupings desired.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 15
Example_ Ungrouped Frequency
Distribution
• Number of children per family
0 1 4 4 3 2 2 3 1 2 4 3 0 2 1 1 2 2
1 1 3 2 2 4 0 0 4 2 2 3 1 1 2 3 2 2
2 0 3 4 2 1 3 2 2 3 4 4 1 0 3 2 1 1
• Frequency distribution of the data in Table
Number of children Tally Frequency
0 //// / 6
1 //// //// // 12
2 //// //// //// /// 18
3 //// //// 10
4 //// /// 8
Total = 54
• This table is called a frequency table or a frequency distribution. It is so called
because it gives the frequency or number of times each observation occurs.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 16
Example 1_ Grouped Frequency
Distribution
A company that produces insulation products randomly selects 20
winter days to record the daily high temperatures, aiming to
analyze how their products perform under varying cold weather
conditions.
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select the number of desired classes: 5 (usually between 5
and 15)
Emmanuel Kpeglo (Ph.D.) 22/05/2025 17
Example 1_ Grouped Frequency
Distribution
Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits):
Class 1: 10 to less than 20
Class 2: 20 to less than 30
Class 3: 30 to less than 40
Class 4: 40 to less than 50
Class 5: 50 to less than 60
Compute class midpoints: 15, 25, 35, 45, 55
Count observations & assign to classes
Emmanuel Kpeglo (Ph.D.) 22/05/2025 18
Example 1_ Grouped Frequency
Distribution
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class Midpoints Frequency
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5
40 but less than 50 45 4
50 but less than 60 55 2
Total 20
Emmanuel Kpeglo (Ph.D.) 22/05/2025 19
Example 1_ Grouped Frequency
Distribution
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Relative Percentage
Class Frequency
Frequency
10 but less than 20 3 0.15 15%
20 but less than 30 6 0.30 30%
30 but less than 40 5 0.25 25%
40 but less than 50 4 0.20 20%
50 but less than 60 2 0.10 10%
Total 20 1.00 100%
Emmanuel Kpeglo (Ph.D.) 22/05/2025 20
Example 1_ Grouped Frequency
Distribution
NB
1. Relative frequency distribution describes the proportion or percentage of data values that fall within each category.
2. Cumulative frequency is used to determine the number of observations that lie above (or below) a particular value in a data set.
3. Cumulative % finds the % of the number of observations that lie above (or below) a particular value in a data set.
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage
10 but less than 20 3 15% 3 15%
20 but less than 30 6 30% 9 45%
30 but less than 40 5 25% 14 70%
40 but less than 50 4 20% 18 90%
50 but less than 60 2 10% 20 100%
Total 20 100% 20
Emmanuel Kpeglo (Ph.D.) 22/05/2025 21
Example 2_ Grouped Frequency
Distribution
The following data represent the record high temperatures for each of the 50
states. Construct a grouped frequency distribution for the data using 7 classes.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Find range: 134 - 100 = 34
Compute class interval (width): 5 (34/7 then round up)
For convenience’s sake, we will choose the lowest data value,
100, for the first lower class limit.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 22
Example 2_ Grouped Frequency
Distribution
The subsequent lower class limits are found by adding
the width to the previous lower class limits.
Class Limits
100 - 104
The first upper class limit is one
105 - 109 less than the next lower class
110 - 114 limit.
115 - 119
120 - 124 The subsequent upper class
125 - 129 limits are found by adding the
130 - 134 width to the previous upper class
limits.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 23
Example 2_ Grouped Frequency
Distribution
The class boundary is midway between an upper class
limit and a subsequent lower class limit.
104,104.5,105
Class Class Cumulative
Frequency
Limits Boundaries Frequency
100 - 104 99.5 - 104.5
105 - 109 104.5 - 109.5
110 - 114 109.5 - 114.5
115 - 119 114.5 - 119.5
120 - 124 119.5 - 124.5
125 - 129 124.5 - 129.5
130 - 134 129.5 - 134.5
Emmanuel Kpeglo (Ph.D.) 22/05/2025 24
Example 2_ Grouped Frequency
Distribution
Tally the data.
Find the frequencies.
Class Class Cumulative
Frequency
Limits Boundaries Frequency
100 - 104 99.5 - 104.5 2
105 - 109 104.5 - 109.5 8
110 - 114 109.5 - 114.5 18
115 - 119 114.5 - 119.5 13
120 - 124 119.5 - 124.5 7
125 - 129 124.5 - 129.5 1
130 - 134 129.5 - 134.5 1
Emmanuel Kpeglo (Ph.D.) 22/05/2025 25
Example 2_ Grouped Frequency
Distribution
Find the cumulative frequencies by keeping a
running total of the frequencies.
Class Class Cumulative
Frequency
Limits Boundaries Frequency
100 - 104 99.5 - 104.5 2 2
105 - 109 104.5 - 109.5 8 10
110 - 114 109.5 - 114.5 18 28
115 - 119 114.5 - 119.5 13 41
120 - 124 119.5 - 124.5 7 48
125 - 129 124.5 - 129.5 1 49
130 - 134 129.5 - 134.5 1 50
Emmanuel Kpeglo (Ph.D.) 22/05/2025 26
Exercise 2.1
In a certain study, blood glucose levels (in mg/dl) of a sample of students of St.
Andrew High School were measured.
103 125 120 118 117 109 114 118 131 118
116 119 117 119 110 117 124 117 124 113
127 127 114 129 120 105 121 112 115 126
101 114 128 125 109 122 123 130 115 123
(a) State the population and the variable in the study.
(b) Make a frequency table of the data using the class intervals 100 – 104, 105 – 109,
110 – 114, …, 130 – 134.
(c) Obtain the class boundaries, class mid-points and class widths of the frequency
distribution in part (a).
(d) Find the relative frequencies and the cumulative frequencies of the frequency
distribution in part (a).
Emmanuel Kpeglo (Ph.D.) 22/05/2025 27
Why Use a Frequency Distribution?
It condenses the raw data into a more
useful form
It allows for a quick visual interpretation of
the data
It enables the determination of the major
characteristics of the data set, including
where the data are concentrated/clustered
Emmanuel Kpeglo (Ph.D.) 22/05/2025 28
Frequency Distributions
Some Tips
Different class boundaries may provide different pictures for
the same data (especially for smaller data sets)
Shifts in data concentration may show up when different
class boundaries are chosen
As the size of the data set increases, the impact of
alterations in the selection of class boundaries is greatly
reduced
When comparing two or more groups with different sample
sizes, you must use either a relative frequency or a
percentage distribution
Emmanuel Kpeglo (Ph.D.) 22/05/2025 29
Visualizing Qualitative Data
Qualitative Data
Visualizing Data
Summary Table Contingency
For One Variable Table For Two
Variables
Bar Pareto Side By Side Bar
Chart Pie Chart Chart Chart
Emmanuel Kpeglo (Ph.D.) 22/05/2025 30
The Bar Chart
In a bar chart, a bar shows each category, the length of which
represents the amount, frequency or percentage of values falling
into a category which come from the summary table of the variable.
Banking Preference
Banking Preference? % Internet
ATM 16%
In person at branch
Automated or live 2%
telephone
Drive-through service at branch
Drive-through service at 17%
branch
In person at branch 41% Automated or live telephone
Internet 24%
ATM
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Emmanuel Kpeglo (Ph.D.) 22/05/2025 31
Practical Illustration Using MS Excel
(Bar Chart)
• Select both variables (together with their
heading)
• Insert, bar, 3-D column, ok
• Then label
Emmanuel Kpeglo (Ph.D.) 22/05/2025 32
The Pie Chart
The pie chart is a circle broken up into slices that represent categories.
The size of each slice of the pie varies according to the percentage in
each category.
Banking Preference
Banking Preference? %
16% ATM
ATM 16% 24%
Automated or live 2% 2% Automated or live
telephone telephone
Drive-through service at
Drive-through service at 17%
17% branch
branch
In person at branch
In person at branch 41%
Internet 24% Internet
41%
Emmanuel Kpeglo (Ph.D.) 22/05/2025 33
The Pareto Chart
Used to portray categorical data (nominal scale)
A vertical bar chart, where categories are
shown in descending order of frequency
A cumulative polygon is shown in the same
graph
Used to separate the “vital few” from the “trivial
many”
Emmanuel Kpeglo (Ph.D.) 22/05/2025 34
The Pareto Chart (con’t)
Pareto Chart For Banking Preference
100% 100%
% in each category
80% 80%
Cumulative %
(line graph)
(bar graph)
60% 60%
40% 40%
20% 20%
0% 0%
In person Internet Drive- ATM Automated
at branch through or live
service at telephone
branch
Emmanuel Kpeglo (Ph.D.) 22/05/2025 35
Practical Illustration Using MS Excel
(The Pareto Chart)
• Place cursor on one of the data series
• Click on filter series and select largest to smallest
• Calculate total
• Generate cumulative frequency
• Create and generate percentage column
• Highlight all percent data click on percent sign
• Highlight all data except cum freq
• Insert, recommended chart, ok
Emmanuel Kpeglo (Ph.D.) 22/05/2025 36
The Component Bar Chart
The table shows the distribution of sales of agricultural produce from Asiedu Farm in
1995, 1996 and 1997. Illustrate the information with a component bar chart.
Palm oil
Sales (million dollars)
Cocoa
Coffee
1995 1996 1997
Year
A component bar chart of the data in Table 2.24
The sales of agricultural produce consist of three components: the sales of coffee,
cocoa, and palm oil. The component bar chart shows the changes of each component
over the years as well as the comparison of the total sales between different years.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 37
Side by Side Bar Charts
The side by side bar chart represents the data from a contingency table.
No
Errors Errors Total
Invoice Size Split Out By Errors
Small 50.75% 30.77% 47.50%
& No Errors
Amount
Medium 29.85% 61.54% 35.00% Errors
Amount
Large 19.40% 7.69% 17.50% No Errors
Amount
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%
100.0% 100.0% 100.0%
Large Medium Small
Total
Invoices with errors are much more likely to be of
medium size (61.54% vs 30.77% and 7.69%)
Emmanuel Kpeglo (Ph.D.) 22/05/2025 38
Visualizing Quantitative Data
Quantitative Data
Frequency Distributions and
Ordered Array Cumulative Distributions
Stem-and-Leaf
Display Histogram Polygon Ogive
Emmanuel Kpeglo (Ph.D.) 22/05/2025 39
Stem-and-Leaf Display
A simple way to see how the data are distributed
and where concentrations of data exist
METHOD: Separate the sorted data series
into leading digits (the stems) and
the trailing digits (the leaves)
Emmanuel Kpeglo (Ph.D.) 22/05/2025 40
Stem and Leaf Display
A stem-and-leaf display organizes data into groups (called
stems) so that the values within each group (the leaves)
branch out to the right on each row.
Age of College Students
Day Students Day Students Night Students
16 17 17 18 18 18 Stem Leaf
Age of Stem Leaf
19 19 20 20 21 22
Surveyed 1 67788899
22 25 27 32 38 42 1 8899
College
Students Night Students 2 0012257 2 0138
18 18 19 19 20 21
3 28 3 23
23 28 32 33 41 45
4 2
4 15
• All the measurements are two-digit numbers leading to one-digit stems
and one-digit leaves.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 41
Double Stem-and-Leaf Display
Double stem and leaf plots are used to compare
two distributions side-by-side
Emmanuel Kpeglo (Ph.D.) 22/05/2025 42
The Histogram
A vertical bar chart of the data in a frequency distribution is
called a histogram.
In a histogram there are no gaps between adjacent bars.
The class boundaries (or class midpoints) are shown on the
horizontal axis.
The vertical axis is either frequency, relative frequency, or
percentage.
The height of the bars represent the frequency, relative
frequency, or percentage.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 43
The Histogram
Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20 8
50 but less than 60 2 .10 10 Histogram: Age Of Students
Total 20 1.00 100
6
Frequency
4
(In a relative/percentage
histogram the vertical axis
would be defined to show 2
the relative
frequency/percentage of
observations per class) 0
5 15 25 35 45 55 More
Emmanuel Kpeglo (Ph.D.) 22/05/2025 44
How to do histogram in excel
• Highlight/select data with names
• Insert
• Histogram
Or
• Highlight/select data with names
• Insert
• Choose Chart-bar chart
• Click on series option (bar icon)
• Reduce series width to 0%
Emmanuel Kpeglo (Ph.D.) 22/05/2025 45
The Polygon
A percentage polygon is formed by having the midpoint of
each class represent the data in that class and then connecting
the sequence of midpoints at their respective class
percentages.
The cumulative percentage polygon, or ogive, displays the
variable of interest along the X axis, and the cumulative
percentages along the Y axis.
Useful when there are two or more groups to compare.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 46
The Frequency Polygon
Class
Class Midpoint Frequency
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5 Frequency Polygon: Age Of Students
40 but less than 50 45 4
50 but less than 60 55 2
7
Frequency 6
5
4
3
2
(In a percentage polygon 1
the vertical axis would be 0
defined to show the 5 15 25 35 45 55 65
percentage of
Class Midpoints
observations per class)
Emmanuel Kpeglo (Ph.D.) 22/05/2025 47
How to do Polygon on Excel
• Generate class midpoints
• Generate frequencies of classes
• Insert horizontal row at top and bottom of frequency and
place zero (0) in those two places
• Highlight both classes and frequencies
• Insert, line graph, ok
• Then label graph
• To label classes, right click horizontal axis, select data, edit,
copy names/class names from table (including empty space
for zeros rows created)
Emmanuel Kpeglo (Ph.D.) 22/05/2025 48
The Ogive (Cumulative % Polygon)
Lower % less
class than lower
Class boundary boundary
10 but less than 20 10 15
20 but less than 30 20 45
30 but less than 40 30 70
40 but less than 50 40 90
50 but less than 60 50 100 Ogive: Age Of Students
Cumulative Percentage
100
80
60
40
(In an ogive the percentage of 20
the observations less than 0
each lower class boundary are
plotted versus the lower class 10 20 30 40 50 60
boundaries.
Emmanuel Kpeglo (Ph.D.) 22/05/2025 49
How to do Ogive on Excel
Generate class midpoints and cumulative frequency
Insert a new row and place zero (0), to become first cum freq.
value
Then based on the existing midpoints, generate a new class for
the new zero frequency
Highlight both classes/midpoints and cum. Freq.
Insert, Line graph, ok
OR
Highlight cumulative frequency alone, insert, line graph, ok
And then edit class names and label
Emmanuel Kpeglo (Ph.D.) 22/05/2025 50
The Scatter Plot
Scatter plots are used for numerical data consisting of paired
observations taken from two numerical variables
One variable is measured on the vertical axis and the other
variable is measured on the horizontal axis
Scatter plots are used to examine possible relationships
between two numerical variables
Emmanuel Kpeglo (Ph.D.) 22/05/2025 51
Example_Scatter Plot
Volume Cost per
per day day Cost per Day vs. Production Volume
23 125
250
26 140
200
Cost per Day
29 146
150
33 160
100
38 167
50
42 170
0
50 188
20 30 40 50 60 70
55 195
Volume per Day
60 200
Emmanuel Kpeglo (Ph.D.) 22/05/2025 52
The Time Series Plot
A Time Series Plot is used to study
patterns in the values of a numeric
variable over time
The Time Series Plot:
Numeric variable is measured on the
vertical axis and the time period is
measured on the horizontal axis
Emmanuel Kpeglo (Ph.D.) 22/05/2025 53
Example_Time Series Plot
Number of
Year Franchises Number of Franchises, 1996-2004
120
1996 43
100
1997 54
Franchises
Number of
80
1998 60 60
1999 73 40
2000 82 20
0
2001 95
1994 1996 1998 2000 2002 2004 2006
2002 107 Year
2003 99
2004 95
Emmanuel Kpeglo (Ph.D.) 22/05/2025 54
Emmanuel Kpeglo (Ph.D.) 22/05/2025 55