[go: up one dir, main page]

0% found this document useful (0 votes)
51 views20 pages

STA 111 - Topic One - Lecture 2

Statistical learning lecture 2

Uploaded by

kocheinoel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views20 pages

STA 111 - Topic One - Lecture 2

Statistical learning lecture 2

Uploaded by

kocheinoel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

STA 111: Probability and Statistics 1

TOPIC ONE: INTRODUCTION, NATURE OF STATISTICAL DATA AND


PRESENTATION

Lecture Two: Data Presentation


Introduction

In Lecture One, Introduction and Nature of Statistical Data was explained and the
following sections notes can be accessed on the week one forum on the ODEL platform.

1.1 Objectives
1.2 Introduction
1.3 Data
This is a continuation of Topic One lecture notes.

1.4 The Representation of Data

1.4.1 Objectives
By the end of the lecture the learner should be able to:
i) Summarize a set of data using a table or frequency distribution table.
ii) Display data graphically using bar graphs, histogram, frequency polygon,
frequency and Ogive curve and interpret the graphs.

1.4.2 Introduction
When data is collected (raw data) it’s usually not organized. After the data have been
collected, the next step is to present them in some suitable form. The need for proper
presentation arises because of the fact that statistical data in raw form almost defy
comprehension. Frequently the first stage in presenting data is to produce a table.
When the data consists of a few figures, it can be easily presented and understood. But
when the number of figures is very large, a proper classification is essential for analysis
and deriving valid differences.

Next is to represent the data diagrammatically or graphically.


A statistical graph is a tool that helps you learn about the shape or distribution of a
sample or a population. A graph can be a more effective way of presenting data than a
mass of numbers because we can see where data clusters and where there are only a
few data values. Newspapers and the Internet use graphs to show trends and to enable
readers to compare facts and figures quickly. Statisticians often graph data first to get a
picture of the data. Then, more formal tools may be applied.

Page 1 of 20
Some of the types of graphs that are used to summarize and organize data are the dot
plot, the bar graph, the histogram, the stem-and-leaf plot, the frequency curve, the
frequency polygon, the pie chart, the box plot and the cumulative frequency (ogive)
curve. In this course, we will briefly look at, histogram, line graphs, and bar graphs, as
well as frequency polygons, and the cumulative frequency curve.

1.4.3 Frequency Distribution


One such method of presentation is in the form of a frequency distribution.
The frequency of a value is the number of times a value is repeated. When the number
of observations is small and we have repetitions of the values, we can arrange them in
the form of a table according to their magnitudes with corresponding frequencies. Such
a table is called frequency table. Therefore:
A frequency table/distribution is a listing of possible values for a variable, together
with the number of observations or relative frequencies for each value (Tabular
arrangement of data by classes with their corresponding frequencies)
Ungrouped Data
Suppose we record some observations or measurements given by numbers where some
occur only once, while others are repeated several times.
Recording the numbers as they appear is tedious. This is known as “ungrouped (or
raw) data”.
When the data set contains only a relative small number of distinct or different values
(discrete distribution), it is convenient to represent it in an ungrouped frequency
distribution table which present each distinct value along with its frequency of
occurrence.

Example 1.1
The following set of data consists of exam scores for 25 students
3,3,6,4,5,4,10,5,29,3,5,6,10,31,4,10,3,29,5,31,29,11,31,6,10
Construct an ungrouped frequency distribution table to represent this data set.
Solution
Steps of Construction of Ungrouped frequency distribution table:
1. Identify the smallest and the largest value in the data set and arrange all the values
in the data set in ascending (or descending) order.
2. Tally the number of times each value is appearing in the data.
3. Count the number of tallies of each quantity and record them as the frequency for
the value.
• The smallest value is 3 and the largest is 31
• Arranging the values in ascending order we obtain:
3,3,3,4,4,4,5,5,5,5,6,6,6,10,10,10,10,11,20,29,29,31,31,31,31
• Next step is to construct a frequency distribution table as follows:
( N/B – if the tally mark is 5 we use //// and not ///// )

Page 2 of 20
Scores (x) Tallies Frequency (f)

3 /// 3

4 /// 3
5 //// 4
6 /// 3

10 //// 4

11 / 1
29 /// 3
31 //// 4
total 25

 Categorical Frequency Distributions


The categorical frequency distribution is used for data that can be placed in specific
categories, such as nominal- or ordinal-level data. For example, data such as political
affiliation, religious affiliation, blood group, tree species or major field of study would
use categorical frequency distributions.

 Grouped Frequency Distribution Tables (Classification according to class-


intervals)
If amount of data is large we put it into groups /categories /classes and determine
number of units in each category (class frequency).
A “grouped frequency distribution table” normally has columns which show the class
intervals, class mid-points, class frequencies, and “cumulative frequencies”, the last of
these being a running total of the frequencies themselves. There may also be a column
of “tallied frequencies”, if the table is being constructed from the raw data without
having first arranged the values in rank order.

Classification according to class-intervals and Principles of Classification


1. For the purpose of further calculations in statistical work the mid-point of each class
is taken to represent that class.
2. There are two methods of classifying the data according to class-intervals, namely;
a) “Exclusive” method: When the class-intervals are so fixed that the upper limit of
one class is the lower limit of the next class, it is known as the “Exclusive”
method classification. It is clear that the “exclusive’ method ensures continuity of
data in as much as the upper limit of one class is the lower limit of the next class.

Page 3 of 20
b) “Inclusive” method: Under the “Inclusive” method of classification, the upper
limit of one class is included in that class itself.
3. The number of classes denoted by 𝑘 falls between 5 and 15. (However, there is no
rigidity about it. The classes can be more than 15 depending upon the total number
of observations in the data and the details required). Further, the precise number of
classes to be used for a given variable may depend upon personal judgment and
other considerations such as the details required, the ease of calculation of further
statistical work, etc.
4. The classes should be mutually exclusive.
5. The starting point, i.e., the lower limit of the first class, should either be zero or 5 or
multiples of 5. For example, if the lowest value of the data is 63 and we have taken a
class-interval of 10, then the first class should be 60 – 70, instead of 63 – 73.
6. To ensure continuity and to get correct class-interval we should adopt “exclusive”
method of classification. However, where “inclusive” method has been adopted it is
necessary to make an adjustment to determine the correct class-interval and to have
continuity. See steps in the Construction of a Grouped Frequency Distribution
below. The adjustment consists of finding the difference between the lower limit of
the second class and the upper limit of the first class, dividing the difference by two,
subtracting the value so obtained from all lower limits and adding the value to all
upper limits. This can be expressed in the formula as follows;
Lower Limit of the 2nd class  Upper Limit of the 1st class
𝑪𝒐𝒓𝒓𝒆𝒄𝒕𝒊𝒐𝒏 𝒇𝒂𝒄𝒕𝒐𝒓 =
2
7. Whenever possible all classes should be of the same size.

Steps in the Construction of a Grouped Frequency Distribution


1. Select the number of classes 𝑘. One such guideline is to pick k such that 2𝑘 ≥ 𝑛, so
that if the sample size, n = 20, k = 5 because 25 = 32 > 𝑛 and if n = 80, k = 7
log 𝑛
because 27 = 128 > 𝑛. To be more specific, we can solve for k to get 𝑘 > .
log 2
Alternatively Sturges suggested the following formula for determining the
approximate number of classes:
𝒌 = 𝟏 + 𝟑. 𝟑𝟐𝟐𝐥𝐨𝐠⁡ (𝒏)
where k = The approximate number of classes, n = Total number of observations
and log = The ordinary logarithm to the base of 10.
2. Find the largest and smallest values and compute the working range denoted by R.
R = Maximum Value − Minimum value (or Desired Lower Class Limit (LCL) of
starting class). LCL of the starting class is normally the Minimum value in the data
or any other value slightly less than the minimum value.
3. Identify the smallest unit of measurement (𝑢) used in the data collection. The value
of 𝑢 can be inferred from the given data or the given starting value (usually tens
(10), ones (1), tenth (0.1) and hundredth (0.01) etc.
(𝑢 = 𝐿𝐶𝐿 𝑜𝑓 2𝑛𝑑 𝑐𝑙𝑎𝑠𝑠 − (𝑈𝐶𝐿 𝑜𝑓 2𝑛𝑑 𝑐𝑙𝑎𝑠𝑠) )

Page 4 of 20
Estimate the class interval (𝑖) (sometimes denoted by 𝑐) as

𝐿𝑎𝑟𝑔𝑒𝑠𝑡 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 −𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒


𝑡𝑕𝑒 𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡𝑕 𝐶𝑊(𝑖) = 𝐷𝑒𝑠𝑖𝑟𝑎𝑏𝑙𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠

𝑅
or 𝑖 = 𝑅𝑜𝑢𝑛𝑑 𝑢𝑝 𝑡𝑜 𝑡𝑕𝑒 𝑛𝑒𝑎𝑟𝑒𝑠𝑡 𝑢.
𝑘
Note: You must Round Up, not Round Off. For u = 1, Round Up (5.2) = 6 not 5 and for u
= 0.1 Round Up is exact (no remainder when divided by u) add one to the number of
classes. Or simply put round 𝑖 to the next highest whole number so that the classes
cover the whole data.

4. The starting value used in calculation of R above is picked as the lower class limit
(LCL) of the first class. Add the class interval 𝑖 to this LCL successfully to get the rest
of the lower class limits.
5. Find the Upper Class Limit (UCL) of the first class by subtracting 𝑢 from the LCL of
the second class. Then continue to add the class interval 𝑖 to this UCL to find the rest
of the upper limits.
6. If necessary, find the class boundaries (CB) for each class as follows.
Lower Class Boundary 𝐿𝐶𝐵 = 𝐿𝐶𝐿 − 0.5𝑢 (0.5𝑢 = the correction factor)
Upper Class Boundary 𝑈𝐶𝐵 = 𝑈𝐶𝐿 + 0.5𝑢
7. Tally the number of observations falling in each class and find the frequencies.
Note: A value x falls into a class LCL − UCL only if LCB ≤ x < UCB. That is x can be
equal to LCB but not UCB of that class.
8. Record the number of tallies in each category as the class frequencies.
9. Compute the cumulative frequencies to confirm that the last value of the column is
equal to the sum of the frequencies.
10. Compute the midpoints of each class using the class boundaries.

Example 1.2
The idea of grouped data can also be illustrated by considering the following raw
dataset:
Time taken (in seconds) by a group of students to answer a simple math question
20 25 24 33 13 16 21 17 11 34
26 8 19 31 11 14 15 21 18 17
The above data can be organized into a frequency distribution (or a grouped data) in
several ways. One method is to use intervals as a basis.

Page 5 of 20
The smallest value in the above data is 8 and the largest is 34. The interval from 8 to 34
is broken up into smaller subintervals (called class intervals). Suppose we want to have
number of classes as
𝑘 = 𝟏 + 𝟑. 𝟑𝟐𝟐𝐥𝐨 𝐠 𝟐𝟎 = 𝟓. 𝟑𝟐𝟐 which we round to 𝟔
Then the class width is obtained as:
34−8
𝑡𝑕𝑒 𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡𝑕 𝐶𝑊 = = 4.33 rounding to the next whole number 𝐶𝑊 𝑜𝑟 𝑖 = 5
6

The results are tabulated as a frequency distribution as follows:


Frequency distribution of the time taken (in seconds) by the group of students to
answer a simple math question:
a) Using exclusive method of classification;
Time taken (in Time taken Tallies Cumulative Class mid
seconds) Frequency
(in seconds) frequencies point
5-10 5 ≤ t < 10 / 1 1 7.5
10-15 10 ≤ t < 15 //// 4 5 12.5
15-20 15 ≤ t < 20 ///// 6 11 17.5
20-25 20 ≤ t < 25 //// 4 15 22.5
25-30 25 ≤ t < 30 // 2 17 27.5
30-35 30 ≤ t < 35 /// 3 20 32.5

b) Using the inclusive method of classification;

Time taken (in Tallies Cumulative Class mid Time taken (in
seconds) Frequency seconds)
frequencies point
Class boundaries
5-9 / 1 1 7.5 4.5-9.5
10-14 //// 4 5 12.5 9.5-14.5
15-19 ///// 6 11 17.5 14.5-19.5
20-24 //// 4 15 22.5 19.5-24.4
25-29 // 2 17 27.5 24.5-29.5
30-34 /// 3 20 32.5 29.5-34.5
Note that to ensure continuity, the class limits are adjusted to obtain the true class limits
(class boundaries) as shown earlier in the principles of classification number (iv). This is
indicated in the last column.

Page 6 of 20
Example 1.3
Let the marks of 50 students of a class be:
46 58 54 52 55 59 52 62 65 67
64 63 77 78 92 6 7 12 18 16
3 23 25 25 27 81 88 24 29 22
34 33 30 37 36 42 48 28 22 28
17 13 70 37 32 36 41 40 43 44
We can arrange them as follows;
Marks Frequency Marks Frequency
0 – 10 3 50 – 60 6
10 – 20 5 60 – 70 5
20 – 30 10 70 – 80 3
30 – 40 8 80 – 90 2
40 – 50 7 90 – 100 1
Total 50
Data organized and summarized as in the above frequency distribution is called
grouped data.

Remark:
Consider the following;
Mass (K.g) Number of students
60-62 5
63-65 18
66-68 42
69-71 27
72-74 8
75- 0
66-68 referred to as class interval where 66 is the lower class limit while 68 is the upper
class limit. 75- Is the open class interval.
If the measurementare taken to the nearest Kg then for example 65.5-68.5 is the true
classlimits/ boundaries.
Mid-point between class limits is called class mid mark /midpoint. It’s used for all
mathematical analysis of frequency distribution.
Upper Limit of the class  Lower Limit of the class
𝑴𝒊𝒅 − 𝒑𝒐𝒊𝒏𝒕 𝒐𝒇 𝒂 𝒄𝒍𝒂𝒔𝒔 =
2
Note: Relative Frequencies may also be calculated by dividing the number of cases in
each category by the total number of students (100) and multiplying by 100. For
42
example in the class 66-68, 𝑡𝑕𝑒 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = 100 × 100 = 42. Relative
frequencies are most useful where the class size is different.

Self test Question

Page 7 of 20
The list below shows One-way Commuting Distances (in Km) for 60 workers in Nairobi
city.

13 7 12 6 34 14 47 25 45 2
13 26 10 8 1 14 41 10 3 21
8 13 28 24 16 19 4 7 36 37
20 15 16 15 17 31 17 3 11 46
24 8 40 17 18 12 27 16 4 14
23 9 29 12 2 6 12 18 9 16

• Construct a grouped frequency distribution table and include the cumulative


frequencies and class mid point using:
a) Exclusive method of classification with the class boundaries ending with either 0
or 5.
b) Inclusive method of classification
• Find the class boundaries in (b) to ensure continuity.

1.4.4 Diagrammatic Representation of Data


a) HISTOGRAMS

A histogram consists of a set of adjoining rectangles such that their bases are on x-axis
with centers at class marks and length equals class interval size. The horizontal axis is
labeled with what the data represents (for instance, distance from campus to your hostel). The
vertical axis is labeled either frequency or relative frequency (or percent frequency or
probability). The graph will have the same shape with either label. The histogram can give you
the shape of the data, the center, and the spread of the data.

The relative frequency is equal to the frequency for an observed value of the data divided by the
total number of data values in the sample or population. If:

 f = frequency
 n = total number of data values (or the sum of the individual frequencies), and
 RF = relative frequency,

𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑓
Then 𝑅𝐹 = 𝑡𝑜𝑡𝑎𝑙 =𝑛
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠

The areas of the rectangles are proportional to the class frequencies. If class intervals
have equal sizes the histogram is obtained by plotting the frequencies against the true
class limits (class boundaries) such that the heights of rectangles are proportional to
class frequencies.

Page 8 of 20
But If class intervals are not equal, then plot the frequency density (or relative
frequencies) against the class boundaries as illustrated in Example 1.4 (ii).
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑓)
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 = 𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡 𝑕 (𝑖)

To construct a histogram, first decide how many bars or intervals, also called classes, represent
the data. This usually equal to the number of intervals/ classes in the data set. But Choose a
starting point to be the lower class boundary of a class lower than the first interval in the data
set. For instance if the class intervals were: 10-15, 15-20,….. then the first interval will be 5-10
with a height/frequency zero.

Example 1.4

i) Represent the following data by a histogram.


Marks Frequency Marks Frequency
0 – 10 5 50 – 60 10
10 – 20 11 60 – 70 8
20 – 30 19 70 – 80 6
30 – 40 21 80 – 90 3
40 – 50 16 90 – 100 1
Total 100
The class intervals are of equal size and class boundaries are given since exclusive
method of data classification has been used.

Frequency Histogram for Marks of students


25

20
Frequency of students

15

10
Frequency

0
0-10 10 20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
Marks

Page 9 of 20
ii) Construct a histogram to represent the following data set;

X(Class limits) F Class Relati 𝑖=clas Frequency


boundari ve s size density (fd)
es freque 𝑓𝑑 = 𝑓/𝑖
ncy
15-19 5 14.5-19.5 5 100 5 5 5
20-29 8 19.5-29.5 8 100 10 8 10
30-34 22 29.5-34.5 22 100 5 22 5
35-39 35 34.5-39.5 35 100 5 35 5
40-54 20 39.5-54.5 20 100 15 20 15
55-59 10 54.5-59.5 10 100 5 10 5
The class sizes are unequal and therefore to construct the histogram we use frequency
𝑓
density for each class calculated as 𝑓𝑑 = .
𝑖
Class limits are given hence to obtain class boundaries (true class limits), we adjust the
limits by using the correction factor.

Frequency Histogram
8
7
6
Frequency density

5
4
3
2
1
0
14.5-19.5 29.5 34.5-39.5 49.5 54.5-59.5
class boundaries

Exercise: Suppose the classes were of equal widths, then construct a histogram (DIY)
Class limits 15-19 20-14 25-29 30-34 35-39 40-44

Page 10 of 20
Frequency 1 4 22 35 20 8

b) FREQUENCY POLYGONS
Frequency polygon is a graphical form of representation of data. It is used to depict the
shape of the data and to depict trends. It is usually drawn with the help of a histogram
but can be drawn without it as well. If a histogram is already drawn and the midpoint
of adjacent rectangles joined by straight lines we will obtain frequency polygons
Steps to Draw a Frequency Polygon

 Mark the class intervals for each class on the horizontal axis. We will plot the
frequency on the vertical axis.
 Calculate the classmark for each class interval. The formula for class mark is:

𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 + 𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡


𝐶𝑙𝑎𝑠𝑠𝑚𝑎𝑟𝑘 =
2

 Mark all the class marks on the horizontal axis. It is also known as the mid-value
of every class.
 Corresponding to each class mark, plot the frequency as given to you. The height
always depicts the frequency. Make sure that the frequency is plotted against the
class mark and not the upper or lower limit of any class.
 Join all the plotted points using a line segment. The curve obtained will be
kinked.
 This resulting curve is called the frequency polygon.

N/B; It can be drawn without rectangles


Example
i) Plot the frequency polygon of the marks of students given in a) above.
Solution
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 80 – 90 90 – 100
Frequency 0 5 11 19 21 16 10 8 6 3 1 0

Page 11 of 20
Midpoint 0 5 15 25 35 45 55 65 75 85 95 105

Note:
It is customary to add the extensions PQ and RS to the next lower and next higher
midpoints which have corresponding class frequencies of zero.
ii) Plot the frequency polygon given the following data set

Class limits 15-19 20-24 25-29 30-34 35-39 40-44


Frequency 1 4 22 35 20 8
Class mid-point 17 22 27 32 37 42
Solution

Frequency Polygon

40
35 35
30
Frequency

25
20 22 20
15
10
8
5 4
0 0 1 0
12 17 22 27 32 37 42 47
Class mid-points

c) BAR GRAPH (CHART)

Page 12 of 20
The height of the bar is proportional to the frequency of the variate but the thickness of
the bar is insignificant. A bar chart comprises a number of spaced rectangles and thus
do not suggest continuity and which generally have their major axes vertical. They can
be used to represent a large variety of statistical data. The bar chart is appropriate for
displaying discrete data with only a few categories.

Example 1.5

a) The following table gives the birth rate per thousand of different countries over a
certain period of time.
Country Kenya India china Uganda U.K. Sweden
Birth rate 30 33 40 29 20 15
Represent the above data by a suitable diagram.
Solution
The appropriate diagram for this data is a simple Bar diagram

Sweden
Birth rate per thousand of different 15

countries
50
40
30
Rate
Birth

20
10
0
Kenya India China U.K. Uganda Sweden

Country
Comparing the size of the bars, you can easily see that China has the highest birth rate
while Sweden has the lowest.
b) Consider data relating to the number of patients diagnosed with Bacterial
menegities in a hospital each year.
Year 2001 2002 2003 2004 2005
No.of patients 141 225 205 108 192
This data can be represented by the bar chart as shown below.

Page 13 of 20
The number of patients diagnosed with Bacterial menegities in a hospital during the
period 2001 – 2005.

300
Number of
patients

200

100

0
2001 2002 2003 2004 2005
Year

Notice that it is now easy to see that variations in the number of cases over this period
of time.

Multiple barcharts
Bar charts often prove most useful if we have two (or more) sets of comparable data,
and wish to compare and contrast them.
Example 1.6
Suppose that apart from the data relating to the number of patients diagnosed with
Bacterial menegities in a hospital each year, we also have the corresponding numbers
for Malaria cases.
Year 2001 2002 2003 2004 2005
Number of patients(Menegities 141 225 205 108 192
Number of patients(Malaria) 321 251 123 547 148

Component bar charts (Sub-divided Bar diagrams)


In this type of bar chart each bar is subdivided into two or more components.
Example 1.7
Suppose further that the data in the example above is grouped according to sex as
follows:
Year 2001 2002 2003 2004 2005
Number of Male patients 100 125 90 20 102
Number of Female patients 41 100 115 88 90
Total Patients 141 225 205 108 192

Page 14 of 20
This data can be represented in a component bar chart as shown in the figure below.

250

200
patients

150
Males Female
100

50

0
2001 2002 2003 2004 2005
year

Looking at this presentation, it is possible to discern two main features; firstly, we can
see how the menegities cases vary from year to year and secondly we can get a good
idea of the make up of this total in terms of proportions of patients who are male or
female.
Pie-Charts

A pie chart presents data in the form of a circle. The slices represent absolute or relative
proportions. A pie chart is formed by making of a portion of the pie corresponding to
each characteristic being displayed.

Example 1.8

A researcher studying the distribution of manufacturing costs in ABC Ltd found that
20% of the firm’s unit cost is due to labour, 40% raw materials, 25% maintenance costs
and 15% debt servicing. Present this information in a pie chart.

Fig 2: A pie chart representing the distribution of ABC Ltd per unit manufacturing cost
during the year.

Page 15 of 20
1.4.5 Graphical Presentation
a) Consider for example the sales data for some company over a period of six years as
shown in the table below;

Year 2000 20001 2002 2003 2004


Sales (Ksh) 420,000 370,000 360,000 380,000 540,000

600,000

500,000

400,000

300,000

200,000

100,000

0
2000 20001 2002 2003 2004

This original data can be presented in a graphical form as follows;

b) Cumulative Frequency Curve (Ogive Curve)


i) “Less Than” Ogive Curve
The cumulative frequency curve is obtained by first plotting the points with the
upper class boundaries of class interval in the X-axis and their corresponding
cumulative frequencies. The points are joined by means of a freehand smooth
curve. The cumulative frequency curve is specifically called “Less than” Ogive
curve.
Example 1.9
Plot the “Less than” ogive curve of the marks of students given in example 2 above.
Solution
Marks Frequency Cumulative frequency Upper class boundary
0 – 10 5 5 10
10 – 20 11 16 20
20 – 30 19 35 30
30 – 40 21 56 40
40 – 50 16 72 50
50 – 60 10 82 60
60 – 70 8 90 70
70 – 80 6 96 80
80 – 90 3 99 90
90 – 100 1 100 100
From the graph there are “y” students who scored less than “x” marks.

Page 16 of 20
“More Than” Ogive Curve
If we plot the “more than” cumulative frequencies against the corresponding lower
class boundaries and join the points by a smooth curve we get a “more than” ogive
curve.
Example 1.10
Plot the “More than” ogive curve of the marks of students given in example 2 above.
Solution
Marks Frequency More than cumulative frequency Lower class boundary
0 – 10 5 100 0
10 – 20 11 95 10
20 – 30 19 84 20
30 – 40 21 65 30
40 – 50 16 44 40
50 – 60 10 28 50
60 – 70 8 18 60
70 – 80 6 10 70
80 – 90 3 4 80
90 – 100 1 1 90
0 100

From the graph there are “y” students who scored more than “x” marks.
The value of x at the intersection of the two graphs is the median value.
This is a graph of v upper class boundaries and cumulative frequencies.

𝑐𝑓

Exercise 1.2
1. Consider the following data:
32, 46, 25, 57, 39, 45, 55, 42, 20, 36,
58, 12, 38, 34, 22, 40, 33, 64, 43, 46,

Page 17 of 20
31, 40, 52, 29, 14, 57, 66, 36, 32, 48,
46, 42, 47, 54, 65, 44, 35, 19, 54, 25,
23, 33, 38, 45, 32, 38, 41, 42, 58, 43.
Arrange the data in a frequency distribution with the first class interval 10 – 19
2. The highway patrol set up a radar checkpoint and recorded the speed in miles per
hour of a random sample of 50 cars that passed the checkpoint in one hour. The
speed of the cars was recorded as follows;
74 66 65 55 48 56 50 75 75 67
76 68 50 65 60 65 60 68 68 76
68 77 63 65 52 52 63 80 80 70
65 81 70 63 45 45 65 71 71 64
55 70 64 45 64 64 40 55 55 71
Make a frequency distribution table using 5 as the class width.
3. Given the data below:
3.0 3.4 4.1 4.1 4.3 2.7 3.5 3.7 3.4 3.4
3.8 4.2 3.1 3.9 3.1 4.1 2.8 3.7 4.4 3.5
3.5 3.4 3.7 3.7 2.8 4.3 3.8 3.4 4.1 3.0
4.4 4.1 4.1 3.6 3.4 2.7 3.6 3.0 3.4 4.3
3.8 3.2 4.2 3.9 4.2 3.4 2.9 4.4 3.5 3.9
Form a frequency distribution using the classes 2.7-2.9, 3.0-3.2, 3.3-3.5,……
4. Using Sturges’ rule, K = 1+ 3.322 log10 N,Where K = no. of class-intervals, N = total
no. of observations; classify, in equal intervals, the following hours worked by 20
workers in a factory for one month.
155, 120, 50, 110, 116, 95, 125, 42, 175, 130, 160, 90, 68, 71, 135, 147, 115, 108, 140, 98.
Find the percentage frequency in each class-interval.
5. Represent the following data by a histogram.
Marks Frequency Marks Frequency
0 – 10 5 50 – 60 10
10 – 20 11 60 – 70 8
20 – 30 19 70 – 80 6
30 – 40 21 80 – 90 3
40 – 50 16 90 – 100 1
Total 100
6. Using the data classified in questions 1, 2 and 3, draw:
a) A Histogram
b) A Frequency polygon
c) “less than” and “more than” Ogive curves.
7. A nutritionist is interested in knowing the percent of calories from fat
whichKenyans intake on a daily basis. To study this, the nutritionist randomly
selects 25 Kenyans and evaluates the percent of calories from fat consumed in a
typical day. The results of the study are as follows

Page 18 of 20
34% 18% 33% 25% 30%
42% 40% 33% 39% 40%
45% 35% 45% 25% 27%
23% 32% 33% 47% 23%
27% 32% 30% 28% 36%
Construct a frequency distribution and the corresponding histogram.
8. In Kenya, approximately 45% of the population has blood type O; 40% type A; 11%
type B; and 4% type AB. Illustrate this distribution of blood types with a pie chart.
9. In the academic years 1982 to 1985, the number of students in College ABC were as
follows;
Year Science Arts Law
1982-83 1000 1500 200
1983-84 1600 2000 350
1984-85 2100 4000 420
Represent the data by an appropriate diagram. (Component bar chart)
10. The table below gives data relating to the Kenyan exports and imports (in millions
of ofKsh) during the four years ending 1999-2004
Year Export Import
1999-2000 160000 200000
2000-2001 170000 300000
2001-2002 180000 350000
2002-2003 200000 300000
2003-2004 200000 380000
Source: KNBS
Represent this information using a suitable diagram. (multiple bar chart)
11. The following table shows the Kenyan population age structure as per the 2009
census
Age %of total population male female
0-14 40.02 9557274 9497870
15-24 19.15 4552448 4567894
25-54 33.91 8170264 7976751
55-64 3.92 856092 1009075
65 years and above 3 614751 813320
Source CIA World Factbook 2017
How best would you represent this data diagrammatically?
12. The following data represents the maximum temperatures in degrees centigrade
predicted for some 55 major cities on the 24th September 1993.
17 25 21 18 14 15 24 22 15 21 25
17 25 15 18 17 29 16 24 39 30 23
23 27 43 28 29 15 15 19 32 30 32
23 13 18 13 27 32 17 17 25 25 30
20 18 17 33 28 27 26 32 32 33 19

Page 19 of 20
a) Construct a frequency distribution table for theses temperatures starting with
the classes: 11-17, 18-24,…….
Solution
Temperature (oC) Frequency
11 – 17 15
18 - 24 15
25 - 31 16
32 - 38 7
39 - 45 2
b) Represent the data using a histogram, a frequency polygon and an ogive curve
c) Using the appropriate diagrammatic/graph representation of the data, estimate:
i) The modal temperature
ii) The median temperature
iii) The lower and upper class boundaries of the temperature range within which
the middle 50% of all cities lie.
iv) The minimum and maximum temperature of the middle 80% of the cities.
v) On this particular day, a researcher was collecting some data and required
data from cities whose temperatures were above 29.50 𝐶. How many of these
cities did he include in his study?

Page 20 of 20

You might also like