Chapter 2
METHODS OF DATA COLLECTION
AND PRESENTATION
1-2
2.1 Method of data collection
Data:- is a measurement or observation value
recorded for a certain element or variable.
Two types of data: Primary and Secondary
1. Primary (raw) data: Data collected by the
investigator himself for specific the purpose of study.
Primary data collection methods: includes Telephone
interview, personal interview, mailed questionnaire
etc.
Example:- asking the CGPA of the students in class.
2. Secondary data: When an investigator uses data,
which have already been collected by others, such
data are called secondary data. . Example of
secondary data: books, reports, magazines, etc.
Secondary data collection methods: obtained from
registrar, hospital, office, reports, magazines, etc.
1-3
2.2 Method of data presentation
1. Tabular presentation
2. Diagrammatic and Graphic presentation.
2.2.1 Tabulation presentation(FD)
Frequency: - is the number of times a certain value or
class of values occurs.
Frequency distribution (FD):- is the organization of raw
data in the table form using classes and frequency.
There are three types of FD and there are specific
procedures for constructing each type.
The three types are:- I Categorical FD,
II. Ungrouped FD and III. Grouped FD
1-4
I. Categorical FD: Used for data that can be placed in specific
categories; such as nominal, ordinal level of data.
Example 2.1: Twenty five patients were given a blood test to
determine their blood type. The data is as shown below: A B B
AB O O O B AB B B B O A O O O AB AB A O O B A.
Solution: since the data are categorical by taking the four blood
types as classes we can construct a FD as shown below.
Step 1: Make a table which contains class, tally, frequency and
percent.
Step 2: Tally data and place the result under the column Tally.
Step 3: Count the tallies and place the result under the column
Frequency.
Step 4: find the percentage of values in each class by the
formula (%= f/n * 100%; f= frequency, n total number of
observation.)
1-5
Definitions of some terms
1. Frequency: is the number of values in
a specific class of the distribution.
2. Frequency distribution: is the
organization of a data in table form
using classes and frequencies.
•There are three basic types of frequency
distributions
Categorical frequency distribution
Ungrouped frequency distribution
Grouped frequency distribution
1-6
1. Categorical frequency distributions:
It uses for data that can be place in specific categories such as
nominal or ordinal.
e.g. sex type, marital status, & etc.
Example: a social worker collected the following data
on marital status for 25 persons. (M=married,
S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
How can U construct this data as ungrouped
frequency distribution?
1-7
solution
Class Frequency Percent
M 6 24
S 7 28
D 7 28
W 5 24
1-8
Break time … Enjoy
Q#1. I am a number and multiplied with 3
and added with 10 and divided by 2, and
the result equals to quarter of the
number. Then, who am I?
1-9
ANSWER
Let the number is k, then
(K*3 +10)/ 2 = k*1/4
6k +20 = k
5k = -20
K = -4
1-10
2. Ungrouped frequency distributions
•It is often constructed for small set or data on discrete variable.
•Steps in constructing ungrouped frequency distribution:
First find the smallest and largest raw score in the collected data.
Arrange the data in order of magnitude and count the frequency.
To facilitate counting one may include a column of tallies.
1-11
Example: The following data is the
number of cars in a sample of 30
different parking places.
4 2 4 3 2 8 3 4 4 2 2 8
5 3 4
4 5 4 3 5 2 7 3 3 7 7 3
8 4 5
How do U construct this data as
ungrouped frequency distribution?
1-12
Solution
The frequency distribution of the
number of cars
Number Frequenc %
of cars y
2 5 17
3 7 23
4 8 27
5 4 13
7 3 10
8 3 10
Total 30 100
1-13
3. Grouped Frequency Distribution (GFD)
• It is a frequency distribution when several numbers are
grouped in one class.
Definitions for some terms to construct a GFD
1. Class Limit (CL): serve to identify the
classes of a frequency There are lower and
the upper class limits.
There are Lower CL & Upper CL
Eg. (4,6), (7,9),(10,12), ... (22,24)
4, 7,10,…, & 22 are the LCLs.
6,9,12,…, & 24 are the UCLs.
1-14
2. Units of measurement (U): the distance between two
consecutive classes of the LCL & UCL.
It is usually taken as 1, 0.1, 0.01, 0.001, ....
3. Class Boundaries (CB):
It separates one class in a grouped frequency distribution
from another. The boundaries have one more decimal
places than the row data and therefore do not appear in the
data.
There is no gap between the upper boundary of one class
and lower boundary of the next class.
There are LCB & UCB.
LCB=LCL-U/2 & UCB=LCL+U/2.
Eg. (3.5,6.5), (6.5,9.5), (9.5,11.5), …, & (21.5,24.5).
3.5, 6.5, 9.5,…, & 21.5 are the LCBs.
1-15
4. Class width (W): the difference between the
upper and lower class boundaries of any class. It is also
the difference between the lower limits of any two
consecutive classes or the difference between any two
consecutive class marks. It has a unique value for all
classes.
W= UCB1 -LCB2 = LCL2 –LCL1= M2 –M1 or…
Eg. W=3 for the above data.
5. Class Mark (M): it divides the class into two equal
parts.
M= (LCL+UCL)/2 = (LCB +UCB)/2.
Eg. 5,8,11,…, & 23 are the M of the data.
1-16
6. Relative frequency (rf): it is the frequency divided
by the total frequency. rf= fi/n.
7. More than Cumulative frequency (mcf): it is the
total frequency of all values ≥ the lower class boundary
of a given class.
The sum of all freq values for ≥ LCB of that class.
8. Less than cumulative frequency (lcf): it is the
total frequency of all values ≤ the upper class boundary
of a given class.
The sum of all freq values for ≤ UCB of that class.
1-17
Steps for constructing a GFD:-
Step 1: find range (R)= Max value – Min
value
Step 2: Find the number of classes (K) using
Sturges’s formula, K = 1 + 3.322 log (n).
Round up the value of K. Its range is 5 ≤ K ≤
15.
Step 3: find the class width (W): W= Range/K
Round up the W value.
Step 4: take z min value as z 1st class LCL.
Step 5: find all z next LCLs by adding W to their
previous LCLs. Eg. LCL2 = LCL1 + W, ….
Step 6: find all UCLs:
UCL1 = LCL2 + U. but z UCL3 = UCL2 + W, …
Step 7: count and assign frequencies to the
classes
1-18
Example : The following data are on the
number of minutes to travel from home to
work for a group of automobile 25 workers.
28 25 48 37 41 19 32 26 16 23 23
29 36
31 26 21 32 25 31 43 35 42 38 33 28
Construct a GFD for this data.
Solution:
Range = 49 – 16 = 32
K = 1 + 3.322 log(25) = 5.64 = 6
W = R/K = 32/6 = 5.33 = 6
1-19
The final frequency distribution is:
Time (in Number of
Minutes) workers
16-21 3
22-27 6
28-33 8
34-39 4
40-45 3
46-51 1
Total 25
1-20
The relative frequency distribution is:
Time (in Relative
Minutes) frequency
16-21 0.12
22-27 0.24
28-33 0.32
34-39 0.16
40-45 0.12
46-51 0.04
Total 1
1-21
Less than cumulative frequency
distribution :
Time (in Less than
Minutes) cumulative
frequency
Less than 21.5 3
Less than 27.5 9
Less than 33.5 17
Less than 39.5 21
Less than 45.5 24
Less than 51.5 25
1-22
More than cumulative frequency
distribution:
Time (in More than
Minutes) cumulative
frequency
More than 15.5 25
More than 21.5 22
More than 27.5 16
More than 33.5 8
More than 39.5 4
More than 45.5 1
More than 51.5 0
1-23
Then, the complete GFD is as follow
Time CB Freq. CM LCF MCF
(CL)
16-21 15.5-21.5 3 18.5 3 25
22-27 21.5-27.5 6 24.5 9 22
28-33 27.5-33.5 8 30.5 17 16
34-39 33.5-39.5 4 36.5 21 8
40-45 39.5-45.5 3 42.5 24 4
46-51 45.5-51.5 1 48.5 25 1
2.2.2 Diagrammatic and Graphic
Presentation.
1-24
2.2.2.1 Diagrammatic Presentation
it uses for discrete as well as qualitative
data types.
And, it has 3 different types.
A. Pie charts B. Bar charts C.
Pictogram
A. Pie charts: a circle divided into sectors to
present the portion of the data set
1-25
Steps to construct pie charts
Step 1: Calculate percentage frequency of each
component
(fi/n)*100
Step 2: Calculate the degree measures of each
sector
(fi/n)*360.
Step 3: Draw the circle using protractor and
compass.
1-26
Example: Draw a pie-chart to represent
the following data on a certain family
expenditure.
Item Food Clothing House Fuel & Miscellaneous Total
rent light
Expenditur 50 30 20 15 35 150
e
% freq. 33.33 20 13.33 10 23.33 100
Angles 1200 720 480 360 840 3600
1-27
solution
Figure: Pie-chart of the data on family
expenditure.
Item
Food
Clothing
House rent
Fuel and light
Miscellaneous
- 1-28
Pie chart :
Class Frequency Percent Degree
Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys 1500 15 54
Total 10000 100 360
1-29
1-30
B. Bar Charts
A set of bars (thick lines or narrow
rectangles) representing some magnitude
over time space.
They are useful for comparing aggregate
over time space.
Bars can be drawn either vertically or
horizontally.
There are different types of bar charts. The
most common being :-
Simple bar chart
Component or sub divided bar chart
Multiple bar charts
1-31
I. Simple bar charts
they use to display data on one variable.
They are thick lines (narrow rectangles)
having the same breadth.
The magnitude of a quantity is
represented by the height /length of the
bar.
Example: Draw a bar chart for the following
coffee production
Year 1990 1991 data
1992 from
19931990 to 1995.
1994 1995
Amount (in 50 75 92 64 100 120
1000
tones)
1-32
Solution
Figure: Production of coffee from 1990 to 1995
Amount of coffee in 1000 tons
120
100
80
60
40
20
0
1990 1991 1992 1993 1994 1995
Production year
1-33
II. Multiple bar charts
These are used to display data on more
than one variable.
They are used for comparing different
variables at the same time.
Example : Draw a multiple bar chart for the
data on production of coffee (in 1000 tons)
from 1991 to 1993 by region.
Region Production year
1991 1992 1993
Region 80 85 90
A
Region 120 165 120
B
1-34
Answer
Figure: Production of coffee from 1991 to 1993
Amount of coffee in 1000 tons
in two regions.
200 Region
A
B
150
100
50
0
1991 1992 1993
Production year
1-35
III. Component bar charts
They use to show how a total (aggregate)
frequency is divided into its component
parts.
The bars represent total value of a
variable with each total broken in to its
component parts and different colors
(designs) are used for identifications.
1-36
Example : Draw a component bar chart for
the data on production of coffee (in 1000
tons) from 1991 to 1993.
1-37
ANSWER
Amount of coffee in 1000 tons
250 Region
A
B
200
150
100
50
0
1991 1992 1993
Production year
1-38
C. Pictogram: is a method to represent a
data by using means of pictures or
small symbols.
Example 2.23: The following table shows
the orange production in a plantation
from production year 1990-1993.
Represent the data by a pictogram.
Production year 1990 1991 1992 1993
Amount (in kg) 3000 3850 3500 5000
1-39
answer
Figure: Pictogram of the data on Orange
productions from 1990 to 1993
1-40
Break time: IQ Test (find number in ?)
1-41
Answer
9 + 8 = 17
1-42
2.2.2.2 Graphic Presentation for
Continuous Data
Histogram, Frequency Polygon and
Cumulative Frequency Graph (Ogive) are
most commonly applied graphical
representations for continuous data.
A. Histogram:
Its bases are marked by class boundaries
and its heights are proportional to the
frequencies.
Example: Construct a histogram for the
frequency distribution of the time spent by
the automobile workers.
1-43
The complete GFD is as follow:
Time CB Freq. CM LCF MCF
(CL)
16-21 15.5-21.5 3 18.5 3 25
22-27 21.5-27.5 6 24.5 9 22
28-33 27.5-33.5 8 30.5 17 16
34-39 33.5-39.5 4 36.5 21 8
40-45 39.5-45.5 3 42.5 24 4
46-51 45.5-51.5 1 48.5 25 1
1-44
answer
Figure: The time in minutes spent by
automobile workers to travel from home to
work.
1-45
B. Frequency polygon
It uses class marks against class
frequencies and joining them by a set of
line segments.
Add two classes with zero frequencies at
the two ends of the frequency
distribution.
Example: Construct a frequency polygon
for the frequency distribution of the time
spent by the automobile workers.
1-46
answer
Figure: The time in minutes spent by
automobile workers to travel from home to
work.
1-47
C. Ogive (Cumulative frequency
graph)
Plotting the cumulative frequencies of a
distribution against the boundaries.
There are more than and less than
Ogives.
Example: Construct an Ogive for the
time spent by the automobile workers.
1-48
The frequency distribution is:
Class LCF MCF
boundaries
15.5 0 25
21.5 3 22
27.5 9 16
33.5 17 8
39.5 21 4
45.5 24 1
51.5 25 0
1-49
solution
Figure: The time in minutes spent by
automobile workers to travel from home to
work
1-50
End of chapter 2!