Stat CHSP 2
Stat CHSP 2
After having the collected and edited the data, the next important step is to organize it. That is to
present it in a readily comprehensible condensed form that aids in order to draw inferences from
it. It is also necessary that the like be separated from the unlike ones.
Tabular presentation
Diagrammatic and Graphic presentation.
Classification is a preliminary and it prepares the ground for proper presentation of data.
Definitions:
Raw data: is a data which is collected in original form (survey), whether it may be
counts or measurements.
Frequency: is the number of values in a specific class of the distribution.
Frequency distribution: is the organization of raw data in table form using classes and
frequencies.
Example: A frequency distribution presenting the number of males and females in a class
Sex Frequency
Male 57
Female 39
NB: The main purpose of grouping is now summarization and condensation of the masses of data.
Used for data that can be place in specific categories such as nominal or ordinal data.
Example: A social worker collected the following data on marital status for 25
persons. (M=married, S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status
M, S, D, and W. These types will be used as class for the distribution. We follow procedure to
construct the frequency distribution.
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
f
%= ∗100
n Where f= frequency of the class, n=total number of value.
Percentages are not normally a part of frequency distribution but they can be added since they are
used in certain types diagrammatic such as pie charts.
Combing the entire steps one can construct the following frequency distribution.
S //// // 7 28 2)
D //// // 7 28
3)
W //// 5 20
2) Ungrouped frequency Distribution:
Is a table of all the potential raw score values that could possible occur in the data along with
the number of times each actually occurred.
Example:
The following data represent the mark of 20 students.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.
3) Grouped frequency Distribution:
When the range of the data is large, the data must be grouped in to classes that are more than
one unit in width.
Definitions:
Class mark (Mid points): it is the average of the lower and upper class limits or the average of
UCBi + LCB i
M=
upper and lower class boundary. i.e. 2
1. First arrange the data in ascending order and determine the unit of measurement, U
2. Find the largest and smallest values , then Compute the Range(R) = Maximum -
Minimum
3. Select the number of classes interval (K) desired, usually between 5 and 20 or use Sturges
k=1+3 .32 log n
formula: where k is number of classes desired and n is total number of
observation.
Pick a suitable starting point less than or equal to the minimum value. The starting point
is called the lower limit of the first class. Continue to add the class width to this lower
limit to get the rest of the lower limits.
Lower class limit (LCL): The LCL of the first class interval should be equal to or
smaller than the smallest observation in the data.
i.e. lcl1 ≤ the smallest observation
Continue to add the class width to this lower limit to get the rest of the
lower limits. i.e. lcli +1=lcl i+ w , i=1 ,2 , … , k−1 .
Upper class limit (UCL): To find the upper class limit of the first class, subtract
ufrom the lower limit of the second class. i .e .ucl1=lcl 2−u .
Then continue to add the class width to this upper limit to get the rest of
the upper class limits. i.e. ucli+1 =ucli+ w , i=1 ,2 , … , k −1.
6. Find the class boundary: are the set of exact limits or true limits. They are called lower
and upper class boundaries.
Lower class boundary (LCB): The Lcb is obtained by subtracting half the unit of
measurements from the lcl of the class. i.e.
u
lcbi=lcl i− Note :lcbi+ 1=lcbi+ w
2
Upper class boundary (UCB): The Ucb is obtained by adding half the unit of
measurements from the ucl of the class. i.e.
u
ucbi =ucli+ Note :ucb i+1=ucb i+ w
2
7. Class marks (mid points) (m): It is the average of Lcl and Ucl or Lcb and Ucb.
lcli +ucli lcbi+ucb i
m i= ∨mi= Note :mi +1=mi+ w
2 2
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may
not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies
Find the upper class limits; e.g. the first upper class limit i .e .
ucl1=13−u=13−1=12.
UCL=12, 19, 26, 33, 40 are the upper class limits.
So combininglcl∧ucl, one can construct the following classes.
Class limits
6 – 12
13 – 19
20 – 26
27 – 33
34 – 40
Importance:
Pie charts
pictogram
Bar charts
1. Pie chart
A pie chart is a circle that is divided in to sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector of a class is obtained by
multiplying the ratio of the frequency of the class to the total frequency by 3600.
frequency of the class
i.e. sec tor angle of a class= ×3600
total frequency
Note that: pie-charts are usually used for depicting nominal level data.
Example: Draw the pie chart for the following hospital data. First construct a table providing the
central angles.
Example: Draw a pictorial diagram to present the following data (number of students in a certain
school for four years.)
199
5
199 Key: = 1000 students
4
199
3
199
2
3. Bar Charts:
Bar-diagrams are usually used to represent one way or simple frequency distribution.
Bar-diagrams can be drawn either horizontally or vertically. Usually horizontal bar-
diagrams are used for qualitatively classified data whereas vertical bar-diagrams are used
for quantitatively classified data.
1. Simple bar-diagrams
Simple bar-diagrams are used to depict data of single variable or one-way variable.
Example: The following frequency distribution shows sales of production (in million birr) of
three products for 2004 production year.
Product Sale (in
million)
A 14
B 21
C 9
D 17
The bar-diagram presentation for these data is given below.
2. Component bar-diagrams
When it is desired to show how a total (an aggregate) is divided into component parts, we use
component bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a
variable with each aggregate broken into its component parts and different colors or designs are
used for identification.
Example: The data given in the above example can be presented by using multiple bar-diagram
as below.
Three common graphic presentations of data: histogram, frequency polygon, and cumulative frequency
polygon (ogive).
Procedures for constructing statistical graphs:
Draw and label the X and Y axes.
Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y axes.
Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the X axes.
Plot the points.
Draw the bars or lines to connect the points.
a) Histogram
It presents a grouped frequency distribution of a continuous type. It is drawn by making class
boundaries in the x-axis and frequencies in the y-axis.
Example: Draw a histogram for the following grouped age data.
Class limit Class Mid point Frequency
boundaries
15-19 14.5-19.5 17 2
20-24 19.5-24.5 22 8
25-29 24.5-29.5 27 6
30-34 29.5-34.5 32 12
35-39 34.5-39.5 37 7
40-44 39.5-44.5 42 6
45-49 44.5-49.5 47 4
50-54 49.5-54.5 52 3
55-59 54.5-59.5 57 1
60-64 59.5-64.5 62 1
b. Frequency Polygon
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the
vertical axis and their respective class marks along the horizontal axis. Then join the cross points
by a free hand curve.
Example: Present the data in the previous example using a frequency polygon.
Cumulative frequency polygon can be traced on less than or more than cumulative frequency
basis. Place the class boundaries along the horizontal axis and the corresponding cumulative
frequencies (either less than or more than cumulative frequencies) along the vertical axis. Then
join the cross points by a free hand curve.
Example: the data in the previous example can be presented using either a less than or a more
than cumulative frequency polygon as given below (i) and (ii) respectively.
(i) Less than type cumulative frequency polygon