Presentation of Data
Presentation of Data
Overview
This chapter introduces a set of basic procedures and statistical measures for describing data.
Data generally consist of an extensive number of measurements or observations that are to0
numerous or complicated to be understood through simple observation. Therefore, this chapter
introduces several techniques including the construction of tables, graphical displays, and basic
statistical computations that provide ways to condense and organize information into a set of
descriptive measures and visual devices that enhance the understanding of complex data.
Data can be presented in the following way:
(1) Textual presentation: In textual presentation, data is presented along with the text.
For comprehension one has to read through and scan the whole text in order to
grasp its meaning and implications. This is a serious disadvantage, though certain
points can be emphasized in such a presentation. It is often necessary and
advantageous to represent such data in either of the two other forms of
representation.
(2) Tabular presentation: In a tabular representation, data are arranged in a
systematic way in rows and columns. Huge and unwieldy raw data can be neatly
condensed in a table, by classifying data accordingto suitable groups of classes.
There should be a title at the head of the table, and the title should indicate the
contents of the table.
(3) Graphic or diagrammatic presentation: Graphic and Unlver Agrieukal
diagrammatic
representations are useful ways or devices of presenting data for quick afsd'i'Teady
comprehension.
Classification of Data H.
Classification is the process of arranging data into groups or classes. It is the first step in
tabulation.
1
Frequency Distribution
The most important method of organizing and summarizing statistical data is the construction
of frequency distribution table. In this method, classification is done according to quantitative
magnitude. The items are classified into groups or classes according to their increasing order
in terms of magnitude and the number of items falling into each group, known as the frequency
of that group or class, is determined and indicated. This type of arrangement stresses the
manner in which the frequencies are distributed over the classes and hence the name frequency
distribution.
Class interval: The interval defining a class is known as a class interval. For example 145
146, 147-148... etc, are class intervals. When both the end numbers of classes are considered
in a class then the class interval is inclusive. On the other hand, if any one of the end numbers
is not considered in a class, the class interval is exclusive. Usually, larger end numbers are
excluded. 10-20, 20-30, 30-40 etc.
Class limits: The end numbers of a inclusive class interval are known as class limits; the
Smaller number 145is the lower class limit and the larger number 146 is theupper class limit
of the class 145-146, if both 145 and 146 are considered in the class.
Class boundaries: The end numbers of an exclusive class interval are knows as class
boundaries. For example, if we consider a number between 144.5 and 146.5 (greater or equal
to 144.5 but less then 146.5) as falls in that class and the class is represented as 144.5-146.5,
the end numbers are called class boundaries, the smaller number 144.5 is knows as lower class
boundary and the larger number 146.5 as upper class boundary.
Interval size / Class width: The difference between the upper and lower class boundaries is
known as the width of the class. The common width is denoted by C. It is to be noted that in
certain cases, it may not be possible to have the same width for all the classes (especially for
the terminal/ end classes).
2
Those who need more specific guidance in the
matter of
employ may use a formula given by Sturges. This deciding how many class intervals to
where k stands for the number of class intervals and formula gives k = 1+3.322 xlog10 n
nis the number of values in the data set
under consideration. The answer obtained by applying Sturges 's
rule should not be regarded
as final, but should be considered as a guide only. The
number of class intervals specified by
the rule should be increased or decreased for
convenience and clear presentation.
Suppose, for example, that we havea sample of 275 observations that we want to group. The
logarithm to the base 10 of 275 is 2.4393. Applying Sturges's formula gives K= 1l+
3.322 X2.4393 = 9. In practice, other considerations might cause us to use eight or fewer or
perhaps 10 or mnore class intervals.
Another question that must be decided regards the width of the class intervals. Class intervals
generally should be of the same width, although this is sometimes impossible to accomplish.
This width may be determined by dividing the range by k, the number of class intervals.
Symbolically, the class interval width is given by W= K where R (the range) is the difference
between the smallest and the largest observation in the data set. As a rule this procedure yields
a width that is inconvenient for use. Again, we may exercise our good judgment and select a
width (usually close toone given by Equation 2.3.1) that is more convenient.
There are other rules of thumb that are helpful in setting up useful class intervals. When the
nature ofthedata makes them appropriate, class interval widths of 5units, 10 units, and widths
that are multiples of 10 tend to make the summarization more comprehensible. When these
widths are employed it is generally good practice to have the lower limit of each interval end
in azero or5. Usually class intervals are ordered from smallest to largest; that is, the first class
interval contains the smaller measurements and the last class interval contains the larger
measurements. When this is the case, the lower limit of the first class interval should be equal
to or smaller than the smallest measurement in the data set, and the upper limit of the last class
interval should be equal to or greater than the largest measurement.
We wish to know how many class intervals to have in the frequency distribution of the
data. We also want to know how wide the intervals should be.
Solution: To get an idea as to the number of class intervals to use, we can apply
Sturges's rule to obtain
k= 1+3.322(log 189)
=1+3.322(2.2764618)
Now let us divide the range by 9 to get some idea about the class
interval width. We have
||
R 82- 30 52
= 5.778
k 9
30-39
40-49
50-59
60-69
70-79
80-89
It is sometimes useful to refer to the center, called the midpoint, of a class interval. The
midpoint of a class interval is determined by obtaining the sum of the upper and lower limits
of the class interval and dividing by 2Thus, for example, the midpoint of the class interval 30
39 is found to be (30 + 39))2 = 34.5.
When we group data manually, determining the number of values falling into each class
interval is merely a matter of looking at the ordered array and counting the number of
observations falling in the various intervals. When we do this for our example, have Table
2.3.1.
Atable such as Table 2.3.1 is called a frequency distribution. This table shows the way in
which the values of the variable ae distributed among the specified class intervals. By
4
consulting it, we can determine the frequency of
class intervals shown. occurrence of values within any one of the
Class Interval
Frequency
30-39 11
40-49 46
50-59 70
60-69 45
70-79 16
80-89 1
Total 189
In determining the frequency of values falling within two or more class intervals, we obtain the
sum of the number of values falling within the class intervals of interest. Similarly, if we want
to know the relative frequency of occurrence of values falling within tWo or more class
intervals, we add the respective relative frequencies. We nmay sum, or cumulate, the
frequencies and relative frequencies to facilitate obtaining information regarding the frequency
shows
or relative frequency of values within two or more contiguous class intervals. Table 2.3.2
the data of Table 2.3.1 along with the cumulative frequencies, the relative frequencies, and
cumulative relative frequencies.
TABLE 2.3.2 Frequency, Cumulative Frequency, Relative Frequency, and
Described
Cumulative Relative Frequency Distributions of the Ages of Subjects
in Example 1.4.1
Cumulative
Cumulative Relative Relative
Class
Frequency Frequency Frequency
Interval Frequency
11 .0582 .0582
30-39 11
.2434 .3016
40-49 46 57
.3704 .6720
50-59 70 127
.2381 .9101
60-69 45 172
.0847 .9948
70-79 16 188
.0053 1.0001
80-89 1 189
Total 1.0001
189
5
Suppose that we are interested in the relative frequency of values between 50 and 79. We use
the cumulative relative frequency column of Table 2.3.2 and subtract .3016 from .9948,
obtaining .6932.
We may use a statistical package to obtain a table similar to that shown in Table 2.3.2. Tables
obtained from both MINITAB and SPSS software are shown in Figure 2.3.1.
The Histogram
Wemay display a frequency distribution (or a relative frequency distribution) graphically in
theform of a histogram, which is a special typeof bar graph.
When we construct a histogram the values of the variable under consideration are represented
by the horizontal axis, while the vertical axis has as its scale the frequency (or relative
frequency if desired) of occurrence. Above each class interval on the horizontal axis a
rectangular bar, or cell, as it is sometimes called, is erected so that the height corresponds to
the respective frequency when the class intervals are of equal width. The cells of a histogram
must be joined and, to accomplish this, we must take into account the true boundaries of the
class intervals to prevent gaps from occurring between the cells of our graph.
The level of precision observed in reported data that are measured on a continuous scale
indicates some order of rounding. The order of rounding reflects either the reporter's personal
preference or the limitations of the measuring instrument employed. When a frequency
distribution is constructed from the data, the class interval limits usually reflect the degree of
precision of the raw data. This has been done in our illustrative example.
We know, however, that some of the values falling in the second class interval, for example,
when measured precisely, would probably be a little less than 40 and some would be a little
greater than 49. Considering the underlying continuity of our variable, and assuming that the
datawere rounded to the nearest whole number, we find it convenient to think of 39.5and 49.5
as the true limits of this second interval. The true limits for each of the class intervals, then, we
take to be as shown in Table 2.3.3.
If we construct a graph using these class limits as the base of our rectangles, no gaps willresult,
and we willhave the histogram shown in Figure 2.3.2. We used MINITABto construct this
histogram, as shown in Figure 2.3.3.
TABLE 2.3.3 The Data of
Table 2.3.1 Showing True 70
Limits Class
60
True Class Limits
Frequency Frequency
50
29.5-39.5
11 40
39.5-49.5 46
49.5-59.5 70
30
59.5-69.5 45 20
69.5-79,5 16
79.5-89.5 1 10
Total 189
34.5 44.5 54.5 64.5 74.5 84.5
Age
FIGURE 2.3.2 Histogram of ages of
189 subjects from Table 2.3.1.
We refer tothe space enclosed by the boundaries of the histogram as the area of the histogram.
Each observation is allotted one unit of this area. Since we have 189 observations, the
histogram consists of a total of 189 units. Each cell contains a certain proportion of the total
area, depending on the frequency. The second cell, for example, contains 46/189 of the area.
This, as we have learned, is the relative frequency of occurrence of values between 39.5and
49.5. From this we see that subareas of the histogram defined by the cells correspond to the
frequencies of occurrence of values between the horizontal scale boundaries of the areas. The
ratioof a particular subarea to the total area of the histogram is equal to the relative frequency
of occurrence of values between the corresponding points on the horizontal axis.
Note that the polygon is brought down to the horizontal axis at the ends at points that would be
the midpoints if there were an additional cellat each end of the corresponding histogram. This
allows for the total area to be enclosed. The total area under the frequency polygon is equal to
the area under the histogram. Figure 2.3.5 shows the Frequency polygon of Figure 2.3.4
set
Superimposed on the histogram of Figure 2.3.2. This figure allows you to see, for the same
of data, the relationship between the two graphic forms.
70
70
60
60
50
50
Aouenbas 40
ÁouanbY 40
30
30
20
20
10 10
Basicprinciples of a graph:
(1) Agraph should be clear and simple; acomplicated graph defeat its own purpose.
(2) Agraph should be completely self-explanatory. One should be able to grasp the
contents of a graph without reference to the text.
(3) The origin, the vertical and the horizontal scales should be so chosen that a graph
does not convey a false impression about the nature of the data. In particular. If the
zem line does not appear on the graph, its position should be shown with a distinct
break.
scale and the variable or
(4) Frequency or ratio is usually represented on the vertical
the method of classification on the horizontal.
Types of Diagram
’ Y-asis
incoe
total
of
Prcentage
9008 8965
April 6855 8976
2233 6547
June 3456 4555
12000
10000
SO00
Shade 1
Shade 2
6000
Shede 3
Shade 4
4000
2000
Month
(ü) Pie Diagram: Piediagram is also called pie graph, circular graph, or pie chart. Thisdiagram
1S intended to compare the distinct components, which together constitute a whole. A circle of
arbitrary radius represents the whole and the segments of the circle represent the component
parts. Toconstruct such a diagram we use the fact that the whole" corresponds to the total
number of degrees in the circular area, namely, 360º. This 360° is then proportionately divided
among the various componentsof the whole. This type of diagram should be sparingly used,
especially if there are many segments. In thatcase it is difficult to compare area segments and
the labelling of segments is quite troublesome.
1. Thefollowing table shows the numbers of hours spent by a child on
Jifferent events on a working day.
Represent the adjoining information on a pie chart
Sleep
Playing 2
Study 4
T. V. 1
Others 3
school
sleep
60
uPlaing
Mstudy
120°
bd T. V.
Others
(m) Line Diagram: This diagram is alternatively called a line graph or a time series graph. If
We are given the values of a variable at different points of time, the set of values is known as a
time series and line diagram is used to represent this type of data.
In thisdiagram time is represented along the x-axis and the variable is plotted along the y-axis.
Ihus we get a point for each time period and successive points, when connected by straight
lines,give the desired diagram Often smooth curve is drawn through these points. For example,
if we wish to represent the production of jute goods in Bangladesh during the period 1981-90.
Then line diagram is suitable for the purpose. We can also represent the production of rice and
wheat in the same diagram during the same period by multiple lines (distinct lines for rice,
wheat, and jute).
2005-2008.
Multiple line chart showing production of wheat and rice of a region during
(Dotted line represent production of rice and continuous
line that of wheat).
40
TONNES (Rice)
---.
METRIC
30
IN
w 20 -O (Wheat)
PRODUCTION
10
+YEAR
14
Scatter Plot
12
10 Beach Visitors
students
ofNumber
600
Visitors
525
450
375
300
225
150
75
80 84 88 92 96
40 50 70 80 100
Marks obtained out of 100 Average Daily Temperature (°F)
12
3. The rectangles in a histogram are all adjacent but the spacing of bars in a
bar diagram is quite arbitrary.
histogram can be used for unequal class intervals but a frequency polygon
2. A
when the intervals are equal.
for grouped data is admissible only