[go: up one dir, main page]

0% found this document useful (0 votes)
33 views13 pages

Presentation of Data

Uploaded by

robin.foton882
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views13 pages

Presentation of Data

Uploaded by

robin.foton882
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Presentation of Data

Overview
This chapter introduces a set of basic procedures and statistical measures for describing data.
Data generally consist of an extensive number of measurements or observations that are to0
numerous or complicated to be understood through simple observation. Therefore, this chapter
introduces several techniques including the construction of tables, graphical displays, and basic
statistical computations that provide ways to condense and organize information into a set of
descriptive measures and visual devices that enhance the understanding of complex data.
Data can be presented in the following way:

(1) Textual presentation: In textual presentation, data is presented along with the text.
For comprehension one has to read through and scan the whole text in order to
grasp its meaning and implications. This is a serious disadvantage, though certain
points can be emphasized in such a presentation. It is often necessary and
advantageous to represent such data in either of the two other forms of
representation.
(2) Tabular presentation: In a tabular representation, data are arranged in a
systematic way in rows and columns. Huge and unwieldy raw data can be neatly
condensed in a table, by classifying data accordingto suitable groups of classes.
There should be a title at the head of the table, and the title should indicate the
contents of the table.
(3) Graphic or diagrammatic presentation: Graphic and Unlver Agrieukal
diagrammatic
representations are useful ways or devices of presenting data for quick afsd'i'Teady
comprehension.

Classification of Data H.
Classification is the process of arranging data into groups or classes. It is the first step in
tabulation.

The purposes of classification are as follows:

(1) It facilitates meaningful comparisons.


(2) Ithelps iscondensing the data.
(3) It aids in studying the relationship.

The bases of classification are:

(1) Geographical:arranging data according to geographical region.


(2) Chronological: aranging data according to the order of time.
(3) Quantitative: arranging data according to its numerical magnitude.

1
Frequency Distribution
The most important method of organizing and summarizing statistical data is the construction
of frequency distribution table. In this method, classification is done according to quantitative
magnitude. The items are classified into groups or classes according to their increasing order
in terms of magnitude and the number of items falling into each group, known as the frequency
of that group or class, is determined and indicated. This type of arrangement stresses the
manner in which the frequencies are distributed over the classes and hence the name frequency
distribution.

Class interval: The interval defining a class is known as a class interval. For example 145
146, 147-148... etc, are class intervals. When both the end numbers of classes are considered
in a class then the class interval is inclusive. On the other hand, if any one of the end numbers
is not considered in a class, the class interval is exclusive. Usually, larger end numbers are
excluded. 10-20, 20-30, 30-40 etc.
Class limits: The end numbers of a inclusive class interval are known as class limits; the
Smaller number 145is the lower class limit and the larger number 146 is theupper class limit
of the class 145-146, if both 145 and 146 are considered in the class.
Class boundaries: The end numbers of an exclusive class interval are knows as class
boundaries. For example, if we consider a number between 144.5 and 146.5 (greater or equal
to 144.5 but less then 146.5) as falls in that class and the class is represented as 144.5-146.5,
the end numbers are called class boundaries, the smaller number 144.5 is knows as lower class
boundary and the larger number 146.5 as upper class boundary.
Interval size / Class width: The difference between the upper and lower class boundaries is
known as the width of the class. The common width is denoted by C. It is to be noted that in
certain cases, it may not be possible to have the same width for all the classes (especially for
the terminal/ end classes).

Construction of a Frequency Distribution Table


One of the first considerations when data are to be grouped is howmany intervals to include.
Too few intervals are undesirable because of the resulting loss of infomation. On the other
hand, if toomany intervals are used, the objective of summarization willnot be met. The best
guide to this, as well as to other decisions to be made in grouping data, is your knowledge of
the data. It may be that class intervals have been determined by precedent, as in the case of
annual tabulations, when the class intervals of previous years are maintained for comparative
purposes. Acommonly followed rule of thumb states that there should be no fewer than five
intervals and no more than 15. If there are fewer than five intervals, the data have been
summarized too much and the information they contain has been lost. If there are more than 15
intervals, the data have not been summarized enough.

2
Those who need more specific guidance in the
matter of
employ may use a formula given by Sturges. This deciding how many class intervals to
where k stands for the number of class intervals and formula gives k = 1+3.322 xlog10 n
nis the number of values in the data set
under consideration. The answer obtained by applying Sturges 's
rule should not be regarded
as final, but should be considered as a guide only. The
number of class intervals specified by
the rule should be increased or decreased for
convenience and clear presentation.
Suppose, for example, that we havea sample of 275 observations that we want to group. The
logarithm to the base 10 of 275 is 2.4393. Applying Sturges's formula gives K= 1l+
3.322 X2.4393 = 9. In practice, other considerations might cause us to use eight or fewer or
perhaps 10 or mnore class intervals.

Another question that must be decided regards the width of the class intervals. Class intervals
generally should be of the same width, although this is sometimes impossible to accomplish.
This width may be determined by dividing the range by k, the number of class intervals.
Symbolically, the class interval width is given by W= K where R (the range) is the difference
between the smallest and the largest observation in the data set. As a rule this procedure yields
a width that is inconvenient for use. Again, we may exercise our good judgment and select a
width (usually close toone given by Equation 2.3.1) that is more convenient.
There are other rules of thumb that are helpful in setting up useful class intervals. When the
nature ofthedata makes them appropriate, class interval widths of 5units, 10 units, and widths
that are multiples of 10 tend to make the summarization more comprehensible. When these
widths are employed it is generally good practice to have the lower limit of each interval end
in azero or5. Usually class intervals are ordered from smallest to largest; that is, the first class
interval contains the smaller measurements and the last class interval contains the larger
measurements. When this is the case, the lower limit of the first class interval should be equal
to or smaller than the smallest measurement in the data set, and the upper limit of the last class
interval should be equal to or greater than the largest measurement.
We wish to know how many class intervals to have in the frequency distribution of the
data. We also want to know how wide the intervals should be.

Solution: To get an idea as to the number of class intervals to use, we can apply
Sturges's rule to obtain

k= 1+3.322(log 189)
=1+3.322(2.2764618)

Now let us divide the range by 9 to get some idea about the class
interval width. We have
||
R 82- 30 52
= 5.778
k 9

It is apparent that a class interval width of 5 or 10 will be more con


venient to use, as well as more meaningful to the reader. Suppose we decide
on 10. WNe may now construct our intervals. Since the smallest value in Table
2.2.1 is 30 and the largest value is 82, we may begin our intervals with 30
and end with 89. This gives the following intervals:

30-39

40-49

50-59

60-69

70-79

80-89

It is sometimes useful to refer to the center, called the midpoint, of a class interval. The
midpoint of a class interval is determined by obtaining the sum of the upper and lower limits
of the class interval and dividing by 2Thus, for example, the midpoint of the class interval 30
39 is found to be (30 + 39))2 = 34.5.

When we group data manually, determining the number of values falling into each class
interval is merely a matter of looking at the ordered array and counting the number of
observations falling in the various intervals. When we do this for our example, have Table
2.3.1.

Atable such as Table 2.3.1 is called a frequency distribution. This table shows the way in
which the values of the variable ae distributed among the specified class intervals. By
4
consulting it, we can determine the frequency of
class intervals shown. occurrence of values within any one of the

TABLE 2.3.1 Frequency


Distribution
Ages of 189 Subjects Shown of
in Tables 1.4.1
and 2.2.1

Class Interval
Frequency
30-39 11
40-49 46
50-59 70
60-69 45
70-79 16
80-89 1

Total 189

In determining the frequency of values falling within two or more class intervals, we obtain the
sum of the number of values falling within the class intervals of interest. Similarly, if we want
to know the relative frequency of occurrence of values falling within tWo or more class
intervals, we add the respective relative frequencies. We nmay sum, or cumulate, the
frequencies and relative frequencies to facilitate obtaining information regarding the frequency
shows
or relative frequency of values within two or more contiguous class intervals. Table 2.3.2
the data of Table 2.3.1 along with the cumulative frequencies, the relative frequencies, and
cumulative relative frequencies.
TABLE 2.3.2 Frequency, Cumulative Frequency, Relative Frequency, and
Described
Cumulative Relative Frequency Distributions of the Ages of Subjects
in Example 1.4.1
Cumulative
Cumulative Relative Relative
Class
Frequency Frequency Frequency
Interval Frequency
11 .0582 .0582
30-39 11
.2434 .3016
40-49 46 57
.3704 .6720
50-59 70 127
.2381 .9101
60-69 45 172
.0847 .9948
70-79 16 188
.0053 1.0001
80-89 1 189

Total 1.0001
189

Note: Frequencies do not add to 1.0000 exactly because of rounding.

5
Suppose that we are interested in the relative frequency of values between 50 and 79. We use
the cumulative relative frequency column of Table 2.3.2 and subtract .3016 from .9948,
obtaining .6932.

We may use a statistical package to obtain a table similar to that shown in Table 2.3.2. Tables
obtained from both MINITAB and SPSS software are shown in Figure 2.3.1.

The Histogram
Wemay display a frequency distribution (or a relative frequency distribution) graphically in
theform of a histogram, which is a special typeof bar graph.
When we construct a histogram the values of the variable under consideration are represented
by the horizontal axis, while the vertical axis has as its scale the frequency (or relative
frequency if desired) of occurrence. Above each class interval on the horizontal axis a
rectangular bar, or cell, as it is sometimes called, is erected so that the height corresponds to
the respective frequency when the class intervals are of equal width. The cells of a histogram
must be joined and, to accomplish this, we must take into account the true boundaries of the
class intervals to prevent gaps from occurring between the cells of our graph.
The level of precision observed in reported data that are measured on a continuous scale
indicates some order of rounding. The order of rounding reflects either the reporter's personal
preference or the limitations of the measuring instrument employed. When a frequency
distribution is constructed from the data, the class interval limits usually reflect the degree of
precision of the raw data. This has been done in our illustrative example.

We know, however, that some of the values falling in the second class interval, for example,
when measured precisely, would probably be a little less than 40 and some would be a little
greater than 49. Considering the underlying continuity of our variable, and assuming that the
datawere rounded to the nearest whole number, we find it convenient to think of 39.5and 49.5
as the true limits of this second interval. The true limits for each of the class intervals, then, we
take to be as shown in Table 2.3.3.

If we construct a graph using these class limits as the base of our rectangles, no gaps willresult,
and we willhave the histogram shown in Figure 2.3.2. We used MINITABto construct this
histogram, as shown in Figure 2.3.3.
TABLE 2.3.3 The Data of
Table 2.3.1 Showing True 70
Limits Class
60
True Class Limits
Frequency Frequency
50

29.5-39.5
11 40
39.5-49.5 46
49.5-59.5 70
30

59.5-69.5 45 20
69.5-79,5 16
79.5-89.5 1 10

Total 189
34.5 44.5 54.5 64.5 74.5 84.5
Age
FIGURE 2.3.2 Histogram of ages of
189 subjects from Table 2.3.1.

We refer tothe space enclosed by the boundaries of the histogram as the area of the histogram.
Each observation is allotted one unit of this area. Since we have 189 observations, the
histogram consists of a total of 189 units. Each cell contains a certain proportion of the total
area, depending on the frequency. The second cell, for example, contains 46/189 of the area.
This, as we have learned, is the relative frequency of occurrence of values between 39.5and
49.5. From this we see that subareas of the histogram defined by the cells correspond to the
frequencies of occurrence of values between the horizontal scale boundaries of the areas. The
ratioof a particular subarea to the total area of the histogram is equal to the relative frequency
of occurrence of values between the corresponding points on the horizontal axis.

The Frequency Polygon


A frequency distribution can be portrayed graphically in yet another way by means of a
frequency polygon, which is aspecial kind of line graph. To draw a frequency polygon we first
place adot above the midpoint of each class interval represented on the horizontal axis of a
graph like the one shown in Figure 2.3.2. The height of a given dot above the horizontal axis
corresponds to the frequency of the relevant class interval. Connecting the dots by straight lines
produces the frequency polygon. Figure 2.3.4 is the frequency polygon for the age data in Table
2.2.1.

Note that the polygon is brought down to the horizontal axis at the ends at points that would be
the midpoints if there were an additional cellat each end of the corresponding histogram. This
allows for the total area to be enclosed. The total area under the frequency polygon is equal to
the area under the histogram. Figure 2.3.5 shows the Frequency polygon of Figure 2.3.4
set
Superimposed on the histogram of Figure 2.3.2. This figure allows you to see, for the same
of data, the relationship between the two graphic forms.
70
70
60
60
50
50
Aouenbas 40
ÁouanbY 40
30
30

20
20

10 10

24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5


24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5
Age Age
FIGURE 2.3.5 Histogram and frequency polygon
FIGURE 2.3.4 Frequency polygon for the ages of
189 subjects shown in Table 2.2.1. for the ages of 189 subjects shown in Table 2.2.1.

Graphical Representation of Data


One very simple but effective form of statistical analysis is to present the tabulated data with
the help of graphs and diagrams. Agraph or a diagram may be very useful in classifying a
complex situation; it can reveal interesting facts that were not obvious from the original data.
Some of the uses of graphs are:

(1) It is helpful in elucidating the main features of aset of data.


(2) It is often valuable in suggesting an appropriate method of analysis and in
explaining the conclusions founded upon the analysis.
(3) Itcan sometimes pinpoint gross error in statistical records

Basicprinciples of a graph:
(1) Agraph should be clear and simple; acomplicated graph defeat its own purpose.
(2) Agraph should be completely self-explanatory. One should be able to grasp the
contents of a graph without reference to the text.
(3) The origin, the vertical and the horizontal scales should be so chosen that a graph
does not convey a false impression about the nature of the data. In particular. If the
zem line does not appear on the graph, its position should be shown with a distinct
break.
scale and the variable or
(4) Frequency or ratio is usually represented on the vertical
the method of classification on the horizontal.

Types of Diagram

We shall briefly describe below the following diagrams:


polygon, (v) Histogram
(1) Bar Diagram, (i) Pie diagram, (iii) Line diagram, (iv) Frequency
(vi) C. F. Polygon and (vii) Scatter diagram.
() Bar Diagram: This diagram is used mainly for portraying qualitative data. It is drawn by
erecting a series of blocks of equal widths but with heights proportional to the value
corresponding to different time periods or categories. The vertical blocks are alternatively
Known as bars and hence the name bar diagram, It is also to be noted that horizontal bars are
also used for depicting qualitative data. The bars may be arranged in a chronological,
numerical, or some other convenient order. Two more independent series or the component
parts of a total may also be portrayed with the help of this diagram by grouping bars in a suitable
manner or by subdividing a bar into anumber of segments. The bar diagram is quite simple
and highly useful. In order to avoid a misleading impression of the relative lengths of bars, it
is important that the scale of lengths on a bar diagram should start from zero.
1. The percentage of total income spent under various heads by a family is
given below.

Different Heads FooddothingHealth |EducationHouse Rent Miscellaneous


fo Age of Tota! 40% 10% 10% 15% 20% 5%
Number

Represent the above data in the form of bar graph.

’ Y-asis

incoe

total
of
Prcentage

Heatttr Education House Rent Misc X-axis


Food Clothing
Diffaet needs
lipstick. The sale for 6 months is shown
Example 2: A cosmetic company manufactures 4 different shades of
in the tabie. Represent it using bar charts.

Month Sales (in units)


Shade 3 Shade 4
Shade 1 Shade 2

1600 4400 3245


January 4500

5645 5675 6754


February 2870

8900 9768 7786


March 3985

9008 8965
April 6855 8976

5678 5643 7865


May 3200

2233 6547
June 3456 4555
12000

10000

SO00
Shade 1
Shade 2
6000
Shede 3

Shade 4
4000

2000

January February March Aprl May June

Month

(ü) Pie Diagram: Piediagram is also called pie graph, circular graph, or pie chart. Thisdiagram
1S intended to compare the distinct components, which together constitute a whole. A circle of
arbitrary radius represents the whole and the segments of the circle represent the component
parts. Toconstruct such a diagram we use the fact that the whole" corresponds to the total
number of degrees in the circular area, namely, 360º. This 360° is then proportionately divided
among the various componentsof the whole. This type of diagram should be sparingly used,
especially if there are many segments. In thatcase it is difficult to compare area segments and
the labelling of segments is quite troublesome.
1. Thefollowing table shows the numbers of hours spent by a child on
Jifferent events on a working day.
Represent the adjoining information on a pie chart

Activity No. of Hours


School 6

Sleep
Playing 2
Study 4

T. V. 1
Others 3

The central angles for various observations can be calculated as:

Activity No. of Hours Measure of central angle


School 6 (6/24 X 360)° = 900
Sleep (8/24 x 360)° = 120°
Playing 2 (24 x 360)° = 30°
Study 4 ("/24 X 360)° = 600
T. V. (/24 X360)0 = 150
Others 3 (24 X360)° = 450
ew.we shall represent theseangles within the circle as different
Then we now make the pie chart: sectors.

school

sleep
60
uPlaing
Mstudy
120°
bd T. V.
Others

(m) Line Diagram: This diagram is alternatively called a line graph or a time series graph. If
We are given the values of a variable at different points of time, the set of values is known as a
time series and line diagram is used to represent this type of data.

In thisdiagram time is represented along the x-axis and the variable is plotted along the y-axis.
Ihus we get a point for each time period and successive points, when connected by straight
lines,give the desired diagram Often smooth curve is drawn through these points. For example,
if we wish to represent the production of jute goods in Bangladesh during the period 1981-90.
Then line diagram is suitable for the purpose. We can also represent the production of rice and
wheat in the same diagram during the same period by multiple lines (distinct lines for rice,
wheat, and jute).
2005-2008.
Multiple line chart showing production of wheat and rice of a region during
(Dotted line represent production of rice and continuous
line that of wheat).

40
TONNES (Rice)

---.
METRIC
30

IN
w 20 -O (Wheat)
PRODUCTION

10

2005 2006 2007 2008

+YEAR

consist of pairs values of two related variables, X


(vii) Scatter Diagram: Sometimes the data
y, and the statistical problem is to investigate the inter-relationship between the variables.
and of
may represent the amount of fertilizer and y, the yield of rice. When the given pairs
Thus x
plotted on ordinary graph paper, we get a dot diagram" or scatter diagram". It is
values are
11
called so because it gives a series of dots, each of which has x and y as its co-ordinates. A set
ofn pairs of observations thusprovides n dots on the diagram and the "scatter or clustering of
the points exhibits the relationship between the variables. Hence the alternative name scatter
diagram.
This diagram is frequently useful in deciding whether the relationship between two variables
can be represented by, say, a straight line or a parabola.
Example: X-axis - Marks obtained out of 100, Y-axis - Number of Students
The data points that we need to plot according to the given dataset are
(45,12), (55, 10), (65,8), (75,7), (85,5), (95,2)
Here's how the plot will look like -

14
Scatter Plot
12

10 Beach Visitors
students
ofNumber
600
Visitors
525
450
375
300
225
150
75

80 84 88 92 96
40 50 70 80 100
Marks obtained out of 100 Average Daily Temperature (°F)

Comparison of Histogram with Bar Diagram:

The apparent similarity between a histogram and a bar diagram is often


confusing to a beginner. But they are quite different and serve distinct purposes.

1. A histogram is used for representing a frequency distribution only but a bar


diagram isnever used for representing a frequency distribution.

Z. In ahistogram the area of a rectangle is proportional to the relevant


jrequency, whereas in a bar diagram it is the height of the bar that counts.

12
3. The rectangles in a histogram are all adjacent but the spacing of bars in a
bar diagram is quite arbitrary.

Comparison of histogram with Frequency Polygon:

It is worth while to compare the advantages and disadvantages of these two


distribution only.
types of diagram, both of which are used for representing frequency

continuous, the histogram is decidedly


1. If thevariable under consideration is
is essentially discrete, the
superior to the frequency polygon; if the variable
frequency polygon is to be preferred.

histogram can be used for unequal class intervals but a frequency polygon
2. A
when the intervals are equal.
for grouped data is admissible only

several frequency distributions are to


3. If several diagram, corresponding to present a confused picture and
be superimposed for comparison, histogram
frequency polygon are preferable.

You might also like