DESCRIPTIVE STATISTICS-MEASURES OF CENTRAL TENDENCY AND DISPERSION
INTRODUCTION
The major focus is descriptive statistics which is used to describe the basic features of the data in a study.
They provide simple summaries about the sample and the measures. Together with simple graphics
analysis studied in unit 2, they form the basis of virtually every quantitative analysis of data.
Descriptive Statistics are used to present quantitative descriptions in a manageable form. In a research
study we may have lots of measures. Descriptive statistics help us to simplify large amounts of data in a
sensible way. Each descriptive statistic reduces lots of data into a simpler summary. In this unit we are
going the concentrate on the measures of central tendency (mean, mode and median) and measures of
dispersion (range, standard deviation and coefficient of variation) as measures that provide a summary of
any given quantitative data.
MEASURE OF CENTRAL TENDENCY
The central tendency of a distribution is an estimate of the "centre" of a distribution of values. There are
three major types of estimates of central tendency: Mean. Mode and media
The Mean or average is probably the most commonly used method of describing central tendency
denoted by x. It lends itself to subsequent analysis because it includes all values in the universe but may
not coincide with any value and in certain instances may be unrepresentative due to extreme numbers. We
compute the mean adding up all the values and then divide by the number of values.
We have two formulas that are used to compute the mean:
In mathematical terms, the general formula is denoted by:
Where n is the sample size and the x correspond to the observed valued
For ungrouped data, the formula is: Σx/n
Mean of grouped data fx
f
Where n= no. of values, f= no of values in an interval and x = midpoint of class interval.
Example
1. Consider the yields obtained by a farmer for his maize enterprise for the past 10 year
Season Yield of maize in Tonnes
2000-2001 16
2001-2002 13
2002-2003 25
2003-2004 24
2004-2005 18
2005-2006 18
2006-2007 12
2007-2008 15
2008-2009 19
2009-2010 26
Total 186
Mean= Σx/n=16+13+25+24+18+18+12+15+19+26
= 18.6
10
The mode is the most frequently occurring value in the set of scores. To determine the mode, you must
order the yields shown in above table, and then count each one.
12, 13, 15, 16,18,18,19,24,25,26
The most frequently occurring value is the mode. In our example, the value 18 occurs twice and is the
model. In some distributions there is more than one modal value. For instance, in a bimodal distribution
there are two values that occur most frequently. If the distribution is truly normal (i.e., bell-shaped), the
mean, median and mode are all equal to each other.
The Median is the score found at the exact middle of the set of values. More precisely, the median is any
middle value in order of size, if n is odd, or the mean of the two middle numbers if n is even. Median is
more representative, when data contain a few very large numbers or small values although it cannot be
used for subsequent calculation unlike the mean. One way to compute the median is to list all scores in
numerical order, and then locate the score in the centre of the sample. For example, if there are 500 scores
in the list, score #250 would be the median. If we order the 10 yields shown above, we would get:
12, 13, 15, 16,18,18,19,24,25,26
There are 10scores and score #5 and #6 represent the halfway point. Since both of these scores are 18, the
median is 18. If the two middle scores had different values, you would have to interpolate to determine
the median.
Example
The dairy herd was weighed and the results were tabulated in the table below:
Live-weight in (KG) Frequency
150-154 08
155-159 16
160-164 43
165-170 29
170-174 04
Calculate the mean weight of the herd.
Working:
1. The first step is to find the midpoints of each weight category (x).
2. Multiply by frequencies of each category (fx)
3. Sum all products of fx
4. Divide by total frequency (Σf)
This can be summarised in a tabular form below:
X= lower limit plus upper limit divided by 2
Live-weight in (KG) Mid point Frequency Fx
(x) (f)
150-154 152 08 1 216
155-159 157 16 2 512
160-164 162 43 6 866
165-169 167 29 4 843
170-174 172 04 688
Total 100 16 125
Mean of grouped data
fx
f
16 125
=
100
=161.25Kg
The mean weight was found to be 161.25kg for the dairy herd.
MEASURES OF DISPERSION.
Dispersion refers to the spread/ variation/ scatter of the values around the central tendency. This is vital
for:
i) Assessing reliability of the averages of the data.
ii) Serves as a basis for control of variability e.g. in quality control that assess variations in the products.
There are two common measures of dispersion, the range and the standard deviation.
The range is the simplest measure of dispersion which is calculated by simply taking the difference
between the maximum and minimum values in the data set. However, the range only provides
information about the maximum and minimum values and does not say anything about the values in
between.
The Standard Deviation is a more accurate and detailed estimate of dispersion because an outlier can
greatly exaggerate the range. The Standard Deviation shows the relation that set of scores has to the mean
of the sample.
We have different formula used to compute the standard deviation and these are:
For grouped data:
Or alternatively it can be given as:
fx fx
2
2
f f
Where +
x is the variable
f is the frequency of responses
VARIANCE
It is defined as sum of squared deviations from the mean. The general formula is given as
3.3.5 VARIANCE AND STANDARD DEVIATION:
Step by Step Simple calculation:
a. Calculate the mean, x.
b. Write a table that subtracts the mean from each observed value.
c. Square each of the differences.
d. Add this column.
e. Divide by n -1 where n is the number of items in the sample. This is the variance.
f. To get the standard deviation we take the square root of the variance
Although this computation may seem convoluted, it's actually quite simple.
The table below is a summary of the steps above using the ungroup data example:
X x - 49.2 (x - 49.2 )2
15 -5.875 34.515625
20 -0.875 0.765625
21 0.125 0.015625
20 -0.875 0.765625
36 15.125 228.765625
15 -5.875 34.516525
25 4.125 17.015625
15 5.875 34.515625
Total Σ= 350.875
Now, s2 = 350.875
= 50.125
8-1
S = √50.125 =7.07990112953
The standard deviation allows some conclusions about specific scores in our distribution.
Assuming that the distribution of scores is normal or bell-shaped (or close to it!), the following
conclusions can be reached:
approximately 68% of the scores in the sample fall within one standard deviation of the mean
approximately 95% of the scores in the sample fall within two standard deviations of the mean
approximately 99% of the scores in the sample fall within three standard deviations of the mean
For instance, since the mean in our example is 20.875 and the standard deviation is 7.0799, an estimation
can be drawn from the above that approximately 95% of the scores will fall in the range of 20.875-
(2*7.0799) to 20.875+(2*7.0799) or between 6.7152 and 35.0348. This kind of information is a critical
stepping stone to enabling comparison between the performances of an individual on one variable with
their performance on another, even when the variables are measured on entirely different scale
fx fx
2 2
standard deviation of grouped data
f f
The sample standard deviation will be denoted by s and the population standard deviation will be denoted
by the Greek letter s.
The sample variance will be denoted by s2 and the population variance will be denoted by s2.
The variance and standard deviation describe how spread out the data is. If the data all lies close to the
mean, then the standard deviation will be small, while if the data is spread out over a large range of
values, s will be large. Having outliers will increase the standard deviation.
One of the flaws involved with the standard deviation, is that it depends on the units that are used. One
way of handling this difficulty, is called the coefficient of variation which is the standard deviation
divided by the mean times 100%
S
CV= x100%
m
In the above example, it is
17
x100% = 34.6%
49.2
CONCLUSION
Measures of central tendency/ location are estimates of centre of distribution of values. These are the
mean, the mode and the median.
Mean is commonly used measure of location and it lends itself to subsequent analysis since it includes all
values.
Dispersion measures the variation/scatter of values around the central tendency. The measures of
dispersion include standard deviation, variance, coefficient of variation and the range.
Standard deviation is more accurate and detailed estimate of dispersion and it shows the relation that a set
of values has to the mean.
Formulae
Mean Mode Median
Most frequently occurring value For un grouped data: any middle
For ungrouped data in the set of scores value in order of size if n is odd
or the mean of two middle values
Σx if n is even
=
n
Mean Of Grouped Data
fx
f
Measures of dispersion
Range Variance Standard deviation Coefficient of
variation
Highest value s
minus lowest CV= x100%
m
value
3.8 ACTIVITY
1. State the three measures of central tendency?
2. State the most important measure of location and give a reason(s)
3. State the mean formula of the grouped data.
4. Define the term dispersion and give the formula for the standard deviation.
5. Evaluate the relationships that do exist between standard deviation, variance, mean and coefficient
of variation.
6. Find the mean, mode, median, standard deviation and relative dispersion of the following data
which is the maize height distribution in field
Height Frequency
153- 157 04
158- 162 11
163- 167 20
168- 172 24
173- 177 17
178- 182 4