Measures of Central Tendency and Variability
Shair Muhammad Hazara
PhD Public Health (Fellow), HSA, NIH, Islamabad)
MSPH (Health Services Academy, NIH, Islamabad)
MSBE (Dow University of Health Sciences Karachi)
BSN (PRN) The Aga Khan University, Karachi
1
E-mail address: hazara_27@Hotmail.com
Measure of Central Tendency
Given a data set, a measure of the central tendency is a value
about which the observations tend to cluster. In other words it is a
value around which a data set is centered.
The three most common measures of central tendency are the
mean, the median, the mode.
2
Mean
is the arithmetic average of a set of numbers
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set, including extreme
values
Computed by summing all values in the data set and
dividing the sum by the number of values in the data set
3
Sample Mean
Age of the patients coming to the clinic
57,86,42,38,90,66
X
X X X X
1 2 3
... X n
n n
57 86 42 38 90 66
6
379
6
63.167
4
The Median
Middle value in an ordered array of numbers.
Applicable for ordinal, interval, and ratio data
Not applicable for nominal data
Unaffected by extremely large and extremely small values.
Median: Computational Procedure
Arrange the observations in an ordered array.
If there is an odd number of terms, the median is the middle term of
the ordered array.
If there is an even number of terms, the median is the average of the
middle two terms
5
Median:
Example with an Odd Number of Terms
Ordered Array
Age of the patients coming to the clinic
3, 4, 5, 7, 8, 9, 11, 14, 15, 16, 16, 17, 19, 19, 20, 21, 22
There are 17 terms in the ordered array.
Position of median = (n+1)/2 = (17+1)/2 = 9
The median is the 9th term, 15.
If the 22 is replaced by 100, the median is 15.
If the 3 is replaced by -103, the median is 15.
6
Median:
Example with an Even Number of Terms
Ordered Array
Age of the patients coming to the clinic
3, 4, 5, 7, 8, 9, 11, 14, 15, 16, 16, 17, 19, 19, 20, 21
There are 16 terms in the ordered array.
Position of median = (n+1)/2 = (16+1)/2 = 8.5
The median is between the 8th and 9th terms, 14.5.
If the 21 is replaced by 100, the median is 14.5.
If the 3 is replaced by -88, the median is 14.5.
7
The Mode
The mode is the observation that occurs most frequently.
for a sample of five salaries
6,000,10,000,14,000,50,000,10,000
the mode is equal to $10,000.
It should be noted that there can be more than one mode for
a data set.
8
Measures of Variation
Knowing the central tendency of a data set is helpful, but it
is not enough. For example the following two data sets have
the same mean
5, 6, 8, 10, 12, 14, 15 1, 4, 8, 10, 12, 16, 19
The difference however , is that the second data set has
more spread. The same point is illustrated by the following
distributions, which have the same mean but different spread.
9
Measures of Variability:
Measures of variability describe the spread or the dispersion
of a set of data.
Common Measures of Variability
Range
Interquartile Range
Variance and Standard Deviation
Coefficient of Variation
10
Range
The difference between the largest 35 41 44 45
and the smallest values in a set of data
Simple to compute 37 41 44 46
Ignores all data points except
the two extremes. Example: 37 43 44 46
Range = Largest – Smallest
39 43 44 46
= 48 - 35 = 13
40 43 44 46
The range is quick to compute but fails
to be very useful since it considers only
the extreme values and does not take 40 43 45 48
into consideration the bulk of the
observations. It is not widely used. 11
Sample Variance
Average distance of the values from the arithmetic mean (=1773)
X X X X X X X
2
2
2
2,398
1,844
625
71
390,625
5,041
S
n 1
1,539 -234 54,756 663,866
1,311 -462 213,444
7,092 0 663,866 3
221,288.67
12
Sample Standard Deviation
Square root of the sample variance
X X
2
2
S
n 1
X X X X X
2
663,866
2,398 625 390,625
3
1,844 71 5,041
1,539 -234 54,756
221,288.67
1,311 -462 213,444 2
7,092 0 663,866
S S
221,288.67
470.41
13
EXAMPLE. Find the standard deviation of the average temperatures
recorded over a five-day period last winter: 18, 22, 19, 25, 12
14
Coefficient of variation
It is a dimensionless measure of the relative variation.
– Constructed by dividing the standard deviation by the
mean and multiplying by 100.
CV = (s/x) (100)
15
Coefficient of variation
• Used to compare the variability in one data set
with that in another when a direct comparison of
standard deviation is not appropriate.
16
Coefficient of variation
Adults Children
Mean 25 yrs 11 yrs
age
Mean wt 145lbs 80lbs
SD 10lbs 10lbs
CV 6.9% 12.5%
17
• Example: Two plants C and D of a factory show the
following results about the number of workers and the
wages paid to them.
No. of workers 5000 6000
Average monthly wages $2500 $2500
Standard deviation 9 10
Using coefficient of variation formulas, find in which
plant, C or D is there greater variability in individual
wages.
18
To Find: Which plant has greater variability.
For this, we need to find the coefficient of variation. The plant that
has a higher coefficient of variation will have greater variability.
Coefficient of variation for plant C. Now, CV for plant D
Using coefficient of variation CV = (σ/μ) × 100
formula, CV = (10/2500) × 100
CV = (σ/μ) × 100, μ≠0 CV = 0.4%
CV = (9/2500) × 100
CV = 0.36%
Plant C has CV = 0.36 and plant D has CV = 0.4
Hence plant D has greater variability in individual wages.
19
20