MEASURES
OF
CENTRAL
TENDENCY
DESCRIPTIVE
MEASURES
Measures that summarise the data into a
single number are called descriptive
measures.
Descriptive measures can be calculated from a
sample or a population.
A measure calculated from a sample is called a
statistic and that from a population is called a
parameter.
MEASURES OF
CENTRAL TENDENCY
A measure of central tendency is a
descriptive statistic that indicates;
• The average or typical observed value of
a variable in a data set.
It gives an indication of where most of the
data lies
• or where most of the data is clustered.
MEASURES OF
CENTRAL TENDENCY
The three most commonly used measures
of central tendency include the:
• Mode
• Median and
• Mean
MEASURES OF
CENTRAL TENDENCY
The measures of central tendency, can
be calculated from three different
“starting points”
• a list of observed values
• a frequency table (or bar graph) or
• a histogram
The Mode
MODE
The mode (or modal value) of a variable in a
data set is the value of the variable that is
observed most frequently in that data
• or, given a continuous frequency curve, is at the point
of greatest density.
Note: the mode is the value that is observed
most frequently, not the frequency itself.
The mode is defined for every type of
measurement scales [i.e., nominal, ordinal,
interval, or ratio].
• However, the mode is used as a measure of central
tendency primarily for nominal scale (variables) only.
MODE
The mode may be ill-defined if we have either:
• a small number of cases; or
• a precisely measured continuous
variable and a finite number of cases;
because in either event it is likely that no value
will be observed more than once in the data
A set of values can have more than one mode
MODE
The mode can be unstable in some
instance
• small changes in the data can result in large
and erratic changes in the modal value;
• Especially changes in changes coding of
the variable or in class intervals, can
change the modal value.
CALCULATION OF THE
MODE
Example 1.
Given the data on number of problem sets
turned in by students in class.
4 5 5 3 4 5 5
3 1 2 3 5 5
We can construct a frequency table….
NUMBER OF PROBLEM SETS
TURNED IN
Value Abs. freq. Rel. freq. Cum. Freq. De-cum. Freq.
0 0 0% 0% 100%
1 1 8% 8% 100%
2 1 8% 16% 92%
3 3 23% 39% 84%
4 2 15% 54% 61%
5 6 46% 100% 46%
Total 13 100%
• Value with the greatest absolute or relative
frequency is the modal value.
• In this case 5 if the modal value.
MODE
Notice that the modal number of problem sets
turned in is 5,
• although most students turned in fewer than 5,
So if we recoded the variable to create just two
dichotomous categories:
(a) turned in all 5
(b) did not turn in all 5
• the latter category (i.e.(b) becomes the modal
category.
FREQUENCY
DISTRIBUTION TABLE
MODE FROM THE FREQUENCY TABLE
MODE ON A FREQUENCY CURVE
Given a continuous frequency curve:
• the mode is the value of the variable under the
highest point of the frequency curve
• (the point with the greatest density of observed
values).
STRENGTHS OF THE
MODE
It is easy to understand and calculate
It is not affected by extreme large or small
values (outliers)
It is useful for qualitative data
Can be located graphically
WEAKNESSES OF THE MODE
Its computation is not based on all values as is the
case for the mean
It will not be well defined if the data consists of small
number of values (it is possible that there can be more
than one modal value)
It is not capable of further mathematical treatment
Sometimes the data may not have a mode at all
The Median
MEDIAN
This is a value that divides the data set of finite
values into two equal parts such that
• the number of values equal to or greater than
the median is equal to the number less than or
equal to the median
CALCULATION OF THE MEDIAN
Given a list of observed values (raw data):
rank order the cases in terms of their observed
values (e.g., from lowest to highest)
identify the value of the case right at the
middle of this rank-ordered list, and
• the value of this case is the median value; or
construct a frequency table and find where the
cumulative frequency crosses the 50% mark
MEDIAN
Unordered data (raw data)
4 5 5 3 4 5 5 3 1 2 3 5 5
Rank ordered data
1 2 3 3 3 4 4 5 5 5 5 5 5
Median value (i.e. in the middle of the rank)
NUMBER OF PROBLEM SETS
TURNED IN.
Value Abs. freq. Rel. freq. Cum. Freq. De-cum. Freq.
0 0 0% 0% 100%
1 1 8% 8% 100%
2 1 8% 16% 92%
3 3 23% 39% 84%
4 2 15% 54% 61%
5 6 46% 100% 46%
Total 13 100%
• Value at which the cumulative frequency
crosses the 50% threshold.
• In this case 4 if the median value.
MEDIAN
When the number of values is even, there is no single
middle value.
• There will be two middle values
In this case the median is the average of the two
middle values.
1 2 3 3 3 4 4 5 5 5 5 5
Median is the average of the two middle values
• In this case median = 4
If we add another value to the data set
1 2 3 3 3 4 4 5 5 5 5 5 5 5
The median becomes the average of 4 and 5 = 4.5
FREQUENCY
DISTRIBUTION TABLE
(MEDIAN)
MEDIAN FROM THE
FREQUENCY TABLE
MEDIAN ON A FREQUENCY CURVE
On a frequency distribution curve, the median
cuts the area under the curve into two equal
parts.
STRENGTHS OF THE
MEDIAN
It is unique
• There is only one median for a given set of
data.
It is easy to calculate
It is not drastically affected by extreme
values (outliers) as is the case for the
mean.
WEAKNESSES OF THE
MEDIAN
Computation of the median only relies on
the central values and ignores all the other
data.
It is also less amenable to statistical tests
(i.e. compared to the mean)
The mean
MEAN
The mean (or mean value) of a variable in a set
of data is the result of adding up all the
observed values of the variable and dividing by
the number of cases
• (i.e. the “average” ).
CALCULATION OF THE
MEAN
Suppose we have a variable X and a set of cases
numbered 1,2,...,n. Let the observed value of the
variable in each case be designated x1, x2, etc.
Thus:
Mean = Sum of values
Number of observations
Notation : Let x1 , x2 , ... xn are n observations of a variable
x. Then the mean of this variable,
n
x x2 ... xn x i
x 1 i 1
n n
CALCULATION OF THE
MEAN
Given the data:
1 2 3 3 3 4 4 5 5 5 5 5 5
Mean = Sum of values
Number of observations
Mean = 1+2+3+3+3+4+4+5+5+5+5 = 50
13 13
Mean = 3.85
FREQUENCY DISTRIBUTION
TABLE
MEAN FROM THE
FREQUENCY TABLE
MEAN ON A FREQUENCY CURVE
The mean is the “center of gravity” of the
distribution.
• Determine (by “eyeball” approximation) the
value of the variable such that the density
“balances” at that point; this value is the mean.
STRENGTHS OF THE
MEAN
The mean is unique for a given set of data
• There is only one mean.
It is easily understood and easy to
calculate.
It takes into consideration all the values in
the set of data.
WEAKNESS OF THE MEAN
The mean is affected by extreme values
• Because each value in the set of data is included in the
computation.
e.g. family income.
20, 30, 40, and 990
Mean = (20+30+40+990)/4 = 270.
Median = (30+40)/2 = 35.
Here 3 observations out of 4 lie between 20-40.
So, the mean 270 really fails to give a realistic
picture of the major part of the data.
It is influenced by extreme value 990
CHOOSING A MEASURE OF
CENTRAL TENDENCY
Depends on the nature of the distribution.
For continuous variables in a unimodal and
symmetric distribution the mean, median and
mode are identical.
With a skewed distribution the median may be
more useful.
For statistical analyses the mean is the preferred
measure.
MEASURES
OF
DISPERSIO
N
MEASURES OF
DISPERSION
Synonyms: measures of variation, spread and scatter
Dispersion refers to the variability exhibited by a set of
observations
A measure of dispersion describes the amount of variability
present in a set of data
There are different measures of variation that are used in
statistics. Those included in this module are:
• The range, variance, standard deviation and coefficient of
variation.
RANGE
1. The difference between the largest and
smallest observation.
Example: Numbers below are test scores for a
class.
44 56 58 62 64 64 70
72
Range = 72 – 44 = 28
28 (44,72)
Communicates very little information
ADVANTAGES AND
DISADVANTAGES OF THE
RANGE
Advantage
it is easy to compute
Disadvantage
It communicates very little information about
the data set
• It only takes into account the largest and
smallest value
• This makes it a poor measure of dispersion
VARIANCE
The variance is a measure of variability which
takes into account the differences between
each observation and the sample mean
It measures the scatter of the values in a set of
data about the mean
The dispersion of the value when they are
close to the mean is less and vice versa
• Hence the logic to measure the variation of
values from the mean
CALCULATION OF THE VARIANCE
Sample variance = The sum of the squared
deviations, divided by (n – 1).
Mathematical notation: s² = Σ(x – x¯)²
n -1
The quantity s² is called the sample estimate of
the variance
Population variance:
Mathematical notation: σ² = Σ(x – μ)²
N
POPULATION
VARIANCE
Average of squared deviations of values from the mean
Calculating the variance (population)
Population Variance:
Where; 𝜇 = Population mean
𝑁 =Population size
𝑋𝑖 = ith value of the variable
SAMPLE VARIANCE
Where; = sample mean
𝑋𝑖 =ith value of observation X
n=sample size
ADVANTAGES AND
DISADVANTAGES OF THE
VARIANCE
Advantage
It takes into consideration all the values in
the set of data.
Disadvantage
The units of measure are squared which
may be difficult to communicate
• e.g. variance of weight will be in kg
squared.
STANDARD DEVIATION
The way around the difficulty of s² is to use the square root of the
variance as a measure of variability.
The quantity denoted by s, is called the sample standard deviation
Thus, if s² = Σ(x – x¯)²
n–1
Then
The population standard deviation will therefore be denoted as: σ =
√σ²
Where
σ² = Σ(x – μ)²
N
STANDARD DEVIATION
OF THE POPULATION
Get the square root of the population
variance to obtain the standard deviation
for the population.
STANDARD DEVIATION
OF THE SAMPLE
The sample standard deviation is obtained by
squaring root of the sample variance.
Therefore, the sample standard deviation is
given by;
EXAMPLE 1
The data given is of plasma volume
x xx x x
2
x)
(x
2
Variance S
2
(n 1)
Mean=3.0025
Variance = 0.097
Standard dev. = 0.31
VARIANCE AND S.D.
FROM THE FREQUENCY
TABLE
VARIANCE AND S.D.
FOR GROUPED DATA
RECAP OF FORMULAS
x)
(x
2
Variance S
2
(n 1)
FEATURES OF THE
STANDARD DEVIATION
• It is usually positive and NEVER negative
• It is 0 only when all data values are the same
number
• The larger value for SD the greater amount the
data varies
• It can increase dramatically with the inclusion
of outliers
• The units (minutes, feet, etc...) are the same as
the units of original values
COEFFICIENT OF VARIATION (CV)
Sometimes we may wish to compare standard
deviations in two groups.
• i.e. we may want to compare the variability in two
groups.
• The two groups may be from two different data
sets
Or may have observations measured in different
units of measure.
• e.g. weight measures in pounds and kg
The groups may also have different means
• e.g. mean weight in children and mean weight in
adults.
COEFFICIENT OF VARIATION (CV)
The coefficient of variation gives a relative
measure of variation rather than the absolute
variation.
• Hence sometimes referred to as the relative
coefficient of variation.
It expresses the standard deviation as a
percentage of the sample mean
COEFFICIENT OF VARIATION (CV)
CV. = standard deviation (s) x 100%
mean (x¯)
The cv is independent of the units of
observations (i.e. it is a unit less dimension)
• Because the standard deviation and the mean
are expressed in the same units, the two units
cancel out.
EXAMPLE 2
The data given is from two samples of males aged 11 years and 25
years.
Sample 1 Sample 2
Age 25 years 11 years
Mean weight 145 pounds 80 pounds
Standard 10 pounds 10 pounds
deviation
We wish to know which of the weights is more variable.
EXAMPLE 2
If we calculate the CV for the 25 year olds;
C.V. = (10/145) x 100 = 6.9%
CV for the 11 year olds
C.V. = (10/80) x 100 = 12.5%
We can see that variation is higher in the 11
year olds than the 25 year olds.
COEFFICIENT OF
VARIATION (CV) FOR
GROUPED DATA
VARIANCE AND S.D.
FOR GROUPED DATA
COEFFICIENT OF
VARIATION (CV) FOR
GROUPED DATA
Measures of Position
PERCENTILES
• Percentiles are values that divide the
ranked data set into 100 (‘per cent’) equal
parts.
The Pth percentile of a data set is a value
such that;
• at least p percent of the observations take
on this value or less
• at least (100-p) percent of the
observations take on this value or more.
PERCENTILES
pth percentile
1 1 3 4 5 5 7 8 9
p% (100-p)% greater
PERCENTILES
• If approximately n percent of the items in a
distribution are less than the number x;
• then x is the nth percentile of the
distribution, denoted Pn.
• Percentile rank indicates the percentage of
data values that fall below the specified rank.
• Symbolized by P1, P2 ,…..
PERCENTILES
To find the percentile rank (ranked data) given
data value x
= number of data values below the given data
points+0.5 x 100
Total number of values
Note: techniques differ, but all get to the
similar values
EXAMPLE: PERCENTILES
The following are test scores (out of 100) for a
particular Epidemiology class.
44 56 58 62 64 64 70 72
72 72 74 74 75 78 78 79
80 82 82 84 86 87 88 90
92 95 96 96 98 100
Mulenga scored 62 in the test. What was his
percentile rank?
PERCENTILES
= number of data values below the given data
points+0.5 x 100
Total number of values
= (3+0.5/30) x 100
= 11.6666
~12
The score 62 is the 12th percentile and is expressed as
P12
Bwalya has a score of 95 in the test, what was his
percentile rank?
PERCENTILES
EXAMPLE: PERCENTILES
PERCENTILES OF
GROUPED DATA
PERCENTILES OF
GROUPED DATA
PERCENTILES OF
GROUPED DATA
DECILES
Deciles are values that divide a data set into ten
(approximately) equal parts.
Denoted by D1, D2,…, D9)
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
D1 D2 D3 D4 D5 D6 D7 D8 D9
DECILES AND
QUARTILES
Deciles and quartiles are determined in the
same manner as percentiles, since they
may be expressed as percentiles.
EXAMPLE: DECILES
The following are test scores (out of 100) for a
particular math class.
44 56 58 62 64 64 70 72
72 72 74 74 75 78 78 79
80 82 82 84 86 87 88 90
92 95 96 96 98 100
Find the sixth decile.
EXAMPLE: DECILES
Solution
• The sixth decile is the 60th percentile.
• 60 percent of 30 is (0.6)(30) = 18
• we take the average of the 18th and 19th item = 82 as
the sixth decile.
• D6 = 82
QUARTILES
These are values which divide a series of observations, arranged in
ascending order into 4 equal parts. (Thus the 2nd Quartile is the
Median).
The data is ranked and then split into 4 segments with an equal
number of values per segment
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are larger)
Lastly, Only 25% of the observations are greater than the third
quartile.
QUARTILES
For any set of data (ranked in order from
least to greatest):
• The second quartile, Q2, is the median.
• The first quartile, Q1, is the median of all
items below Q2.
• The third quartile, Q3, is the median of
all items above Q2.
QUARTILES
• Quartiles are the three values (Q1, Q2, Q3) that
divide the data set into four (approximately) equal
parts.
Q1, Q2, Q3
divides ranked scores into four equal parts
25% 25% 25% 25%
(minimum)
Q1 Q2 Q3 (maximum)
(median)
INTER QUARTILE
RANGE
The interquartile range shows the spread of
the middle 50% of the data.
Interquartile Range (or IQR): Q3 - Q1
INTERQUARTILE
RANGE
The interquartile range (IQR) is a measure of
variability, based on dividing a data set into
quartiles.
Quartiles divide a rank-ordered data set into four
equal parts. The values that divide each part are
called the first, second, and third quartiles; and
they are denoted by Q1, Q2, and Q3, respectively
EXAMPLE: QUARTILES
The following are test scores (out of 100) for a
particular math class.
44 56 58 62 64 64 70 72
72 72 74 74 75 78 78 79
80 82 82 84 86 87 88 90
92 95 96 96 98 100
• Find the three quartiles.
• And find the interquartile range
EXAMPLE: QUARTILES
Solution
The two middle numbers are 78 and 79 so
Q2 = (78 + 79)/2 = 78.5.
There are 15 numbers above and 15 numbers below
Q2, the middle number for the lower group is
Q1 = 72, and for the upper group is
Q3 = 88.
IQR = Q3 – Q1 = 88 – 72 = 16
QUARTILES OF
GROUPED DATA
QUARTILES OF
GROUPED DATA
QUARTILES OF
GROUPED DATA
QUARTILES OF
GROUPED DATA
INTERQUARTILE RANGE
OF GROUPED DATA
BOX AND WHISKER
PLOT
QUARTILES Deciles
Q1 = P25 D1 = P10
D2 = P20
Q2 = P50
D3 = P30
Q3 = P75 •
•
•
D9 = P90
END OF
LECTURE