Descriptive Statistics
Part 3 : Measures of Variation
1
Outline
• 2.1 Frequency Distributions and Their Graphs
• 2.2 Measures of Central Tendency
• 2.3 Measures of Variation/Variability
2
Section 2.3
Measures of Variation/ Variability
3
Measures of Variation (“Spread”)
Another important characteristic of quantitative
data is how much the data varies, or is spread
out.
The most common method of measuring spread:
1. Range
2. inter-quartile range
3. Standard deviation and Variance
4. Skewness – Will be discuss in detail in
NORMAL DISTRIBUTION
4
Range
Range
• The difference between the maximum and minimum
data entries in the set.
• The data must be quantitative.
• Range = (Max. data entry) – (Min. data entry)
5
Example: Finding the Range
The wait time to see a bank teller is studied at 2 banks.
Bank A has multiple lines, one for each teller.
Bank B has a single wait line for 1st available teller.
5 wait times (in minutes) are sampled from each bank:
Bank A: 5.2 6.2 7.5 8.4 9.2
Bank B: 6.6 6.8 7.5 7.7 7.9
Find the mean, median, and range for each bank.
Solution: Finding the Range
• Bank A: Range = ?
• Bank B: Range = ?
• Note: The range is easy to compute, but only uses 2
values. Do the following 2 sets vary the same?
Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10
7
Inter-quartile range
• The inter-quartile range is a measure of spread or
dispersion. It is the difference between the 75th percentile
(often called Q3) and the 25th percentile (Q1). The
formula for inter-quartile range is therefore: Q3-Q1. It is
sometimes called the H-spread. Although not used
extensively, the inter-quartile range is a stable measure of
spread and perhaps should be in more common usage.
Quartiles
• Split Ordered Data into 4 Quarters
25% 25% 25% 25%
Q1 Q2 Q3
i n 1
• Position of i-th Quartile Qi
4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
1 9 1 12 13
Position of Q1 2.5 Q1 12.5
4 2
• Q1 and Q3 Are Measures of Noncentral Location
• Q2 = Median, A Measure of Central Tendency
==Example==
i x[i]
1 102
2 104
3 105 ---- the first quartile, Q1 = 105
4 106
5 108
6 109 ---- the second quartile, Q2 or median = 109
7 110
8 112
9 115 ---- the third quartile, Q3 = 115
10 115
11 118
From this table, the '''interquartile range''' is 115 - 105 = 10.
Inter-quartile Range
The inter-quartile range is the range for the
middle 50% of observations. That is the
distance from the third quartile (75th
percentile) to first quartile (25th percentile)
on a frequency distribution.
Because the inter-quartile range is the
distance between the 25th and 75th
percentiles it is not sensitive to changes in
the extreme scores at either end of the
distribution.
Example
4 7 6 31 10 29 4 6 9 11 7 23
5 8 10 7 11 6 5 8 10 9 12 9 8
Find the 1) Range
2) Inter-quartile range
Solution:
Arrange in order :
4
4
5
5
6 1. Range = High Score – Low Score
6
6 = 31 -4 = 27
7 Q1
7
7
8
8
8
9
Median, Q2= 8
9
9 2) Find median: median (odd) :
10
10
=score 13th = 8
10 Q3
11
11
Q1 = 6
12 Q3 = 10.5
23
29
IQR = Q3-Q1= 10.5-6 = 4.5
31
The inter-quartile range (IQR)
in particular is used to describe the dispersion
of the data.
The inter-quartile range (IQR) is defined as the range between
the first and the third quartile. Please note that the IQR
contains exactly 50 %of the data within the distribution.
Median, Quartiles, Deciles & Percentiles
• The Median is a value that subdivides the ordered data into two
halves.
• The Quartiles subdivide the data into quarters, the deciles
provide a subdivision into tenths, and the percentiles a
subdivision into hundredths.
• There are three quartiles: the lower quartiles, Q1, the
median(Q2), and the upper quartile, Q3.
• The percentiles are simply called the 1st percentile, the 2nd
percentile and so on.
• The median is the 5th decile and the 50th percentile.
• A study of the values of the deciles or quartiles gives us an idea
of the spread of the data, but an ‘ idea’ is all we get and there is
no need for great precicision
IN SUMMARY:
• MEDIAN
(Data is divided into 2 parts)
• QUARTILE
(Data is divided into 4 parts)
• DECILES
(Data is divided into 10 parts)
• PERCENTILES
(Data is divided into 100 parts)
Standard Deviation and Variance
Measures the typical amount data deviates from the
mean.
2
Sample Variance, s : (SAMPLE SIZE <30)
( x x ) 2 ( x x ) 2
• s
2 OR s
2
nn 1 n 1
Sample Standard Deviation, s:
( x x ) 2 ( x x ) 2
s s 2 OR s s 2
• n 1
n n1
17
Finding Sample Variance & Standard Deviation
x
1. Find the mean of the sample x
data set. n
2. Find deviation of each entry. xx
3. Square each deviation. ( x x )2
4. Add to get the sum of the ( x x ) 2
deviations squared.
( x x ) 2
5. Divide by n – 1 to get the s2
sample variance (if sample n 1
size less than 30).
( x x ) 2
6. Find the square root to get s
n 1
the sample standard
deviation. 18
Find the Standard Deviation and Variance
for Bank A (multi-line)
x 36.5
x 7.3 min Wait time, Deviation: x – x Squares: (x – x)2
n 5 x (in min)
5.2 5.2 – 7.3 = -2.1 (–2.1)2 = 4.41
6.2 6.2 – 7.3 = ( )2 =
7.5 – 7.3 =
( x x )2 7.5 ( )2 =
s
2
8.4 8.4 – 7.3 = ( )2 =
n 1 9.2 9.2 – 7.3 = ( )2 =
x x
2
x 36.5 Σ(x – x) =
s s2
• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Find the Standard Deviation and Variance
for Bank B (1 wait line)
x 36.5
x 7.3 min Wait time, Deviation: x – x Squares: (x – x)2
n 5 x (in min)
6.6
6.8
( x x ) 2 7.5
s2 7.7
n 1 7.9
x x
2
x 36.5 Σ(x – x) =
s s2
• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Sample versus Population
Standard Deviation and Variance
Sample Population
Statistics: Parameters:
Mean x µ
Standard s σ
Deviation
Variance s2 σ2
Sample versus Population
Standard Deviation
Note: Unlike x and µ, the formulas for s and σ
are not mathematically the same:
Sample Standard Deviation
( x x ) 2
• s s2
n 1
Population Standard Deviation
( x ) 2
• 2
N
22
Standard Deviation: Key Points
s0 ( When would s = 0 ?)
The standard deviation is a measure of variation of all
values from the mean. The larger s is, the more the
data varies.
The units of the standard deviation s are the same as
the units of the original data values. (The variance
has units2).
The value of the standard deviation s can increase
dramatically with the inclusion of one or more
outliers (data values far away from all others)
Interpreting Standard Deviation
• Standard deviation is a measure of the typical amount
an entry deviates from the mean.
• The more the entries are spread out, the greater the
standard deviation.
24
The Empirical Rule
Empirical (68-95-99.7) Rule
For data sets having a symmetric distribution:
About 68% of all values fall within 1 standard
deviation of the mean
About 95% of all values fall within 2 standard
deviations of the mean
About 99.7% of all values fall within 3 standard
deviations of the mean
The Empirical Rule
The Empirical Rule
The Empirical Rule
Example: Using the Empirical Rule
A sample of IQs has a symmetric distribution with a mean
of 100 and a standard deviation of 15.
1. Sketch the distribution.
2. 68% of people have an IQ between what 2 values?
3. What percent of people have an IQ between 70 and 130?
4. What percent of people have an IQ between 100 and 115?
5. What percent of people have an IQ above 145?
29
Summary
• The inter-quartile range is used in the conjunction with
the median to describe skewed distribution. It calculated
as one-half of the distance between the scores at the 25th
and 75th percentiles
• The variance is used with the conjunction of the mean to
describe symmetrical or normal distributions of interval
or ratio scores. It is the average of the squared deviations
of scores around the mean.
Summary
• The standard deviation is also used in conjunction
with the mean to describe symmetrical or normal
distribution of interval/ratio scores. It is the square
root of the variance. It can be thought of as the
“average” amount that scores deviate from the mean.
Summary
• Measures of variability describe how much the score
differ from each other, or how much the distribution
is spread out
• The range is a measure of variability based on the
difference between the highest score and the lowest
sacore