Measures of Variability:
Variability:
Variability describes how far apart data points lie from each other and from the center of a
distribution.
Variability is also referred to as spread, scatter or dispersion. It is most commonly measured
with the following:
Range: the difference between the highest and lowest values
Interquartile range: the range of the middle half of a distribution
Standard deviation: average distance from the mean
Variance: average of squared distances from the mean
Why does variability matter?
1. amount of variability determines how well you can generalize results from the
sample to your population.
2. Low variability is ideal because it means that you can better predict information
about the population based on sample data. High variability means that the values
are less consistent, so it is harder to make predictions.
3. Data sets can have the same central tendency but different levels of variability or
vice versa
Same average but different variability.
Range:
1. The range tells you the spread of your data from the lowest to the highest value in the
distribution.
2. To find the range, simply subtract the lowest value from the highest value in the data
set.
3. Because only 2 numbers are used, the range is influenced by outliners and doesn’t
give you any information about the distribution of values.
Interquartile range:
1. The interquartile range gives you the spread of the middle of your distribution.
2. For any distribution that’s ordered from low to high, the interquartile range contains
half of the values. While the first quartile(Q1) contains the first 25% of values, the
third quartile (Q3) contains the last 25% of values.
The interquartile range is the third quartile (Q3) minus the first quartile (Q1). This
gives us the range of the middle half of a data set.
1. The interquartile range uses only 2 values in its calculation. But the IQR is less
affected by outliers: the 2 values come from the middle half of the data set, so
they are unlikely to be extreme scores.
2. The IQR gives a consistent measure of variability for skewed as well as normal
distributions.
Five-number summary
Every distribution can be organized using a five-number summary:
Lowest value
Q1: 25th percentile
Q2: the median
Q3: 75th percentile
Highest value (Q4)
These five-number summaries can be easily visualized using box and whisker plots.
Standard deviation:
The standard deviation is the average amount of variability in your dataset.
It tells you, on average, how far each score lies from the mean. The larger the standard
deviation, the more variable the data set is.
There are six steps for finding the standard deviation by hand:
1. List each score and find their mean
2. Subtract the mean from each score to get the deviation from the mean.
3. Square each of these deviations.
4. Add up all of the squared deviations.
5. Divide the sum of the squared deviations by n – 1 (for a sample) or N (for a population).
6. Find the square root of the number you found.
STANDARD DEIVATION FOR POPULATION:
= population standard deviation
= sum of…
= each value
= population mean
= number of values in the population
STANDARD DEIVATION FOR SAMPLE:
s = sample standard deviation
= sum of…
= each value
= sample mean
= number of values in the sample
Note: A population is the entire group that you want to draw conclusions about. A sample is
the specific group that you will collect data from. The size of the sample is always less than
the total size of the population.
Why use (n-1) for sample?
1. When you use sample data, your sample standard deviation is always used as an
estimate of the population standard deviation. Using n in this formula tends to give
you a biased estimate that consistently underestimates variability.
2. Reducing the sample n to n – 1 makes the standard deviation artificially large, giving
you a conservative estimate of variability.
3. While this is not an unbiased estimate, it is a less biased estimate of standard
deviation: it is better to overestimate rather than underestimate variability in samples.
Variance:
1. The variance is the average of squared deviations from the mean.
2. Variance is the square of the standard deviation.
VARIANCE FOR POPULATION:
= population variance
= sum of…
X= each value
= population mean
N= number of values in the population
VARIATION FOR SAMPLE:
= sample variance
= sum of …
X=each value
= sample mean
= number of values in the sample
What is the best measure of variability?
1. For normal distributions, all measures can be used. The standard deviation and
variance are preferred because they take your whole data set into account, but this
also means that they are easily influenced by outliers.
2. For skewed distributions or data sets with outliers, the interquartile range is the best
measure. It’s least affected by extreme values because it focuses on the spread in the
middle of the data set.
REFERENCES:
1. https://www.scribbr.com/statistics/variability/