Variance and Standard Deviation
1
Variance and Standard Deviation
We need a measure of the distribution or spread of data
around an expected value (either x or ). Variance and
standard deviation provide such measures.
Formulas and rationale for these measures are described
in the next Procedure display. Then, examples and guided
exercises show how to compute and interpret these
measures.
As we will see later, the formulas for variance and standard
deviation differ slightly, depending on whether we are using
a sample or the entire population.
2
Measures of Variation:
• The Sample Variance
• Average (approximately) of squared deviations of values
from the mean
n
• Sample variance: (X X) i
2
S 2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
3
Sample Variance
4
SAMPLE VARIANCE
• A shortcut formula for the sample variance:
n
2
n xi
1 i 1
S2 i
n 1 i 1
x 2
n
• Where S2 is the sample variance
• n is the total number of values in the sample
• xi is the value of the i-th observation.
• represents a summation
5
Measures of Variation:
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
• Sample standard deviation:
n
i
(X X ) 2
S i1
n -1
6
Population Variance
• In practice population variance cannot be computed
directly because the entire population is not ordinarily
observed.
• An analogous measure of variability may be determined
with sample data.
• This referred to as sample variance
7
SAMPLE VARIANCE
• Notice that the sample variance is defined as the sum
of the squared deviations divided by n-1.
• Sample variance is computed to estimate the
population variance.
• An unbiased estimate of the population variance may
be obtained by defining the sample variance as the
sum of the squared deviations divided by n-1 rather
than by n.
• Defining sample variance as the mean squared
deviation from the sample mean tends to
underestimate the population variance.
8
POPULATION/SAMPLE STANDARD DEVIATION
• Compute the sample standard deviation of
advertising data: 2.5, 1.3, 1.4, 1.0 and 2.0
• Compute the sample standard deviation of sales
data: 264, 116, 165, 101 and 209
9
The population mean
• The population mean is the sum of the values in the
population divided by the population size, N
X i
X1 X2 XN
i1
N N
where μ = population mean
N = population size
Xi = ith value of the variable X
10
Population Variance σ2
• Average of squared deviations of values from the mean
• Population variance:
N
i
(X μ)2
σ2 i1
N
where μ = population mean
N = population size
Xi = ith value of the variable X
11
Population Standard Deviation σ
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the population variance
• Has the same units as the original data
• Population standard deviation:
N
i
(X μ) 2
σ i1
N
12
POPULATION / SAMPLE STANDARD DEVIATION
• The standard deviation is the positive square root of
the variance:
Population standard deviation: 2
Sample standard deviation: S S 2
Compute the standard deviations of advertising and
sales.
13
Sample vs population parameters
Measure Population Sample
Parameter Statistic
Mean
X
Variance
2 S2
Standard
S
Deviation
14
Variance and Standard Deviation
Procedure:
15
Variance and Standard Deviation
16
Variance and Standard Deviation
17
Variance and Standard Deviation
In statistics, the sample standard deviation and sample
variance are used to describe the spread of data about the
mean x.
The next example shows how to find these quantities by
using the defining formulas.
As you will discover, for “hand” calculations, the
computation formulas for s 2 and s are much easier to use.
18
Variance and Standard Deviation
However, the defining formulas for s 2 and s emphasize the
fact that the variance and standard deviation are based on
the differences between each data value and the mean.
19
Variance and Standard Deviation
20
Example – Sample Standard Deviation (Defining Formula)
Compute the variance and standard deviation of the calories.
21
Standard Deviation
22
Standard Deviation
23
Standard Deviation
Ex: The wholesale prices of a commodity for seven consecutive days in
a month is as follows:
Days: 1 2 3 4 5 6 7
Commodity price/quintal : 240 260 270 245 255 286 264
Calculate the variance and standard deviation.
variance = 206,
24
Example – Sample Standard Deviation (Defining Formula)
Big Blossom Greenhouse was commissioned to develop an
extra large rose for the Rose Bowl Parade.
A random sample of blossoms from Hybrid A bushes
yielded the following diameters (in inches) for mature peak
blooms.
2 3 3 8 10 10
Use the defining formula to find the sample variance and
standard deviation.
25
Example – Solution
Several steps are involved in computing the variance and
standard deviation. A table will be helpful (see Table).
Diameters of Rose Blossoms (in inches)
Since n = 6, we take the sum of the entries in column I of
Table and divide by 6 to find the mean x.
26
Example – Solution
Using this value for x, we obtain Column II. Square each
value in column II to obtain Column III, and then add the
values in Column III.
To get the sample variance, divide the sum of Column III by
n – 1. Since n = 6, n – 1 = 5.
27
Example – Solution
Now obtain the sample standard deviation by taking the
square root of the variance.
28
Variance
29
Variance
30
Variance
31
Variance and Standard Deviation
In most applications of statistics, we work with a random
sample of data rather than the entire population of all
possible data values.
32
Variance and Standard Deviation
However, if we have data for the entire population, we can
compute the population mean , population variance 2,
and population standard deviation (lowercase Greek
letter sigma) using the following formulas:
33
Variance and Standard Deviation
We note that the formula for is the same as the formula
for x (the sample mean) and that the formulas for 2 and
are the same as those for s 2 and s (sample variance and
sample standard deviation), except that the population size
N is used instead of n – 1.
Also, is used instead of x in the formulas for 2 and .
In the formulas for s and , we use n – 1 to compute s and
N to compute . Why?
The reason is that N (capital letter) represents the
population size, whereas n (lowercase letter) represents
the sample size. 34
Variance and Standard Deviation
Since a random sample usually will not contain extreme
data values (large or small), we divide by n – 1 in the
formula for s to make s a little larger than it would have
been had we divided by n.
Courses in advanced theoretical statistics show that this
procedure will give us the best possible estimate for the
standard deviation .
In fact, s is called the unbiased estimate for . If we have
the population of all data values, then extreme data values
are, of course, present, so we divide by N instead of N – 1.
35
Variance and Standard Deviation
Comment
The computation formula for the population standard
deviation is
36
Standard Deviation
37
Standard Deviation
Example:
For a group of 50 male workers, the mean and standard
deviation of their monthly wages are Rs. 6300 and Rs. 900
respectively. For a group of 40 female workers, these are
Rs. 5400 and Rs. 600, respectively. Find the standard
deviation of monthly wages for the combined group of
workers.
38
Standard Deviation
39
Variance / Standard Deviation
For grouped data:
40
Variance
For grouped data:
41
Standard Deviation
Calculate standard deviation from the following data:
Marks: 10 20 30 40 50 60 70
No. of students: 6 5 12 3 5 4 5
42
Standard Deviation
10 6 60 -27 729 4374
20 5 100 -17 289 1445
30 12 360 -7 49 588
40 3 120 3 9 27
50 5 250 13 169 845
60 4 240 23 529 2116
70 5 350 33 1089 5445
Sum 40 1480 14840
43
Standard Deviation
Compute the standard deviation from the following data.
Expenditure (Rs): 50–100 100–150 150–200 200–250 250–300
No. of families: 20 10 30 5 10
44
Standard Deviation
45
Example – Sample Standard Deviation (Defining Formula)
Ex:
The mean of 5 observations is 15 and the variance is 9. If
two more observations having values – 3 and 10 are
combined with these 5 observations, what will be the new
mean and variance of 7 observations.
46
Standard Deviation
47
Standard deviation
Ex: A study of the age of 100 persons grouped into
intervals 20–22, 22–24, 24–26,..., revealed the mean age
and standard deviation to be 32.02 and 13.18 respectively.
While checking it was discovered that the observation 57
was misread as 27. Calculate the correct mean age and
standard deviation.
48
Standard deviation
49
Coefficient of Variation
50
Coefficient of Variation
A disadvantage of the standard deviation as a comparative
measure of variation is that it depends on the units of
measurement.
This means that it is difficult to use the standard deviation
to compare measurements from different populations.
For this reason, statisticians have defined the coefficient of
variation, which expresses the standard deviation as a
percentage of the sample or population mean.
51
Coefficient of Variation
Notice that the numerator and denominator in the definition
of CV have the same units, so CV itself has no units of
measurement.
52
Coefficient of Variation
This gives us the advantage of being able to directly
compare the variability of two different populations using
the coefficient of variation.
The set of data for which the coefficient of variation is low is
said to be more uniform (consistent) or more homogeneous
(stable).
53
Coefficient of Variation
54
Example – Coefficient of Variation
The Trading Post on Grand Mesa is a small, family-run
store in a remote part of Colorado. The Grand Mesa region
contains many good fishing lakes, so the Trading Post sells
spinners (a type of fishing lure).
The store has a very limited selection of spinners. In fact,
the Trading Post has only eight different types of spinners
for sale. The prices (in dollars) are
2.10 1.95 2.60 2.00 1.85 2.25 2.15 2.25
Since the Trading Post has only eight different kinds of
spinners for sale, we consider the eight data values to be
the population. 55
Example – Coefficient of Variation
(a) Use a calculator with appropriate statistics keys to verify
that for the Trading Post data, and $2.14 and
$0.22.
Solution:
Since the computation formulas for x and are identical,
most calculators provide the value of x only.
Use the output of this key for . The computation formulas
for the sample standard deviation s and the population
standard deviation are slightly different.
Be sure that you use the key for (sometimes designated
as n or x). 56
Example – Coefficient of Variation
(b) Compute the CV of prices for the Trading Post and
comment on the meaning of the result.
Solution:
57
Example – Solution
Interpretation
The coefficient of variation can be thought of as a measure
of the spread of the data relative to the average of the data.
Since the Trading Post is very small, it carries a small
selection of spinners that are all priced similarly.
The CV tells us that the standard deviation of the spinner
prices is only 10.28% of the mean.
58
Example
Ex: The weekly sales of two products A and B were
recorded as given below:
Product A : 59 75 27 63 27 28 56
Product B : 150 200 125 310 330 250 225
Find out which of the two shows greater fluctuation in
sales.
Solution: For comparing the fluctuation in sales of two
products, we will prefer to calculate coefficient of
variation for both the products.
59
Example
Product A: Let A = 56 be the assumed mean of sales for
product A
60
Example
Product B: Let A = 225 be the assumed mean of sales for product B
61
Example
62
Example
Since the coefficient variation for product A is more than that of
product B, the sales fluctuation in case of product A is higher.
63
Example
From the analysis of monthly wages paid to employees in two service
organizations X and Y, the following results were obtained:
(a) Which organization pays a larger amount as monthly wages?
(b) In which organization X or Y, is there greater variability in individual
wages?
(c) What are the measures of (i) average monthly wages and (ii)
standard deviation in the distribution of individual wages of all workers
in two organizations taken together?
Solution: (a) For finding out which organization X or Y pays larger
amount of monthly wages, we have to compare the total wages:
64
Example
(a)
Organization Y pays a larger amount as monthly wages as
compared to organization X
(b)
Since CV for X is greater than CV for Y, organization B has greater
variability in individual wages.
65
Example
(c)
66
Example – Solution
From the analysis of monthly wages paid to workers
in two organizations X and Y, the following results
were obtained:
Obtain the average wages and the variability in individual
wages of all the workers in the two organizations taken
together.
Solution:
67
Example
The number of employees, average daily wages per
employee, and the variance of daily wages per employee
for two factories are given below:
(a) In which factory is there greater variation in the distribution of daily
wages per employee?
(b) Suppose in Factory B the wages of an employee were wrongly
noted as Rs. 120 instead of Rs. 100. What would be the correct
variance for Factory B?
68
Example
69
Example
32 trials of a process to finish a certain job revealed the
following information:
Mean time taken to complete the job = 80 minutes
Standard deviation = 16 minutes
Another set of 8 trials gave mean time as 100 minutes and
standard deviation equal to 25 minutes.
Find the combined mean and standard deviation.
Solution:
70
Example
An analysis of production rejects resulted in the following
observations
Calculate the mean and standard deviation.
Solution:
71
Example
From the analysis of monthly wages paid to employees in
two service organizations X and Y, the following results
were obtained:
(a) Which organization pays a larger amount as monthly
wages?
(b) In which organization is there greater variability in
individual wages of all the wage earners taken together?
72
Example
73
Example
From the analysis of monthly wages paid to workers in two
organizations X and Y, the following results were obtained:
X Y
Number of wage-earners : 550 600
Average monthly wages (Rs.): 1260 1348.5
Variance of distribution of
wages (Rs.): 100 841
Obtain the average wages and the variability in individual
wages of all the workers in the two organizations taken
together.
74
Example
The following set of data is from a sample of
7 4 9 8 2
a. Compute the mean, median, and mode.
b. Compute the range, variance, standard deviation, and
coefficient of variation.
The following set of data is from a sample of
7 -5 -8 7 9
a. Compute the mean, median, and mode.
b. Compute the range, variance, standard deviation, and
coefficient of variation.
75