[go: up one dir, main page]

0% found this document useful (0 votes)
7 views28 pages

Chapter 1 Notes

Uploaded by

chaopuishan127
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views28 pages

Chapter 1 Notes

Uploaded by

chaopuishan127
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Associate Degree 2020 – 2021 First Semester

CCMA4001 Quantitative Analysis I

Chapter 1
Summarizing and Describing Data

1.1 Summarizing Data

In order to visualize the distribution of a set of raw data, we ought to compile the data into a
more comprehensible form, making use of tables and graphs.

A. Frequency Tables

Given a set of raw data we usually arrange it into a frequency distribution where we collect
‘like’ quantities and display them by writing down how many of each type there are to form
a frequency table.

2-1
Example 1

In a multiple-choice test with 10 questions, the numbers of correct answers of 40 students are as follows.

10 4 9 6 7 4 8 7 7 5
5 8 7 9 10 6 5 9 7 6
4 7 5 6 7 9 5 8 8 4
8 7 7 5 5 4 8 6 6 6

Construct a frequency table for these data.

Solution:

Number of correct answers Tally Frequency


4 //// 5
5 //// // 7
6 //// // 7
7 //// //// 9
8 //// / 6
9 //// 4
10 // 2
Total: 40

2-2
B. Bar Chart

Example 2

Using the frequency table constructed in Example 1, draw a bar chart for the distribution of the
number of correct answers of 40 students in the multiple-choice test.

Solution:

Distribution of the number of correct answers of 40 students


in the multiple-choice test

10
Frequency (Number of students)

0
4 5 6 7 8 9 10
Number of correct answers

2-3
C. Stem-and-leaf diagrams

A very useful graphical representation of a frequency distribution is the stem-and-leaf diagram


(or stemplot) .

The stem-and-leaf diagram involves a combination of a graphical technique and a sorting technique.
By sorting it means listing the data in rank order according to numerical value.
The data values themselves are use to do this sorting.
The “stem” is the leading digit(s) of the data, while the “leaf” is the trailing digit.

For example, the numerical data 386 might be split 38 – 6 as shown:

Leading digits Trailing digit


38 6
(Used in sorting) (Shown in display)

A stem-and-leaf diagram is a method of presenting a data set so that gaps or concentrations in the
data become visible.

2-4
Example 3

Suppose that a class of 40 students obtained the following results in a Mathematics test.

61 80 55 70 76 73 100 90 64 62
75 64 62 66 46 61 67 39 58 63
63 64 51 40 66 43 38 37 28 71
70 49 48 68 86 27 69 74 37 56

Construct a stem-and-leaf diagram for these data.

Solution:

Stem Leaf
(Tens) (Units)
2 78
3 7789
4 03689
5 1568
6 11223344466789
7 0013456
8 06
9 0
10 0

2-5
Advantages of a stem-and-leaf diagram

1. It is easy to construct. In fact, it is no more difficult to construct than a frequency table.

2. It is actually partly a table and partly a graph and so it immediately and directly gives a good
picture of the frequency distribution without having to prepare a frequency table first and then
construct charts afterwards.

3. Since the actual data are recorded in the diagram, it retains the information about the original data,
and the information may be recovered readily.

In a frequency table or histogram, data are represented by tallies or areas of rectangles in class intervals
and so some information about the original data is lost and cannot be recovered.
For example, the reading 64 is recorded in its entirety in a stem-and-leaf diagram, but is represented
only by a count of 1 in the class interval (e.g. 60 – 64) in a frequency table or histogram.

4. It can be regarded as the original set of data arranged in ascending order of magnitude.
Hence it can be readily used for finding quartiles.

Disadvantages of a stem-and-leaf diagram

1. For some type of data, the number of stems that can be chosen is either very small or very large,
thus making the diagram inconvenient to construct and unable to show the distribution effectively.

2. It is not quite suitable for large sets of data.

Actually, for a large set of data, the purpose of graphical representation is to give a good overall
picture of the distribution rather than to show the details of the data.
A bar chart or a histogram is more suitable in this case.

2-6
Example 4

A fishery expert found the following concentrations of mercury, in parts per million, in thirty fish caught
in a certain stream.

0.024 0.031 0.052 0.024 0.024 0.030 0.056 0.034 0.059 0.068
0.035 0.021 0.052 0.023 0.054 0.028 0.037 0.034 0.048 0.040
0.022 0.049 0.043 0.034 0.032 0.021 0.040 0.032 0.021 0.039

Construct a double-stem diagram for these data.

Solution:

Stem Leaf
(Unit = 0.01) (Unit = 0.001)
2 11123444
2 8
3 0122444
3 579
4 003
4 89
5 224
5 69
6
6 8

In the above diagram, the units of the stems and leaves have been chosen to make the recorded digits simple.
This is an important feature of a stem-and-leaf diagram.

2-7
1.2 Statistical Descriptions

In statistics, there are two useful types of measure which characterize any set of data or
frequency distribution.

The first type, a measure of ‘centralization’, attempts to locate a typical value about which the
distribution clusters. This type of measure is called an average or measure of central tendency
or measure of location.

The second type is a measure of how scattered or spread out a distribution is and is called
a measure of dispersion.

In the figures shown,


(a) shows two distributions with different measures of central tendency but roughly the same spread,
(b) illustrates two distributions with the same measure of central tendency but different spreads.

(a) (b)

2-8
I. Measures of Central Tendency

The most common measures of central tendency or average are the mean, the median and the mode.

A. Mean (Arithmetic Mean)

Given the complete set of N data {x 1 , x 2 ,!, x N } in a population, the mean µ , is defined as

1 1 N
µ= (x1 + x 2 + ! + x N ) or µ= ∑ xi
N N i =1

The mean is usually denoted by Greek letter µ (pronounced as mu).

If the set of n data {x p1 , x p 2 ,!, x p n } , where the p i ’s are a set of integers selected from 1 to N,

is a sample of size n drawn from a population, then the sample mean is defined similarly,
but is denoted by x (read as x bar). Thus

1 1 n
x = ( x p1 + x p 2 + ! + x p n ) or x = ∑ x pi
n n i =1

The notation x p i for the elements of the sample may be a bit difficult for beginners.

Hence, when no misunderstanding arises, we shall denote the sample of size n simply as
{x1 , x 2 ,!, x n }

Bearing in mind that the element x i in the sample is, in general, not the same element x i in
the population.

1 1 n
With this understanding, the sample mean is x = (x1 + x 2 + ! + x n ) or x = ∑ xi
n n i =1

2-9
Example 4

Suppose that a class of 40 students obtained the following results in a Mathematics test.

61 80 55 70 76 73 100 90 64 62
75 64 62 66 46 61 67 39 58 63
63 64 51 40 66 43 38 37 28 71
70 49 48 68 86 27 69 74 37 56

(a) Find the mean of the population of Mathematics test marks.


(b) The following two samples each have been drawn randomly from the population of Mathematics
test marks.
S1 = {70, 43, 28, 69, 75, 90}
S 2 = {68, 62, 48, 39, 38, 55, 66, 71, 37, 76}
Find the means of these samples.
(c) Find the mean of the sample S3 formed by combining the samples S1 and S 2 .

Solution:

(a) The population mean is


1
µ= (61 + 80 + 55 + 70 + 76 + 73 + 100 + 90 + 64 + 62 + 75 + 64 + 62 + 66 + 46 + 61 + 67 + 39 + 58 + 63
40
+ 63 + 64 + 51 + 40 + 66 + 43 + 38 + 37 + 28 + 71 + 70 + 49 + 48 + 68 + 86 + 27 + 69 + 74 + 37 + 56)

= 60.425

(b) The sample mean of S1 is


1
x 1 = (70 + 43 + 28 + 69 + 75 + 90) = 62.5
6

The sample mean of S 2 is


1
x2 = (68 + 62 + 48 + 39 + 38 + 55 + 66 + 71 + 37 + 76) = 56
10

Note that a population mean is a unique value, but the sample mean varies from sample to sample.

(c) The sample mean of S3 is


1
x3 = (70 + 43 + 28 + 69 + 75 + 90 + 68 + 62 + 48 + 39 + 38 + 55 + 66 + 71 + 37 + 76) = 58.4375
16
62.5 × 6 + 56 × 10
or x3 = = 58.4375
6 + 10

2-10
B. Median

The median is a measure of position. It is the middle value in an ordered sequence of data.

To find the median from a set of data collected in its raw form, we must first arrange the data
in rank order, from the smallest to the largest observation. Such an ordered sequence of data is
called an ordered array.

For a set of discrete data x 1 , x 2 , …, x n arranged in ascending order,

(i) if n is odd, x n +1 is the median, the median is the value of the datum that is in the middle.
2

1⎛ ⎞
(ii) if n is even, the median is ⎜ x n + x n ⎟ , the median is the mean of the two data that are
2 ⎜⎝ 2 +1 ⎟
2 ⎠

nearest to the middle.

Example 5

(a) Find the median of the set of data {12, 8, 13, 16, 5}.
(b) Find the median of the set of data {25, 25, 37, 26, 25, 12, 75, 75}.

Solution:

(a) Arrange the set of five data in ascending order 5, 8, 12, 13, 16, the median is x 5+1 = x 3 = 12
2

(b) Arrange the set of eight data in ascending order 12, 25, 25, 25, 26, 37, 75, 75,

1⎛ ⎞ 1 1
the median is ⎜ x 8 + x 8 ⎟ = ( x 4 + x 5 ) = (25 + 26) = 25.5

2⎝ 2 +1 ⎟ 2 2
2 ⎠

2-11
C. Mode

The mode of a set of data is the value that occurs with the highest frequency.
In this sense it is “most typical” of a set of data

For example, for the data 1, 1, 2, 2, 2, 3, 3, 4, 5, 5, 6, the mode is 2.

A distribution with one mode is called a unimodal distribution, while those with two modes are
bimodal, and with three or more are multimodal.

The two main advantages of mode are that it requires no calculations, only counting, and that
it can be determined for qualitative as well as quantitative data.
However, if all values are different in the set of data, certainly, the mode is useless in such a situation.

Example 6

Suppose that 50 children are asked which of the six brands of soft drink they prefer most
and the following results are obtained.

Brand A B C D E F
Number of children 4 15 5 8 3 15

Find the mode of these data.

Solution:

There are two modes in this set, namely, B and F.


This set is said to be bimodal.

2-12
II. Measures of Dispersion

The measures of central tendency can provide only brief information on a set of data.
Obviously, for a set of data, the averages alone cannot tell us how spread out or dispersed the data are.
We need some measures of dispersion, a numerical value indicating the amount of scatter about
a central point.

Widely dispersed data are also highly variable data. Hence measures of dispersion are also called
measures of variability.

The most common measures of dispersion in statistics are the range, the inter-quartile range,
the variance and the standard deviation.

2-13
A. Range

The range of a set of data is the difference between the largest value and the smallest value of the set.

In general, the greater the range, the greater the dispersion of the set of data.

Example 7

Find the range of scores of athlete A and B in Example 11

Solution:

The range of scores of athlete A = 9.5 – 6.0 = 3.5


The range of scores of athlete B = 8.0 – 7.0 = 1.0

Since the range of score of athlete A is greater than that of athlete B, we say that the scores of
athlete A is more dispersed than those of athlete B.

2-14
B. Inter-quartile range

With the set of data arranged in ascending order, the median is the value which divides the set of
data into two equal parts.
Similarly, if we divide the set of data into four equal parts, the corresponding values, denoted by
Q1 , Q 2 , Q 3 are called the first, second and third quartiles respectively.
And Q 2 is just the median of the distribution.

The inter-quartile range (IQR) of a set of data is defined as Q 3 − Q1 ,


it measures approximately how far from the median we must go on either side before we can
include one-half of the values of the data set.

In dividing the set of data into 100 equal parts, the values are called percentiles and
are denoted by P1 , P2 , …, P99 .
The 50 th percentile, P50 , corresponds to the median,
whereas P25 and P75 corresponds to Q1 and Q 3 respectively.

The p th percentile of a data set is a value such that at least p percent of the items take on this value or less
and at least (100 – p) percent of the items take on this value or more.

Q1 is the first quartile (or lower quartile) where 25% of the data lie below it;
Q 2 is the second quartile (or middle quartile or median) where 50% of the data lie below it; and
Q 3 is the third quartile (or upper quartile) where 75% of the data lie below it.

To find the p th percentile, first arrange the set of discrete data x 1 , x 2 , …, x n in ascending order,
then compute index i, where
p
i= ×n
100
to find the position of the p th percentile.

If i is not an integer, round up to the nearest integer. The p th percentile is the value in the i th position.
If i is an integer, the p th percentile is the average of the values in positions i and i + 1.

2-15
Example 8

(a) Find the inter-quartile range of the data set A {14, 23, 16, 18, 15, 44, 19}.
(b) Find the inter-quartile range of the data set B {10, 15, 40, 28, 34, 18, 24, 30}.
(c) By comparing the inter-quartile range of the data sets A and B, which set has a greater dispersion?

Solution:

(a) Arrange the seven data of the data set A in ascending order 14, 15, 16, 18, 19, 23, 44.
25
For the 25 th percentile, the index i = × 7 = 1.75 = 2 (round up to the nearest integer),
100
hence Q1 = x 2 = 15
75
For the 75 th percentile, the index i = × 7 = 5.25 = 6 (round up to the nearest integer),
100
hence Q 3 = x 6 = 23
The inter-quartile range = Q 3 − Q1 = 23 – 15 = 8

(b) Arrange the eight data of the data set B in ascending order 10, 15, 18, 24, 28, 30, 34, 40.
25
For the 25 th percentile, the index i = × 8 = 2,
100
1 1
hence Q1 = ( x 2 + x 3 ) = (15 + 18) = 16.5
2 2
75
For the 75 th percentile, the index i = × 8 = 6,
100
1 1
hence Q 3 = ( x 6 + x 7 ) = (30 + 34) = 32
2 2
The inter-quartile range = Q 3 − Q1 = 32 – 16.5 = 15.5

(c) The range of both data sets A and B are 30.


However, the inter-quartile range of data set A is less than the inter-quartile range of data set B,
data set B has a greater dispersion.

The range considers the difference between the maximum and minimum values of a set of data.
The inter-quartile range considers the range of 50% of the data in the middle and thus avoids the
impact of extreme values.

Therefore if there are extreme values in a set of data, the inter-quartile range is a better measure of
dispersion than the range.

Moreover, the inter-quartile range exists even if the set of data has open ends.
2-16
Box-and-Whisker Diagram

The median, the lower quartile and the upper quartile together with the maximum and the minimum
values provide a good description of a set of data as they indicate some of the most important
characteristics of the set. These five key descriptive statistical measures are often called the
five-number summary of the set of data. A graphical display of these measures, called a
box-and-whisker diagram or a box plot, gives an even better visual impression of the set.

middle
$!!! 50
#%!of!data
!"
lower upper
25% of data 25% of data
$!!#!!" $!!#!!"
_____________ _____________

Minimum Q1 Q2 Q3 Maximum
(median)

IQR

Range

A box-and-whisker diagram consists of a rectangular box drawn with its length parallel to the x-axis
and with its ends marking the position of the lower and the upper quartiles. An orange bar is then
inserted in the box to mark the median. The two extreme values, the minimum and the maximum
values of the data, are linked to the box by lines, called whiskers, parallel to the x-axis.

A glance at the diagram then gives us good information about the central tendency, dispersion and
extreme values of the set.
(1) The bar at the median shows the location of the centre of the data.
(2) The length of the box is equal to the inter-quartile range shows the dispersion of 50% of the data
in the middle, a measure of dispersion.
(3) The lengths of the whiskers show the dispersion of the data below the lower quartile and
above the upper quartile, describe the behavior at the ends or tails of the distribution.
(4) The shape of the diagram gives us a quick impression on the degree of symmetry of the data
distribution about the median.

It is easy to use box-and-whisker diagrams to compare the features, such as location of centre,
dispersion and symmetry of different sets of data. However, a box-and-whisker diagram does not
reveal the total frequency of each set of data, nor the frequency of the data for any specific range.
If such information is required, a stem-and-leaf diagram, bar chart or histogram can be used.
2-17
Box-and-whisker diagrams are particularly useful for comparing the central tendency and
the dispersion of two or more sets of data.

Example 9

The following box-and-whisker diagrams show the distributions of marks of Chinese, English and
Mathematics test.

(a) Which test has the marks with the largest inter-quartile range?
(b) Which test has the marks with the smallest range?
(c) Which test has the highest median mark?
(d) If Mary gets 70 marks in all three tests, in which test does she perform the best?
Briefly explain your answer.

Solution:

(a) Since the length of the box of Mathematics test is the largest, Mathematics test has the marks with
the largest inter-quartile range.

(b) Since the distance between two ends of the whiskers of Chinese test is the shortest.
Chinese test has the marks with the smallest range.

(c) Since the orange bar in the box of Mathematics test is at the rightmost position, the median mark
of Mathematics test is the highest.

(d) Since from the box-and-whisker diagram above, the mark of Mary’s English test is in the top
25% of the class while her marks in Mathematics and Chinese tests are not.
Mary performs the best in English test.

2-18
Skewness of Distributions

A distribution can have many different shapes. It may be symmetric or skewed.

A distribution is symmetric if the parts above and below its center are mirror images.
If Q 2 − Q1 = Q 3 − Q 2 , the distribution is symmetric.

Min Q1 Q2 Q3 Max

A distribution is skewed to the right if the right side is longer, while it is skewed to the left if the left
side is longer.

For a positively skewed or right-skewed distribution, an asymmetric distribution with a “tail” on the right
indicates the presence of extreme values at the positive end of the distribution.
A distribution is positively skewed if Q 2 − Q1 < Q 3 − Q 2

Long tail to the right

Min Q1 Q 2 Q3 Max

For a negatively skewed or left-skewed distribution, an asymmetric distribution with a “tail” on the left.
A distribution is negatively skewed if Q 2 − Q1 > Q 3 − Q 2

Long tail to the left

Min Q1 Q 2 Q 3 Max

2-19
Example 10

Using the stem-and-leaf diagram constructed in Example 5 for the distribution of results of the class
of 40 students in the Mathematics test.

Stem Leaf
(Tens) (Units)
2 78
3 7789
4 03689
5 1568
6 11223344466789
7 0013456
8 06
9 0
10 0

(a) Find the median, the first and the third quartiles.
(b) Construct the box-and-whisker diagram.
(c) Use the quartiles to comment on the skewness of the distribution.

Solution:

1⎛ ⎞ 1 1
(a) The median is ⎜ x 40 + x 40 ⎟ = ( x 20 + x 21 ) = (63 + 63) = 63
2 ⎜⎝ 2 2
+1 ⎟
⎠ 2 2
25 1 1
For the 25 th percentile, the index i = × 40 = 10 , hence Q1 = ( x 10 + x 11 ) = (48 + 49) = 48.5
100 2 2
75 1 1
For the 75 th percentile, the index i = × 40 = 30 , hence Q 3 = ( x 30 + x 31 ) = (70 + 70) = 70
100 2 2

(b)
63

27 48.5 70 100

25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

(c) Q2 − Q1 = 63 − 48.5 = 14.5 and Q 3 − Q 2 = 70 − 63 = 7


Since Q 2 − Q1 > Q 3 − Q 2 , the distribution is negatively skewed (left-skewed).

2-20
Example 11

The table below gives the monthly salaries in dollars of 25 employees of a certain department.

7800 11900 12700 10400 20200


6200 7300 9200 15500 17900
9700 9500 10500 13300 10200
9900 14200 8900 8700 16600
7400 6600 9600 6100 8200

(a) Construct a stem-and-leaf diagram for the data.


(b) Find the mean.
(c) Find the median, the first and the third quartiles and the inter-quartile range.
(d) Construct the box-and-whisker diagram.
(e) Use the quartiles to comment on the skewness of the distribution.

Solution:

(a)
Stem Leaf
(Unit = $1000) (Unit = $100)
6 126
7 348
8 279
9 25679
10 245
11 9
12 7
13 3
14 2
15 5
16 6
17 9
18
19
20 2

2-21
1
(b) The mean = (7800 + 11900 + 12700 + 10400 + 20200 + 6200 + 7300 + 9200 + 15500 + 17900
25
+ 9700 + 9500 + 10500 + 13300 + 10200 + 9900 + 14200 + 8900 + 8700 + 16600
+ 7400 + 6600 + 9600 + 6100 + 8200)
= 10740

(c) Making use of the stem-and-leaf diagram for the distribution of the salaries (with a column of
cumulative frequencies added to help locating the quartiles),

The median is x 25+1 = x 13 = 9700


2

25
For the 25 th percentile, the index i = × 25 = 6.25 = 7 (round up to the nearest integer),
100
hence Q1 = x 7 = 8200 .
75
For the 75 th percentile, the index i = × 25 = 18.75 = 19 (round up to the nearest integer),
100
hence Q 3 = x 19 = 12700 .

The inter-quartile range = Q 3 − Q1 = 12700 – 8200 = 4500

(d)

9700

6100 8200 12700 20200

6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 21000

(e) Q2 − Q1 = 9700 − 8200 = 1500 and Q 3 − Q 2 = 12700 − 9700 = 3000


Since Q 2 − Q1 < Q 3 − Q 2 , the distribution is positively skewed (right-skewed).

2-22
C. Variance and Standard Deviation

Although the inter-quartile range is an improved measure of dispersion compared with the range,
still it does not make use of the actual values of all the data in the set, therefore, cannot completely
reflect the dispersion of the data. A measure of dispersion which does take into account the
dispersion of all the values is the variance and standard deviation.

To overcome the limitations of range and inter-quartile range mentioned above, we can find the
distance of each datum from the centre of a group of data. The greater the average distance of all
data from the centre, the wider the dispersion of a set of data is.

If the set of N data {x 1 , x 2 ,!, x N } represents a population with mean µ , then the variance of the
set of data is defined as the mean of the squares of the deviations of individual values from the
population mean, and is commonly denoted by σ 2 . Thus, population variance

1 N 1
σ2 = ∑ ( x i − µ) 2 = [(x 1 − µ) 2 + ( x 2 − µ) 2 + ! + ( x N − µ) 2 ]
N i =1 N

Large variances indicate large dispersion and small variance indicate small dispersion.

However, the variance defined above does not have the same unit as the original values of x.
To have a measure of dispersion with the same unit as the original data, we take the positive square
root of the variance. The resulting measure is called the standard deviation of the set of data. Thus,

1 N 1
Population standard deviation σ = ∑ ( x i − µ) 2 = [(x 1 − µ) 2 + ( x 2 − µ) 2 + ! + ( x N − µ) 2 ]
N i =1 N

If the set of n data {x1 , x 2 ,!, x n } is a sample of size n drawn from a population and with mean x ,

the sample variance, s 2 , is defined as

1 n 1
s2 = ∑ (x i − x) 2 = [(x 1 − x ) 2 + ( x 2 − x ) 2 + ! + ( x n − x ) 2 ]
n − 1 i =1 n −1

The sample standard deviation, s, is the positive square root of the sample variance.

1 n 1
s= ∑ (x i − x) 2 = [(x 1 − x ) 2 + ( x 2 − x ) 2 + ! + ( x n − x ) 2 ]
n − 1 i =1 n −1

2-23
Note that the differences between sample variance s 2 and population variance σ 2 are
the sample mean x is used instead of the population mean µ , and the divisor is n – 1 instead of N.

Standard deviation can give us an idea about how close all the data are from their mean, and thus
we can learn about the consistency of the set of data.
The smaller the standard deviation, the less dispersed the set of data is.
In other words, the distribution of data in the set is more consistent.

2-24
Example 12

The temperatures (in o C ) of water in seven beakers are: 30, 32, 33, 28, 31, 29, 34.
(a) Find the mean of the temperatures of the water.
(b) Find the population standard deviation of the temperatures of the water.

Solution:

(a) The mean of the temperatures of the water is


1 7 1
µ= ∑ x i = (30 + 32 + 33 + 28 + 31 + 29 + 34) = 31
7 i =1 7

(b) The variance of the temperatures of the water is


1 7
σ2 = ∑ ( x i − µ) 2
7 i =1
1
= [(30 − 31) 2 + (32 − 31) 2 + (33 − 31) 2 + (28 − 31) 2 + (31 − 31) 2 + (29 − 31) 2 + (34 − 31) 2 ]
7
1
= [(−1) 2 + 12 + 2 2 + (−3) 2 + 0 2 + (−2) 2 + 3 2 ]
7
1
= (1 + 1 + 4 + 9 + 0 + 4 + 9)
7
=4

Therefore, the population standard deviation of the temperatures of the water is σ = 4 = 2

2-25
Example 13

(a) Find the variance and standard deviation of the population of Mathematics test marks in Example 7
with the population mean 60.425.
(b) If the passing mark is one population standard deviation less than the mean, find the number of
students failed in the Mathematics test.
(c) The sample S 2 = {68, 62, 48, 39, 38, 55, 66, 71, 37, 76} has been drawn from the population of
Mathematics test marks in Example 7. The sample mean was found to be 56.
Find the sample variance and sample standard deviation.

Solution:

(a) The variance is


1 40 2
σ2 = ∑ xi − µ2
40 i =1
1
= (612 + 80 2 + 55 2 + 70 2 + 76 2 + 73 2 + 100 2 + 90 2 + 64 2 + 62 2 + 75 2 + 64 2 + 62 2 + 66 2 + 46 2
40
+ 612 + 67 2 + 39 2 + 58 2 + 63 2 + 63 2 + 64 2 + 512 + 40 2 + 66 2 + 43 2 + 38 2 + 37 2 + 28 2 + 712
+ 70 2 + 49 2 + 48 2 + 68 2 + 86 2 + 27 2 + 69 2 + 74 2 + 37 2 + 56 2 ) − 60.4252
1
= (156497) − (60.425) 2
40
= 261.2444

Therefore the standard deviation is σ = 261.2444 = 16.1631

(b) The passing mark = 60.425 – 16.1631 = 44.2619


There are eight students with marks less than 44, so eight students failed in the Mathematics test.

(c) The sample variance is


1 ⎛ 10 2 2⎞
s2 = ⎜ ∑ x i − 10 x ⎟
10 − 1 ⎝ i =1 ⎠
1
= [(68 2 + 62 2 + 48 2 + 39 2 + 38 2 + 55 2 + 66 2 + 712 + 37 2 + 76 2 ) − 10 × 56 2 ]
9
1
= (33304 − 31360)
9
= 216

And the sample standard deviation is s = 216 = 14.70

2-26
Use Scientific Calculator to find mean and standard deviation

Use the calculator to find the mean and standard deviation of the data set
{1, 2, 5, 6, 8, 9, 10, 12, 14, 18}

2-27
Use Scientific Calculator to find mean and standard deviation

Use the calculator to find the mean and standard deviation of the data set
{1, 2, 5, 6, 8, 9, 10, 12, 14, 18}

2-28

You might also like