The Box Model
Sampling Data | Chance Variability
© University of Sydney MATH1062/MATH1005
12 September 2024
Course Overview
Population
3 Sampling Data 4 Decisions with Data
1 Exploring Data Sample 2 Modelling Data
2/47
Module3 Sampling Data
Understanding probability
What is probability?
Counting and chance simulation
How to count the number of possible outcomes?
Chance variability
How can we model chance variability by a box model?
Central limit theorem
What is the behaviour of the sample mean for a large sample size?
3/47
Today’s outline
The box model
Random draws, sample sums and sample means
Expected value and Standard error
4/47
The box model
Statistical models
· A model is a representation of something which
- is simpler but at the same time
- captures the key features of the original.
· Data obtained in real life is generated by complicated processes.
· Statistical models are models for data-generating processes:
- they are much simpler than the real data-generating process but
- (hopefully) they capture the signal or key features within the data.
6/47
The box model
The box model is a very simple statistical model for representing a population. The
box model can be thought of as:
· A collection of 𝑁 objects, e.g. tickets, balls is imagined “in a box”.
· Each object bears a number.
· A random sample of a certain number 𝑛 of the objects is taken.
· The sampling may be with or without replacement.
7/47
Random samples and random draws
· Consider all possible ways of selecting 𝑛 objects from the box. A random sample
is when each possible of these selection is equally likely.
· A random draw is a random sample with 𝑛 = 1 .
- If a single draw is taken, then each object in the box has an equal chance of
being picked.
· If we completely know the contents of the box, we can write down the chance of
each possible value.
· We let 𝑋 denote the random draw:
- this represents the “value we might get”
- 𝑋 can take different values with different probabilities/chances.
· The distribution of 𝑋 is a table with two “columns”:
- each possible value 𝑥 that 𝑋 can take and
- the corresponding probability/chance of that value.
8/47
Simple example
· For example, suppose 𝑋 is a random draw from the following simple box:
1 2 3
· There are then three possible tickets: 1 , 2 and 3 and each has equal chance
of 13 of being picked, so:
1
𝑃(𝑋 = 1) = 𝑃(𝑋 = 2) = 𝑃(𝑋 = 3) = .
3
Here we write 𝑃(⋅) to denote the probability of each event.
· The distribution of 𝑋 :
- 𝑥 1 2 3
1 1 1
𝑃 (𝑋 = 𝑥) 3 3 3
9/47
Non-equal chance example
· We can have box models where the different possible values are not necessarily
equally likely.
· For the box
1 2 2 3 3 3
if each “ticket” is equally likely, we have
1 2 1 3 1
𝑃(𝑋 = 1) = , 𝑃(𝑋 = 2) = = , 𝑃(𝑋 = 3) = = .
6 6 3 6 2
· The distribution of 𝑋 :
- 𝑥 1 2 3
1 1 1
𝑃 (𝑋 = 𝑥) 6 3 2
10/47
Larger box example
Consider the box defined by the file y.dat in the R code below:
box = scan("y.dat")
box
## [1] 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12
## [31] 8 9 10 11 12 13 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12
## [61] 8 9 10 11 12 13 9 10 11 12 13 14 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12
## [91] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 6 7 8 9 10 11 7 8 9 10 11 12
## [121] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 7 8 9 10 11 12
## [151] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 12 13 14 15 16 17
## [181] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 12 13 14 15 16 17
## [211] 13 14 15 16 17 18
table(box) # note: first two rows below are only labels: the 'real' output is the third line
## box
## 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 1 3 6 10 15 21 25 27 27 25 21 15 10 6 3 1
11/47
Find the probability 𝑃 (𝑋 < 8)
sum(table(box)) # gives total freq, i.e. size of the box
## [1] 216
length(box) == sum(table(box))
## [1] TRUE
head(box < 8) # reports the first 6 values of 'box<8'
## [1] TRUE TRUE TRUE TRUE TRUE FALSE
sum(box < 8) # reports the total numer of TRUE values in 'box<8'
## [1] 35
12/47
Find the proportion less than 8
sum(box < 8)/length(box)
## [1] 0.162037
mean(box < 8) # mean of TRUEs in 'box<8'
## [1] 0.162037
· The chance of drawing a value less than 8 is 35 ≈ 16% .
216
· Note: 35 = 1 + 3 + 6 + 10 + 15 (the frequencies of 3, 4, 5, 6 and 7
respectively).
13/47
Expected value and standard error
· In some situations, we may not know the exact contents of the box. Indeed, boxes
are used to model populations and we might not know everything about the
population.
· Instead we might have access to summary information about the box.
· For a random draw 𝑋 from a box, we define the following two quantities:
- We denote the expected value 𝐸(𝑋) as the mean of the box
- We denote the standard error 𝑆𝐸(𝑋) as the standard deviation of the box
14/47
Interpreting the expected value 𝐸(𝑋)
· The random draw may be “decomposed” into two pieces:
𝑋 = 𝐸(𝑋) + [𝑋 − 𝐸(𝑋)] = 𝐸(𝑋) + 𝜀 .
· The first part 𝐸(𝑋) is not random.
· All randomness is included in the chance error 𝜀 , which is itself can be
represented by a random draw from an error box (a box with mean zero).
· Example: a random draw 𝑋 from the box
1 2 3
(which has mean 2) may instead be thought of as 𝑋 = 2 + 𝜀 where the chance
error 𝜀 is a random draw from the error box
−1 0 +1 .
15/47
Interpreting the standard error 𝑆𝐸(𝑋)
· The standard error measures the typical size of the error 𝜀 . It is a measure of
random variation in the outcome of 𝑋 .
· For two different random draws, one with the larger SE is likely to differ from its
expected value by a larger amount.
· The standard error is the root-mean-square of the error box.
‾(1
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
− 2)2 + (2 − 2)2 + (3 − 2)2‾
𝑆𝐸(𝑋) = 𝑆𝐷(box) = √ ≈ 0.816
3
‾(−1
‾‾‾‾‾‾‾‾‾‾‾‾‾‾
)2 + 02 + 12‾
√
𝑆𝐸(𝑋) = 𝑅𝑀𝑆(error box) = ≈ 0.816
3
16/47
Sums of random draws
New interpretation of mean and SD
· We have introduced the concepts of
- a random draw 𝑋 from a box
- its expected value 𝐸(𝑋)
- its standard error 𝑆𝐸(𝑋)
· The expected value and standard error are not “new” things, rather, they are new
interpretations of old things.
- It is really “worth the effort” to introduce these new names for these things are
already know about?
- The expected value and standard error become very useful when we have
more than one draw.
18/47
Sum of two random draws
· Consider the two boxes
1 2 3 and 2 4 6 8 .
The first box has mean 2 and SD √‾13‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾2‾
[(−1)2 + 0 + 1 ] = √ 3 ≈ 0.816 .
- 2 2
- The second box has mean 5 and SD
‾1‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
2‾
√4[
(−3) + (−1) + 1 + 3 ] = √5
2 2 2
‾ ≈ 2.236 .
· Suppose we are going to take a random draw from each, 𝑋 from the first box, 𝑌
from the second box, in such a way that each possible pair of values is equally
likely. What is the behaviour of the (random) sum 𝑆 = 𝑋 + 𝑌 ?
19/47
All possible pairs/sums
· There are 12 possible pairs:
( 1 , 2 ),( 1 , 4 ),( 1 , 6 ),( 1 , 8 ),
( 2 , 2 ),( 2 , 4 ),( 2 , 6 ),( 2 , 8 ),
( 3 , 2 ),( 3 , 4 ),( 3 , 6 ),( 3 , 8 ).
20/47
Table of all possible pairs and their sums
Sample Sum
(1,2) 3
(1,4) 5
(1,6) 7
(1,8) 9
(2,2) 4
(2,4) 6
(2,6) 8
(2,8) 10
(3,2) 5
(3,4) 7
(3,6) 9
(3,8) 11
21/47
Single random draw from a “bigger” box
Thus getting a random pair (𝑋, 𝑌 ) and forming the sum 𝑆 = 𝑋 + 𝑌 is equivalent
to a single random draw from the bigger box
3 4 5 5 6 7 7 8 9 9 10 11
What are the mean and SD of this “bigger” box?
22/47
Using outer()
· The R function outer() forms a two-way array by applying an operation to each
pair of elements from two vectors:
bx = c(1, 2, 3)
by = c(2, 4, 6, 8)
bs = outer(bx, by, "+")
bs
## [,1] [,2] [,3] [,4]
## [1,] 3 5 7 9
## [2,] 4 6 8 10
## [3,] 5 7 9 11
mean(bs)
## [1] 7
mean((bs - mean(bs))^2)
## [1] 5.666667
23/47
Expected value and standard error of the sum
So we have that 𝐸(𝑆) = 7 and 𝑆𝐸(𝑆) = √‾
5‾23‾ ≈ 2.38 .
·
· As it turns out
7 = 𝐸(𝑆) = 𝐸(𝑋 + 𝑌 ) = 𝐸(𝑋) + 𝐸(𝑌 ) = 2 + 5 .
2 2 2 2 2 2
5 = 𝑆𝐸(𝑆 ) = 𝑆𝐸(𝑋 + 𝑌 ) = 𝑆𝐸(𝑋 ) + 𝑆𝐸(𝑌 ) = + 5 .
3 3
· So in this case we have
- expected value of sum is sum of expected values;
- squared SE of the sum is the sum of the squared SEs
· These results hold in general.
24/47
Sum of two random draws.
· Consider two boxes
𝑥1 𝑥2 ⋯ 𝑥𝑀 and 𝑦1 𝑦2 ⋯ 𝑦𝑁
· Suppose we are going to take a random draw from each: 𝑋 from the first box, 𝑌
from the second box, in such a way that each possible pair of values is equally
likely.
25/47
All possible sums
· There are 𝑀𝑁 possible sums, we may arrange them in a two-way array with 𝑀
(horizontal) rows and 𝑁 (vertical) columns.
· Noting that ∑𝑀 𝑥𝑖 = 𝑀 𝑥¯ , we may write the column sums below the line:
𝑖=1
𝑥1 + 𝑦1 𝑥1 + 𝑦2 ⋯ 𝑥1 + 𝑦𝑁
𝑥2 + 𝑦1 𝑥2 + 𝑦2 ⋯ 𝑥2 + 𝑦𝑁
⋮ ⋮ ⋱ ⋮
𝑥𝑀 + 𝑦1 𝑥𝑀 + 𝑦2 ⋯ 𝑥𝑀 + 𝑦𝑁
𝑀 𝑥¯ + 𝑀 𝑦1 𝑀 𝑥¯ + 𝑀 𝑦2 ⋯ 𝑀 𝑥¯ + 𝑀 𝑦𝑁
26/47
The sum of column sums is
𝑀 𝑥¯ +
⋯+ 𝑀𝑥¯ + 𝑀(𝑦1 + ⋯ + 𝑦𝑁 ) = 𝑁𝑀 𝑥¯ + 𝑀𝑁 𝑦¯
𝑁 terms
Thus the average of all possible sums is
sum of all possible sums 𝑁𝑀 𝑥¯ + 𝑀𝑁 𝑦¯
= = 𝑥¯ + 𝑦¯ = 𝐸(𝑋) + 𝐸(𝑌 ) .
no. of all possible sums 𝑀𝑁
That is,
𝐸(𝑋 + 𝑌 ) = 𝐸(𝑋) + 𝐸(𝑌 ) .
27/47
Computing formula for SD
· For a list of numbers 𝑥1 , 𝑥2 , … , 𝑥𝑀 , the square of the SD may be written as
𝑀 𝑀
( ∑ )
2 1 2 1
𝑥2𝑖 − 𝑥¯ 2
𝑀 ∑
𝑆𝐷 = (𝑥𝑖 − 𝑥¯ ) =
𝑖=1
𝑀 𝑖=1
the “mean square minus the square of the mean”.
· To see why, recall that ∑𝑀 𝑥𝑖 = 𝑀 𝑥¯ and so:
𝑖=1
𝑀
(𝑥𝑖 − 𝑥¯ )2 = (𝑥21 − 2𝑥¯ 𝑥1 + 𝑥¯ 2 ) + ⋯ + (𝑥2𝑀 − 2𝑥¯ 𝑥𝑀 + 𝑥¯ 2 )
∑
𝑖=1
¯2
= (𝑥21 + ⋯ + 𝑥2𝑀 ) − 2𝑥¯ (𝑥1 + ⋯ + 𝑥𝑀 ) + 𝑥 +
⋯+ 𝑥
¯ 2
𝑀 terms
𝑀 𝑀
𝑥2𝑖 − 2𝑥¯ 𝑀 𝑥¯ + 𝑀 𝑥¯ 2 = 𝑥2𝑖 − 𝑀 𝑥¯ 2
∑ ∑
=
𝑖=1 𝑖=1 28/47
Easy way to compute SD in R
· The computing formula above can be used to write a quick-and-easy R function to
compute the (population) SD of a list of numbers.
popsd = function(x) sqrt(mean(x^2) - (mean(x)^2))
· Let’s try it out:
x = 1:10
x # this list has mean 5.5
## [1] 1 2 3 4 5 6 7 8 9 10
sqrt(mean((x - 5.5)^2))
## [1] 2.872281
popsd(x)
## [1] 2.872281
29/47
SE of a sum (not examinable)
· It is possible to deduce the SE of our general sum 𝑆 = 𝑋 + 𝑌 .
· We do so by first working out the mean-square of the bigger box of all possible
sums.
· Write each squared sum (𝑥𝑖 + 𝑦𝑗 )2 = 𝑥2𝑖 + 2𝑥𝑖 𝑦𝑗 + 𝑦2𝑗 in an array and add over
columns:
𝑥21 + 2𝑥1 𝑦1 + 𝑦21 ⋯ 𝑥21 + 2𝑥1 𝑦𝑁 + 𝑦2𝑁
𝑥22 + 2𝑥2 𝑦1 + 𝑦21 ⋯ 𝑥22 + 2𝑥2 𝑦𝑁 + 𝑦2𝑁
⋮ ⋱ ⋮
𝑥2𝑀 + 2𝑥𝑀 𝑦1 + 𝑦21 ⋯ 𝑥2𝑀 + 2𝑥𝑀 𝑦𝑁 + 𝑦2𝑁
∑𝑖 𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦1 + 𝑀 𝑦21 ⋯ ∑𝑖 𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦𝑁 + 𝑀 𝑦2𝑁
30/47
SE of a sum (not examinable)
· The sum of squares (of all possible sums) is then
(∑ )
𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦1 + 𝑀 𝑦21 + ⋯+
𝑖
(∑ )
𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦𝑁 + 𝑀 𝑦2𝑁 𝑥2𝑖 + 2𝑀 𝑥¯ (𝑦1 + ⋯ + 𝑦𝑁 )+
∑
=𝑁
𝑖 𝑖
𝑀(𝑦21 + ⋯ + 𝑦2𝑁 )
𝑥2𝑖 + 2𝑀𝑁 𝑥¯ 𝑦¯ + 𝑀 𝑦2𝑗 .
∑ ∑
=𝑁
𝑖 𝑗
· Since there are 𝑀𝑁 possible sums, the mean square is
1 2 1
𝑦2𝑗 .
𝑀 ∑ 𝑁 ∑
𝑥𝑖 + 2𝑥¯ 𝑦¯ +
𝑖 𝑗 31/47
SE of a sum (not examinable)
· Since mean of all possible sums is 𝑥¯ + 𝑦¯ , the squared SD of all possible sums is
1 1
2 2 𝑦2𝑗 − (𝑥¯ 2 + 2𝑥¯ 𝑦¯ + 𝑦¯2 )
𝑀 ∑ 𝑁 ∑
𝑆𝐸(𝑆 ) = 𝑥𝑖 + 2𝑥¯ 𝑦¯ +
𝑖 𝑗
𝑠𝑞. 𝑜𝑓 𝑚𝑒𝑎𝑛
𝑚𝑒𝑎𝑛 𝑠𝑞.
1 2 1
2 𝑦2𝑗 − 𝑦¯2
𝑀 ∑ ∑
= 𝑥𝑖 − 𝑥¯ +
𝑖
𝑁 𝑗
1 2 1
(𝑦𝑗 − 𝑦¯)2
𝑀 ∑ ∑
= (𝑥𝑖 − 𝑥¯ ) +
𝑖
𝑁 𝑗
= 𝑆𝐸(𝑋 )2 + 𝑆𝐸(𝑌 )2 .
32/47
Random samples with replacement of size 𝑛 =2
· A special case of our general sum is where we have a single box
𝑥1 𝑥2 ⋯ 𝑥𝑁
but take two random draws with replacement.
- This means each of the 𝑁 2 possible pairs
(𝑥1 , 𝑥1 ), … , (𝑥1 , 𝑥𝑛 ), … , (𝑥𝑛 , 𝑥1 ), … , (𝑥𝑛 , 𝑥𝑛 ) is equally likely.
· This is where both boxes are (effectively) the same, so 𝐸(𝑋) = 𝐸(𝑌 ) and
𝑆𝐸(𝑋) = 𝑆𝐸(𝑌 ).
· If we write the mean of the box as 𝜇 and the SD of the box as 𝜎 , then the sum 𝑆
of the two random draws has
- 𝐸(𝑆) = 𝐸(𝑋) + 𝐸(𝑌 ) = 𝜇 + 𝜇 = 2𝜇
- 𝑆𝐸(𝑆 )2 = 𝑆𝐸(𝑋 )2 + 𝑆𝐸(𝑌 )2 = 𝜎 2 + 𝜎 2 = 2𝜎 2 ⟹ 𝑆𝐸(𝑆) = √2
‾𝜎 .
33/47
Sums and averages of random
samples of size 𝑛
Random samples of size 𝑛
· We may easily extend the results to any 𝑛 ≥ 2 . Suppose:
- we have a box with mean 𝜇 and SD 𝜎 ;
- we are going to take a random sample of size 𝑛 from the box with
replacement;
- so each possible sample of size 𝑛 is equally likely.
· Let us write
- the random draws as 𝑋1 , 𝑋2 , … , 𝑋𝑛 ;
- the sum as 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 ;
- the sample average as 𝑋¯ = 𝑆 = 1 (𝑋1 + ⋯ + 𝑋𝑛 ) = 1 ∑𝑛 𝑋𝑖 .
𝑛 𝑛 𝑛 𝑖=1
· What are the expected value and standard error of both 𝑆 and 𝑋¯ ?
35/47
The sum 𝑆
· Each single draw has the same behaviour. That is each 𝑋1 , … , 𝑋𝑛 is a single
random draw from the same box with 𝐸(𝑋1 ) = 𝜇 and 𝑆𝐸(𝑋1 ) = 𝜎 .
· Expected value of sum is sum of expected values:
𝐸(𝑆) = 𝐸(𝑋1 + ⋯ + 𝑋𝑛 ) = 𝐸(𝑋1 + ⋯ + 𝑋𝑛−1 ) + 𝐸(𝑋𝑛 ) =
⋯ = 𝐸(𝑋1 ) + ⋯ + 𝐸(𝑋𝑛 ) = 𝜇 + ⋯ + 𝜇 = 𝑛𝜇 .
𝑛 terms
· Also,
𝑆𝐸(𝑆 )2 = 𝑆𝐸(𝑋1 + ⋯ + 𝑋𝑛−1 )2 + 𝑆𝐸(𝑋𝑛 )2 =
⋯ = 𝑆𝐸(𝑋1 )2 + ⋯ + 𝑆𝐸(𝑋𝑛 )2 = 𝜎 2
+ ⋯
+ 𝜎 2
= 𝑛𝜎 2
𝑛 terms
⟹ 𝑆𝐸(𝑆) = √𝑛𝜎
36/47
What if we divide by 𝑁 ?
· Consider the box
𝑥1 𝑥2 … 𝑥𝑁
What is the expected value and standard error of a random draw if we divide each
𝑥𝑖 by 𝑁 ?
· This gives us a new box
𝑦1 𝑦2 … 𝑦𝑁
𝑥
where 𝑦𝑖 = 𝑁𝑖 .
37/47
What if we divide by 𝑁 ?
· If 𝑌 is a random draw from this new box then we can work out 𝐸(𝑌 ) as:
𝑁 𝑁 𝑁
𝑁 ( 𝑁 𝑖=1 )
1 1 𝑥𝑖 1 1 𝑥¯ 𝐸(𝑋)
∑ ∑ ∑
𝐸(𝑌 ) = 𝑦¯ = 𝑦𝑖 = = 𝑥𝑖 = =
𝑁 𝑖=1 𝑁 𝑖=1 𝑁 𝑁 𝑁
· We can also work out the standard error:
𝑁 𝑁
2 1 2 1 𝑥𝑖 𝑥¯ 2
∑ ∑
𝑆𝐸(𝑌 ) = (𝑦𝑖 − 𝑦¯) = ( − )
𝑁 𝑖=1 𝑁 𝑖=1 𝑁 𝑁
𝑁 2
1 1 𝑆𝐸(𝑋)
(𝑥𝑖 − 𝑥¯ )2 =
𝑁 𝑁 ∑
= 2
𝑖=1
𝑁 2
𝑆𝐸(𝑋)
⟹ 𝑆𝐸(𝑌 ) =
𝑁
38/47
The sample average 𝑋¯
· The sample average 𝑋¯ is just 𝑆 , so we can immediately work out the expected
𝑛
value and standard error.
· We thus obtain immediately that for the average,
¯ 𝐸(𝑆) 𝑛𝜇
𝐸(𝑋) = = = 𝜇;
𝑛 𝑛
· As for the standard error we have
¯ 𝑆𝐸(𝑆) 𝜎√ 𝑛 𝜎
𝑆𝐸(𝑋) = = = .
𝑛 𝑛 √ 𝑛
39/47
Example: 6-sided die
· Consider rolling a fair 6-sided die.
· In this case each of the numbers 1,2,3,4,5,6 are equally likely.
· This is equivalent to a random draw from the box
1 2 3 4 5 6
· The mean is 𝜇 = 3.5 = 7 , mean-square 1+4+9+16+25+36 = 91 and thus SD
2 6 6
‾91
‾‾‾‾‾‾‾‾‾‾
7 ‾2
√ 6
−( ) =
‾91
‾‾‾‾‾‾‾
49‾ ‾182
‾‾‾‾‾‾‾‾‾
− 147‾ ‾35
‾‾
√ 6 √ √ 12
𝜎= − = = ≈ 1.71
2 4 12
40/47
Rolling the die 3 times: sum of rolls
· Suppose we roll the die (independently) 3 times. What is the random behaviour of
the sum of the values of the three rolls?
· Let 𝑋1 , 𝑋2 , 𝑋3 denote 3 random draws with replacement from the box
1 2 3 4 5 6
· Then the sum of the 3 rolls 𝑆 = 𝑋1 + 𝑋2 + 𝑋3 has 𝐸(𝑆) = 3𝜇 = 21 = 10.5
2
and
‾35
‾‾‾‾‾‾ ‾35
‾‾ ‾‾
‾
√35
√ 12 √ 4
𝑆𝐸(𝑆) = 𝜎√3‾ = ×3= = ≈ 2.958 .
2
· The box of all possible sums here is exactly the dataset y.dat from earlier in the
lecture!
41/47
Rolling the die 3 times: average of rolls
· What is the random behaviour of the average of the values of the three rolls?
· Writing 𝑋¯ = 𝑋1 +𝑋2 +𝑋3 = 𝑆 , we have
3 3
¯ 𝐸(𝑆) 3𝜇
𝐸(𝑋) = = = 𝜇 = 3.5
3 3
and
𝜎 ‾35
‾‾‾‾‾‾1‾ ‾35
‾‾ ‾‾
‾
√35
√ √
¯
𝑆𝐸(𝑋) = = × = = ≈ 0.956 .
√‾3 12 3 36 6
42/47
Demonstration
· Let us simulate 3 rolls of a 6-sided die 1000 times, and look at the corresponding
1000 sums and averages of each triplet.
d = 1:6
S = 0 # empty vector to catch the sums
for (i in 1:1000) {
rolls = sample(d, size = 3, replace = T)
S[i] = sum(rolls)
}
mean(S)
## [1] 10.476
sd(S)
## [1] 3.014052
popsd(S)
## [1] 3.012544
43/47
hist(S, pr = T, breaks = br)
Note these proportions are close to (but not exactly equal to) the corresponding
proportions in y.dat .
44/47
Averages
Xbar = S/3
mean(Xbar)
## [1] 3.492
sd(Xbar)
## [1] 1.004684
popsd(Xbar)
## [1] 1.004181
45/47
hist(Xbar, pr = T, breaks = br/3)
Same shape as for the sums, but centred on 3.5 and less spread-out.
46/47
Closing remarks: 𝑛 getting larger
· We have seen that for 𝑛 random draws (with replacement) from a box with mean
𝜇 and SE 𝜎
- the sum of draws 𝑆 has 𝐸(𝑆) = 𝑛𝜇 and 𝑆𝐸(𝑆) = 𝜎√𝑛 ;
- the average of the draws 𝑋¯ has 𝐸(𝑋¯ ) = 𝜇 and 𝑆𝐸(𝑋¯ ) = 𝜎 .
𝑛 √
· What happens to the SE of each as 𝑛 gets bigger?
- for the sum, 𝜎√𝑛 gets larger but
- for the average, 𝜎𝑛 gets smaller.
√
· In particular, for the average 𝑋¯ , the random variability about 𝐸(𝑋¯ ) = 𝜇 gets less
as the sample size 𝑛 increases.
47/47