[go: up one dir, main page]

0% found this document useful (0 votes)
19 views47 pages

10 TheBoxModel

The document discusses the box model as a statistical representation of a population, detailing how random samples can be drawn from a collection of objects in a box. It explains key concepts such as expected value and standard error, and how they relate to chance variability and the behavior of sample means. Additionally, it provides examples and R code for calculating probabilities and analyzing data distributions.

Uploaded by

ishrat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views47 pages

10 TheBoxModel

The document discusses the box model as a statistical representation of a population, detailing how random samples can be drawn from a collection of objects in a box. It explains key concepts such as expected value and standard error, and how they relate to chance variability and the behavior of sample means. Additionally, it provides examples and R code for calculating probabilities and analyzing data distributions.

Uploaded by

ishrat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

The Box Model

Sampling Data | Chance Variability

© University of Sydney MATH1062/MATH1005


12 September 2024
Course Overview

Population

3 Sampling Data 4 Decisions with Data

1 Exploring Data Sample 2 Modelling Data

2/47

 Module3 Sampling Data

Understanding probability
What is probability?

Counting and chance simulation


How to count the number of possible outcomes?

Chance variability
How can we model chance variability by a box model?

Central limit theorem


What is the behaviour of the sample mean for a large sample size?

3/47

 Today’s outline

The box model

Random draws, sample sums and sample means


Expected value and Standard error

4/47
The box model
Statistical models
· A model is a representation of something which
- is simpler but at the same time
- captures the key features of the original.
· Data obtained in real life is generated by complicated processes.
· Statistical models are models for data-generating processes:
- they are much simpler than the real data-generating process but
- (hopefully) they capture the signal or key features within the data.

6/47
The box model
The box model is a very simple statistical model for representing a population. The
box model can be thought of as:

· A collection of 𝑁 objects, e.g. tickets, balls is imagined “in a box”.


· Each object bears a number.
· A random sample of a certain number 𝑛 of the objects is taken.
· The sampling may be with or without replacement.

7/47
Random samples and random draws
· Consider all possible ways of selecting 𝑛 objects from the box. A random sample
is when each possible of these selection is equally likely.
· A random draw is a random sample with 𝑛 = 1 .
- If a single draw is taken, then each object in the box has an equal chance of
being picked.
· If we completely know the contents of the box, we can write down the chance of
each possible value.
· We let 𝑋 denote the random draw:
- this represents the “value we might get”
- 𝑋 can take different values with different probabilities/chances.
· The distribution of 𝑋 is a table with two “columns”:
- each possible value 𝑥 that 𝑋 can take and
- the corresponding probability/chance of that value.
8/47
Simple example
· For example, suppose 𝑋 is a random draw from the following simple box:

1 2 3

· There are then three possible tickets: 1 , 2 and 3 and each has equal chance
of 13 of being picked, so:

1
𝑃(𝑋 = 1) = 𝑃(𝑋 = 2) = 𝑃(𝑋 = 3) = .
3
Here we write 𝑃(⋅) to denote the probability of each event.

· The distribution of 𝑋 :
- 𝑥 1 2 3
1 1 1
𝑃 (𝑋 = 𝑥) 3 3 3
9/47
Non-equal chance example
· We can have box models where the different possible values are not necessarily
equally likely.
· For the box

1 2 2 3 3 3

if each “ticket” is equally likely, we have

1 2 1 3 1
𝑃(𝑋 = 1) = , 𝑃(𝑋 = 2) = = , 𝑃(𝑋 = 3) = = .
6 6 3 6 2
· The distribution of 𝑋 :
- 𝑥 1 2 3
1 1 1
𝑃 (𝑋 = 𝑥) 6 3 2

10/47
Larger box example
Consider the box defined by the file y.dat in the R code below:

box = scan("y.dat")
box

## [1] 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12
## [31] 8 9 10 11 12 13 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12
## [61] 8 9 10 11 12 13 9 10 11 12 13 14 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12
## [91] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 6 7 8 9 10 11 7 8 9 10 11 12
## [121] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 7 8 9 10 11 12
## [151] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 12 13 14 15 16 17
## [181] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 12 13 14 15 16 17
## [211] 13 14 15 16 17 18

table(box) # note: first two rows below are only labels: the 'real' output is the third line

## box
## 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 1 3 6 10 15 21 25 27 27 25 21 15 10 6 3 1

11/47
Find the probability 𝑃 (𝑋 < 8)
sum(table(box)) # gives total freq, i.e. size of the box

## [1] 216

length(box) == sum(table(box))

## [1] TRUE

head(box < 8) # reports the first 6 values of 'box<8'

## [1] TRUE TRUE TRUE TRUE TRUE FALSE

sum(box < 8) # reports the total numer of TRUE values in 'box<8'

## [1] 35

12/47
Find the proportion less than 8
sum(box < 8)/length(box)

## [1] 0.162037

mean(box < 8) # mean of TRUEs in 'box<8'

## [1] 0.162037

· The chance of drawing a value less than 8 is 35 ≈ 16% .


216
· Note: 35 = 1 + 3 + 6 + 10 + 15 (the frequencies of 3, 4, 5, 6 and 7
respectively).

13/47
Expected value and standard error
· In some situations, we may not know the exact contents of the box. Indeed, boxes
are used to model populations and we might not know everything about the
population.
· Instead we might have access to summary information about the box.
· For a random draw 𝑋 from a box, we define the following two quantities:
- We denote the expected value 𝐸(𝑋) as the mean of the box
- We denote the standard error 𝑆𝐸(𝑋) as the standard deviation of the box

14/47
Interpreting the expected value 𝐸(𝑋)
· The random draw may be “decomposed” into two pieces:

𝑋 = 𝐸(𝑋) + [𝑋 − 𝐸(𝑋)] = 𝐸(𝑋) + 𝜀 .


· The first part 𝐸(𝑋) is not random.
· All randomness is included in the chance error 𝜀 , which is itself can be
represented by a random draw from an error box (a box with mean zero).
· Example: a random draw 𝑋 from the box

1 2 3

(which has mean 2) may instead be thought of as 𝑋 = 2 + 𝜀 where the chance


error 𝜀 is a random draw from the error box

−1 0 +1 .
15/47
Interpreting the standard error 𝑆𝐸(𝑋)
· The standard error measures the typical size of the error 𝜀 . It is a measure of
random variation in the outcome of 𝑋 .
· For two different random draws, one with the larger SE is likely to differ from its
expected value by a larger amount.
· The standard error is the root-mean-square of the error box.

‾(1
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
− 2)2 + (2 − 2)2 + (3 − 2)2‾
𝑆𝐸(𝑋) = 𝑆𝐷(box) = √ ≈ 0.816
3
‾(−1
‾‾‾‾‾‾‾‾‾‾‾‾‾‾
)2 + 02 + 12‾

𝑆𝐸(𝑋) = 𝑅𝑀𝑆(error box) = ≈ 0.816
3

16/47
Sums of random draws
New interpretation of mean and SD
· We have introduced the concepts of
- a random draw 𝑋 from a box
- its expected value 𝐸(𝑋)
- its standard error 𝑆𝐸(𝑋)

· The expected value and standard error are not “new” things, rather, they are new
interpretations of old things.
- It is really “worth the effort” to introduce these new names for these things are
already know about?
- The expected value and standard error become very useful when we have
more than one draw.

18/47
Sum of two random draws
· Consider the two boxes

1 2 3 and 2 4 6 8 .

The first box has mean 2 and SD √‾13‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾2‾


[(−1)2 + 0 + 1 ] = √ 3 ≈ 0.816 .
- 2 2

- The second box has mean 5 and SD

‾1‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
2‾
√4[
(−3) + (−1) + 1 + 3 ] = √5
2 2 2
‾ ≈ 2.236 .

· Suppose we are going to take a random draw from each, 𝑋 from the first box, 𝑌
from the second box, in such a way that each possible pair of values is equally
likely. What is the behaviour of the (random) sum 𝑆 = 𝑋 + 𝑌 ?

19/47
All possible pairs/sums
· There are 12 possible pairs:

( 1 , 2 ),( 1 , 4 ),( 1 , 6 ),( 1 , 8 ),

( 2 , 2 ),( 2 , 4 ),( 2 , 6 ),( 2 , 8 ),

( 3 , 2 ),( 3 , 4 ),( 3 , 6 ),( 3 , 8 ).

20/47
Table of all possible pairs and their sums
Sample Sum
(1,2) 3
(1,4) 5
(1,6) 7
(1,8) 9
(2,2) 4
(2,4) 6
(2,6) 8
(2,8) 10
(3,2) 5
(3,4) 7
(3,6) 9
(3,8) 11

21/47
Single random draw from a “bigger” box
Thus getting a random pair (𝑋, 𝑌 ) and forming the sum 𝑆 = 𝑋 + 𝑌 is equivalent
to a single random draw from the bigger box

3 4 5 5 6 7 7 8 9 9 10 11

What are the mean and SD of this “bigger” box?

22/47
Using outer()
· The R function outer() forms a two-way array by applying an operation to each
pair of elements from two vectors:

bx = c(1, 2, 3)
by = c(2, 4, 6, 8)
bs = outer(bx, by, "+")
bs

## [,1] [,2] [,3] [,4]


## [1,] 3 5 7 9
## [2,] 4 6 8 10
## [3,] 5 7 9 11

mean(bs)

## [1] 7

mean((bs - mean(bs))^2)

## [1] 5.666667

23/47
Expected value and standard error of the sum
So we have that 𝐸(𝑆) = 7 and 𝑆𝐸(𝑆) = √‾
5‾23‾ ≈ 2.38 .
·

· As it turns out

7 = 𝐸(𝑆) = 𝐸(𝑋 + 𝑌 ) = 𝐸(𝑋) + 𝐸(𝑌 ) = 2 + 5 .


2 2 2 2 2 2
5 = 𝑆𝐸(𝑆 ) = 𝑆𝐸(𝑋 + 𝑌 ) = 𝑆𝐸(𝑋 ) + 𝑆𝐸(𝑌 ) = + 5 .
3 3
· So in this case we have
- expected value of sum is sum of expected values;
- squared SE of the sum is the sum of the squared SEs
· These results hold in general.

24/47
Sum of two random draws.
· Consider two boxes

𝑥1 𝑥2 ⋯ 𝑥𝑀 and 𝑦1 𝑦2 ⋯ 𝑦𝑁

· Suppose we are going to take a random draw from each: 𝑋 from the first box, 𝑌
from the second box, in such a way that each possible pair of values is equally
likely.

25/47
All possible sums
· There are 𝑀𝑁 possible sums, we may arrange them in a two-way array with 𝑀
(horizontal) rows and 𝑁 (vertical) columns.

· Noting that ∑𝑀 𝑥𝑖 = 𝑀 𝑥¯ , we may write the column sums below the line:
𝑖=1

𝑥1 + 𝑦1 𝑥1 + 𝑦2 ⋯ 𝑥1 + 𝑦𝑁
𝑥2 + 𝑦1 𝑥2 + 𝑦2 ⋯ 𝑥2 + 𝑦𝑁
⋮ ⋮ ⋱ ⋮
𝑥𝑀 + 𝑦1 𝑥𝑀 + 𝑦2 ⋯ 𝑥𝑀 + 𝑦𝑁
𝑀 𝑥¯ + 𝑀 𝑦1 𝑀 𝑥¯ + 𝑀 𝑦2 ⋯ 𝑀 𝑥¯ + 𝑀 𝑦𝑁

26/47
The sum of column sums is


𝑀 𝑥¯ +
⋯+ 𝑀𝑥¯ + 𝑀(𝑦1 + ⋯ + 𝑦𝑁 ) = 𝑁𝑀 𝑥¯ + 𝑀𝑁 𝑦¯
𝑁 terms

Thus the average of all possible sums is

sum of all possible sums 𝑁𝑀 𝑥¯ + 𝑀𝑁 𝑦¯


= = 𝑥¯ + 𝑦¯ = 𝐸(𝑋) + 𝐸(𝑌 ) .
no. of all possible sums 𝑀𝑁
That is,

𝐸(𝑋 + 𝑌 ) = 𝐸(𝑋) + 𝐸(𝑌 ) .

27/47
Computing formula for SD
· For a list of numbers 𝑥1 , 𝑥2 , … , 𝑥𝑀 , the square of the SD may be written as

𝑀 𝑀

( ∑ )
2 1 2 1
𝑥2𝑖 − 𝑥¯ 2
𝑀 ∑
𝑆𝐷 = (𝑥𝑖 − 𝑥¯ ) =
𝑖=1
𝑀 𝑖=1

the “mean square minus the square of the mean”.


· To see why, recall that ∑𝑀 𝑥𝑖 = 𝑀 𝑥¯ and so:
𝑖=1
𝑀
(𝑥𝑖 − 𝑥¯ )2 = (𝑥21 − 2𝑥¯ 𝑥1 + 𝑥¯ 2 ) + ⋯ + (𝑥2𝑀 − 2𝑥¯ 𝑥𝑀 + 𝑥¯ 2 )

𝑖=1

¯2
= (𝑥21 + ⋯ + 𝑥2𝑀 ) − 2𝑥¯ (𝑥1 + ⋯ + 𝑥𝑀 ) + 𝑥 + 
⋯+ 𝑥
¯ 2

𝑀 terms
𝑀 𝑀
𝑥2𝑖 − 2𝑥¯ 𝑀 𝑥¯ + 𝑀 𝑥¯ 2 = 𝑥2𝑖 − 𝑀 𝑥¯ 2
∑ ∑
=
𝑖=1 𝑖=1 28/47
Easy way to compute SD in R
· The computing formula above can be used to write a quick-and-easy R function to
compute the (population) SD of a list of numbers.

popsd = function(x) sqrt(mean(x^2) - (mean(x)^2))

· Let’s try it out:

x = 1:10
x # this list has mean 5.5

## [1] 1 2 3 4 5 6 7 8 9 10

sqrt(mean((x - 5.5)^2))

## [1] 2.872281

popsd(x)

## [1] 2.872281
29/47
SE of a sum (not examinable)
· It is possible to deduce the SE of our general sum 𝑆 = 𝑋 + 𝑌 .
· We do so by first working out the mean-square of the bigger box of all possible
sums.
· Write each squared sum (𝑥𝑖 + 𝑦𝑗 )2 = 𝑥2𝑖 + 2𝑥𝑖 𝑦𝑗 + 𝑦2𝑗 in an array and add over
columns:

𝑥21 + 2𝑥1 𝑦1 + 𝑦21 ⋯ 𝑥21 + 2𝑥1 𝑦𝑁 + 𝑦2𝑁


𝑥22 + 2𝑥2 𝑦1 + 𝑦21 ⋯ 𝑥22 + 2𝑥2 𝑦𝑁 + 𝑦2𝑁
⋮ ⋱ ⋮
𝑥2𝑀 + 2𝑥𝑀 𝑦1 + 𝑦21 ⋯ 𝑥2𝑀 + 2𝑥𝑀 𝑦𝑁 + 𝑦2𝑁
∑𝑖 𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦1 + 𝑀 𝑦21 ⋯ ∑𝑖 𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦𝑁 + 𝑀 𝑦2𝑁

30/47
SE of a sum (not examinable)
· The sum of squares (of all possible sums) is then

(∑ )
𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦1 + 𝑀 𝑦21 + ⋯+
𝑖

(∑ )
𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦𝑁 + 𝑀 𝑦2𝑁 𝑥2𝑖 + 2𝑀 𝑥¯ (𝑦1 + ⋯ + 𝑦𝑁 )+

=𝑁
𝑖 𝑖
𝑀(𝑦21 + ⋯ + 𝑦2𝑁 )
𝑥2𝑖 + 2𝑀𝑁 𝑥¯ 𝑦¯ + 𝑀 𝑦2𝑗 .
∑ ∑
=𝑁
𝑖 𝑗

· Since there are 𝑀𝑁 possible sums, the mean square is

1 2 1
𝑦2𝑗 .
𝑀 ∑ 𝑁 ∑
𝑥𝑖 + 2𝑥¯ 𝑦¯ +
𝑖 𝑗 31/47
SE of a sum (not examinable)
· Since mean of all possible sums is 𝑥¯ + 𝑦¯ , the squared SD of all possible sums is

1 1
2 2 𝑦2𝑗 − (𝑥¯ 2 + 2𝑥¯ 𝑦¯ + 𝑦¯2 )
𝑀 ∑ 𝑁 ∑
𝑆𝐸(𝑆 ) = 𝑥𝑖 + 2𝑥¯ 𝑦¯ +


𝑖 𝑗
𝑠𝑞. 𝑜𝑓 𝑚𝑒𝑎𝑛
𝑚𝑒𝑎𝑛 𝑠𝑞.
1 2 1
2 𝑦2𝑗 − 𝑦¯2
𝑀 ∑ ∑
= 𝑥𝑖 − 𝑥¯ +
𝑖
𝑁 𝑗
1 2 1
(𝑦𝑗 − 𝑦¯)2
𝑀 ∑ ∑
= (𝑥𝑖 − 𝑥¯ ) +
𝑖
𝑁 𝑗

= 𝑆𝐸(𝑋 )2 + 𝑆𝐸(𝑌 )2 .

32/47
Random samples with replacement of size 𝑛 =2
· A special case of our general sum is where we have a single box

𝑥1 𝑥2 ⋯ 𝑥𝑁

but take two random draws with replacement.


- This means each of the 𝑁 2 possible pairs
(𝑥1 , 𝑥1 ), … , (𝑥1 , 𝑥𝑛 ), … , (𝑥𝑛 , 𝑥1 ), … , (𝑥𝑛 , 𝑥𝑛 ) is equally likely.
· This is where both boxes are (effectively) the same, so 𝐸(𝑋) = 𝐸(𝑌 ) and
𝑆𝐸(𝑋) = 𝑆𝐸(𝑌 ).
· If we write the mean of the box as 𝜇 and the SD of the box as 𝜎 , then the sum 𝑆
of the two random draws has
- 𝐸(𝑆) = 𝐸(𝑋) + 𝐸(𝑌 ) = 𝜇 + 𝜇 = 2𝜇
- 𝑆𝐸(𝑆 )2 = 𝑆𝐸(𝑋 )2 + 𝑆𝐸(𝑌 )2 = 𝜎 2 + 𝜎 2 = 2𝜎 2 ⟹ 𝑆𝐸(𝑆) = √2
‾𝜎 .

33/47
Sums and averages of random
samples of size 𝑛
Random samples of size 𝑛
· We may easily extend the results to any 𝑛 ≥ 2 . Suppose:
- we have a box with mean 𝜇 and SD 𝜎 ;
- we are going to take a random sample of size 𝑛 from the box with
replacement;
- so each possible sample of size 𝑛 is equally likely.
· Let us write
- the random draws as 𝑋1 , 𝑋2 , … , 𝑋𝑛 ;
- the sum as 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 ;
- the sample average as 𝑋¯ = 𝑆 = 1 (𝑋1 + ⋯ + 𝑋𝑛 ) = 1 ∑𝑛 𝑋𝑖 .
𝑛 𝑛 𝑛 𝑖=1
· What are the expected value and standard error of both 𝑆 and 𝑋¯ ?

35/47
The sum 𝑆
· Each single draw has the same behaviour. That is each 𝑋1 , … , 𝑋𝑛 is a single
random draw from the same box with 𝐸(𝑋1 ) = 𝜇 and 𝑆𝐸(𝑋1 ) = 𝜎 .
· Expected value of sum is sum of expected values:

𝐸(𝑆) = 𝐸(𝑋1 + ⋯ + 𝑋𝑛 ) = 𝐸(𝑋1 + ⋯ + 𝑋𝑛−1 ) + 𝐸(𝑋𝑛 ) =


⋯ = 𝐸(𝑋1 ) + ⋯ + 𝐸(𝑋𝑛 ) = 𝜇 + ⋯ + 𝜇 = 𝑛𝜇 .

𝑛 terms

· Also,

𝑆𝐸(𝑆 )2 = 𝑆𝐸(𝑋1 + ⋯ + 𝑋𝑛−1 )2 + 𝑆𝐸(𝑋𝑛 )2 =


⋯ = 𝑆𝐸(𝑋1 )2 + ⋯ + 𝑆𝐸(𝑋𝑛 )2 = 𝜎 2
+ ⋯
 + 𝜎 2
= 𝑛𝜎 2

𝑛 terms
⟹ 𝑆𝐸(𝑆) = √𝑛𝜎
36/47
What if we divide by 𝑁 ?
· Consider the box

𝑥1 𝑥2 … 𝑥𝑁

What is the expected value and standard error of a random draw if we divide each
𝑥𝑖 by 𝑁 ?
· This gives us a new box

𝑦1 𝑦2 … 𝑦𝑁

𝑥
where 𝑦𝑖 = 𝑁𝑖 .

37/47
What if we divide by 𝑁 ?
· If 𝑌 is a random draw from this new box then we can work out 𝐸(𝑌 ) as:

𝑁 𝑁 𝑁

𝑁 ( 𝑁 𝑖=1 )
1 1 𝑥𝑖 1 1 𝑥¯ 𝐸(𝑋)
∑ ∑ ∑
𝐸(𝑌 ) = 𝑦¯ = 𝑦𝑖 = = 𝑥𝑖 = =
𝑁 𝑖=1 𝑁 𝑖=1 𝑁 𝑁 𝑁

· We can also work out the standard error:


𝑁 𝑁
2 1 2 1 𝑥𝑖 𝑥¯ 2
∑ ∑
𝑆𝐸(𝑌 ) = (𝑦𝑖 − 𝑦¯) = ( − )
𝑁 𝑖=1 𝑁 𝑖=1 𝑁 𝑁
𝑁 2
1 1 𝑆𝐸(𝑋)
(𝑥𝑖 − 𝑥¯ )2 =
𝑁 𝑁 ∑
= 2
𝑖=1
𝑁 2

𝑆𝐸(𝑋)
⟹ 𝑆𝐸(𝑌 ) =
𝑁
38/47
The sample average 𝑋¯
· The sample average 𝑋¯ is just 𝑆 , so we can immediately work out the expected
𝑛
value and standard error.
· We thus obtain immediately that for the average,

¯ 𝐸(𝑆) 𝑛𝜇
𝐸(𝑋) = = = 𝜇;
𝑛 𝑛
· As for the standard error we have

¯ 𝑆𝐸(𝑆) 𝜎√ 𝑛 𝜎
𝑆𝐸(𝑋) = = = .
𝑛 𝑛 √ 𝑛

39/47
Example: 6-sided die
· Consider rolling a fair 6-sided die.
· In this case each of the numbers 1,2,3,4,5,6 are equally likely.
· This is equivalent to a random draw from the box

1 2 3 4 5 6

· The mean is 𝜇 = 3.5 = 7 , mean-square 1+4+9+16+25+36 = 91 and thus SD


2 6 6

‾91
‾‾‾‾‾‾‾‾‾‾
7 ‾2
√ 6
−( ) =
‾91
‾‾‾‾‾‾‾
49‾ ‾182
‾‾‾‾‾‾‾‾‾
− 147‾ ‾35
‾‾
√ 6 √ √ 12
𝜎= − = = ≈ 1.71
2 4 12

40/47
Rolling the die 3 times: sum of rolls
· Suppose we roll the die (independently) 3 times. What is the random behaviour of
the sum of the values of the three rolls?
· Let 𝑋1 , 𝑋2 , 𝑋3 denote 3 random draws with replacement from the box

1 2 3 4 5 6

· Then the sum of the 3 rolls 𝑆 = 𝑋1 + 𝑋2 + 𝑋3 has 𝐸(𝑆) = 3𝜇 = 21 = 10.5


2
and

‾35
‾‾‾‾‾‾ ‾35
‾‾ ‾‾

√35
√ 12 √ 4
𝑆𝐸(𝑆) = 𝜎√3‾ = ×3= = ≈ 2.958 .
2
· The box of all possible sums here is exactly the dataset y.dat from earlier in the
lecture!

41/47
Rolling the die 3 times: average of rolls
· What is the random behaviour of the average of the values of the three rolls?
· Writing 𝑋¯ = 𝑋1 +𝑋2 +𝑋3 = 𝑆 , we have
3 3

¯ 𝐸(𝑆) 3𝜇
𝐸(𝑋) = = = 𝜇 = 3.5
3 3
and

𝜎 ‾35
‾‾‾‾‾‾1‾ ‾35
‾‾ ‾‾

√35
√ √
¯
𝑆𝐸(𝑋) = = × = = ≈ 0.956 .
√‾3 12 3 36 6

42/47
Demonstration
· Let us simulate 3 rolls of a 6-sided die 1000 times, and look at the corresponding
1000 sums and averages of each triplet.

d = 1:6
S = 0 # empty vector to catch the sums
for (i in 1:1000) {
rolls = sample(d, size = 3, replace = T)
S[i] = sum(rolls)
}
mean(S)

## [1] 10.476

sd(S)

## [1] 3.014052

popsd(S)

## [1] 3.012544

43/47
hist(S, pr = T, breaks = br)

Note these proportions are close to (but not exactly equal to) the corresponding
proportions in y.dat .

44/47
Averages
Xbar = S/3
mean(Xbar)

## [1] 3.492

sd(Xbar)

## [1] 1.004684

popsd(Xbar)

## [1] 1.004181

45/47
hist(Xbar, pr = T, breaks = br/3)

Same shape as for the sums, but centred on 3.5 and less spread-out.

46/47
Closing remarks: 𝑛 getting larger
· We have seen that for 𝑛 random draws (with replacement) from a box with mean
𝜇 and SE 𝜎
- the sum of draws 𝑆 has 𝐸(𝑆) = 𝑛𝜇 and 𝑆𝐸(𝑆) = 𝜎√𝑛 ;
- the average of the draws 𝑋¯ has 𝐸(𝑋¯ ) = 𝜇 and 𝑆𝐸(𝑋¯ ) = 𝜎 .
𝑛 √
· What happens to the SE of each as 𝑛 gets bigger?
- for the sum, 𝜎√𝑛 gets larger but
- for the average, 𝜎𝑛 gets smaller.

· In particular, for the average 𝑋¯ , the random variability about 𝐸(𝑋¯ ) = 𝜇 gets less
as the sample size 𝑛 increases.

47/47

You might also like