0% found this document useful (0 votes)

19 views47 pages

10 TheBoxModel

The document discusses the box model as a statistical representation of a population, detailing how random samples can be drawn from a collection of objects in a box. It explains key concepts such as expected value and standard error, and how they relate to chance variability and the behavior of sample means. Additionally, it provides examples and R code for calculating probabilities and analyzing data distributions.

Uploaded by

ishrat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views47 pages

10 TheBoxModel

Uploaded by

ishrat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

The Box Model

Sampling Data | Chance Variability

© University of Sydney MATH1062/MATH1005

12 September 2024
Course Overview

Population

3 Sampling Data 4 Decisions with Data

1 Exploring Data Sample 2 Modelling Data

2/47

 Module3 Sampling Data

Understanding probability
What is probability?

Counting and chance simulation

How to count the number of possible outcomes?

Chance variability
How can we model chance variability by a box model?

Central limit theorem

What is the behaviour of the sample mean for a large sample size?

3/47

 Today’s outline

The box model

Random draws, sample sums and sample means

Expected value and Standard error

4/47
The box model
Statistical models
· A model is a representation of something which
- is simpler but at the same time
- captures the key features of the original.
· Data obtained in real life is generated by complicated processes.
· Statistical models are models for data-generating processes:
- they are much simpler than the real data-generating process but
- (hopefully) they capture the signal or key features within the data.

6/47
The box model
The box model is a very simple statistical model for representing a population. The
box model can be thought of as:

· A collection of 𝑁 objects, e.g. tickets, balls is imagined “in a box”.

· Each object bears a number.
· A random sample of a certain number 𝑛 of the objects is taken.
· The sampling may be with or without replacement.

7/47
Random samples and random draws
· Consider all possible ways of selecting 𝑛 objects from the box. A random sample
is when each possible of these selection is equally likely.
· A random draw is a random sample with 𝑛 = 1 .
- If a single draw is taken, then each object in the box has an equal chance of
being picked.
· If we completely know the contents of the box, we can write down the chance of
each possible value.
· We let 𝑋 denote the random draw:
- this represents the “value we might get”
- 𝑋 can take different values with different probabilities/chances.
· The distribution of 𝑋 is a table with two “columns”:
- each possible value 𝑥 that 𝑋 can take and
- the corresponding probability/chance of that value.
8/47
Simple example
· For example, suppose 𝑋 is a random draw from the following simple box:

1 2 3

· There are then three possible tickets: 1 , 2 and 3 and each has equal chance
of 13 of being picked, so:

1
𝑃(𝑋 = 1) = 𝑃(𝑋 = 2) = 𝑃(𝑋 = 3) = .
3
Here we write 𝑃(⋅) to denote the probability of each event.

· The distribution of 𝑋 :
- 𝑥 1 2 3
1 1 1
𝑃 (𝑋 = 𝑥) 3 3 3
9/47
Non-equal chance example
· We can have box models where the different possible values are not necessarily
equally likely.
· For the box

1 2 2 3 3 3

if each “ticket” is equally likely, we have

1 2 1 3 1
𝑃(𝑋 = 1) = , 𝑃(𝑋 = 2) = = , 𝑃(𝑋 = 3) = = .
6 6 3 6 2
· The distribution of 𝑋 :
- 𝑥 1 2 3
1 1 1
𝑃 (𝑋 = 𝑥) 6 3 2

10/47
Larger box example
Consider the box defined by the file y.dat in the R code below:

box = scan("y.dat")
box

## [1] 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12
## [31] 8 9 10 11 12 13 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12
## [61] 8 9 10 11 12 13 9 10 11 12 13 14 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12
## [91] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 6 7 8 9 10 11 7 8 9 10 11 12
## [121] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 7 8 9 10 11 12
## [151] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 12 13 14 15 16 17
## [181] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 12 13 14 15 16 17
## [211] 13 14 15 16 17 18

table(box) # note: first two rows below are only labels: the 'real' output is the third line

## box
## 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 1 3 6 10 15 21 25 27 27 25 21 15 10 6 3 1

11/47
Find the probability 𝑃 (𝑋 < 8)
sum(table(box)) # gives total freq, i.e. size of the box

## [1] 216

length(box) == sum(table(box))

## [1] TRUE

head(box < 8) # reports the first 6 values of 'box<8'

## [1] TRUE TRUE TRUE TRUE TRUE FALSE

sum(box < 8) # reports the total numer of TRUE values in 'box<8'

## [1] 35

12/47
Find the proportion less than 8
sum(box < 8)/length(box)

## [1] 0.162037

mean(box < 8) # mean of TRUEs in 'box<8'

## [1] 0.162037

· The chance of drawing a value less than 8 is 35 ≈ 16% .

216
· Note: 35 = 1 + 3 + 6 + 10 + 15 (the frequencies of 3, 4, 5, 6 and 7
respectively).

13/47
Expected value and standard error
· In some situations, we may not know the exact contents of the box. Indeed, boxes
are used to model populations and we might not know everything about the
population.
· Instead we might have access to summary information about the box.
· For a random draw 𝑋 from a box, we define the following two quantities:
- We denote the expected value 𝐸(𝑋) as the mean of the box
- We denote the standard error 𝑆𝐸(𝑋) as the standard deviation of the box

14/47
Interpreting the expected value 𝐸(𝑋)
· The random draw may be “decomposed” into two pieces:

𝑋 = 𝐸(𝑋) + [𝑋 − 𝐸(𝑋)] = 𝐸(𝑋) + 𝜀 .

· The first part 𝐸(𝑋) is not random.
· All randomness is included in the chance error 𝜀 , which is itself can be
represented by a random draw from an error box (a box with mean zero).
· Example: a random draw 𝑋 from the box

1 2 3

(which has mean 2) may instead be thought of as 𝑋 = 2 + 𝜀 where the chance

error 𝜀 is a random draw from the error box

−1 0 +1 .
15/47
Interpreting the standard error 𝑆𝐸(𝑋)
· The standard error measures the typical size of the error 𝜀 . It is a measure of
random variation in the outcome of 𝑋 .
· For two different random draws, one with the larger SE is likely to differ from its
expected value by a larger amount.
· The standard error is the root-mean-square of the error box.

‾(1
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
− 2)2 + (2 − 2)2 + (3 − 2)2‾
𝑆𝐸(𝑋) = 𝑆𝐷(box) = √ ≈ 0.816
3
‾(−1
‾‾‾‾‾‾‾‾‾‾‾‾‾‾
)2 + 02 + 12‾
√
𝑆𝐸(𝑋) = 𝑅𝑀𝑆(error box) = ≈ 0.816
3

16/47
Sums of random draws
New interpretation of mean and SD
· We have introduced the concepts of
- a random draw 𝑋 from a box
- its expected value 𝐸(𝑋)
- its standard error 𝑆𝐸(𝑋)

· The expected value and standard error are not “new” things, rather, they are new
interpretations of old things.
- It is really “worth the effort” to introduce these new names for these things are
already know about?
- The expected value and standard error become very useful when we have
more than one draw.

18/47
Sum of two random draws
· Consider the two boxes

1 2 3 and 2 4 6 8 .

The first box has mean 2 and SD √‾13‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾2‾

[(−1)2 + 0 + 1 ] = √ 3 ≈ 0.816 .
- 2 2

- The second box has mean 5 and SD

‾1‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
2‾
√4[
(−3) + (−1) + 1 + 3 ] = √5
2 2 2
‾ ≈ 2.236 .

· Suppose we are going to take a random draw from each, 𝑋 from the first box, 𝑌
from the second box, in such a way that each possible pair of values is equally
likely. What is the behaviour of the (random) sum 𝑆 = 𝑋 + 𝑌 ?

19/47
All possible pairs/sums
· There are 12 possible pairs:

( 1 , 2 ),( 1 , 4 ),( 1 , 6 ),( 1 , 8 ),

( 2 , 2 ),( 2 , 4 ),( 2 , 6 ),( 2 , 8 ),

( 3 , 2 ),( 3 , 4 ),( 3 , 6 ),( 3 , 8 ).

20/47
Table of all possible pairs and their sums
Sample Sum
(1,2) 3
(1,4) 5
(1,6) 7
(1,8) 9
(2,2) 4
(2,4) 6
(2,6) 8
(2,8) 10
(3,2) 5
(3,4) 7
(3,6) 9
(3,8) 11

21/47
Single random draw from a “bigger” box
Thus getting a random pair (𝑋, 𝑌 ) and forming the sum 𝑆 = 𝑋 + 𝑌 is equivalent
to a single random draw from the bigger box

3 4 5 5 6 7 7 8 9 9 10 11

What are the mean and SD of this “bigger” box?

22/47
Using outer()
· The R function outer() forms a two-way array by applying an operation to each
pair of elements from two vectors:

bx = c(1, 2, 3)
by = c(2, 4, 6, 8)
bs = outer(bx, by, "+")
bs

## [,1] [,2] [,3] [,4]

## [1,] 3 5 7 9
## [2,] 4 6 8 10
## [3,] 5 7 9 11

mean(bs)

## [1] 7

mean((bs - mean(bs))^2)

## [1] 5.666667

23/47
Expected value and standard error of the sum
So we have that 𝐸(𝑆) = 7 and 𝑆𝐸(𝑆) = √‾
5‾23‾ ≈ 2.38 .
·

· As it turns out

7 = 𝐸(𝑆) = 𝐸(𝑋 + 𝑌 ) = 𝐸(𝑋) + 𝐸(𝑌 ) = 2 + 5 .

2 2 2 2 2 2
5 = 𝑆𝐸(𝑆 ) = 𝑆𝐸(𝑋 + 𝑌 ) = 𝑆𝐸(𝑋 ) + 𝑆𝐸(𝑌 ) = + 5 .
3 3
· So in this case we have
- expected value of sum is sum of expected values;
- squared SE of the sum is the sum of the squared SEs
· These results hold in general.

24/47
Sum of two random draws.
· Consider two boxes

𝑥1 𝑥2 ⋯ 𝑥𝑀 and 𝑦1 𝑦2 ⋯ 𝑦𝑁

· Suppose we are going to take a random draw from each: 𝑋 from the first box, 𝑌
from the second box, in such a way that each possible pair of values is equally
likely.

25/47
All possible sums
· There are 𝑀𝑁 possible sums, we may arrange them in a two-way array with 𝑀
(horizontal) rows and 𝑁 (vertical) columns.

· Noting that ∑𝑀 𝑥𝑖 = 𝑀 𝑥¯ , we may write the column sums below the line:
𝑖=1

𝑥1 + 𝑦1 𝑥1 + 𝑦2 ⋯ 𝑥1 + 𝑦𝑁
𝑥2 + 𝑦1 𝑥2 + 𝑦2 ⋯ 𝑥2 + 𝑦𝑁
⋮ ⋮ ⋱ ⋮
𝑥𝑀 + 𝑦1 𝑥𝑀 + 𝑦2 ⋯ 𝑥𝑀 + 𝑦𝑁
𝑀 𝑥¯ + 𝑀 𝑦1 𝑀 𝑥¯ + 𝑀 𝑦2 ⋯ 𝑀 𝑥¯ + 𝑀 𝑦𝑁

26/47
The sum of column sums is


𝑀 𝑥¯ +
⋯+ 𝑀𝑥¯ + 𝑀(𝑦1 + ⋯ + 𝑦𝑁 ) = 𝑁𝑀 𝑥¯ + 𝑀𝑁 𝑦¯
𝑁 terms

Thus the average of all possible sums is

sum of all possible sums 𝑁𝑀 𝑥¯ + 𝑀𝑁 𝑦¯

= = 𝑥¯ + 𝑦¯ = 𝐸(𝑋) + 𝐸(𝑌 ) .
no. of all possible sums 𝑀𝑁
That is,

𝐸(𝑋 + 𝑌 ) = 𝐸(𝑋) + 𝐸(𝑌 ) .

27/47
Computing formula for SD
· For a list of numbers 𝑥1 , 𝑥2 , … , 𝑥𝑀 , the square of the SD may be written as

𝑀 𝑀

( ∑ )
2 1 2 1
𝑥2𝑖 − 𝑥¯ 2
𝑀 ∑
𝑆𝐷 = (𝑥𝑖 − 𝑥¯ ) =
𝑖=1
𝑀 𝑖=1

the “mean square minus the square of the mean”.

· To see why, recall that ∑𝑀 𝑥𝑖 = 𝑀 𝑥¯ and so:
𝑖=1
𝑀
(𝑥𝑖 − 𝑥¯ )2 = (𝑥21 − 2𝑥¯ 𝑥1 + 𝑥¯ 2 ) + ⋯ + (𝑥2𝑀 − 2𝑥¯ 𝑥𝑀 + 𝑥¯ 2 )
∑
𝑖=1

¯2
= (𝑥21 + ⋯ + 𝑥2𝑀 ) − 2𝑥¯ (𝑥1 + ⋯ + 𝑥𝑀 ) + 𝑥 + 
⋯+ 𝑥
¯ 2

𝑀 terms
𝑀 𝑀
𝑥2𝑖 − 2𝑥¯ 𝑀 𝑥¯ + 𝑀 𝑥¯ 2 = 𝑥2𝑖 − 𝑀 𝑥¯ 2
∑ ∑
=
𝑖=1 𝑖=1 28/47
Easy way to compute SD in R
· The computing formula above can be used to write a quick-and-easy R function to
compute the (population) SD of a list of numbers.

popsd = function(x) sqrt(mean(x^2) - (mean(x)^2))

· Let’s try it out:

x = 1:10
x # this list has mean 5.5

## [1] 1 2 3 4 5 6 7 8 9 10

sqrt(mean((x - 5.5)^2))

## [1] 2.872281

popsd(x)

## [1] 2.872281
29/47
SE of a sum (not examinable)
· It is possible to deduce the SE of our general sum 𝑆 = 𝑋 + 𝑌 .
· We do so by first working out the mean-square of the bigger box of all possible
sums.
· Write each squared sum (𝑥𝑖 + 𝑦𝑗 )2 = 𝑥2𝑖 + 2𝑥𝑖 𝑦𝑗 + 𝑦2𝑗 in an array and add over
columns:

𝑥21 + 2𝑥1 𝑦1 + 𝑦21 ⋯ 𝑥21 + 2𝑥1 𝑦𝑁 + 𝑦2𝑁

𝑥22 + 2𝑥2 𝑦1 + 𝑦21 ⋯ 𝑥22 + 2𝑥2 𝑦𝑁 + 𝑦2𝑁
⋮ ⋱ ⋮
𝑥2𝑀 + 2𝑥𝑀 𝑦1 + 𝑦21 ⋯ 𝑥2𝑀 + 2𝑥𝑀 𝑦𝑁 + 𝑦2𝑁
∑𝑖 𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦1 + 𝑀 𝑦21 ⋯ ∑𝑖 𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦𝑁 + 𝑀 𝑦2𝑁

30/47
SE of a sum (not examinable)
· The sum of squares (of all possible sums) is then

(∑ )
𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦1 + 𝑀 𝑦21 + ⋯+
𝑖

(∑ )
𝑥2𝑖 + 2𝑀 𝑥¯ 𝑦𝑁 + 𝑀 𝑦2𝑁 𝑥2𝑖 + 2𝑀 𝑥¯ (𝑦1 + ⋯ + 𝑦𝑁 )+
∑
=𝑁
𝑖 𝑖
𝑀(𝑦21 + ⋯ + 𝑦2𝑁 )
𝑥2𝑖 + 2𝑀𝑁 𝑥¯ 𝑦¯ + 𝑀 𝑦2𝑗 .
∑ ∑
=𝑁
𝑖 𝑗

· Since there are 𝑀𝑁 possible sums, the mean square is

1 2 1
𝑦2𝑗 .
𝑀 ∑ 𝑁 ∑
𝑥𝑖 + 2𝑥¯ 𝑦¯ +
𝑖 𝑗 31/47
SE of a sum (not examinable)
· Since mean of all possible sums is 𝑥¯ + 𝑦¯ , the squared SD of all possible sums is

1 1
2 2 𝑦2𝑗 − (𝑥¯ 2 + 2𝑥¯ 𝑦¯ + 𝑦¯2 )
𝑀 ∑ 𝑁 ∑
𝑆𝐸(𝑆 ) = 𝑥𝑖 + 2𝑥¯ 𝑦¯ +


𝑖 𝑗
𝑠𝑞. 𝑜𝑓 𝑚𝑒𝑎𝑛
𝑚𝑒𝑎𝑛 𝑠𝑞.
1 2 1
2 𝑦2𝑗 − 𝑦¯2
𝑀 ∑ ∑
= 𝑥𝑖 − 𝑥¯ +
𝑖
𝑁 𝑗
1 2 1
(𝑦𝑗 − 𝑦¯)2
𝑀 ∑ ∑
= (𝑥𝑖 − 𝑥¯ ) +
𝑖
𝑁 𝑗

= 𝑆𝐸(𝑋 )2 + 𝑆𝐸(𝑌 )2 .

32/47
Random samples with replacement of size 𝑛 =2
· A special case of our general sum is where we have a single box

𝑥1 𝑥2 ⋯ 𝑥𝑁

but take two random draws with replacement.

- This means each of the 𝑁 2 possible pairs
(𝑥1 , 𝑥1 ), … , (𝑥1 , 𝑥𝑛 ), … , (𝑥𝑛 , 𝑥1 ), … , (𝑥𝑛 , 𝑥𝑛 ) is equally likely.
· This is where both boxes are (effectively) the same, so 𝐸(𝑋) = 𝐸(𝑌 ) and
𝑆𝐸(𝑋) = 𝑆𝐸(𝑌 ).
· If we write the mean of the box as 𝜇 and the SD of the box as 𝜎 , then the sum 𝑆
of the two random draws has
- 𝐸(𝑆) = 𝐸(𝑋) + 𝐸(𝑌 ) = 𝜇 + 𝜇 = 2𝜇
- 𝑆𝐸(𝑆 )2 = 𝑆𝐸(𝑋 )2 + 𝑆𝐸(𝑌 )2 = 𝜎 2 + 𝜎 2 = 2𝜎 2 ⟹ 𝑆𝐸(𝑆) = √2
‾𝜎 .

33/47
Sums and averages of random
samples of size 𝑛
Random samples of size 𝑛
· We may easily extend the results to any 𝑛 ≥ 2 . Suppose:
- we have a box with mean 𝜇 and SD 𝜎 ;
- we are going to take a random sample of size 𝑛 from the box with
replacement;
- so each possible sample of size 𝑛 is equally likely.
· Let us write
- the random draws as 𝑋1 , 𝑋2 , … , 𝑋𝑛 ;
- the sum as 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 ;
- the sample average as 𝑋¯ = 𝑆 = 1 (𝑋1 + ⋯ + 𝑋𝑛 ) = 1 ∑𝑛 𝑋𝑖 .
𝑛 𝑛 𝑛 𝑖=1
· What are the expected value and standard error of both 𝑆 and 𝑋¯ ?

35/47
The sum 𝑆
· Each single draw has the same behaviour. That is each 𝑋1 , … , 𝑋𝑛 is a single
random draw from the same box with 𝐸(𝑋1 ) = 𝜇 and 𝑆𝐸(𝑋1 ) = 𝜎 .
· Expected value of sum is sum of expected values:

𝐸(𝑆) = 𝐸(𝑋1 + ⋯ + 𝑋𝑛 ) = 𝐸(𝑋1 + ⋯ + 𝑋𝑛−1 ) + 𝐸(𝑋𝑛 ) =

⋯ = 𝐸(𝑋1 ) + ⋯ + 𝐸(𝑋𝑛 ) = 𝜇 + ⋯ + 𝜇 = 𝑛𝜇 .

𝑛 terms

· Also,

𝑆𝐸(𝑆 )2 = 𝑆𝐸(𝑋1 + ⋯ + 𝑋𝑛−1 )2 + 𝑆𝐸(𝑋𝑛 )2 =

⋯ = 𝑆𝐸(𝑋1 )2 + ⋯ + 𝑆𝐸(𝑋𝑛 )2 = 𝜎 2
+ ⋯
 + 𝜎 2
= 𝑛𝜎 2

𝑛 terms
⟹ 𝑆𝐸(𝑆) = √𝑛𝜎
36/47
What if we divide by 𝑁 ?
· Consider the box

𝑥1 𝑥2 … 𝑥𝑁

What is the expected value and standard error of a random draw if we divide each
𝑥𝑖 by 𝑁 ?
· This gives us a new box

𝑦1 𝑦2 … 𝑦𝑁

𝑥
where 𝑦𝑖 = 𝑁𝑖 .

37/47
What if we divide by 𝑁 ?
· If 𝑌 is a random draw from this new box then we can work out 𝐸(𝑌 ) as:

𝑁 𝑁 𝑁

𝑁 ( 𝑁 𝑖=1 )
1 1 𝑥𝑖 1 1 𝑥¯ 𝐸(𝑋)
∑ ∑ ∑
𝐸(𝑌 ) = 𝑦¯ = 𝑦𝑖 = = 𝑥𝑖 = =
𝑁 𝑖=1 𝑁 𝑖=1 𝑁 𝑁 𝑁

· We can also work out the standard error:

𝑁 𝑁
2 1 2 1 𝑥𝑖 𝑥¯ 2
∑ ∑
𝑆𝐸(𝑌 ) = (𝑦𝑖 − 𝑦¯) = ( − )
𝑁 𝑖=1 𝑁 𝑖=1 𝑁 𝑁
𝑁 2
1 1 𝑆𝐸(𝑋)
(𝑥𝑖 − 𝑥¯ )2 =
𝑁 𝑁 ∑
= 2
𝑖=1
𝑁 2

𝑆𝐸(𝑋)
⟹ 𝑆𝐸(𝑌 ) =
𝑁
38/47
The sample average 𝑋¯
· The sample average 𝑋¯ is just 𝑆 , so we can immediately work out the expected
𝑛
value and standard error.
· We thus obtain immediately that for the average,

¯ 𝐸(𝑆) 𝑛𝜇
𝐸(𝑋) = = = 𝜇;
𝑛 𝑛
· As for the standard error we have

¯ 𝑆𝐸(𝑆) 𝜎√ 𝑛 𝜎
𝑆𝐸(𝑋) = = = .
𝑛 𝑛 √ 𝑛

39/47
Example: 6-sided die
· Consider rolling a fair 6-sided die.
· In this case each of the numbers 1,2,3,4,5,6 are equally likely.
· This is equivalent to a random draw from the box

1 2 3 4 5 6

· The mean is 𝜇 = 3.5 = 7 , mean-square 1+4+9+16+25+36 = 91 and thus SD

2 6 6

‾91
‾‾‾‾‾‾‾‾‾‾
7 ‾2
√ 6
−( ) =
‾91
‾‾‾‾‾‾‾
49‾ ‾182
‾‾‾‾‾‾‾‾‾
− 147‾ ‾35
‾‾
√ 6 √ √ 12
𝜎= − = = ≈ 1.71
2 4 12

40/47
Rolling the die 3 times: sum of rolls
· Suppose we roll the die (independently) 3 times. What is the random behaviour of
the sum of the values of the three rolls?
· Let 𝑋1 , 𝑋2 , 𝑋3 denote 3 random draws with replacement from the box

1 2 3 4 5 6

· Then the sum of the 3 rolls 𝑆 = 𝑋1 + 𝑋2 + 𝑋3 has 𝐸(𝑆) = 3𝜇 = 21 = 10.5

2
and

‾35
‾‾‾‾‾‾ ‾35
‾‾ ‾‾
‾
√35
√ 12 √ 4
𝑆𝐸(𝑆) = 𝜎√3‾ = ×3= = ≈ 2.958 .
2
· The box of all possible sums here is exactly the dataset y.dat from earlier in the
lecture!

41/47
Rolling the die 3 times: average of rolls
· What is the random behaviour of the average of the values of the three rolls?
· Writing 𝑋¯ = 𝑋1 +𝑋2 +𝑋3 = 𝑆 , we have
3 3

¯ 𝐸(𝑆) 3𝜇
𝐸(𝑋) = = = 𝜇 = 3.5
3 3
and

𝜎 ‾35
‾‾‾‾‾‾1‾ ‾35
‾‾ ‾‾
‾
√35
√ √
¯
𝑆𝐸(𝑋) = = × = = ≈ 0.956 .
√‾3 12 3 36 6

42/47
Demonstration
· Let us simulate 3 rolls of a 6-sided die 1000 times, and look at the corresponding
1000 sums and averages of each triplet.

d = 1:6
S = 0 # empty vector to catch the sums
for (i in 1:1000) {
rolls = sample(d, size = 3, replace = T)
S[i] = sum(rolls)
}
mean(S)

## [1] 10.476

sd(S)

## [1] 3.014052

popsd(S)

## [1] 3.012544

43/47
hist(S, pr = T, breaks = br)

Note these proportions are close to (but not exactly equal to) the corresponding
proportions in y.dat .

44/47
Averages
Xbar = S/3
mean(Xbar)

## [1] 3.492

sd(Xbar)

## [1] 1.004684

popsd(Xbar)

## [1] 1.004181

45/47
hist(Xbar, pr = T, breaks = br/3)

Same shape as for the sums, but centred on 3.5 and less spread-out.

46/47
Closing remarks: 𝑛 getting larger
· We have seen that for 𝑛 random draws (with replacement) from a box with mean
𝜇 and SE 𝜎
- the sum of draws 𝑆 has 𝐸(𝑆) = 𝑛𝜇 and 𝑆𝐸(𝑆) = 𝜎√𝑛 ;
- the average of the draws 𝑋¯ has 𝐸(𝑋¯ ) = 𝜇 and 𝑆𝐸(𝑋¯ ) = 𝜎 .
𝑛 √
· What happens to the SE of each as 𝑛 gets bigger?
- for the sum, 𝜎√𝑛 gets larger but
- for the average, 𝜎𝑛 gets smaller.
√
· In particular, for the average 𝑋¯ , the random variability about 𝐸(𝑋¯ ) = 𝜇 gets less
as the sample size 𝑛 increases.

47/47

11 CentralLimitTheorem
No ratings yet
11 CentralLimitTheorem
53 pages
Intro to Discrete Probability
No ratings yet
Intro to Discrete Probability
53 pages
Topic 7
No ratings yet
Topic 7
57 pages
Hw08 Questions 2
No ratings yet
Hw08 Questions 2
8 pages
RVSP Notes
89% (9)
RVSP Notes
123 pages
Data Science Probability
No ratings yet
Data Science Probability
75 pages
18 - Expected Value
No ratings yet
18 - Expected Value
38 pages
Probability & Statistics Guide
No ratings yet
Probability & Statistics Guide
95 pages
Introduction to Statistics and Probability
No ratings yet
Introduction to Statistics and Probability
9 pages
Module - 3
No ratings yet
Module - 3
90 pages
Summary I 2018-2019
No ratings yet
Summary I 2018-2019
72 pages
Steve Smith Tuition: Maths Notes
No ratings yet
Steve Smith Tuition: Maths Notes
26 pages
Discrete Probability Distributions: Random Variables
No ratings yet
Discrete Probability Distributions: Random Variables
52 pages
Modeling With Probability
No ratings yet
Modeling With Probability
91 pages
Oxford Textbook - Chapters 7 and 8 - Worked Solutions-4
No ratings yet
Oxford Textbook - Chapters 7 and 8 - Worked Solutions-4
32 pages
Bks MaiSL 7u8u Wsol Xxaann
No ratings yet
Bks MaiSL 7u8u Wsol Xxaann
32 pages
TOPIC8. Random Variables and Probability Distributions
100% (1)
TOPIC8. Random Variables and Probability Distributions
8 pages
2023S1 Topic7Summary
No ratings yet
2023S1 Topic7Summary
6 pages
Course Notes
No ratings yet
Course Notes
111 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Probability & Expectation Basics
No ratings yet
Probability & Expectation Basics
6 pages
Lesson 1: Basic Probability: Learning Objectives
No ratings yet
Lesson 1: Basic Probability: Learning Objectives
33 pages
Ma8391 Notes
No ratings yet
Ma8391 Notes
60 pages
Mat2377 Ch2 Discrete Random Variables
No ratings yet
Mat2377 Ch2 Discrete Random Variables
46 pages
Probability Models and Distributions
No ratings yet
Probability Models and Distributions
51 pages
ECE2191 Lecture Notes
No ratings yet
ECE2191 Lecture Notes
106 pages
P&S Unit 1
No ratings yet
P&S Unit 1
50 pages
Lab 6 Activities
No ratings yet
Lab 6 Activities
4 pages
Introdiscreteprobas v1.2
No ratings yet
Introdiscreteprobas v1.2
91 pages
ST102: Text For The Gaps in Lecture Slides: Descriptive Statistics
No ratings yet
ST102: Text For The Gaps in Lecture Slides: Descriptive Statistics
21 pages
Information About The Course Work: Tutorial 2, 3
No ratings yet
Information About The Course Work: Tutorial 2, 3
20 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
Orientation - Basic Mathematics and Statistics - Probability
No ratings yet
Orientation - Basic Mathematics and Statistics - Probability
48 pages
Probability Distributions in R
No ratings yet
Probability Distributions in R
42 pages
MAT3003 Modules - (1 2 3) - Updated
No ratings yet
MAT3003 Modules - (1 2 3) - Updated
40 pages
A-Level Statistics Revision Guide
No ratings yet
A-Level Statistics Revision Guide
9 pages
Material MAT3003 Modules - (1+2+3)
No ratings yet
Material MAT3003 Modules - (1+2+3)
63 pages
4 Random Variables
No ratings yet
4 Random Variables
6 pages
A First Course in Probability Notes
No ratings yet
A First Course in Probability Notes
103 pages
Chap 16 Probability PDF
No ratings yet
Chap 16 Probability PDF
37 pages
Statistical Computing With R: Masters in Data Sciences 503 (S29) Third Batch, SMS, TU, 2024
No ratings yet
Statistical Computing With R: Masters in Data Sciences 503 (S29) Third Batch, SMS, TU, 2024
40 pages
STA124 Complete Note (Edward Cares)
No ratings yet
STA124 Complete Note (Edward Cares)
41 pages
Mathematical Foundations of Computer Science Lecture Outline
No ratings yet
Mathematical Foundations of Computer Science Lecture Outline
5 pages
2a Eda
No ratings yet
2a Eda
17 pages
Lecture 4
No ratings yet
Lecture 4
39 pages
Lecture 2 Slides With Q&A 20242025
No ratings yet
Lecture 2 Slides With Q&A 20242025
38 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
8 pages
Bab 12. Data Analysis and Probability
No ratings yet
Bab 12. Data Analysis and Probability
49 pages
Expectation of Geometric Distribution Variance and Standard Deviation
No ratings yet
Expectation of Geometric Distribution Variance and Standard Deviation
5 pages
STAT2011 Week3 2024
No ratings yet
STAT2011 Week3 2024
11 pages
Lecture7 Slides
No ratings yet
Lecture7 Slides
6 pages
Random Variables: Jeff Chak Fu WONG
No ratings yet
Random Variables: Jeff Chak Fu WONG
104 pages
Expectations of Discrete Random Variables: Scott Sheffield
No ratings yet
Expectations of Discrete Random Variables: Scott Sheffield
61 pages
Probability Concepts Explained
No ratings yet
Probability Concepts Explained
21 pages
Aero1400 Assignment 2
No ratings yet
Aero1400 Assignment 2
20 pages
Topic2 Numerical Summary
No ratings yet
Topic2 Numerical Summary
62 pages
Trigonometric Graphs
No ratings yet
Trigonometric Graphs
5 pages
Topic3 NormalCurve
No ratings yet
Topic3 NormalCurve
40 pages
12 UnknownProportions
No ratings yet
12 UnknownProportions
37 pages
Python Notes
No ratings yet
Python Notes
27 pages
13 UnknownProportionsMore
No ratings yet
13 UnknownProportionsMore
38 pages
AERO1400 Quiz Notes
No ratings yet
AERO1400 Quiz Notes
4 pages
14 UnknownMeans
No ratings yet
14 UnknownMeans
43 pages
Topic5 Probability
No ratings yet
Topic5 Probability
39 pages
Samsung MH080FXCA4A Service Manual
100% (5)
Samsung MH080FXCA4A Service Manual
108 pages
Class 11 Maths Sample Paper Set 8
No ratings yet
Class 11 Maths Sample Paper Set 8
9 pages
Conductivity Method
No ratings yet
Conductivity Method
2 pages
Animal Nutrition
No ratings yet
Animal Nutrition
53 pages
Metal Casting Gating Systems Guide
No ratings yet
Metal Casting Gating Systems Guide
15 pages
Statistics & Research Textbooks
No ratings yet
Statistics & Research Textbooks
6 pages
December SAT v0
No ratings yet
December SAT v0
16 pages
Features TO-92: 0.8A, 35V NPN Plastic-Encapsulated Transistor Elektronische Bauelemente
No ratings yet
Features TO-92: 0.8A, 35V NPN Plastic-Encapsulated Transistor Elektronische Bauelemente
1 page
Physical Properties
No ratings yet
Physical Properties
4 pages
Sample Chapter 3
No ratings yet
Sample Chapter 3
84 pages
Duranate WL72-100 Product Info - Isocyanate - Asahi KASEI
100% (1)
Duranate WL72-100 Product Info - Isocyanate - Asahi KASEI
5 pages
Combustion Engineering: Technical Development Program For Process Performance Engineers
100% (1)
Combustion Engineering: Technical Development Program For Process Performance Engineers
36 pages
Ekstraksi Coconut Oil With L. Plantarum
No ratings yet
Ekstraksi Coconut Oil With L. Plantarum
5 pages
List of Chord Progressions
No ratings yet
List of Chord Progressions
1 page
Microcontroller Serial Comms Lab
No ratings yet
Microcontroller Serial Comms Lab
6 pages
Problem-Solving Agent: Goal Formulation Problem Formulation Search
No ratings yet
Problem-Solving Agent: Goal Formulation Problem Formulation Search
11 pages
fbg096 PDF
No ratings yet
fbg096 PDF
16 pages
SAP Configuration & Integration Guide
100% (1)
SAP Configuration & Integration Guide
1 page
Midterm 2022
No ratings yet
Midterm 2022
8 pages
CSA8000 Series Communications Signal Analyzers &amp TDS8000 ...
No ratings yet
CSA8000 Series Communications Signal Analyzers &amp TDS8000 ...
456 pages
BBDMS Report
No ratings yet
BBDMS Report
107 pages
Optimized Design of G+ 20 Storied Building
No ratings yet
Optimized Design of G+ 20 Storied Building
8 pages
Chopin Prelude Op28 No 4 Psi
No ratings yet
Chopin Prelude Op28 No 4 Psi
5 pages
Python 1
No ratings yet
Python 1
48 pages
Penrose1965 PDF
No ratings yet
Penrose1965 PDF
3 pages
Assembler Notes - SS
No ratings yet
Assembler Notes - SS
23 pages
Aclar Llamado I231431 12
No ratings yet
Aclar Llamado I231431 12
40 pages
Lab Report Exp 3: Impact of Jet: Fluids Mechanic (MDB 2013) Semester May 2019
No ratings yet
Lab Report Exp 3: Impact of Jet: Fluids Mechanic (MDB 2013) Semester May 2019
17 pages
ESP Application in BRN Oil Field
100% (1)
ESP Application in BRN Oil Field
81 pages

10 TheBoxModel

Uploaded by

10 TheBoxModel

Uploaded by

The Box Model

Sampling Data | Chance Variability

© University of Sydney MATH1062/MATH1005

3 Sampling Data 4 Decisions with Data

1 Exploring Data Sample 2 Modelling Data

Counting and chance simulation

Central limit theorem

The box model

Random draws, sample sums and sample means

· A collection of 𝑁 objects, e.g. tickets, balls is imagined “in a box”.

if each “ticket” is equally likely, we have

head(box < 8) # reports the first 6 values of 'box<8'

## [1] TRUE TRUE TRUE TRUE TRUE FALSE

sum(box < 8) # reports the total numer of TRUE values in 'box<8'

mean(box < 8) # mean of TRUEs in 'box<8'

· The chance of drawing a value less than 8 is 35 ≈ 16% .

𝑋 = 𝐸(𝑋) + [𝑋 − 𝐸(𝑋)] = 𝐸(𝑋) + 𝜀 .

(which has mean 2) may instead be thought of as 𝑋 = 2 + 𝜀 where the chance

The first box has mean 2 and SD √‾13‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾2‾

- The second box has mean 5 and SD

( 1 , 2 ),( 1 , 4 ),( 1 , 6 ),( 1 , 8 ),

( 2 , 2 ),( 2 , 4 ),( 2 , 6 ),( 2 , 8 ),

( 3 , 2 ),( 3 , 4 ),( 3 , 6 ),( 3 , 8 ).

What are the mean and SD of this “bigger” box?

## [,1] [,2] [,3] [,4]

7 = 𝐸(𝑆) = 𝐸(𝑋 + 𝑌 ) = 𝐸(𝑋) + 𝐸(𝑌 ) = 2 + 5 .

Thus the average of all possible sums is

sum of all possible sums 𝑁𝑀 𝑥¯ + 𝑀𝑁 𝑦¯

𝐸(𝑋 + 𝑌 ) = 𝐸(𝑋) + 𝐸(𝑌 ) .

the “mean square minus the square of the mean”.

popsd = function(x) sqrt(mean(x^2) - (mean(x)^2))

· Let’s try it out:

𝑥21 + 2𝑥1 𝑦1 + 𝑦21 ⋯ 𝑥21 + 2𝑥1 𝑦𝑁 + 𝑦2𝑁

· Since there are 𝑀𝑁 possible sums, the mean square is

but take two random draws with replacement.

𝐸(𝑆) = 𝐸(𝑋1 + ⋯ + 𝑋𝑛 ) = 𝐸(𝑋1 + ⋯ + 𝑋𝑛−1 ) + 𝐸(𝑋𝑛 ) =

𝑆𝐸(𝑆 )2 = 𝑆𝐸(𝑋1 + ⋯ + 𝑋𝑛−1 )2 + 𝑆𝐸(𝑋𝑛 )2 =

· We can also work out the standard error:

· The mean is 𝜇 = 3.5 = 7 , mean-square 1+4+9+16+25+36 = 91 and thus SD

· Then the sum of the 3 rolls 𝑆 = 𝑋1 + 𝑋2 + 𝑋3 has 𝐸(𝑆) = 3𝜇 = 21 = 10.5

You might also like