Sampling with Replacement
Sampling with replacement is used to find probability with replacement. In other
words, you want to find the probability of some event where there’s a number of balls,
cards or other objects, and you replace the item each time you choose one.
OR
Sampling is called with replacement when a unit selected at random from the
population is returned to the population and then a second element is selected at
random. Whenever a unit is selected, the population contains all the same units, so
a unit may be selected more than once. There is no change at all in the size of the
population at any stage. We can assume that a sample of any size can be selected
from the given population of any size.
Sampling with replacement:
Consider a population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17,
or 18 potatoes, and all the values are equally likely. Suppose that, in this population,
there is exactly one sack with each number. So the whole population has seven
sacks. If I sample two with replacement, then I first pick one (say 14). I had a 1/7
probability of choosing that one. Then I replace it. Then I pick another. Every one of
them still has 1/7 probability of being chosen. And there are exactly 49 different
possibilities here (assuming we distinguish between the first and second.) They are:
(12,12), (12,13), (12, 14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,13), (13,14),
etc.
Statistics problems and practice by Shahid Jamal (Book)
Insert
Question
Let’s say you had a population of 7 people, and you wanted to sample 2. Their names
are:
John
Jack
Qiu
Tina
Hatty
Jacques
Des
You could put their names in a hat. If you sample with replacement, you would choose
one person’s name, put that person’s name back in the hat, and then choose another
name. The possibilities for your two-name sample are:
John, John
John, Jack
John, Qui
Jack, Qui
Jack Tina
…and so on.
When you sample with replacement, your two items are independent. In other words,
one does not affect the outcome of the other. You have a 1 out of 7 (1/7) chance of
choosing the first name and a 1/7 chance of choosing the second name.
P(John, John) = (1/7) * (1/7) = .02.
P(John, Jack) = (1/7) * (1/7) = .02.
P(John, Qui) = (1/7) * (1/7) = .02.
P(Jack, Qui) = (1/7) * (1/7) = .02.
P(Jack Tina) = (1/7) * (1/7) = .02.
Note that P(John, John) just means “the probability of choosing John’s name, and then
John’s name again.” You can figure out these probabilities using the multiplication rule.
Sampling Without Replacement
Sampling without Replacement is a way to figure out probability without replacement.
In other words, you don’t replace the first item you choose before you choose a second.
This dramatically changes the odds of choosing sample items. Taking the above
example, you would have the same list of names to choose two people from. And your
list of results would similar, except you couldn’t choose the same person twice:
John, Jack
John, Qui
Jack, Qui
Jack Tina…
But now, your two items are dependent, or linked to each other. When you choose the
first item, you have a 1/7 probability of picking a name. But then, assuming you don’t
replace the name, you only have six names to pick from. That gives you a 1/6 chance of
choosing a second name. The odds become:
P(John, Jack) = (1/7) * (1/6) = .024.
P(John, Qui) = (1/7) * (1/6) = .024.
P(Jack, Qui) = (1/7) * (1/6) = .024.
P(Jack Tina) = (1/7) * (1/6) = .024…
As you can probably figure out, I’ve only used a few items here, so the odds only
change a little. But larger samples taken from small populations can have more
dramatic results.
You can tell how dramatic these results are by calculating the covariance. That’s a
measure of how much probabilities of two items are linked together; the higher the
covariance, the more dramatic the results. A covariance of zero would mean there’s no
difference between sampling with replacement or sampling without.
Key points:
The sampling distribution of a statistic is the distribution of the statistic for all
possible samples from the same population of a given size.
inferential statistics: A branch of mathematics that involves drawing conclusions
about a population based on sample data drawn from it.
What Is a Sampling Distribution?
A sampling distribution is a probability distribution of a statistic obtained from a
larger number of samples drawn from a specific population. The sampling
distribution of a given population is the distribution of frequencies of a range of
different outcomes that could possibly occur for a statistic of a population.
Sampling distribution of a sample mean:
https://online.stat.psu.edu/stat500/lesson/4/4.1
How Does it Work?
3. Select a random sample of a specific size from a given population.
4. Calculate a statistic for the sample, such as the mean, median, or
standard deviation.
5. Develop a frequency distribution of each sample statistic that you
calculated from the step above.
6. Plot the frequency distribution of each sample statistic that you
developed from the step above. The resulting graph will be the
sampling distribution.
Population is Normal
https://online.stat.psu.edu/stat500/node/510