Sampling and Sampling
Distributions
By: Mansi K.
Basic Terminologies
• Population or Universe: It refers to the group of people, items, or units under
investigation and includes every individual
• Sample: A collection consisting of a part or subset of the objects or individuals of
population which is selected for the purpose, representing the population
• Sampling: It is the process of selecting a sample from the population. For this
population is divided into a number of parts called sampling units
Need of Sampling
• Large population can be conveniently covered
• Time, money, and energy is saved
• Used when percent accuracy is not required
Advantages of Sampling
• Economical: Less cost compared to entire population
• Increased Speed: Collection of data, analysis and interpretation of data etc takes
less time
• Rapport: Better rapport is established with the respondents, which helps in validity
and reliability of results
Disadvantages of Sampling
• Biasedness: Chances of biased selection leading to incorrect conclusion
• Need for specialized knowledge: The researcher need knowledge, training and
experience in sampling technique, statistical analysis etc.
Characteristics of a good sample
• A true representative of the population
• Free from error due to bias
• Adequate in size
• Free from non sampling error
Classification of Sampling Techniques
Types of Sampling
• Probability Sampling/ Random: A probability sample is one in which each
member of the population has an equal chance of being selected. Randomness is
in control
• Non-Probability Sampling(Non Random): It relies on personal judgement,
Randomness is not in control
Simple Random Sampling
• Here, all the members have the same chance (probability) of being selected in the
sample. Random method provides an unbiased selection from population
• For example, we wish to draw a sample fo 50 students from a population of 400
students. Place all 400 names in a container and draw out 50 names one by one
Systematic Sampling
• Each member of the sample comes after an equal interval from its previous member
• For example, for a sample of 50 students out of 400, the sampling fraction is
50/400=1/8 i.e. select one student out of every eight students in the population. The
starting point for the selection is chosen at random
Stratified Random Sampling
• The population is divided into smaller homogeneous group or strata by some
characteristic and from each of these strata, members are selected randomly.
• From each strata, simple random sampling or systematic sampling is used to select
the final sample
Cluster Sampling
• A researcher selects sampling units at random and then does complete observation of all the
units in the group
• For example, the study involves primary schools. Select 15 schools randomly and then study all
the children of 15 schools. It is also known as Area sampling.
Non Probability Sampling
• Purposive Sampling: In this sampling method, the researcher selects a typical
group of individuals who might represent the larger population. This is often
accomplished by applying expert knowledge of the population to select in a
nonrandom manner a sample of elements that represents a cross-section of the
population. This is also known Judgement sampling
• Convenience Sampling: It refers to the procedures of obtaining units or
members who are most conveniently available. It consists of units which are
obtained because case are readily available
• Quota Sampling: The selection of the sample is made by the researcher, who decides the
quotas for selecting sample from specified sub groups of the population
• For example, an interviewer might need data from 40 adults and 20 adolescents in order
to study students television viewing habits
• Selection could be: 20 adult men and 20 adult women, 10 adolescent boys and 10 girls
• Snowball sampling: In this, the researcher identifies and select available respondents
who meet criteria for inclusion
• After the data have been collected from the subject, the researcher asks for a referral of
other individuals, who would also meet the criteria and represent the population of
concern.
• Also called, chain sampling, chain referrals sampling.
Parameter and Statistic
Parameter Statistic
A descriptive
measure A descriptive measure
Definition
calculated for calculated for sample
population
Population
Sample Size=n
size=N
Population
Sample mean= x̄
mean= μ
Population Sample standard
standard= σ deviation=s
Population
Sample proportion =p̂
proportion= p
Introduction to Sampling Distributions
• A probability distribution of all the possible means of the samples is a distribution
of the sample means. This is called sampling distribution of the mean. Similarly
we can have sampling distribution of proportion as well
• Suppose we take a sample of size n from a population of size N. There would be
NCn = k (possible samples)
Standard Error
• The standard deviation of the distribution of sample statistic is known as standard error
of the statistic. For example, standard error of mean. Similarly, the “standard deviation of
the distribution of sample proportions” is shortened to the standard error of the proportion.
• An example will help explain the reason for the name. Suppose we wish to learn something
about the height of freshmen at a large state university. We could take a series of samples and
calculate the mean height for each sample. It is highly unlikely that all of these sample means
would be the same; we expect to see some variability in our observed means. This variability
in the sample statistics results from sampling error due to chance; that is, there are
differences between each sample and the population, and among the several samples.
Sampling from Normal Populations
• Suppose we take a sample of size n from a normal population with mean μ and
standard deviation σ
Example: A bank calculates that its individual savings accounts are normally distributed with
a mean of $2,000 and a standard deviation of $600. If the bank takes a random sample of 100
accounts, what is the probability that the sample mean will lie between $1,900 and $2,050?
Sol: This is a question about the sampling distribution of the mean; therefore, we must first
calculate the standard error of the mean. In this case, we shall use the equation for the standard
error of the mean designed for situations in which the population is infinite
Sampling from Non Normal Populations
• In the preceding section, we concluded that when the population is normally
distributed, the sampling distribution of the mean is also normal. Yet decision
makers must deal with many populations that are not normally distributed. How
does the sampling distribution of the mean react when the population from which
the samples are drawn is not normal?
Central Limit Theorem
First, the mean of the sampling distribution of the mean will equal the population
mean regardless of the sample size, even if the population is not normal. Second,
as the sample size increases, the sampling distribution of the mean will approach
normality, regardless of the shape of the population distribution.
Example: The distribution of annual earnings of all bank tellers with five years’ experience is
skewed negatively (non normal). This distribution has a mean of $19,000 and a standard
deviation of $2,000. If we draw a random sample of 30 tellers, what is the probability that their
earnings will average more than $19,750 annually?
In Figure 6-8(b), we show the sampling distribution of the mean that would result, and we have
colored the area representing “earnings over $19,750.” Our first task is to calculate the standard
error of the mean from the population standard deviation, as follows
More Questions
Q1. An astronomer at the Mount Palomar Observatory notes that during the Geminid meteor
shower, an average of 50 meteors appears each hour, with a variance of 9 meteors squared. The
Geminid meteor shower will occur next week.
(a) If the astronomer watches the shower for 4 hours, what is the probability that at least
48 meteors per hour will appear?
(b) If the astronomer watches for an additional hour, will this probability rise or fall? Why?
Q2. In a normal distribution with mean 375 and standard deviation 48, how large a sample must
be taken so that the probability will be at least 0.95 that the sample mean falls between 370 and
380?
Finite Population Multiplier (when N is finite and
known)