[go: up one dir, main page]

0% found this document useful (0 votes)
20 views49 pages

Unit 4 Sampling

The document provides an overview of sampling distribution and confidence interval estimation, highlighting the importance of sampling in data collection for statistical analysis. It discusses various sampling techniques, including probability and non-probability sampling methods, and their respective advantages and disadvantages. Additionally, it defines key terms related to sampling, such as population, sample, and representative sample, emphasizing the need for a representative sample to ensure accurate generalizations about the population.

Uploaded by

theiconicps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views49 pages

Unit 4 Sampling

The document provides an overview of sampling distribution and confidence interval estimation, highlighting the importance of sampling in data collection for statistical analysis. It discusses various sampling techniques, including probability and non-probability sampling methods, and their respective advantages and disadvantages. Additionally, it defines key terms related to sampling, such as population, sample, and representative sample, emphasizing the need for a representative sample to ensure accurate generalizations about the population.

Uploaded by

theiconicps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Sampling Distribution and

Confidence Interval Estimation


Introduction to Sampling
 Data are numerical values containing some
information. Statistics is the science of data.
 Statistical tools can be used on a data set to draw
statistical inferences. These statistical inferences
are in turn used for various purposes.
 For example, the government uses such data for
policy formulation for the welfare of the people,
marketing companies use the data from consumer
surveys to improve the company and to provide
better services to the customer, etc.
 Such data is obtained through sample surveys.
 Sample surveys are conducted throughout the
world by governmental as well as
nongovernmental agencies.
Introduction to Sampling
 For example, “National Sample Survey Organization
(NSSO)” conducts surveys in India, “Statistics
Canada” conducts surveys in Canada, agencies of
United Nations like “World Health Organization
(WHO), “Food and Agricultural Organization (FAO)”
etc. conduct surveys in different countries.
 Sampling theory provides the tools and techniques
for data collection, keeping in mind the objectives
to be fulfilled and the nature of the population.
 There are two ways of obtaining the information
 1. Sample surveys
 2. Complete enumeration or census
Example
 Sample surveys collect information on a fraction
of the total population,
 Whereas census collects information on the whole
population.
 Some surveys, e.g., economic surveys,
agricultural surveys etc. are conducted regularly.
 Some surveys are need-based and are conducted
when some need arises, e.g., consumer
satisfaction surveys at a newly opened shopping
mall to see the satisfaction level with the
amenities provided in the mall .
Problem and Solution
 Problem The populations we wish to study are almost
always so large that we are unable to gather information
from every case.
 Solution We choose a sample – a carefully chosen subset
of the population – and use information gathered from the
cases in the sample to generalize to the population.
 Sample must be representative of the population
representative:
The sample has the same characteristics as the population.
 Statistics are mathematical characteristics of samples
 Parameters are mathematical characteristics of
populations
 Statistics are used to estimate parameters.
Why Sampling?
 Population considered for random experiment is
too large and would cost too much and take too
long.
 e.g., when testing light bulbs to see how long they
last, you take a bulb and leave it on until it burns
out. You can't test all the bulbs this way, because
their whole objective is to sell the bulbs, not burn
them out.
 It's not necessary to survey all cases: taking a
sample yields a estimates that are accurate
enough.
 But sampling may introduce some error.
 You didn't interview everybody, so certain
opinions or combinations of opinions won't be
Sampling Terminology
 Population: It is Universe of cases.
 Collection of all the sampling units in a given
region at a particular point of time or a particular
period.
 For example, if you want to report 'what students
think about online education', then the population
is all Students.
 If the production of wheat in a district is to be
studied, then all the fields cultivating wheat in
that district will constitute the population.
 The total number of sampling units in the
population is the population size, generally
denoted by N.
 The population size can be finite or infinite (N is
Sampling Terminology
 Sample: some elements are selected from a population.
 A sample should have the same characteristics as the
population it is representing.
 Gathering information about an entire population often
costs too much or is virtually impossible. Instead, we use a
sample of the population.
 In the context of sample surveys, a collection of units like
households, people, cities, countries etc. is called a finite
population.
 Census: when all elements are included, or complete
count of population we call it a census. A census is a
100% sample, and it is a complete count of the population.
 For example, in India, the census is conducted at every
tenth year in which observations on all the persons staying
in India is collected.
Sampling Terminology
 Sampling unit: An element or a group of elements
on which the observations can be taken is called a
sampling unit. The objective of the survey helps in
determining the definition of the sampling unit.
 For example, if the objective is to determine the total
income of all the persons in the household, then the
sampling unit is a household. If the objective is to
determine the income of any particular person in the
household, then the sampling unit is the income of the
particular person in the household. So the definition of
sampling unit depends and varies as per the objective
of the survey.
 Elements: Individual cases or a single member of any
given population.(usually, persons. Here each student)
Sampling Terminology
 Representative sample: When all salient features of the
population are present in the sample, then it is called a
representative sample.
 Every sample is considered as a representative sample.
 For example, if a population has 30% males and 70%
females, then we expect the sample to have nearly 30%
males and 70% females.
 Similarly, it is expected that a drop of blood will give the
same information as all the blood in the body.
 Sampling ratio: Size of sample divided by size of
population.
 Sampling frame: Specific list of items from which sample
elements will be chosen.
 Example, residents of a city may be listed in more than one
frame - as per automobile registration as well as the listing
in the telephone directory.
 Online poll was done for favorite new paper. But actual
Sampling Terminology
 Replacement: Sampling with replacement
means that after you draw a item out of the
sample space, you put it back and it can be
chosen again. Sampling without replacement
means that once you draw the name out, it is not
available to be chosen again.
 Bias: Systematic errors produced by your
sampling procedure. For example, if you ask
sample people whether they watch web series ?
The percentage comes high (maybe because you
ask your friends who really likes web series)
Sampling Terminology
Steps of sampling process
1. Identification of target population that is
important for a given problem under
study.
2. Decide the sampling frame.
3. Determine the sample size.
4. Sampling method.
Sampling Techniques
 Probability sampling
 Non probability sampling
Sampling
Sampling Techniques : Probability
sampling
 The probability of an element’s
(participant’s) being included in the sample
can be specified.
 Individual observations in the sample are
selected according to probability
distribution.
 Assume that population has total of N cases,
and we are interested in creating a sample
of size n.
 There are NCn [= N!/ {n! ×(N −n)!}]
different ways for creating such a sample.
Sampling Techniques : Probability
sampling
 It is more difficult and expensive.
 Useful when findings derived from the sample is
to be generalized to the general population.
 It is also known as random sampling or
representative sampling.
 In each form of random sampling, each member of
a population initially has an equal chance of being
selected for the sample.
 The word random describes the procedure used to
select elements (participants, cars, test items)
from a population.
Sampling Techniques : Probability
sampling
 The samples can be drawn in two possible
ways.
 If random sampling is done with
replacement. That is, once a member is
picked, that member goes back into the
population and thus may be chosen more
than once.
 In simple random sampling without
replacement, a member of the population
may be chosen only once. units once are
chosen are not placed back in the
population.
Probability Sampling Techniques

 It is any sampling scheme in which the probability


of choosing each individual is the same. These are
also called random sampling.
 In probability (random) sampling, you start with a
complete sampling frame of all eligible individuals
from which you select your sample.
 All eligible individuals have a chance of being

chosen for the sample, and you will be more able to


generalize the results from your study.
 more time-consuming and expensive, but much

accurate.
1. Simple random sampling 2. Systematic
sampling
3. Stratified sampling 4. Cluster sampling
Simple random sampling
 In this case each individual is chosen
entirely by chance and each member of the
population has an equal chance, or
probability, of being selected.
 One way of obtaining a random sample is to
give each individual in a population a
number, and then use a table of random
numbers to decide which individuals to
include.
Simple random sampling
Simple random sampling
 Advantages
 Simplicity
 Requires little prior knowledge of the population
 Reduces selection bias
 Disadvantages
 Lower accuracy
 Higher cost
 Lower efficiency
 Samples may be clustered spatially
 Samples may not be representative of the feature
attribute(s)
Systematic sampling
 Individuals selected at regular intervals from
sampling frame.
 Intervals are chosen to ensure an adequate
sample size.
 If you need a sample size n from a population of
size x, you should select every x/nth individual for
the sample.
 For Example, suppose you have to do a phone
survey.
 Your phone book contains 20,000 residence
listings.
 You must choose 400 names for the sample.
 Number the population 1–20,000.
 Use a simple random sample to pick a number
Systematic sampling
Systematic sampling
 Advantages
 Greater efficiency
 Lower cost
 Disadvantages
 Lower precision
 it may also lead to bias
Stratified Sampling
 In this method, the population is first
divided into subgroups (or strata) who all
share a similar characteristic and then take
a proportionate number from each
stratum.
 For example, you could stratify (group) your
college population by department and then
choose a proportionate simple random
sample from each stratum (each
department) to get a stratified random
sample.
Stratified Sampling
Stratified Sampling
 Advantages
 Higher accuracy
 Lower cost
 Low bias
 Disadvantages
 it requires knowledge of the appropriate
characteristics of the sampling frame
 The existing knowledge used to construct strata
may be flawed.
Cluster Sampling
 Subgroups of the population are used as the
sampling unit, rather than individuals.
 The population is divided into subgroups, known
as clusters, which are randomly selected to be
included in the study.
 All the members from these clusters are in the
cluster sample.
 In single-stage cluster sampling, all members of
the chosen clusters are then included in the study.
 In two-stage cluster sampling, a selection of
individuals from each cluster is then randomly
selected for inclusion. Clustering should be taken
into account in the analysis.
Cluster Sampling
 For example, if you randomly sample four
departments from your college population,
 the four departments make up the cluster
sample.
 Divide your college faculty by department.
 The departments are the clusters.
 Number each department, and then
choose four different numbers using simple
random sampling.
 All members of the four departments with
those numbers are the cluster sample.
Cluster Sampling
Cluster Sampling
 Advantages
 Greater efficiency
 Lower cost
 Disadvantages
 Lower precision
 Higher bias
Sampling Techniques :
Non Probability sampling
 There is no way of estimating the probability of an
element’s being included in a sample.
 The selection of units in the sample from the
population is not governed by the probability
laws.
 For example, the units are selected on the basis of
the personal judgment of the surveyor.
 The persons volunteering to take some medical
test or to drink a new type of coffee also
constitute the sample on non-random laws.
Non Probability Sampling
Techniques
 you do not start with a complete sampling frame,
so some individuals have no chance of being
selected.
 Consequently, you cannot estimate the effect of
sampling error and there is a significant risk of
ending up with a non-representative sample which
produces non-generalisable results.
 However, non-probability sampling methods tend
to be cheaper and more convenient, and they are
useful for exploratory research and hypothesis
generation.
1. Convenience sampling
2. Quota sampling
3. Judgement (or Purposive) Sampling
4. Snowball sampling
Convenience sampling
 Participants are selected based on availability and
willingness to take part.
 Useful results can be obtained, but the results are
prone to significant bias, because those who
volunteer to take part may be different from those
who choose not to (volunteer bias), and the
sample may not be representative of other
characteristics.
 It involves using results that are readily available.
 For example, a computer software store conducts
a marketing study by interviewing potential
customers who happen to be in the store browsing
through the available software.
 Advantages : Easiest method of sampling
Convenience sampling
Quota sampling
 This method of sampling is often used by market
researchers. Interviewers are given a quota of
subjects of a specified type to attempt to recruit.
 Ideally the quotas chosen would proportionally
represent the characteristics of the underlying
population.
 Advantages
 Relatively straightforward and potentially
representative
 Disadvantages
 The chosen sample may not be representative of
other characteristics that weren’t considered
Quota sampling
Judgment (or Purposive)
Sampling
 Also known as selective, or subjective, sampling, this
technique relies on the judgement of the researcher
when choosing who to ask to participate.
 In other words, researchers choose only those people
who they deem fit to participate in the research study.
Judgmental or purposive sampling is not a scientific
method of sampling,
 This approach is often used by the media when
canvassing the public for opinions and in qualitative
research.
 Advantages
 Time-and cost-effective to perform
 Disadvantages
 Volunteer bias, it is also prone to errors
 Will not necessarily be representative
Judgment (or Purposive)
Sampling
Snowball sampling
 Commonly used in social sciences when
investigating hard-to-reach groups. Existing
subjects are asked to nominate further subjects
known to them, so the sample increases in size
like a rolling snowball.
 Advantages
 Effective when a sampling frame is difficult to
identify
 Disadvantages
 Selection bias
Snowball sampling

Examples
What kind of sampling would you expect was used if the
sample group was composed of 5 yellow, 3 green, 4 red, and
6 blue members, and the population included 48 blue, 32 red,
24 green, and 40 yellow members?
 Since the sample group contains exactly 1/8 as many
members of each color as the entire population, it is
reasonable to suspect that a stratified sampling was used.
 What type(s) of sampling method(s) might be most
appropriate for approximating the number of golden fish in a
25-mile section of river?
 A 25-mile-long section of river is likely to include a number of
different types of ecosystems that each would harbor a
different density of fish. In order to get a good sample, a
multi-stage sampling method comprised of a stratified
sample of different ecosystems followed by a random
sampling of fish in each ecosystem would probably be a good
choice.
Examples
 Would you reasonably expect bias to have affected a sample
composed of 75% Toyota vehicles in a study of the most
common cars in large U.S. cities?
 Although Toyota is a very popular vehicle manufacturer, 75% is
an extremely high percentage of vehicles in a large city
(reasonable estimates put Toyota somewhere between 25 and
30 percent). Such a huge number would definitely suggest
sample bias.
 Would a random sampling of students be the most appropriate
method of sampling for a study of the most enjoyable after-
school club in a large public school?
 Probably not, since a random sampling would likely include a
large number of students who either have no opinion or have no
experience with any after school clubs. More accurate results
would be obtained by a multi-stage sample that first identified
club members, and then randomly selected representatives
from them.
Examples
 What might you conjecture about a study that claims 100%
of respondents preferred “Super Sweet and Crunchy” cereal
over “Super Duper Sweet” cereal
 There are a number of reasonable specific conjectures we
might make, most related to inaccurate sampling
methodology. Perhaps the sample was chosen from
employees of the “Super Sweet and Crunchy” cereal
company, perhaps respondents were offered a reward for
choosing one option over the other, perhaps there was only
a single member of the sample group or the “study” didn’t
include milk for the other cereal, or didn’t offer samples of
“Super Duper Sweet” to respondents at all.
Statistics and Parameters

You might also like