Sampling and Sample Survey
Sampling and Sample Survey
In order to make any decision about some characteristics, it is necessary to collect and analyse data on it. To collect such data, the
data source and method of collection is important. For example, suppose we want to ascertain the percentage of total cultivable land of
Bangladesh on HYV rice cultivation or the percentage of farmers using organic manures in Bangladesh. But it is not possible to collect
information on the total cultivable land of the country or data on fertilizer use of all the farmers. In such cases, a part of the total cultivable
area or a fraction of all farmers is selected using statistical techniques for collection of necessary data. Here, the selected area or a fraction
of farmers is a sample and the method of selecting a sample is called sampling.
In order to clarify the concept of sample and sampling technique, it is necessary to define and discuss certain relevant terms.
◙ Population : Statistical investigations usually aim at the assessment of the general magnitude and the study of variation with
respect to certain characteristics of the individuals belonging to a group. Such a group of individuals under study is known as
population. All the farmers, students, domestic animals, birds, total forest area, total agricultural land etc. may constitute a
population. Populations may be finite or infinite.
◙ Finite Population : A population composed of a finite number of elements is known as a finite population students of an
institution, farmers in a country, number of livestocks etc. are examples of finite populations; these have specific numbers, that
can be enumerated.
◙ Infinite Population : A population composed of an infinite number of elements, which cannot be enumerated, is called infinite
population. For example, number of fishes in a river, number of stars in the sky etc.
◙ Sample : A sample is a small representative fraction of a population. For example, in order to investigate certain characteristics
of all the farmers of the country, some farmers are selected to collect necessary data, selected farmers constitute a sample of the
population of farmers. A small quantity of blood, not the whole, is collected for testing, the blood is a sample where the total
quantity of blood of a person is the population.
◙ Sample Size : The number of elements selected for a sample is known as the sample size. A sample of size less than 30 is termed
as a small sample and that having 30 or more elements is termed as a large sample.
◙ Census : If data are collected on all the elements of a population, the process is known as census, e.g., population census,
agricultural census etc. Detailed information on all the citizens of a country are collected usually in every ten years through
population census. Information on all aspects of agriculture of a country are collected through agricultural census. Bangladesh
Bureau of Statistics (BBS) is the government organization who conducts the population census, agricultural census and other
nation-wide enumeration’s.
9.1 Sample Survey : Sample survey is a method by which detailed information on the population characteristics are collected on the
basis of sample elements. Population parameters such as mean, standard deviation etc. are estimated through sample survey.
◙ Pilot Survey : Small scale surveys are sometimes conducted in order to get quick primary information before census. Such a
survey is known as pilot survey.
(ii) Defining the Population to be Sampled : The population from which sample is to be drawn should be clearly defined. For
example, if we want to select a sample of farms, clear-cut rules should be framed in order to define farm regarding its size, shape etc.
(iii) Sampling Frame and Sampling Units : For the purpose of sample selection, the population should be divided into sampling
units; the sampling units must be distinct and nonoverlaping so that every element of the population belongs to one and only one sampling
unit. For example, in a socio-economic survey for selecting people in a town, the sampling unit might be an individual person, a family, or
a household.
There should be a complete list of the population elements from which sample is to be selected. Such a list which covers all the
population elements is known as sampling frame. The sampling frame should be carefully scrutinized and examined to ensure that it is up-
to data and free from defects.
(iv) Collection of Data : The objectives of the survey should be kept in view while planning for data collection. Only necessary data
should be collected and analysed. There should be a prior outline of tabulation and analysis of data.
(v) Data Collection Method : The commonly used data collection methods from human populations are -
(a) Interview Method : The investigators meets the individual respondents and collects data by interviewing on the basis of an
interview schedule.
(b) Mailed Questionnaire Method : A structured questionnaire is prepared and mailed to individual respondents who are
required to fill it up and send back.
(vi) The Schedule or Questionnaire : An interview schedule or a questionnaire is prepared or a questionnaire requires skill, special
technique and experience in the field of study. The questions should be direct, easy, clear and unambiguous. The questionnaire should be
brief and should not contain offensive questions. An interview schedule or a questionnaire should be finalized after pre-test.
(vii) Nonresponse : Sometimes it may happen that data cannot be collected from all the units in the sample. For example, a sample
unit may not be available or an interviewer may not contact some respondents or some respondents may refuse to furnish information, or a
mailed questionnaire may be somehow missing. Such incompleteness is termed as nonresponse. Procedures should be devised so as to
deal with the problem of nonresponse.
(viii) Sampling Design : An appropriate sampling design is a precondition for selecting a representative sample for data collection.
Nature of the population, variables on which data are to be collected, resources like time, money and manpower available to the researcher
should be the basis for selecting a suitable sampling design.
(ix) Administration of the Survey : Field workers engaged in the data collection process should be trained in identifying the
sampling units, recording the information, data collection methods etc. before starting the survey. The success of a survey largely depends
on the reliable field workers. It is, therefore, necessary to have provision for supervision of the field work.
Two types of errors may be involved in the collection, organization and analysis of data :
(i) Faulty selection of the sample. Use of a defective sampling technique introduces some bias.
(ii) Substitution : Sometimes investigators deliberately substitute a convenient member of the population for a difficult sampling
unit.
(iii) Faulty demarcation of sampling units : This type of bias are particularly significant in area surveys, such as agricultural
experiments.
(iv) Improper use of statistics for parameter estimation.
Increase in the sample size usually reduces the sampling error. In many situations this reduction in sampling error is inversely
proportional to the square root of the sample size (illustrated in figure below :
Sampling error
Sample size
Figure 9.1.
Nonsampling errors primarily arise at different post- sampling stages (e.g. observation, ascertainment and processing of the data).
This error may be present in both complete enumeration and sample survey, while sampling error occurs only in sample survey. In
complete enumeration, nonsample error is the only source of error. It is difficult to ascertain the sources of nonsampling error. However,
some possible sources of nonsampling errors are given below :
(i) Objectives of the survey, methods data collection and processing, analytical technique etc. should be properly defined.
Inadequate data specification and inconsistent data entry, faulty recording and management of data etc. cause nonsampling
errors.
(ii) Inadequate skill of the data collectors and supervisors may also cause nonsampling errors.
(iii) A respondent, not clearly understanding a question, may, furnish wrong information and thus causing nonsampling error.
(iv) Overstatement of the respondent regarding his education, profession, socio-economic status etc. may invite nonsampling error.
Some respondents may give wrong information on age also.
(v) Wrong information may be furnished for personal interest of the respondent. Many respondents hesitate to disclose the actual
income and expenditure.
(vi) Interviewers may sometimes try to influence the respondent by personal likings which may result erratic information.
(vii) The interview schedule or the questionnaire may include some questions on part events; the respondent may not correctly
remember the event and its time of happening.
(viii) Nonsampling error is also caused by nonresponse. The respondent may not be reached in spite of repeated attempts, or he may
be unable to answer all the questions or he may decline to answer some of the questions. Some bias, thus, may be introduced as
a consequence of above mentioned nonresponses.
(ix) Lack of precise and clear statement of the objectives, a survey may wrongly include irrelevant units or exclude certain
necessary items.
(x) Editing and coding the responses, tabulation and summarizing the observations in a survey are some sources of errors.
Nonsampling errors may also arise due to defective frame and faulty selection of sampling units.
Nonsampling errors may be reduced to a great extent by proper planning, using appropriate techniques, employing experienced and
skilled manpower, preparing a good questionnaire and proper reporting.
While selecting samples from a population special care should be taken so as to ensure the presence of population characteristics in
the sample. This sampling method depends on the nature of the data and the type of the inquiry. Sampling methods may be classified into
three broad categories.
(a) Probability or Random Sampling : In this type of sampling each population unit has a specified probability to be included in
the sample. This probability may be equal or unequal for all the units. Sampling units are selected on the basis of the
corresponding probabilities.
(b) Purposive or Judgement Sampling : In this method the investigator draws sample according to his own choice, experience and
working facilities.
(c) Mixed Sampling : In such sampling process random sampling are done at some stages and some particular method(s) are
followed at other stages.
Usual sampling techniques are :
It is a scientific sampling method based on probability theory. In this method each population unit has the same probability for being
included in the sample. For size and type of the population under investigation sampling is done usually in two ways :
(a) Lottery Method
(b) Random Numbers Method.
This is the easiest and well-known sampling method. Population elements first given serial numbers. Each of the serial numbers is
written on a card or a piece of paper; the cards or papers must be identical in size, colour etc. Then the cards or papers are identically
folded and kept in a box. From the box, cards are drawn one by one without replacement to have a sample of desired size.
In this method sampling may be done with replacement too. That is, after random draw of a card, recording its number the card is
replaced to the box and another card is drawn at random. This is known as sampling with replacement. If the population is small, sampling
with replacement is easy but it becomes difficult and time consuming if the population is large.
Example 9.1. Drawing a random sample of size 10 from a population of size 100
The population size N = 100, a three digit number's columns or rows of 3-digit numbers of the random number table are to be used.
Suppose the serial numbers of the population elements are 1, 2, 3, ........ 9, 10, 11, 12, ................ 99, 100. Let the randomly selected
number from columns or rows of 3-digit random numbers be 005, 021, 033, 048, 055, 069, 080, 084, 087 and 098. Hence the population
elements having serial numbers 5, 21, 33, 48, 55, 69, 80, 84, 87 and 98 will constitute the desired sample.
It is observed that this method of sampling is very easy and less time consuming. Three digits random numbers can also be obtained
by using electronic calculators used in the classes by students.
◙ In simple random sampling each population element has equal probability to be included in the sample.
◙ If the population is not homogeneous, sample drawn in this method may not be representative.
◙ If the sample units are geographically far away from each other, data collection often becomes expensive and troublesome.
N n s2
4. Variance of the mean of the simple random sample is vy
N n
1 n
y i y
2
where N is the population size, n is the sample size and s 2
n 1 i 1
Example 9.2. Draw possible samples of size 2 from a population of size 5 having elements 1, 2, 3, 4, 5 and thus show that sample mean is
an unbiased estimate of the population mean.
Sample
1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5
elements
Sample
1.5 2.0 2.5 3.0 2.5 3.0 3.5 3.5 4.0 4.5
mean
1
Mean of the sample means is = (1.5 + 2.0 + 3.0 + 2.5 + 3.0 +
10
3.5 + 3.5 + 4.0 + 4.5) = 3.0
1
Population mean = ( 1 + 2 + 3 + 4 + 5) = 3.0
5
Mean of the sample means and the population mean are equal.
9.2.2. Stratified Random Sampling :
If the population is not homogeneous (the population elements are not similar) in respect of the characteristic under study, a simple
random sample may not properly represent the population. In such cases, the whole population is divided into a number of more or less
homogeneous subdivisions, these subdivisions are called strata. From each of these subdivisions, separate random selections of elements
are made to constitute a sample. This method of sampling is known as stratified random sampling. The strata should be such that -
◙ Elements included in each stratum should be as far as possible of homogeneous nature, and
Suppose a population of size N is divided into k homogenous strata. Let these k different strata consist of N1, N2, ........, Nk elements
k
such that N N i . Sample elements are to be selected from each stratum by random method. If ni (i = 1, 2, ......., k) sample elements are
i 1
k
selected from Ni population elements in the ith stratum, thus a sample of size n = n i is drawn by the stratified random sampling method
i 1
from a population of size N. In stratified random sampling proper stratification as well as selection of appropriate number of sample
elements from each stratum is very important. The condition of the characteristic under study based on which stratification is done, is
known as stratification factor. Occupation, income, education, age, sex, economic condition, social status, geographical area etc. are
usually the basis on which stratification are done.
Example 9.3 :
In a certain locality there are 600 farmers of whom 400 are small farmers, 150 are medium farmers and 50 are big farmers. In order to
collect data on HYV rice cultivation pattern, a sample of 10% farmers is required to be drawn.
Here the characteristic under study is the cultivation pattern of HYV rice which is likely to differ for different types of farmers
(small, medium, big). As a result sample drawn by the method of simple random sampling may not be representative for the population.
The appropriate sampling method in this case is stratified random sampling. The whole population will be divided into three strata on the
basis of farm size and 10% farmers from each strata will be randomly selected to form the required sample.
10 10
n 1 N1 400 40
100 100
10 10
n2 N2 150 15
100 100
10 10
n 3 N3 50 5
100 100
n = n1 + n2 + n3 = 40 + 25 + 5 = 60.
This sample of size 60 will consist of 40 small farmers, 15 medium farmers and 5 big farmers.
Example 9.4 : 300 farmers, 100 businessmen, 200 factory workers and 100 service holders live in a certain locality. The ratio of mid
income to low income incumbents is 20 : 80 among the farmers, 70 : 30 among the businessmen, 20 : 80 among the factory workers and
40 : 60 among the service holders. For a 20% stratified random sample based on profession and income find the composition of the sample
for each stratum.
Solution :
Here N = 300 + 100 + 200 + 100 = 700
n = 700 x 20% = 140
The population is to be stratified on the basis of profession and income as shown below :
Income group
Profession Total
Mid-income Low income
20 80
Farmer 300 60 300 240 300
100 100
70 30
Businessman 100 70 100 30 100
100 100
20 80
Factory worker 200 40 200 160 200
100 100
40 60
Service holder 100 40 100 60 100
100 100
All 210 490 700
If Nij is the stratum size corresponding to ith profession and jth income group the corresponding sample constituent will be
nij = Nij x 20%
20
That is, nij = N ij
100
So the composition of the sample will be as follows :
Income group
Profession Total
Medium Low
Farmer 60 x 20% = 12 240 x 20% = 48 60
Businessman 70 x 20% = 14 30 x 20% = 6 20
Factory worker 40 x 20% = 8 160 x 20% = 32 40
Service holder 40 x 20% = 8 60 x 20% = 12 20
All 42 98 140
The sample will be of size 140 of which 42 and 98 represent the medium and low income groups respectively. Among the 42 mid
income respondents 12 are farmers, 14 are businessmen, 8 are factory workers, the remaining 8 are service holders; and among the 98 low
income respondents the corresponding numbers will be 48, 6, 32 and 12 respectively.
Advantages :
◙ Sample units are selected from different strata of the population on the basis of relative importance, so the sample drawn in this
method is more representative compared to the sample obtained by other methods.
◙ Administration of stratified random sampling is more convenient than simple random sampling.
◙ Sampling unit selection is less expensive and less time consuming in stratified random sampling compared to simple random
sampling.
◙ Supervision is comparatively easier in stratified random sampling.
Disadvantages :
◙ Stratum selection sometimes may become complicated. Improper stratification leads to reduce the reliability of the collected
information.
◙ It is not easy to determine the sample components of different strata without previous experience.
◙ Sampling is not possible if sizes of the different strata are not known.
Determination of Number of Sample Units :
In stratified random sampling, there rare two different methods for determining the number of choosable sample units from each
stratum :
(i) Proportional allocation.
(ii) Optimum allocation.
Proportional Allocation : If the number of sample units for each stratum are determined according to the same ratio, the method is
known as proportion allocation; 10% sample unit selection from each strata is an example.
Optimum Allocation : Basic principle of this method is to select sample component from each stratum such that variance of the
estimate becomes the minimum. This method is very useful if there exists wide difference in the standard deviations of different strata.
This method is popularly known as Neyman's Optimum Allocation.
Some Properties of Stratified Random Sampling :
◙ Mean of stratified random sample, y st , is an unbiased estimate of the population mean. That is E( y st )= Y
◙ Variance of the stratified sample mean is
1 N i N i n i 2
V( y st ) s i ; s i2 ith stratum variance
2 ni
N
◙ For Neyman's optimum allocation, variance of stratified random sample means is,
N i s i 2 N i s i2
V( y st ) opt
N2n N2
9.2.3. Systematic Sampling :
This is a mixed sampling procedure. In this method, only one sample unit is selected randomly and other units are selected following
specific system. From a population of N units (numbered 1, 2, ............, N) a sample of size n is drawn such that
N
N = nk k
n
Where k is an integer, usually known as sampling interval. The first unit of the sample is selected at random. Then other units are
selected systematically one after another with a regular interval of k units. If the serial number of the selected first sample unit is i (i ≤ k),
the next units of the sample will be i + k, i + 2k, ..........., i + (n-1)k.
The serial number of the randomly selected first unit is called random start. Positions of the other units depend on the position of the
random start. Possible random start and sample units of k samples of size n are described below:
Multistage sampling is relatively more flexible. Selection of sample units and data collection is easy and administrating is more
convenient. For investigation in a big area, this method is very popular.
Exercises
1. Draw a random sample (without replacement) of size 20 from a population of size 200.
2. Suppose you have a population whose elements are 3, 4, 5, 6, 7 and 8. Draw all possible samples of size 2 and prove that the mean of
these sample means is equal to the mean of the population.