Chapter 4
ESTIMATION
4.1 Introduction
In most of the statistical investigations the population parameters are
unknown, simply because, it is not possible or even advisable to study the
entire population. Still, for a meaningful investigation, we need to know
information about the parameters. This is achieved through estimates based
on samples drawn from the population.
Thus, instead of chasing after the actual values of the parameters, we
settle for the estimates with desired degree of precision. That this is a good
bargain will be evident after going through this Chapter.
There are two kinds of estimates that are usually used: 1. point
estimates and 2.imteval estimates. Estimates which specify a single value to
a population parameter are called point estimates; estimates which specify a
range of values in an interval are called interval estimates.
Despite the variety of methods for determining point estimates, these
estimates are not quite useful, because they do not permit any degree of
uncertainty about the estimate; for example, nobody would venture to assert
that there were 25,403 persons in an exhibition on a particular day. But one
would definitely be willing to state that the number of visitors on this
particular day was approximately between 25,000 and 26000. So unless one
has a lot of idle time and is willing to makes an effort, nobody will count
each and every visitor to the exhibition. All that one would say is that the
number of visitors was in the range, say, 25,000 and 26,000.
Thus, we see that interval estimation is more natural and easier to
compute. Further, we often hear people say that “ I am 90% sure that the
number of visitors is between 25,000 and 30,000”. This leads to the concept
of confidence interval. In statistics and particularly in estimation, we always
associate a certain measure of chance (or confidence) in the estimate
methods. What the above person means is that, if x denotes the number of
visitors on that particular day, then the chances are 0.9 that x lies between
25,000 and 30,000: or
p ( 25000 ≤ x ≤ 30,000) = 0.9
1
Because there are 90% chances that x lies between 25000 and 30000, this
interval (25,000, 30,000) is called a “90 percent confidence interval” for x.
This means, that, if we watch the number of visitors on 100 days, then on 90
of these days, the number would be between 25,000 and 30,000. (This also
means that on 10 of these 100 days, the number may not tall in this the
above limits).
A real situation would be, to have an interval with small range and
high confidence level. Unfortunately, this seldom happens in practice. We
will see that, if the interval is small, the confidence would be low and if the
confidence is to be high, then the corresponding interval would have to be
large.
4.2 Estimation of µ
As has already been pointed out, in any statistical investigation, we are
interested in obtaining information on the population parameters – especially
the mean µ and the standard deviation σ ( in the continuous situations), and
the proportion ( in the discrete situations).
In this Section, we will study methods of estimating µ . We make a
blanket assumption that “the population is normally distributed”. Now, it
may happen that the population standard deviation σ is known or unknown.
Accordingly, we have different methods.
4.2.1. Estimation of µ when σ is known
This is the simplest of all the cases; we illustrate this by an example.
Example;1: Twenty – five loan applications in a bank were randomly
selected for the purpose of determining the average amount requested for
each loan. Find a 95 percent confidence interval for µ assuming that the
sample mean x = Rs 900 and the standard deviation σ = Rs 140.
Solution: Since the sampling distribution of sample mean is normal with
mean µ and standard deviation σ / n
x−µ
z=
σ/ n
2
has standard normal distribution. We want to find the values a and b such
that.
p (a ≤ z ≤ b ) = 0.95
From the tables we get the values of a and b as:
a = - 1.96 and b = 1.96.
Thus we obtain the interval
−1.96 ≤ z ≤1.96 ;
x −µ
or −1.96 ≤ = ≤1.96
σ/ n
900 − µ
or −1.96 ≤ = ≤1.96
140 25
or −1.96 × 28 ≤ 900 µ ≤1.96 × 28
or 900 −1.96 × 28 ≤ µ ≤ 900 + 1.96 × 28
or 845.12 ≤ µ ≤ 954.88
This is the required 95 percent confidence interval. In other words,
p ( 845.12 ≤ µ ≤ 954.88 ) = 0.95
In general we have
σ σ
p ( x −1.96 ≤ µ ≤ x + 1.96 ) = 0.95
n n
Note: In the above example, if the experiment of drawing samples of fixed
size 25 was repeated, 95% of the times the interval ( x − 54.88, x + 54.88) will
contain the population mean µ .
3
4.2.2. Estimation of µ when σ in not known
When a population is known to have a normal distribution, but its
standard deviation is not known, then the sample standard deviation s is used
as an alternative to σ when the sample size is greater than 30.
Example: 2: To estimate the mileage of a certain model of a car, 40 cars of
that model were selected and tested on a fairly long run. The mean and
standard deviation of their mileages were 19.3 and 0.7 respectively.
Assuming that the mileage is normally distributed, is it safe to conclude,
with 95% confidence that the average mileage of this model lies between 18
and 21?
Solution: Here n = 40, x = 19.3 and s = 0.7. Here σ is not given and hence
is an unknown. So we use s in place of σ. Thus
s s
p ( x −1.96 ≤ µ ≤ x +1.96 ) = 0.95
n n
s
Now x −1.96 =19.08
n
s
x +1.96 =19.52
n
Thus, the 95% confidence interval is (19.08, 19.52).
But the given interval (18, 21) is larger than the above interval; hence, we
cannot say with 95% confidence that µ lies in (18,21).
4.2.3. Determining the sample size n
In some investigations, an upper limit for the error of the estimate has
to be fixed in advance and a suitable sample size is determined, so that the
error does not exceed this limit.
The following example illustrates this:
Example: 3: Experience with workmen in a certain industry indicates (that
the time required for a randomly selected) workman to complete a job is
normally distributed with a standard deviation of 12 minutes. how large a
sample is needed to estimate the mean of this distribution to within 3
minutes with 95% confidence?
4
Solution: If x is to estimate µ to within 3 minutes, then 1 x − µ 1≤ 3 and also
σ
p (1x − µ 1≤ 3) = 0.95 this mean that p (1.96 ≤ 3) = 0.95
n
Thus, the required sample size is given by
1.96 ×12
≤3
n
2
Or n ≥
1.96 ×12
3
Or n ≥ 61.4656
Thus, any sample of size greater than or equal to 62 will do.
Note: Work out this problem the other way: that is take σ = 12 and n = 62
and find 1.96σ / n . This value must be less than or equal to 3. Satisfy
yourself.
EXERCISES
1. In a factory, the average height of a sample of 256 workers was 62
inches with a standard deviation of 2 inches.
(a) Compute 90 and 99 percent confidence intervals for the
average height of all workers in that factory. What do you
observe?
(b) Repeat part (a), if the sample size is 100.
2. The standard deviation of the incomes of the employees of ABC &
Co. is known to be Rs. 1200. How large a sample is needed to
determine the mean income if it is desired that the chances of the error
being more than Rs50 should be less than 5%.
3. A sample of the IQ of 40% of the students in a university has the
mean IQ of 120 with a standard deviation of 4. How many students
are enrolled in this university, if a 99% confidence interval for the
mean IQ of all students extends three – tenths of the standard
deviation on either side of the sample mean?
(Hint: Let N be the number of students in the university. Then
5
n = N x 40/100 = 2N/5. Given that p 1x − µ 1≤ s = .99 . But, we also
3
10
s 3 s
have p 1x − µ 1≤ 2.56 = .99 . Hence s = 2.56 . Solve for n to find
n 10 n
N)
4.3: Estimation of Proportion – An Introduction
Suppose in a sample of size n, we observe that x number of items
possess a certain characteristic (being defective, being a smoker, having
x
deformities, favoring a candidate etc.). Then, the fraction is called the
n
proportion (based on the observation of x and sample of size n). For
example, in a survey of 49 voters, if 26 voters favored certain candidate,
then the proportion of voters favoring this candidate is 26/49.
In many investigations, such as opinion polls, health surveys, left-
handed habits, quality control etc, we need to have estimation of
proportions. The method adopted is usually the following one: This being
the discrete case, we assume that x is binomially distributed; hence its mean
will be np and variance np(1 – p). Thus, the proportion has mean p and
variance p (1 − p) / n , so that by the Central Limit Theorem, the variable z
given by
(1) z=
(x / n) − p
p (1 − p ) / n
has a standard normal distribution when n is greater than 30.
4.3.1 Estimation of Proportion
When the situation is of discrete type, we need to know estimation of
the proportion of ‘successes’, in order to make decisions. We illustrate this
idea by the following example.
Example:4: A manufacture wants to produce color TV sets. Before doing so,
he wants an estimate of proportion of TV set owners having color TVs, so
that he will have an idea of the available market. His sample of 100
randomly selected owners yielded 40 people possessing color sets. Let us
construct a 95% confidence interval for p.
Solution: Here x = 40 and n = 100 so that
6
(A) Since the denominator of equation (1) also involves p, which is not
known and to be estimated, this is not useful. However, our interest is
to find an estimate for p; so we replace the denominator by
1 x x
. 1 − taking the value x / n as an approximation for p.
n n n
This gives z = − p /
x 1 x x
− 1 −
n n n n
Hence, we get z = (0.4 – p ) / 0.05. We know that
p (−1.96 ≤ z ≤1.96) = 0.95;
Thus, the required interval is
0.4 −1.96 × 0.05 ≤ p ≤ 0.4 +1.96 × 0.05
0.302 ≤ p ≤ 0.498
Hence, the TV manufacturer can be sure with 95% confidence that 30.2% to
49.8% of the population of his locality owns color sets. In other words, he
can be sure that at least 50% of the people do not own color sets (with 95%
confidence), so that it is profitable so start business.
EXERCISES
1. A company manufacturing a detergent powder finds that out of 196
women surveyed, 104 use its product. Find the 99% confidence
interval for the women who use this detergent powder.
2. YXZ Ltd has proposed a set of new condition for its workers. The
union of this company hires an agency to find out about how useful
will the new condition be. The agency selects a sample of 300
workers and finds that 33% favor the change. How accurate is this
estimate of the true proportion at 95% level?
3. A manufacturer of ball – bearings believes that approximately 2
percent of his product is defective. One of his customers wishes to
estimate the percentage to within 0.05 percent so that 97% of the
7
times he can be sure that the manufacturer is right. Will a sample size
of 81 work?
4.4: Small Sample Methods
We have already seen in 4.2.1, that if the population is normal wish
standard deviation σ (known), then an estimate of µ can be found by taking
samples.
Further, we saw in 4.2.2 that, even if σ is not known, we can estimate
µ providing that the sample size n is greater than 30.
In this section, we will see that, if σ is not known and also n is smaller
than 30 (samples whose sizes are less than 30 are called small samples), still
µ can be estimated by means of “ t – distribution”. The table for this
distribution is given at the end of your Text Book. The value n – 1 is denoted
by v and is called degrees of freedom of the t – distribution.
Example:5: A sleep inducing drug was given to 16 volunteers. The observed
data produced a mean increase in sleep of 30 minutes with standard
deviation of 17 minutes; find a 90 percent confidence interval for the mean
increase in sleep for the volunteers who participated.
Solution: It is given that x = 30, s =17 and n =16. Note that σ is not given
(therefore unknown) and n is less than 30. Since it is required to construct a
90% confidence interval, we take α = 0.90 . So that 1−α = 0.10 and
1
(1−α )= 0.05
2
From the table, we find that, for v = 15 and 0.05, the value of t is 1.761.
This means that the 90% interval is given by −1.761 < t <1.761 where
x −µ
t=
s/ n
30 − µ
=
17 / 4
30 − µ
Thus, the required interval is obtained as −1.761< 1.761
17 / 4
17 17
Or 30 − × < µ < 30 + ×1.761
4 4
Or 22.52 < µ < 37.48
8
Thus, on 90% of the occasion, the mean increase in sleep will be between
22.5 minutes and 37.5 minutes.
EXERCISES
1. 20 steel washers were tested for their diameters, giving x = 0.11 inches
and s = 0.002 inch. Find a 95% confidence interval for the true mean.
2. A health inspector tests 19 bottles of certain syrup for alcohol content,
and finds that the mean alcohol content is 2.7% with standard
deviation 0.13%. Find a 99% confidence interval for the true mean
content of alcohol.
$$$ ### $$$