[go: up one dir, main page]

0% found this document useful (0 votes)
2 views19 pages

Methods Chapter 2

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 19

CHAPTER TWO

2. INFERENCE ABOUT SINGLE POPULATION MEAN AND PROPORTION

Inference is the process of making conclusions about the characteristics of a population based on
data from a sample. In statistics there are two ways through which inference can be made
namely:
 Estimation
 Hypothesis testing.

2.1. Estimation for a Single Population Mean and Proportion

Estimation is one way of making inference about the population parameter where the
investigator does not have any prior notion about values or characteristics of the population
parameter. It is the process of using sample statistic to estimate the value/values of unknown
population parameters. In general, there are two ways estimation. These are point estimation and
Interval estimation as details are given below:

2.1.1. Point Estimation for a Single Population Mean and Proportion

Point estimation is a procedure that results in a single value as an estimate for a parameter. It is
a process of calculating a single value from the sample data to estimate the parameter. Then, the
point estimate for a single population Mean and Proportion can be given as: the sample mean
is a point estimate for the population mean µ and Suppose that we select n

random samples from the population and x be number of objects/ individuals possessing that
characteristic in the sample, then and is the point estimate of the population proportion

P. The following are some qualities or properties of best estimator for population parameter
 It should be unbiased. =>Its expected value must be the value of the parameter being
estimated i.e.,
 It should be consistent in a sense that an estimator must gets closer to the value of the
parameter as the sample size increases i.e., .
 It should be relatively efficient. The estimator for a parameter with the smallest variance
is termed as relatively efficient estimator.

7.1.2. Interval Estimation for a Single Population Mean and Proportion

1|Page Statistical Methods


Interval Estimation

It is the procedure that results in the interval of values as an estimate for a parameter which
contains the likely values of a parameter. It deals with identifying the upper and lower limits of a
parameter. When we make confidence intervals for population parameter, we have to consider

 The point estimate of the parameter


 The distribution that the point estimate follows
 Levels of confidence (it is the confidence we have that the parameter lies in the
confidence interval, or it is the probability that the interval estimate a population
parameter. Usually it is taken to be 90%, 95% and 99%.
a) Interval Estimation for a Single Population Mean

Suppose that be a random sample from population X having mean and variance
, then there are three different cases to be considered to construct confidence intervals for
population mean( ) as given below:

Case 1: If the population distribution is normal with known variance, size of sample may be
small or large

(X  Z 2  n, X  Z 2  n ) is a (1   )100% confidence interval for  which will be


derived as: with (1   ) is the probability that the parameter lies inside the interval from normal

distribution we will have P( Z   Z  Z  2 )  1  


2

X 
P (  Z   Z 2 )  1  
 n
2

P ( X  Z 2  n    X  Z 2  n)  1

From standard normal table, Z 2 values corresponding to the most commonly used confidence

levels are:

100(1   ) %   2 Z 2

90 0.10 0.05 1.645


95 0.05 0.025 1.96
99 0.01 0.005 2.58

2|Page Statistical Methods


Example 1: A researcher was interested to estimate mean growth (in cm) of plants per year using
interval estimation. From a sample of size 25 plants a mean growth was computed to 32cm. It
was known that the growth of all plants follows normal distribution with standard deviation is
4.2cm. Then, construct
a) A 95% confidence interval for the true population mean growth.
b) A 99% confidence interval for the true population mean growth.

Solution

a) X  32,   4.2, 1    0.95    0.05,  2  0.025


 Z  2  1.96 from Z  table.
 The requiredint erval will be X  Z  2  n
 32  1.96 * 4.2 25
 32  1.65
 (30.35, 33.65)

Hence, we can be 95% confident that the mean growth of plants is to be between 30.35 and
33.65cm.

b) X  32,   4.2, 1    0.99    0.01,  2  0.005


 Z  2  2.58 from Z  table.
 The requiredint erval will be X  Z  2  n
 32  2.58 * 4.2 25
 32  2.17
 (29.83, 34.17)

This implies that we can be 99% confident that the mean growth rate of plants is to be between
29.83 and 34.17.

Case 2: When sampling is from a non-normally distributed population or from a


population whose distribution is unknown, but sample size is large

Recall the Central Limit Theorem which states that the sampling distribution of X will have a
mean  x   and a standard deviation  x  n , and approaches a normal distribution as n

3|Page Statistical Methods


gets large regardless of sampling population distribution. This allows us to use the normal
distribution for computing confidence intervals. Then,
(X  Z 2  n, X  Z 2  n ) is a (1   )100% confidence interval for 

But usually  2 is not known, in that case we estimate by its point estimator S2and then

(X  Z 2 S n, X  Z 2 S n ) will be a (1   )100% confidence interval for .

Exercise 2.1: A random sample of 625 households was drawn from a town and a survey
generated data on weekly expenditure on food. The mean in the sample was Birr 550 with a
standard deviation Birr 90. Construct a 95% confidence interval for the population mean weekly
expenditure of households on food.
Case 3: If the population distribution is normal with unknown variance and sample size is
small
100(1-α)% confidence interval for population mean is:
( X  t 2 S n , X  t 2 (df  n  1) S n)

Here in this case a table of student t distribution with n-1 degree of freedom (refer student t
distribution in chapter 5 of your lecture notes) will be used rather than standard normal
distribution.

Example2. A drug company is testing a new drug which is supposed to reduce blood pressure.
From the six people who are used as subjects, it is found that the average drop in blood pressure
is 2.28 points, with a standard deviation of .95 points. Construct 95% CI for the true mean
change in pressure assuming normal population distribution?

X  2.28, S  0.95, 1    0.95    0.05,  2  0.025


 t  2  2.571 with df  5 fromstudent  t  table.
 The requiredint erval will be X  t  2 (df  5) S n
 2.28  2.571* 0.95 6
 2.28  1.008
 (1.28, 3.28)

4|Page Statistical Methods


This implies that we can be 95% confident that the mean decrease in blood pressure is between
1.28 and 3.28 points.

Sampling distribution of sample proportion & its estimation

Sampling distribution of sample proportion

We talked about the sample mean's sampling distribution in previous Section. However, there are
numerous real-world instances in business and other fields where data is collected in the form of
counts or is divided into two categories or groups based on an attribute. Examples include
dividing colony residents into two groups (male and female) based on characteristic sex, dividing
hospital patients into two groups based on whether they have cancer or not, and dividing a batch
of goods into defective and non-defective categories, among others.

Such data are typically evaluated in terms of the proportion of components, people, units, or
products that posses (success) in a particular characteristic or quality. As an illustration, consider
the population's gender distribution, the number of cancer patients treated in a hospital, the
number of lots with defective goods, etc. Instead of dealing with population mean in these
circumstances, we deal with population proportion.

When the population proportion is unknown and the total number of population is too large to
determine the proportion. In this case, the sampling distribution of sample proportion is needed
in order to draw conclusions about the population proportion.

For sampling distribution of sample proportion, we draw all possible samples from the
population and for each sample we calculate the sample proportion as

where, x is the number of observations in the sample which have the particular characteristic
under study and n is the sample size.

Suppose, there is a lot of 3 cartons A, B & C of electric bulbs and each carton contains 20 bulbs.
The number of defective bulbs in each carton is given below:

Table: Number of Defective Bulbs per Carton

Carton Number of
Defective Bulbs
5|Page Statistical Methods
A 2
B 4
C 1
The population proportion of defective bulbs can be obtained as

Now, let us assume that we do not know the population proportion of defective bulbs. So we
decide to estimate population proportion of defective bulbs on the basis of samples of size n = 2.
There are possible samples of size 2 with replacement. The all possible samples and
their respective proportion defectives are given in the following table:

Table: Calculation of Sample Proportion

Sample Sample carton Sample Sample


Observation Proportion( )
1 (A, A) 2, 2 4/40
2 (A, B) 2, 4 6/40
3 (A, C) 2, 1 3/40
4 (B, A) 4, 1 5/40
5 (B, B) 4, 4 8/40
6 (B, C) 4, 1 5/40
7 (C, A) 1, 2 3/40
8 (C, B) 1, 4 5/40
9 (C, C) 1, 1 2/40
From the above table, we can see that value of the sample proportion is varying from sample to
sample. So we consider all possible sample proportions and calculate their probability of
occurrence. Since there are 9 possible samples therefore the probability of selecting a sample is
1/9. Then we arrange the possible sample proportion with their respective probability in Table
given below:

S.No Sample Frequency Probability


Proportion( ) (fi)
1 2/40 1 1/9
2 3/40 2 2/9
3 4/40 1 1/9
4 5/40 2 2/9
5 6/40 2 2/9
6 8/40 1 1/9

6|Page Statistical Methods


This distribution is called the sampling distribution of sample proportion. Therefore, the mean of
sampling distribution of sample proportion can be obtained as:

Thus, we have seen that mean of sample proportion is equal to the population proportion.

As we have already mentioned in the previous unit that finding mean, variance and standard
error from this process is tedious so we calculate these by another short-cut method when
population proportion is known.

If X is a binomial random variable (x success in n trials) with parameters n and p then

E(X) = nP and Var(X) = nPQ

where P = 1-Q and P is the probability or proportion of success in the population.

Now, we can easily find the mean and variance of the sampling distribution of sample proportion
by using the above expression as

and the variance

Also standard error of sample proportion can be obtained as

If the sampling is done without replacement from a finite population then the mean and variance
of sample proportion is given by

and

where, N is the population size and the factor (N-n) / (N-1) is called finite population correction.

If sample size is sufficiently large, such that np > 5 and nq > 5 then by central limit theorem, the
sampling distribution of sample proportion p is approximately normally distributed with mean P
and variance PQ/n where, Q = 1‒ P.

7|Page Statistical Methods


Note: If the sample size and number of defective items in the sample are fairly small, we can
calculate using the binomial distribution directly.

Example: A machine produces a large number of items of which 15% are found to be defective.
If a random sample of 200 items is taken from the population and sample proportion is calculated
then find
a) Mean and standard error of sampling distribution of proportion.
b) The probability that less than or equal to 12% defectives are found in the sample.
Solution: Here, we are given that

P = 15/100 = 0.15, n = 200

a) We know that when sample size is sufficiently large, such that np > 5 and nq > 5 then
sample proportion p is approximately normally distributed with mean P and variance PQ/n
where, Q = 1– P. But here the sample proportion is not given so we assume that the
conditions of normality hold, that is, np > 5 and nq > 5. So mean of sampling distribution of
sample proportion is given by

Therefore, the standard error is given by

b) The probability that the sample proportion will be less than or equal to 12% defectives is
given by

To get the value of this probability, we can convert the random variate into standard normal
variate Z by the transformation

8|Page Statistical Methods


Exercise: A state introduced a policy to give loan to unemployed doctors to start own clinic. Out of
10000 unemployed doctors 7000 accept the policy and got the loan. A sample of 100 unemployed doctors
is taken at the time of allotment of loan. What is the probability that sample proportion would have
exceeded 60% acceptance.

Intervals Estimation of a Single Population Proportion

Let the population consist of N objects/individuals and be proportion of objects/individuals


in the population who possess a certain characteristics. Suppose that we select random
samples from the population and be number of objects/ individuals possessing that
characteristic in the sample, and be proportion of objects/ individuals possessing that
characteristic in the sample as discussed above. The 1001   % confidence interval (CI) for
is given as:

Exercise: In university, there are 5000 students, among a random sample of 250 students, 38 are
found to be left handed, then construct 95% CI for the true population proportion of left handed
students in the university.

Exercise: In a survey of diabetics in a large city, it was found that 100 out of 400 persons have
diabetic. Construct 95% CI for the true proportion of diabetics in the city.

9|Page Statistical Methods


Sampling Distribution of Sample Variance

In previous sections, we have discussed the sampling distributions of sample mean and sample
proportion. But many practical situations concerned with the variability. For example, a
manufacturer of steel ball bearings may want to know about the variation of diameter of steel
ball bearing, a life insurance company may be interested in the variation of the number of polices
in different years, etc. Therefore, we need information about the sampling distribution of sample
variance.

For describing the sampling distribution of the sample variance, we consider all possible sample
of same size, say, n taken from the population having variance and for each sample we
calculate sample variance . The values of may vary from sample to sample so we construct
the probability distribution of sample variances. The probability distribution thus obtained is
known as sampling distribution of the sample variance. Therefore, the sampling distribution of
sample variance can be defined as:

“The probability distribution of all values of the sample variance would be obtained by drawing
all possible sample of same size from the parent population is called the sampling distribution of
the sample variance.”

Theorem: are observations of a random sample of size from the normal


distribution, i.e.

 then

1) and are independent.

2)

Proof:

Leave the proof of number 1, because it is beyond the scope of the course.
So, we'll just have to state it without proof.

Now for proving number 2, let

10 | P a g e Statistical Methods
, by adding

, the 3rd term of sums

to zero, so, reduces to:

We can do a bit more with the first term of . As an aside, if we take the definition of the
sample variance:

So, the numerator in the first term of can be written as a function of the sample variance. i.e.:

The term on the left side of the equation is a sum of independent random variables.
That's because we have assumed that are observations of a random sample of

size from the normal distribution i.e. Therefore; follows a standard normal

distribution. Now, recall that if we square a standard normal random variable, we get a chi-
square random variable with 1 degree of freedom. So, again:

is a sum of independent chi-square(1) random variables. Hence, the sum is a chi-square


random variable with degrees of freedom.
Now, the second term of , on the right side of the equals sign, that is:

11 | P a g e Statistical Methods
is a chi-square(1) random variable. That's because the sample mean is normally

distributed with mean and variance, . Therefore:

is a standard normal random variable. So, if we square , we get a chi-square random variable
with 1 degree of freedom:

2.2. Hypothesis Testing About a Single Population Mean and Proportion

2.2.1. Basic Concepts in Hypothesis Testing

Hypothesis: is an assertion or statement about the population parameter(s) and its plausibility is
to be evaluated based of the sample data. Hypothesis Testing is also one way of making inference
about population parameter, where the investigator has prior notion about the value of the
parameter. There are many situations in which we have to make decisions based on observations
or data that are random variables. The theory behind the solutions for these situations is known
as decision theory or hypothesis testing. In this part we will present a brief view of hypothesis
testing about the value of single population characteristics. In any hypothesis testing problem
there are two contradictory hypotheses. These are:

 The null hypothesis


 It is a claim about one or more population characteristics initially assumed true
 It is the hypothesis to be tested.
 It is the hypothesis of equality or the hypothesis of no difference.
 Usually denoted by H0.
 Alternative hypothesis:
 It is the hypothesis available when the null hypothesis has to be rejected.

12 | P a g e Statistical Methods
 It is the hypothesis of difference.
 Usually denoted by H1 or Ha.

Test statistic: is a statistics whose value serves to determine whether to reject or accept the
hypothesis to be tested. It is a random variable.

Hypothesis testing is a method for using sample information to decide whether the null
hypothesis is rejected or not. The null hypothesis, will be rejected in the favor of the alternative
hypothesis, only if the sample evidence supports the null hypothesis is false. So, in hypothesis
testing problem we will make two decisions.

I. Either rejects the null hypothesis and accept the alternative hypothesis, or
II. Fail to reject the null hypothesis and reject the alternative hypothesis,

Types of Errors in Hypothesis Testing


The only way to be absolutely certain whether is true or false is to study the entire population.
Since our decision- to reject or fail to reject is based on a sample, we must accept that our
decision might be wrong/ incorrect. Hence, during decision making in hypothesis testing, we
might reject when it is actually true, or we might fail to reject when it is false. In hypothesis
testing we commit two kinds of errors.
Type-I Error
 Occurs when Ho is rejected while it is true
 It is more serious than type-II error
 The probability of making Ho it is denoted by ⍺.
Type-II Error
 Occurs when H0 is not rejected while it is false
 The probability of making it is denoted by β

2.2.2. Steps in Hypothesis Testing

General procedures /Steps for Hypothesis Testing are the following:

Step1. Formulate the null and alternative hypothesis

Step2. Specify the probability of committing a type-I-error (⍺), or level of significance.

Step3. Based on sampling distribution of appropriate sample statistic evaluate the test statistic.

13 | P a g e Statistical Methods
Step4. Based on sampling distribution of appropriate sample statistic, identify the critical or

rejection regions. Critical or rejection regions are the set of all test statistic values for

which H0 will be rejected.

Step5. Make decision: Decide whether to accept or reject the. We will reject if and only if the
observed or computed test statistic values falls in the rejection region.

Step6. Draw conclusion: Based on the decision we made, we have to make conclusion about the
population characteristics using the information obtained from the sample evidence.

2.2.3. Hypothesis Testing About a Single Population Mean

Suppose that the hypothesized/claimed/ value of true population average (  ) is denoted by  0 ,

then one can formulate two sided (1) and one sided (2 and 3) hypothesis as follows:
1. H 0 :    0 versus H1 :   0

2. H 0 :    0 versus H1 :   0

3. H 0 :    0 versus H1 :   0

Then, the choice the test statistic depends on the three different cases considered in constructing
confidence intervals for population mean as given below:

Case 1: If the population distribution is normal with known variance

The relevant test statistic is:


X  0
Zcal 
 n
After specifying  we will have the following critical/tabulated values from standard normal
distribution table corresponding to the above three hypothesis are Z 2 ,  Z and Z  for two

sided at (1) and one sided at (2) and (3) , respectively. For instances at common choice of
  0.05 critical/tabulated values from standard normal distribution table corresponding to the
above three hypothesis are:
Under Ha  Critical Values
  0 0.05 Z 2  1.96

  0 0.05  Z   1.645

14 | P a g e Statistical Methods
  0 0.05 Z   1.645
By comparing calculated value of the test statistic with critical values from a table, decision rules
corresponding to the above three hypothesis two sided at (1) and one sided at (2) and (3) are:
Under Ha Reject H0 if Accept H0 if Inconclusive if
  0 Z cal  Z 2 Z cal  Z 2 Z cal  Z 2 or Z cal  Z 2
  0 Z cal  Z Z cal  Z Z cal  Z
  0 Z cal  Z Z cal  Z Z cal  Z

Example 1: The mean life time of a sample of 36 light bulbs produced by a company is
computed to be 1570 hours. The population of life time of light bulbs produced by a company
follows normal distribution with standard deviation of 120 hours. Suppose the hypothesized
value for the population mean is 1600 hours. Can we conclude that the life time of light bulbs is
different from 1600 hours? (Use   0.05 )
Solution: Let μ is population mean and μo=1600 is hypothesized population mean

Step 1: Identify the appropriate hypothesis

H 0 :   1600 vs H 1 :   1600

Step 2: select the level of significance,   0.05 ( given)

Step 3: Select an appropriate test statistic

Z- Statistic is appropriate because population variance is known.


X  μ0 1570  1600
Z cal    1.5
σ n 120 36

Step 4: obtain the critical value from the Z-table


Here at 5% level of significance, the critical value is Z 2  1.96

Step 5: Decision
At 1% level of significance we do not reject H0, since calculated value of absolute value of Z test
statistic (Zcal=-1.5) is not greater than tabulated value of Z ( Z 2  1.96 ).

Step 6: Conclusion

15 | P a g e Statistical Methods
Thus, based on the above decision in (5), we conclude that average life time of light bulbs for the
population is 1600 hours. In other words at 5% level of significance, we conclude that there is no
evidence to say that that the life time of light bulbs is different from 1600 hours, based on the
given sample data.
Exercise 1: A researcher claims that the average wind speed in a certain city is 8 miles per hour.
A sample of 32 days has an average wind speed of 8.2 miles per hour which is drawn from
normal distributed wind speed with hypothesized mean and standard deviation of the population
is 0.6 mile per hour. At = 0.05, is there enough evidence to reject the claim?
Case 2: When sampling is from a non-normally distributed population or from a
population whose distribution is unknown, but sample size is large
If a sample size is large one can perform a test hypothesis about a single population mean by
using Z test statistic and computed as:
X  0
Zcal  , if is known
 n

X  0
Zcal  , if is unknown
S n

Obtaining critical/tabulated/ values and decision rules are the same as case 1 above.
Exercise 1: A random sample of 400 households was drawn from a town and a survey generated
data on weekly earning. The mean in the sample was Birr 250 with a standard deviation Birr 80.
Test the hypothesis that the average weekly earnings is 280 birr at 5% level of significance and
also construct a 95% confidence interval for the population mean earning.

Case 3: When sampling is from a normal distribution with 2 unknown and sample size is
small
If a sample size is small one can perform a test hypothesis about a single population mean by
using t test statistic and computed as:

X  0
t cal  ~ t distributionwith n  1 degrees of freedom.
S n

After specifying  we will have the following critical/tabulated values from student t
distribution table corresponding to the above three hypothesis are t 2 (df  n  1) ,

16 | P a g e Statistical Methods
 t  (df  n  1) and t  (df  n  1) for two sided at (1) and one sided at (2) and (3),

respectively.

By comparing calculated value of the test statistic with critical values from a table, decision rules
corresponding to the above three hypothesis two sided at (1) and one sided at (2) and (3) are:
Under Ha Reject H0 if Accept H0 if Inconclusive if
  0 t cal  t 2 (df  n  1) t cal  t 2 (df  n  1) t cal  t 2 (df  n  1) or t cal  t 2 (df  n  1)

  0 t cal  t (df  n  1) t cal  t  (df  n  1) t cal  t (df  n  1)

  0 t cal  t  (df  n  1) t cal  t  (df  n  1) t cal  t  (df  n  1)

Example 1: Test the hypotheses that the average weight gain of sheep from certain a diet after 6
months of feeding is 10 kilogram if the a random sample of 10 sheep weight gain are 10.2, 9.7,
10.1, 10.3, 10.1, 9.8, 9.9, 10.4, 10.3, and 9.8 kilogram. Use the 0.01 level of significance and
assume that the distribution of weight gain is normal.
Solution: Let μ is population mean and μo=10,
From the sample data, sample mean and standard deviation are computed to be:
X  10.06, S  0.25
Step 1: Identify the appropriate hypothesis
H 0 : μ  10 vs H1 : μ  10

Step 2: select the level of significance, α=0.01


Step 3: Select an appropriate test statistic:
The appropriate test statistic is t-test statistic because population variance is not known and the
sample size is also small, then
X  0 10.06  10
t cal    0.76
S n 0.25 10

Step 4: obtain critical value or tabulated value of the test statistic


Here at 1% level of significance with df=9, the critical value is t 0.005 (9)  3.2498
Step 5 : Decision
At 1% level of significance we do not reject H0, since calculated value of absolute value of t test
statistic is not greater than value of tabulated t.
Step 6: Conclusion

17 | P a g e Statistical Methods
Thus, based on the above decision in (5), we conclude that average weight gain of sheep from a
certain diet is 10 kilogram. In other words at 1% level of significance, we have no evidence to
say that the average weight gain of sheep from a certain diet is different from 10 kilogram, based
on the given sample data.
Exercise 1: A manufacturer has developed a new fishing line, which the company claims has a
mean breaking strength of 15 kilograms. To test a claim about the mean a random sample of 25
lines was tested and their average was computed to be 14 with standard deviation of 0.5
kilograms. Test the hypothesis that μ = 15 kilograms against the alternative that μ≠15 kilograms
assuming that breaking strength follows normal distribution.

2.2.4 Hypothesis Testing About a Single Population Proportion

It deals with comparing a single sample with a population value and tests whether the proportion
of a single population differs from a specified constant. For a two-tailed test of a proportion,
hypothesis to be tested is: H0: p = p0 versus HA: p ≠ p0 where p is the population proportion and
p0 is the hypothesized value and other possible alternatives are: HA: p>p0, HA: p<p0. If the
hypothesized value of population proportion is given then, the test statistic about a single
population proportion can take the form:
pˆ  p0
Z cal  ~Z(0,1) provided that sample size is large, i.e., np and nq are both at least 5.
p0 q0
n
The critical value is will be obtained from the standard normal table. The steps involved and
decision rule in testing hypothesis about a single population proportion remain the same to that of
testing hypothesis about single population mean under case 1 above.

Exercise 1: Out of 146 children examined for hearing disability at School-Z, 21 were found to
have some type of hearing abnormality. Does this confirm with the statement that 20% of these
school children have abnormality?

Exercise 2: In a survey of diabetics in a large city, it was found that 100 out of 400 have
diabetic foot. Can we conclude that 20 percent of diabetics in the sampled population have
diabetic foot. Test at the =0.05 significance level.

2.3. Sample size Determination

18 | P a g e Statistical Methods
Sample size determination is closely related to statistical estimation. Quite often you ask: how
large a sample is necessary to make an accurate estimate? The answer is not simple, since it
depends on three things: the margin of error, the population standard deviation, and the degree of
confidence. For example, how close to the true mean do you want to be (2 units, 5 units, etc.),
and how confident do you wish to be (90, 95, 99%, etc.)?

When sample data are used to estimate a population mean μ, the margin of error, denoted by E, is
the maximum likely (with probability 1-α) difference between the observed sample mean and the
true value of the population mean. The margin of error is also maximum error of the estimate and
can be found by multiplying the critical value and the standard deviation of the estimator.
Sample size needed in comparing two means from independent samples
From confidence interval estimation for single population mean ( ), the margin of error, denoted
by B, is the maximum likely (with probability 1-α) between the observed sample mean and the
true value of the population mean will be:

and solving for n, we will have

is sample size required to estimated single population mean in continuous variable,

where is population variance to be taken from previous study or estimated from pilot study.
Also, from confidence interval estimation for single population proportion ( ), the margin of
error, denoted by B, is the maximum likely (with probability 1-α) between the observed sample
proportion and the true value of the population proportion will be:

and solving for n

is sample size required to estimated single population mean in binary categorical

variable,
Where P is population proportion to be taken from previous study or estimated from pilot study.

19 | P a g e Statistical Methods

You might also like