[go: up one dir, main page]

0% found this document useful (0 votes)
71 views46 pages

Sampling Distribution and P G Estimation: T I3 Topic 3

This document discusses statistical estimation and sampling distributions. It defines key terms like population parameters, statistics, estimates, and estimators. The document explains that estimators like the sample mean and variance are used to calculate point estimates of population values. It also introduces the concept of a sampling distribution, which describes the distribution of all possible values that a statistic like the sample mean could take on from samples of a given size. The document provides examples to illustrate concepts like sampling error versus non-sampling error and how to calculate the sampling distribution of the sample mean from a given population.

Uploaded by

taiiq zhou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views46 pages

Sampling Distribution and P G Estimation: T I3 Topic 3

This document discusses statistical estimation and sampling distributions. It defines key terms like population parameters, statistics, estimates, and estimators. The document explains that estimators like the sample mean and variance are used to calculate point estimates of population values. It also introduces the concept of a sampling distribution, which describes the distribution of all possible values that a statistic like the sample mean could take on from samples of a given size. The document provides examples to illustrate concepts like sampling error versus non-sampling error and how to calculate the sampling distribution of the sample mean from a given population.

Uploaded by

taiiq zhou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

T i 3

Topic

Sampling
p g Distribution and
Estimation

Section 1 – Estimation
Section 1.1 Point Estimate

Statistical inference enables us to make judgments about a


population on the basis of sample information.
information

The mean, standard deviation, and proportions of a


population are called population parameters; in
other words, they serve to define the population.

Estimating a population’s parameters is essential to


statistical analysis, and sometimes sampling is the best
(fastest and most economical) way to approach the study.

1
Section 1 – Estimation
Section 1.1 Point Estimate

Definitions:

A parameter or population parameter is a


characteristic of an entire population.

A statistic is a summary measure that is computed to


describe a characteristic for only a sample of the
population.
population

An estimate is a specific observed value of a statistic.

Section 1 – Estimation
Section 1.2 Estimator

The rule that specifies how a sample statistic can be


obtained for estimating the population parameter is called
an estimator. It is the random variable, defined by a
formula, from which we obtain all possible estimates.

The point estimate is the single number that is


obtained from the estimator. It is a single value calculated
from only one sample,
sample used to estimate a population
parameter.

Point estimation is a process that generates specific


numbers, each of which is a point estimate.

2
Section 1 – Estimation
Section 1.2 Estimator

The symbols we use to represent several important


population parameters and their sample counterparts:

Population Sample
Parameter Statistic

Mean  X
Standard deviation  s
Variance 2 s2
Proportion p p

Section 1 – Estimation
Section 1.2 Estimator

Example:

If a professor wants information on central tendency in a


list of test scores, she can calculate a sample mean.

The number for the sample mean is called the estimate,


and the sample mean is the estimator for the population
mean.
ea .

3
Section 1 – Estimation
Section 1.2 Estimator

Example:

Suppose that a professor, whose course has an enrollment


of 50 students, wants information on the performance of
his class.

H takes
He k a sample
l off 10 scores:

95, 67, 89, 70, 56, 97, 68, 78, 50, 79

Section 1 – Estimation
Section 1.2 Estimator

The estimator for the population mean is the sample


mean X .
mean,

The estimate for the population mean, on the basis of the


10 sample scores, is

95  67    79
X  74.9
10

4
Section 1 – Estimation
Section 1.2 Estimator
The estimator for the population variance is the sample
variance, s 2 .

The estimate of the population variance is

s 2

 95 2
 67 2    792   10(74.9) 2
 247.65
10  1
The professor can use X  74.9 and s 2  247.65 to do his
or her class performance analysis. The formula for
combinations reveals that there are 50 C10  10,272,278,000
possible estimates each for the population mean and the
population variance.

Section 1 – Estimation
Section 1.2 Estimator

Definition:

An Interval
A I t l Estimate
E ti t is i constructed
t t d aroundd the th
point estimate, and it is stated that this interval is likely to
contain the corresponding population parameter. Interval
estimates indicate the precision, or accuracy, of an
estimate and are therefore preferable.

In order to have an in-depth study of the interval estimate,


we have to study the sampling distribution for the
estimated parameters (i.e. X , s 2 , and p ).

5
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

Definition:

The population
l ti di t ib ti
distribution is the probability
distribution of the population data.

The probability distribution of X is called the sampling


distribution of X. It lists the various values that X can
assume and the probability of each value of X .

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean

Example:

Suppose there are only five students in an advanced


statistics class and the midterm scores of these five
students are:
70 78 80 80 95

6
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

Let X denote the score of a student. Using single-valued


classes, the frequency distribution of scores is depicted as
f ll
follows:
X f f ( x)
70 1 0.2
78 1 0.2
80 2 0.4
95 1 0.2

The values of the mean and standard deviation calculated


for the probability distribution give the values of the
population parameters   80.6 and   8.0895 .

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean

Example:

Reconsider the population of midterm scores of five


students given in the previous example.

Consider all possible samples of three scores each that can


be selected, without replacement, from that population.

 The total number of possible samples is 5 C3  10 .

7
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

Suppose we assign letters A, B, C, D, and E to the scores


of five students so that

A = 70, B = 78, C = 80, D = 80, E = 95.

Then the 10 possible samples of three scores each are


ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE,
CDE.

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean
These 10 samples and their respective means are listed in
the following table:
Sample Scores in the Sample X
ABC 70 78 80 76.00
ABD 70 78 80 76.00
ABE 70 78 95 81.00
ACD 70 80 80 76.67
ACE 70 80 95 81 67
81.67
ADE 70 80 95 81.67
BCD 78 80 80 79.33
BCE 78 80 95 84.33
BDE 78 80 95 84.33
CDE 80 80 95 85.00

8
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

By using the value of X , we record the frequency


distribution of X as follows:

X f f (X )
76.00 2 0.2
76.67 1 0.1
79.33 1 0.1
81 00
81.00 1 01
0.1
81.67 2 0.2
84.33 2 0.2
85.00 1 0.1

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean
Sampling Error is the difference between the value of
a sample statistic and the value of the corresponding
population parameter.
parameter
In the case of the mean,
Sample Error  X  
assuming that the sample is random and no non-sampling
p g error occurs because of
error has been made. A sampling
chance.

Non-sampling Errors are errors that occur in the


collection, recording, and tabulation of data. Such errors
occur because of human mistakes and not chance.

9
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

Comparison between Sampling and Non-Sampling Errors

Sampling errors Non-sampling errors


•occurs only when a •occur both in a sample survey
sample survey is and in a census
conducted
•ccan be minimized
ed by preparing
p ep g
•impossible to avoid the survey questionnaire carefully
sampling error and handling the data cautiously

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean

Example:

Reconsider the population of midterm scores of five


students given in the previous example.

The population mean is

70  78  80  80  95
  80.60
5

10
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

Now suppose we take a random sample of three scores


from this population. Assume that this sample includes
the
h scores 70, 80, andd 95. The
h mean for
f this
hi sample
l is
i

70  80  95
X  81.67
3
Consequently, Sample Error  X    81.67  80.60  1.07

That is, the mean score estimated from the sample is 1.07
higher than the mean score of the population. Note that
this difference occurred due to chance, that is, because we
used a sample instead of the population.

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean

Now suppose, when we select the above mentioned sample,


we mistakenly record the second score as 82 instead of 80.
As a result,
l we calculate
l l the h samplel mean as

70  82  95
X  82.33
3

Consequently, this difference between the sample mean


and the population mean is

X    82.33  80.60  1.73

11
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean
However, this difference between the sample mean and the
population mean does not represent the sampling error.

As we calculated earlier, only 1.07 of this difference is due


to the sampling error.

The remaining portion, which is equal to 1.73  1.07  0.66


represents the non
non-sampling
sampling error because it occurred due
to the error we made in recording the second score in the
sample.

 Sampling error = 1.07 , Non-sampling error = 0.66

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean

The mean and standard deviation calculated for the


sampling distribution of X are called the mean  X
and standard deviation  X of X .

Actually, the mean and standard deviation of X are,


respectively, the mean and standard deviation of the means
of all samples of the same size selected from a population.

The standard deviation of  X is also called the standard


error of X .

12
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

Mean of the Sampling Distribution of X

The mean of the sampling distribution of X is equal to the


mean of the population. Thus,

X  

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean

Standard Deviation of the Sampling Distribution of X

The standard deviation of the sampling distribution of X


is

X 
n

where
h  is
i the
th standard
t d d deviation
d i ti off the
th population
l ti andd n is
i
the sample size. This formula is used when n / N  0.05 ,
where N is the population size.

13
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

If this condition is not satisfied, we use the following


formula to calculate  X

 N n
X  
n N 1

finite population
correction factor

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean

The Shape of the Sampling Distribution of X

Case I:
Sampling from a Normally Distributed Population

Case II:
Sampling from a population that is not Normally
Distributed

14
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

Case I: Sampling from a Normally Distributed


Population
When the population from which samples are drawn is
normally distributed with its mean equals to  and standard
deviation equal to  , then

p of the sampling
1. The shape p g distribution of X is normal,
whatever the value of n.
2. The mean of X ,  X , is equal to  .

3. The standard deviation of X ,  X , is equal to  X  .
n
( assume n / N  0.05 )

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean

Case II: Sampling from a population that is not


Normally Distributed

Most of the time the population from which the samples are
selected is not normally distributed.

However, if the sample size is at least 30, the shape of the


sampling distribution of X is inferred from a very important
theorem called the Central Limit Theorem (CLT).

15
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

Central Limit Theorem (CLT)


For a large sample size (usually considered large if n  30 )
1. The sampling distribution of the sample mean X is
approximately normal, irrespective of the shape of the
population distribution.
2. The mean of X ,  X , is equal to  .

q to  X 
3. The standard deviation of X ,  X , is equal .
n

If the population distribution is fairly symmetrical, the


sampling distribution of the sample mean X is
approximately normal if sample size n  15 .

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean
Sampling Distribution of X

Normal Population Non-normal


Non normal Population

Mean X   X  

Standard error X  / n X  / n

Shape Normal Approximate Normal if n  30

   2     2 
Notation X ~ N  ,   X ~ N  ,  
  n     n  
   

16
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

Example:

A company which manufactures drink dispensing


machines sets the fill level at 198cc. The standard
deviation is 4cc. Assume that the fill levels have a normal
distribution.

( a ) A drink is randomly selected, what is the probability


that the drink will have less than 195cc?

( b ) What is the probability that a random sample of 50


drinks has a mean value greater than 199cc?

Section 2 – Sampling Distribution


Section 2.1 Sampling Distribution of the Sample Mean

Solution:

( a ) A drink is randomly selected,


selected what is the probability
that the drink will have less than 195cc?

Let X be the fill level and  be the mean fill level.


Given X  N (198, 42 ) ,

 X   195  198 
P ( X  195)  P   
  4 
 P  Z  0.75   0.2266

17
Section 2 – Sampling Distribution
Section 2.1 Sampling Distribution of the Sample Mean

( b ) What is the probability that a random sample of 50


drinks has a mean value greater than 199cc?
Let X be the sample mean. Since the population is
normally distributed, thus the shape of the sampling
distribution of X is normal. We have,
 4   4  
2

 X    198;;  X    X  N 198,,   
n 50   50  

 X   X 199  198 
P ( X  199)  P     P  Z  1.77   0.0384
  X 4 / 50 

Section 2 – Sampling Distribution


Section 2.2 Sampling Distribution of the Sample Proportion

Definition:

The population
l ti proportionti , denoted by p, is
obtained by taking the ratio of the number of elements in a
population with a specific characteristic to the total number
of elements in the population.

The sample proportion, denoted by p , gives a similar


ratio for a sample.

18
Section 2 – Sampling Distribution
Section 2.2 Sampling Distribution of the Sample Proportion

The population and sample proportions, denoted by p and


p , respectively,
respectively are calculated as

x x
p and p
N n

w ee
where
N – Total number of elements in the population
n – Total number of elements in the sample
x – Number of elements in the population or sample that
possess a specific characteristic

Section 2 – Sampling Distribution


Section 2.2 Sampling Distribution of the Sample Proportion

Example:

Suppose a total of 789,654


789 654 families live in a city and
563,282 of them own homes.

Then, N  789,654 and x  563,282

The proportion of all families in this city who own


homes is
x 563, 282
p   0.7133
N 789,654

19
Section 2 – Sampling Distribution
Section 2.2 Sampling Distribution of the Sample Proportion
Now, suppose a sample of 240 families is taken from this
city and 158 of them are homeowner.
Th
Then, n  240 andd x  158 .
x 158
The sample proportion is p    0.6583
n 240
The difference between the sample proportion and the
corresponding population proportion gives the sampling
error, assuming that the sample is random and no non-
sampling error has been made. That is, in case of the
proportion,
Sample Error  p  p  0.6583  0.7133  0.055

Section 2 – Sampling Distribution


Section 2.2 Sampling Distribution of the Sample Proportion
The probability distribution of the sample proportion p is
called the sampling distribution of p .
It gives the various values that p can assume and their
probabilities.

Example:
Boe Consultant Associates has five employees. The
following table gives the name of these five employees
and information concerning their knowledge of statistics.

Name Ally John Susan Peter Tom


Knows Statistics Yes No No Yes Yes

20
Section 2 – Sampling Distribution
Section 2.2 Sampling Distribution of the Sample Proportion

If we define the population proportion p as the proportion


of employees who know statistics, then p  3 / 5  0.6
statistics then, 06.

Now, suppose we draw all possible samples of three


employees each and compute the proportion of employees,
for each sample, who know statistics. The total number of
samples
p of size three that can be drawn from the
population of five employees is 5 C3  10 .

Section 2 – Sampling Distribution


Section 2.2 Sampling Distribution of the Sample Proportion

The table lists the 10 possible samples and the proportion


of employees
p y who know for each of those samples
p

Sample Prop. ( p ) Sample Prop. ( p )


Ally, John, Susan 1/3 = 0.33 Ally, Peter, Tom 3/3 = 1.00
Ally, John, Peter 2/3 = 0.67 John, Susan, Peter 1/3 = 0.33
Ally, John, Tom 2/3 = 0.67 John, Susan, Tom 1/3 = 0.33
Ally, Susan, Peter 2/3 = 0.67 John, Peter, Tom 2/3 = 0.67
Ally, Susan, Tom 2/3 = 0.67 Susan, Peter, Tom 2/3 = 0.67

21
Section 2 – Sampling Distribution
Section 2.2 Sampling Distribution of the Sample Proportion

The sampling distribution of p as

p f ( p)
0.33 0.30
0.67 0.60
1.00 0.10

Section 2 – Sampling Distribution


Section 2.2 Sampling Distribution of the Sample Proportion

Mean of the Sampling Distribution of p

The mean off the


Th th samplel proportion
ti p is
i denoted
d t d byb p
and is equal to the population proportion p. Thus,  p  p .

Standard Deviation of the Sampling Distribution of p

The standard deviation of the sampling distribution of p


is
p (1  p ) p (1  p ) N  n
p  p  
n n N 1
n / N  0.05 n / N  0.05

22
Section 2 – Sampling Distribution
Section 2.2 Sampling Distribution of the Sample Proportion

The Shape of the Sampling Distribution of p

Central Limit Theorem – The sampling distribution of p


is approximately normal for a sufficiently large sample size.

In the case of proportion, the sample size n is considered to


be sufficiently large if np  5 and n(1  p )  5 .

Section 2 – Sampling Distribution


Section 2.2 Sampling Distribution of the Sample Proportion

Sampling Distribution of p

Mean p  p

p(1  p)
Standard error p 
n

Shape Normal if np  5 and n(1  p )  5

  p (1  p )  
2

Notation 
p ~ N p,   
  n  

23
Section 2 – Sampling Distribution
Section 2.2 Sampling Distribution of the Sample Proportion

Example:

The election returns showed that a certain candidate


received 46% of the votes.

( a ) Determine the probability that a poll of 200 people


selected at random from the voting population would have
shown a majority (over 50%) of votes in favor of the
candidate.

( b ) 95% of the sample proportions will be greater than


what value?

Section 2 – Sampling Distribution


Section 2.2 Sampling Distribution of the Sample Proportion

Solution:

( a ) Determine the probability that a poll of 200 people


selected at random from the voting population would
have shown a majority (over 50%) of votes in favor
of the candidate.

From the given information: p  0.46


0 46

This gives:
p (1  p ) (0.46)(0.54)
 p  p  0.46,  p    0.0352
n 200

24
Section 2 – Sampling Distribution
Section 2.2 Sampling Distribution of the Sample Proportion

Since np  200(0.46)  92  5 and


n((1  p )  200(0.54)
( )  108  5 , we can infer from the
Central Limit Theorem that the sampling distribution
of p is approximately normal. Thus,

p ~ N (0.46, (0.0352) 2 )

Required probability:
 p   p 0.50  0.46 
P ( p  0.50)  P     P( Z  1.14)  0.1271
  0.0352
 p 

Section 2 – Sampling Distribution


Section 2.2 Sampling Distribution of the Sample Proportion

( b ) 95% of the sample proportions will be greater than


what value?

Let A be the required value. We want P ( p  A)  0.95


and from the standard normal table, P ( Z  1.645)  0.95

A  0.46
0 46
  1.645  A  0.4021
0.0352

25
Section 2 – Sampling Distribution
Section 2.3 Sampling Distribution of the Sample Variance

Considering a random sample of n observations drawn


from a population with unknown mean  and unknown
variance  2 .
Denote the sample observations as x1 , x2 ,, xn .

The population variance is the expectation


 2  E ( X   )2  which suggests that the mean of
( xi   ) 2 over n observations. Since  is unknown, the
sample mean x is used to compute a sample variance.

Section 2 – Sampling Distribution


Section 2.3 Sampling Distribution of the Sample Variance

Definition:

Let x1 , x2 ,, xn be a random sample of observations from a


population.

The quantity
1 n
s 
2

n  1 i 1
( xi  x ) 2

is called the sample variance, and its square root, s,


is called the sample standard deviation.

26
Section 2 – Sampling Distribution
Section 2.3 Sampling Distribution of the Sample Variance

Suppose a random sample of n observations with sample


variance s 2 is taken from a normally distributed population
with population variance  2 .

Then,
(n  1) s 2 1 n

 2

 2 (x  x )
i 1
i
2

has a chi-square (  ) distribution with n  1 degrees of


2

freedom

Section 2 – Sampling Distribution


Section 2.3 Sampling Distribution of the Sample Variance

a.  0.005,, 5  16.750 b.  0.9,, 9  4.168


2 2
Verify:
c. P (  2  2 )  0.05 with   10     0.05,10  18.307
2 2

d. P (  2   2 )  0.05 with   10     0.95,10  3.940


2 2

e. Given that  2   222 , P (10.982    36.781)  0.95


2

27
Section 2 – Sampling Distribution
Section 2.3 Sampling Distribution of the Sample Variance

Mean of the Sampling Distribution of s 2

The mean of the sample variance s 2 is equal to the


population variance  2 .

Variance of the Sampling Distribution of s 2

The variance of the sample variance s 2 is given by the


formula
2 4
Var ( s ) 
2

n 1

Section 2 – Sampling Distribution


Section 2.3 Sampling Distribution of the Sample Variance
Example:
The variability of the electrical resistance is critical for
manufacturing a control device.
device Manufacturing standards
specify a standard deviation of 3.6, and the population
distribution of resistance measures is normal.

The monitoring process requires that a random sample for


n  6 observations be obtained from the population of
devices and the sample variance be computed.

Determine an upper limit for the sample variance such that


the probability of exceeding this limit, given a population
standard deviation of 3.6, is less than 0.05.

28
Section 2 – Sampling Distribution
Section 2.3 Sampling Distribution of the Sample Variance
Solution:
From the ggiven information,, n  6 and  2  3.62  12.96
Let K be the required upper bound.

We have,

 (n  1) s 2 
P( s  K )  P 
2
  52   0.05
 12.96 

 52  11.07 is the upper 0.05 critical value of the chi-square


distribution with 5 d.f.

Section 2 – Sampling Distribution


Section 2.3 Sampling Distribution of the Sample Variance

The required upper limit for s 2 – labelled as K – can be


obtained by

(n  1) s 2 (6  1) K
  11.07  K  28.69
12.96 12.96

If the sample variance, s 2 , from a random sample of size


n  6 exceeds 28.69, there is strong evidence to suspect
that the population variance exceeds 12.96 and that the
manufacturing process should be halted and appropriate
adjustments should be performed.

29
Section 2 – Sampling Distribution
Section 2.3 Sampling Distribution of the Sample Variance
Example:
A manager of a quality assurance food company wants to ensure
the variation of ppackage
g weights
g is small so that the company
p y
does not produce a large proportion of packages that are under
the stated package weight. The manager wants to obtain upper
and lower limits for the ratio of the sample variance divided by
the population variance for a random sample of
n  20 observations.

The limits are such that the probability that the ratio is below the
lower limit is 0.025 and the probability that the ratio is above the
upper limit is 0.025. Thus, 95% of the ratios will be between
these limits. The population distribution can be assumed to be
normal.

Section 2 – Sampling Distribution


Section 2.3 Sampling Distribution of the Sample Variance

Solution:

To obtain values K L and KU such that

 s2   s2 
P  2  K L   0.025 and P  2  KU   0.025
   

given that n  20 is used to compute the sample variance.

30
Section 2 – Sampling Distribution
Section 2.3 Sampling Distribution of the Sample Variance

For the  (n  1) s 2 
lower limit: 0.025  P   2  (n  1) K L   P (   (n  1) K L )
2

 

8.91  19 K L  K L  0.4689

For the  (n  1) s 2 < 


upper limit: 0.975  P   ( n  1) K   P (  2  ( n  1) KU )
 2 U

32.85  19 KU  KU  1.7289

 The 95% acceptance interval for the ratio ( s 2 /  2 ) is


0.4689  s 2 /  2  1.7289

Section 2 – Sampling Distribution


Section 2.4 Properties of Estimators

A number of different estimators are possible for the same


ppopulation
p pparameter,, but some estimators are better than
others.

To understand how, we need to look at three important


properties of estimators.

I. Unbiasedness
II. Efficiency
III. Consistency

31
Section 2 – Sampling Distribution
Section 2.4 Properties of Estimators

Unbiasedness

An estimator exhibits unbiasedness when the mean of the


sampling estimator ˆ is equal to the population parameter
 . That is, E (ˆ)   .

The sample mean is an unbiased estimator of the


population mean because the mean of the sampling
distribution of X , E ( X ) , is equal to the population mean
 .
The sample proportion is an unbiased estimator of the
population proportion, E ( p )  p .

Section 2 – Sampling Distribution


Section 2.4 Properties of Estimators

Efficiency

Efficiency refers to the size of the standard error of the


statistics. The most efficient estimator is the one with the
smallest variance.

Thus, if there are two estimators for  with variances


Var (ˆ1 ) and Var (ˆ2 ) , then the first estimator ˆ1 is said to
be more efficient than the second estimator ˆ2 , if
Var (ˆ1 )  Var (ˆ2 ) although E (ˆ1 )  E (ˆ2 )   .

32
Section 2 – Sampling Distribution
Section 2.4 Properties of Estimators

Consistency

Consistency is related to the behavior of estimators as the


sample size gets large. A statistic is a consistent
estimator of a population parameter if, as the sample size
increases, it becomes almost certain that the value of the
statistic comes very close to the value of the population
parameter.

It can be shown that an unbiased estimator ˆn for  is a


consistent estimator if the variance approaches 0 as n
increases.

Section 2 – Sampling Distribution


Section 2.4 Properties of Estimators

We can show that the sample mean is a consistent


estimator of the population.

The sample mean is unbiased because E ( X )   . The


variance of X is  2 / n n .

2
As n   , Var
A V (X )  0 .
nn

So this estimator is consistent.

33
Section 3 – Confidence Interval
Definitions:

Each interval is constructed with regard to a given


confidence level and is called a confidence
interval. The confidence level associated with a
confidence interval states how much confidence we have
that this interval contains the true population parameter.

The confidence level is denoted byy ((1   )100%


) . When
expressed as a probability, it is called the confidence
coefficient and is denoted by 1   .

Section 3 – Confidence Interval

Although any value of the confidence level can be chosen


to construct a confidence interval, the more common
values are 90%, 95% and 99%. The corresponding
confidence coefficients are 0.90, 0.95 and 0.99.

34
Section 3 – Confidence Interval

Interval Estimation of a Population Mean:


Known Variances

Recall that in the case of X , the sample size is considered


to be large when n  30 . According to the central limit
theorem, for a large sample the sampling distribution of
the sample mean X is (approximately) normal irrespective
of the shape of the population from which the sample is
d
drawn.

Therefore, when n  30 , use the normal distribution to


construct a confidence interval for  .

Section 3 – Confidence Interval

Confidence Interval for population mean μ

The (1   )100% confidence interval for  is



X  Z /2
n
where
X is sample mean;  is population standard deviation;
n is the sample size; and Z /2 is read from the standard
normal distribution table for the given confidence level.
Conditions: Normal population with known variance
OR Non-normal population, large sample with
known variance

35
Section 3 – Confidence Interval

Maximum Error of Estimate for μ

The maximum error of estimate for  , denoted by


E, is the quantity that is subtracted from and added to the
value of X to obtain a confidence interval for  .

Thus, given the (1   )100% confidence interval,


E  Z /2
n

Section 3 – Confidence Interval

Example:

A publishing company has just published a new college


textbook.
b k Before f the
h company decides
d id the
h price
i at which
hi h
to sell this textbook, it wants to know the average price of
all such textbooks in the market.

The research department at the company took a sample of


36 such textbooks and collected information on their
prices. This information produced a mean price of $48.4
for this sample. It is known that the standard deviation of
the prices of all such textbooks is $4.50.

36
Section 3 – Confidence Interval

Assume that the prices of all such textbooks are normally


distributed.

( a ) What is the point estimate of the mean price of all


such college textbooks?

( b ) Construct a 95% confidence interval for the mean


price of all such
s ch college textbooks.
te tbooks

Section 3 – Confidence Interval

Solution:

From the given information,

n  36, X  48.40,   4.50

( a ) What is the point estimate of the mean price of all


such college textbooks?

The point estimate of the mean price of all such


college textbooks is $48.40, that is,

Point estimate of   X  $48.40

37
Section 3 – Confidence Interval

( b ) Construct a 95% confidence interval for the mean


price of all such college textbooks.

The confidence level is 95% or 0.95    0.05

The 95% confidence interval for  is

 4.5
X  Z /2  48.40
48 40  1.9
1 96  (46.93,4
(46 93 499.87)
n 36

Thus, we are 95% confident that the mean price of all such
college textbooks is between $46.93 and $49.87.

Section 3 – Confidence Interval

Note: We cannot say for sure whether the interval $46.93


to $49.87 contains the true population mean or not.

Since  is a constant, we cannot say that the probability is


0.95 that this interval contains  because either it contains
 or it does not. Consequently, the probability is either 1
or 0 that this interval contains  .

All we can say is that we are 95% confident that the mean
price of all such college textbooks between $46.93 and
$49.87.

38
Section 3 – Confidence Interval

Interpretation of confidence interval:

How do we interpret a 95% confidence level? In the


previous example, if we take all possible samples of 36
such college textbooks each and construct a 95%
confidence interval for  around each sample mean, we
can expect that 95% of these intervals will include  and
5% will not.

Section 3 – Confidence Interval

Interpretation of confidence interval:

Illustration: 
95% C.I. 95% C.I. – #1
X 1  K1 X1 X 1  K1

95% C.I. – #2
X 2  K2 X2 X 2  K2
95% C.I. – #3
95% C.I. – #4
95% C.I.
CI – #5
95% C.I. – #6
95% C.I. – #7


95% C.I. – #n

39
Section 3 – Confidence Interval

The Width of a Confidence Interval

The width of a confidence interval depends on the size of


the maximum error Z   X , which depends on the values of
Z,  , and n because  X   / n .

However, the value of  is not within the control of the


investigator. Hence, the width of a confidence interval
depends
p on

( i ) The value of Z
( ii ) The sample size n

Section 3 – Confidence Interval

The value of Z which depends on the confidence level


The value of Z increases as the confidence level increases,
and it decrease as the confidence level decreases.
Therefore, the width of a confidence interval increases or
decreases with the confidence level.

The sample size n


For the same value of  , an increase in n decreases the
value of  X , which in turn decreases the size of the
maximum error when the confidence level remains
unchanged. Therefore, an increase in the sample size
decreases the width of the confidence interval.

40
Section 3 – Confidence Interval

Thus, if we want to decrease the width of a confidence


interval, we have two choices:

 Lower the confidence level - not a good choice because


a lower confidence level may give less reliable results.

 Increase the sample size - preferred way to decrease the


width of a confidence interval.

Section 3 – Confidence Interval

Example (revisit):

A publishing company has just published a new college


textbook.
b k Before f the
h company decides
d id the
h price
i at which
hi h
to sell this textbook, it wants to know the average price of
all such textbooks in the market.

The research department at the company took a sample of


36 such textbooks and collected information on their
prices. This information produced a mean price of $48.4
for this sample. It is known that the standard deviation of
the prices of all such textbooks is $4.50.

41
Section 3 – Confidence Interval

Assume that the prices of all such textbooks are normally


distributed. Construct a 90% confidence interval for the
mean
e p price
ce o
of all suc
such co
college
ege textbooks.
e boo s.

Solution:
 4.5
X  Z /2  48.40  1.65  (47.16,49.64)
n 36

Comparing this to the 95% confidence interval obtained


previously, (46.93,49.87) , it is observed that the width of
the confidence interval for a 95% C.I. is wider than the
one for a 90% C.I.

Section 3 – Confidence Interval


Example (revisit):
Consider the previous example again. Now suppose the
information given in that example is based on a sample
size of 160. Further assume that all other information
given in that example, construct the 95% confidence level.

Solution:
 4.5
X  Z /2  48.40  1.96  (47.70,49.10)
n 160
160
Comparing this to the 95% confidence interval obtained
previously, (46.93,49.87) , it is observed that the width of
the 95% confidence interval for n  160 is smaller than the
one for n  36 .

42
Section 3 – Confidence Interval

Interval Estimation of a Population Mean:


Unknown Variances

If the sample size is small, the normal distribution can


still be used to construct a confidence interval for  if

1.the population from which the sample is drawn is


normally distributed, and
2 th population
2.the l ti standard
t d d deviation
d i ti  is
i known.
k

Section 3 – Confidence Interval


The t distribution is used to make a confidence
interval about  if
1.the ppopulation
p from which the samplep is selected is
(approximately) normally distributed, and
2.the population standard deviation  is not known.

43
Section 3 – Confidence Interval

Verify:
a. t4,0.05  2.132 and t4,0.95
,  2.132
b. t6,0.005  3.707 and t6,0.995  3.707
c. P (T  t )  0.10 with   22  t22,0.1  1.321
d. P (T  t )  0.05 with   16  t16,0.95  1.746
e Given that T  t5 , P(T  3.365)
e. 3 365)  0.99
0 99
f. Given that T  t8 , P (2.306  T  2.306)  0.95
g. Given that T  t26 , P(T  3.435)  0.999

Section 3 – Confidence Interval


Confidence Interval for population mean μ
using t distribution

The (1   )100% confidence interval for  is

s
X  t /2, n1
n
where
p mean; s is sample
X is sample p standard deviation; n is the
sample size; and t /2, n1 is obtained from the t distribution
table for n  1 d.f. and the (1   )100% confidence level.

Conditions: Population is approximately normal distributed


 is not known

44
Section 3 – Confidence Interval

Example:

Dr. Moore wanted to estimate the mean cholesterol level


f all
for ll adult
d l males
l living
li i ini London.
d He tookk a sample
l off
25 adult males from London and found that the mean
cholesterol level for this sample is 186 with a standard
deviation of 12.

Assume that the cholesterol levels for all adult males in


London are (approximately) normally distributed.
Construct a 95% confidence interval for the population
mean.

Section 3 – Confidence Interval

Solution:

From the given information,

n  25, X  186, s  12

The confidence level is 95% or 0.95    0.05

D
Degree off ffreedom:
d 25  1  24

Area in each tail: 0.05 / 2  0.025

From the t distribution table, the value for t is t0.025,24  2.064

45
Section 3 – Confidence Interval

The 95% confidence interval for  is

s 12
X  t /2, n1  186  2.064  (181.0464,190.9536)
n 25

Thus, we can state with 95% confidence that the mean


cholesterol level for all adult males livingg in London lies
between 181.05 and 190.95.

Note that X  186 is a point estimate of  in this example.

AMA 1006
Lecture Notes

~ END ~

46

You might also like