[go: up one dir, main page]

0% found this document useful (0 votes)
15 views56 pages

Chapter 4

The document discusses the estimation of a normal mean using sample data, focusing on the properties of the sample mean as an estimator for the population mean. It covers the derivation of confidence intervals for the mean when the population standard deviation is known and when it is unknown, introducing the t-distribution for the latter case. The document emphasizes the importance of sample size in improving the accuracy of estimates and confidence intervals.

Uploaded by

Promachos IV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views56 pages

Chapter 4

The document discusses the estimation of a normal mean using sample data, focusing on the properties of the sample mean as an estimator for the population mean. It covers the derivation of confidence intervals for the mean when the population standard deviation is known and when it is unknown, introducing the t-distribution for the latter case. The document emphasizes the importance of sample size in improving the accuracy of estimates and confidence intervals.

Uploaded by

Promachos IV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

4.

Estimates and Confidence Intervals

4.1 Estimating a Normal Mean


4.2 The Distribution of the Normal Sample Mean
4.3 Normal Data, Confidence Interval for µ, σ Known
4.4 Normal Data, Confidence Interval for µ, σ Unknown
(the t-distribution)
4.5 The Central Limit Theorem and a General
Approximate Confidence Interval for µ
4.6 Summary
4.7 Prediction with I.I.D. Normal Data
© Imperial College Business School
4.1 Estimating a Normal Mean

Problem: assume you have a bunch of data points


(observations) from a certain sample. You believe
the observations are independent realizations from a
normal random variable. You want to estimate the
mean of the data generating distribution (the mean of
the normal). How do you do it? Why do you care?

Case: Suppose you run a plant which fills cereal boxes.


You need to know “how much cereal you are putting in
the boxes,” at least “on average”.
© Imperial College Business School
Here are the observed weights for 500 boxes:
380
360
weights

340
320
300

0 100 200 300 400 500

observation #

It looks like the data could be i.i.d. normal with


µ at about 345 and σ about 15. Note, we are guessing (estimating
somehow.) µ and σ based on what we see (the sample).
© Imperial College Business School
With 500 observations, our guess at µ (the true
mean of the normal process that generates the
cereal boxes)is “probably” pretty good.

But what if you had fewer observations?


Suppose you only had the first 10!

370
How would
you guess µ?
360

You would still


350

use the sample


average (the line),
340

but you would trust


it less. 1 2 3 4 5 6 7 8 9 10

first 10
n
1
Since, µ = E(X) ≈ ∑ Xi (for large n)
n i=1
A reasonable guess or estimate for µ would be the
average of the observed sample of observations.

But, if n is not “large,” we worry that the sample mean


could be different from the true long run mean.
Note: what do we want to know?
“How much cereal goes in the boxes on average”
µ is the answer to your question (not the sample mean,
but we use the sample mean as an estimate for it)
• The average of the 500 observations is 344.83
• Based on that information, we would estimate
µ to be 344.83 (about 345)
• The average of the 10 observations is 348.5
• If that is all the data we have, 348.5 would be our
estimate
• Which estimate would you expect to be “better”?

© Imperial College Business School


Given a sample of observations of size n that looks i.i.d.
normal, the sample mean

1 n
X = ∑ Xi
n i=1

is our estimate of

µ = E( Xi )
µ is sometimes called the population mean since
it is the mean of the entire population of
all the potential values, whereas the sample mean is
just the average of some of them.
© Imperial College Business School
4.2 The Distribution of the Normal Sample Mean

(Interpreting the sample mean as an estimator for µ)


Suppose someone is about to get n observations
(they do not know what the observations will be).
They plan on estimating µ with the sample average
(they do not know what the sample average will be)

n
1
X = ∑ Xi
n i=1
What are their chances of getting the right value?
© Imperial College Business School
Key idea:

Before we get the sample, each Xi is random. So, UPFRONT we


think of the sample as a set of random variables. Being every Xi
random (UPFRONT), then the sample mean (which is a linear
combination of random variables)
n
1
X = ∑ Xi
n i=1
is a random variable too. After we obtain the sample (EX POST),
each Xi is a number. So, is the sample mean.
© Imperial College Business School
Important:
...after having said that the sample mean is a random variable up
front (before we get the sample), we might want to assess what
the statistical properties of this RV are, since we are using it to
estimate something, the theoretical mean.
If the sample mean has “decent” (in some sense to be later
defined) statistical properties then, when we actually compute its
value (after we get the sample), we can be confident that we are
doing what we believe we are doing, that is estimating the
theoretical mean.

OK? Do not worry, I will return to this idea over and over again!

© Imperial College Business School


So, what are the statistical properties of the sample
mean as an estimator of the theoretical mean?

Let us derive its expected value and variance.


By now we know how to do this.

Expected value

1 n  1 n
E ( X ) = E  ∑ X i  = ∑ E( X i )
 n i =1  n i=1

1 n 1
= ∑ µ = nµ = µ
n i =1 n
The estimator is unbiased: on average we get exactly what we are
after, µ that is. We expect Xbar to give us (on average) the right
answer when we plug in the data to compute µ.
Let, Var(Xi ) = σ2 Variance:
1 n 
Then, Var(X) = Var  ∑ Xi  =
 n i=1 
Var( n1 X1 + n1 X2 + n1 X3 + L + n1 Xn ) =
1
n2
Var(X1 ) + n12 Var(X2 ) + n12 Var(X3 ) + L + n12 Var(Xn )
= 1
n2
σ2 + n12 σ2 + n12 σ2 + L + n12 σ2
n
1 1
n2

i =1
σ =2

n 2
n σ 2

σ2
=
n
The variance is inversely related to the number of observations in the
sample. As we increase n we capture µ better because the estimator is
less “dispersed” around its expected value which is µ.
To summarize:
• as an estimator for µ (up front, before plugging in the
observations), Xbar has nice statistical properties
• Its expected value is µ (the value we are after). Its
variance (its expected ‘dispersion’ around µ)
decreases as the number of observations increases.
• In particular, if n were very, very large, then we would
be sure to capture µ accurately.
• In other words, the larger the sample, the better!
• Makes sense, right?

© Imperial College Business School


We can do more than just stating what the expected value
and variance of the estimator are.
If the observations are draws from a normal (likely in the case of
the cereal boxes) then, being an average (a linear combination) of
independent normals, the sample mean is normally distributed by
the properties of the normal RV’s (c.f. the last result in the
previous chapter).

Let X1, X2 ,K Xn ~ N(µ, σ2 ) i.i.d.

σ 2
Then, X ~ N(µ, ) (this is the sampling distribution
n of the estimator)

Hence, there is a 95% chance that the sample average is


in the interval:
σ
µ±2
n
4.3 Normal Data, Confidence Interval for m, s Known

• We now know what kind of sample average we can expect


to get given the parameters (m and s)

• What we really want to know is what we think the


parameters (we focus on m here but the intuition for s is
the same) are, given the data

• We can use what we have learnt to develop a confidence


interval for m

© Imperial College Business School


• At first, we will assume that we know

σ and are trying to estimate µ.

• In the next section we will relax this unrealistic


assumption.

• We are also assuming that the data are i.i.d


normal

© Imperial College Business School


The 95% Confidence Interval

First, let us add a bit of notation.

σ
Let σX =
n
This will simplify the look of the formulae and emphasize
that the sample mean has its own standard deviation (we
derived the variance earlier, right? The standard deviation
is just the square root).

© Imperial College Business School


Now, we standardize

X−µ
X ~ N(µ, σ ) ⇒ 2
~ N(0,1)
σXX

So,
X−µ
Pr( −2 ≤ ≤ 2) = .95
σX
(really the 2 is 1.96!)
© Imperial College Business School
So, Pr(−2σX ≤ X −µ ≤ 2σX ) = .95

Pr( − X − 2σ X ≤ −µ ≤ − X + 2σ X ) = .95

Pr(X − 2σX ≤ µ ≤ X + 2σX ) = .95

σ σ
Pr(X − 2 ≤µ ≤ X+2 ) = .95
n n
For i.i.d. normal data with known standard
deviation σ, a 95% confidence interval for
the true mean µ is,

( X − 2σ X
, X + 2σ X
)
= X ± 2σ X

σ
= X ±2
n
Interpret. 95% of the time the true value
of µ will be contained in the interval.
Example:

Let us go back to our weight data.


Given 500 observations, what do we know about µ?

Assume that σ=15. >> k1=15/sqrt(500)


ans = 0.670820

σ
Then, σX = = .67
n >> k1 = 344.83-2*.67
The sample average with 500 ans = 343.490
observations was 344.83. >> k2 = 344.83+2*.67

Hence, the 95% confidence interval ans = 346.170

is (343.5, 346.17)
Example:

Given just 10 observations, what do we know about µ?

Assume that σ=15.


>> k1 =
σ 15/sqrt(10)
Then, σX = = 4.74
n ans = 4.74342

The sample average with >> k1 = 348.5-2*4.74


10 observations was 348.5. ans = 339.020
>> k2 = 348.5+2*4.74
Hence, the 95% confidence
interval is (339.02, 357.98) ans = 357.980
• Confidence intervals answer the basic
questions: what do you think the parameter
is and how sure are you?

• Small interval: Good, you know a lot.

• Big interval: Bad, you do not know much

• Question: what affects the size of the interval?


© Imperial College Business School
4.4 Normal Data, Confidence Interval for µ, σ Unknown

Now we will extend our confidence interval to the more


realistic situation where σ is unknown.
Typically, we don’t know σ. So, we have to estimate it
as well.
Question: How do we estimate σ ?

Answer: Just as we think of the sample mean as an


estimate of µ, we can now think of the sample
standard deviation as an estimate of σ.
© Imperial College Business School
Estimating σ

Consider σ2

n
1
We use s 2x = ∑
n − 1 i =1
( xi − x ) 2

to estimate it.

We divide by n-1 so that the estimator is unbiased


(that is, E(sX2)= σ2)
© Imperial College Business School
The estimate of σ is,

n
1
sx = ∑
n − 1 i =1
( xi − x) 2

© Imperial College Business School


Now our big idea is that in the formula
instead of using

σ
σX =
n
we use an estimate of it, namely

sx
se(X) =
n

This is called the standard error.


Clearly, it is an estimate of the true standard deviation
of the sample mean.
Then, we might try

(just replace
σ with its
estimate)
x ± 2se(X)

as a 95% confidence interval for µ.

We will see that this is approximately right for large n.

Important: It turns out that for i.i.d. normal data we can


get an exact result. First, we need to learn about the t-
distribution.
The t-distribution

• The t is just another continuous distribution

• It has one parameter called the degrees of


freedom which is usually denoted by the symbol n

• Each value of n gives you a different distribution

Recall: the normal distribution has 2 parameters, µ and σ.


Each value of µ and σ gives you a different normal distribution.

© Imperial College Business School


When ν is bigger than about 30
the t is very much like the standard normal

One of these
0.4
is t with 30
d.f., the other
0.3 is standard
normal
nu3

0.2

0.1 t dist with


ν=3 d.f.
0.0

-4 -3 -2 -1 0 1 2 3 4
t
All t’s are centered at 0 and look like the standard normal.
They have a bell shape.
For smaller ν, the t puts more probability in the “tails” than the standard
normal.
Histogram of draws from the t(3)

We say that that has


“heavy tails” relative
to the standard normal.

It is like the standard normal in the middle but sometimes you can get
large values (negative or positive).
© Imperial College Business School
Important: For our normal mean problem we use ν=n-1.
The number of degrees of freedom is equal to the number
of observations minus one.

Now, let the number t n −1,.025 be such that

P( − t n −1,.025 < t n −1 < t n −1,.025 ) = .95


t random variable
with n-1 degrees
0.4
of freedom

0.3 .95
probability in here
f(x)

0.2
.025
probability 0.1 .025
in here
0.0
probability in here
-3 -2 -1 0 1 2 3

− t n −1,.025 x t n −1,.025
Interpret:

tn−1,.025 and
tn−1,.025
are the numbers such that we have 95% probability
of being between them?

(.025 probability of being in the lower tail, .025 probability


of being in the upper tail) for a t random variable with
parameter (number of degrees of freedom) equal to n-1.
© Imperial College Business School
For n-1>about 30, the tn-1 is so much
like the standard normal value, i.e.,

t n −1,.025 ≈ 2
For smaller n, the t value gets bigger than 2.

Here is a table
of t values and n.
t n−1,.025 n
We can see that
for n>30 (or even
4.303 3
about 20) the t value 2.228 11
is about 2. Clearly, 2.086 21
-t would be the same
value with opposite
2.042 31
sign due to the symmetry 2.00 61
of the t distribution.
Note: Traditionally you get these t cut-off values
from a table in the back of a statistics tome. We
can also employ MatLab.

We use the command tinv (inverse cumulative


distribution) in MatLab. We compute –t as the
value such that the probability of getting values
smaller than –t (the c.d.f. at –t) is .025. The Interpret: There
value t is the same as –t with opposite sign. is .025 probability
less than -2.22
and .025 probability
Inverse Cumulative Distribution
Function
greater than 2.22
for the t distribution
>> tinv(0.025,10) with 10 degrees of
ans = -2.228138851986274 freedom.
P( X <= x) x
0.0250 -2.2281
Let us now compute confidence intervals:

When σ was known we started out from X−µ


~ N(0,1)
to compute confidence intervals σX
Now our basic result is
X−µ
~ t n−1
se(X)
We use this result to compute confidence intervals just like
before

This is an exact result, not an approximation. Here is the intuition:


For small n, the t distribution accounts for our estimation of σ with sx .
We are taking into consideration the fact that there is some estimation error.
This is why the distribution has thick tails. As n grows large, the estimation error
diminishes and the distribution of the ratio gets closer to the normal
(which represents the extreme case of knowledge of σ – no estimation error)
Thus:

X−µ
Pr( −tn−1,.025 ≤ ≤ tn−1,.025 ) = .95
se(X)

Following the same steps that we used before for the


case σ known, we can rearrange this to obtain the interval

x ± tn−1,.025 se(X)
© Imperial College Business School
An exact 95% confidence interval for µ with
σ unknown is

x ± t n−1,.025 se(X)
Of course, using the t cut-off instead of 2 will make the
interval bigger for smaller n.

This accounts for the fact that we are not sure that our
estimate for s is quite right.
© Imperial College Business School
Example:
Back to our weight data.
With n=500 the sample standard deviation is 15.455 and
the sample mean is 344.83.
The relevant t distribution for us is the one with ν=499 which
is just like the standard normal. So, the t-value is about 2.

15.455
Compute se(X) = = .69 >> k1 = 344.83-1.4
500 ans = 343.430
>> k2 = 344.83+1.4
The confidence interval is: ans = 346.230
344.83 +/- 1.4
>>hedges=(290+(1:10:120+0) )' sx
>> n=histc(weights,hedges); se( X ) =
>> bar(hedges,n,5,'m'); n
T Confidence Intervals
Variable N Mean StDev SE Mean 95.0 % CI
weights 500 344.828 15.455 0.691 ( 343.470, 346.186)

Histogram of weights
(with 95% t-confidence interval for the mean)

60

40
Frequency

20

0 _
X
[ ]

300 350 400

weights
What if we just use the first 10 observations?

The sample deviation is 14.6.


The sample mean was 348.5.
The relevant distribution for us here is the t with 9 degrees of
freedom. The t 9,.025 value is 2.262.

Compute 14.6
se(X) = = 4.6
10
The confidence interval becomes:
348.5 +/- 2.262*4.6 = 348.5 +/- 10.4 = (338.1, 358.9)
© Imperial College Business School
Variable N Mean StDev SE Mean 95.0 % CI
weights1 10 348.51 14.60 4.62 (338.07,358.96)

Histogram of weights10
(with 95% t-confidence interval for the mean)

3
Frequency

0 _
X
[ ]

330 340 350 360 370 380

weights10
Let us get a 95% confidence interval for the Example:
true mean of the Canadian returns.
>>mi=min(canada); ma=max(canada); dd=ma-mi;
>>hedges=(mi-0.05+(1:1:dd*100+10)/100 )'; sx
>>n=histc(canada,hedges); se( X ) = x ± tn−1,.025 se(X)
>>bar(hedges,n,1); n
>>title ‘canada';

Variable N Mean StDev SE Mean 95.0 % CI


canada 107 0.00907 0.03833 0.00371 ( 0.00172, 0.01641)

Is the confidence
interval big?
4.5 The Central Limit Theorem &
General Approximate Confidence Interval for µ

Suppose we are willing to assume that the data are


i.i.d. but we are not willing to assume that they are either
normally distributed or Bernoulli.

In particular, assume the Xi‘s are i.i.d. with the same


mean and variance, namely
E(Xi ) = µ and Var(Xi ) = σ2

Assume you do not know what their distribution is.


You might still be interested in estimating µ using Xbar!
© Imperial College Business School
In fact, without normality, we still have

σ 2
E(X) = µ Var(X) =
n
Nonetheless, if the X’s are not i.i.d. normal we do not have

σ2
X ~ N(µ, )
n
precisely.
© Imperial College Business School
However, by the central limit theorem
we can write the following approximation

X−µ X−µ
≈ N(0,1) ⇒ Pr( −2 < < 2) ≈ .95
se(X) se(X)

Using standard methods, the approximation implies the result


below.

Given i.i.d. observations Xi, an approximate


95% confidence interval for µ = E(Xi) is given by

x ± 2se(X)
© Imperial College Business School
Example:
Minutes the service is used
over a given time period
90
Random sample 80
70
of customers of 60

Frequency
a company providing 50
40

cell phone service. 30


20
10

Assume the data is I.I.D. 0

0 1000 2000 3000 4000 5000 6000


usage

The data does not look normal but, as pointed out earlier,
the CI for the mean is still approximately valid

Variable N Mean StDev SE Mean 95.0% CI


usage 206 643.5 954.1 66.5 (512.4,774.5)
Example:
7000

Daily volume of trades in 6000

the 5000

4000

Volume
cattle pit.
3000

2000
Is the data i.i.d. from some 1000

distribution? 0

Assuming the data is i.i.d, Index 100 200 300 400

we can use the Central


Limit Theorem as before to
60
compute an approximate
50
confidence interval for the
40

Frequency
true daily mean value.
30

20
Note: if the data is not i.i.d. 10
similar methods apply. 0

0 1000 2000 3000 4000 5000 6000 7000


48
Volume
4.6 Summary
Problem/
Data type
µ).
Estimate the true mean (µ).
Data are real numbers. Estimate a proportion (p).
Important: Typically, data have units Data are 1 or 0 (yes or no)
In each (e.g., grams)

case
the data estimate is sample proportion = p̂
must ˆ − p)
p(1 ˆ
normal data approx ci: pˆ ± 2se(p)
ˆ = pˆ ± 2
be i.i.d. n

no
yes

estimate is sample mean = x


sx
x ± 2se(x) = x ± 2
know σ no approx ci:
n
yes

If n>30, we do not bother with the t-


estimate is sample mean = x estimate is sample mean = x distribution
σ s
sx approx ci: x ± 2se(x) = x ± 2 x
exact ci: x ± 2σ x = x ± 2 exact ci: x ± t n−1,.025 se(x) = x ± tn−1,.025 n
n n 49
4.7 Prediction with I.I.D. Normal data
Prediction represents the bottom line of what we do and
is the reason why we care about µ (and σ)
Example: Assume we have a bunch of returns. Assume we believe they are
independent draws from an underlying normal random variable with a certain µ and
a certain σ. How would you predict the next return (say the return tomorrow).

Well, if you knew µ and σ you could make statements like: with 95% probability the
next observation (the return tomorrow) will be between µ-2σ and µ+2σ (we know
that this is true due to the properties of the normal, right?).

This is a powerful (and useful) statement for prediction (suppose you are deciding
whether you want to invest in a certain asset based on its future performance).

But what if you do not know µ and σ (and this is typically the case)? We simply
estimate them based on the available data. In the end we come up with statements
like the one above for intervals that are constructed using estimated quantities.

We use the past (the data) to understand (or predict) the future (returns tomorrow,
for example).
In general: suppose the data are i.i.d. normal.
The fundamental question is: How do you predict
the next one?
0.4

If you knew
µ and σ your 0.3

95% predictive
0.2

f(x)
interval would
be (µ−2σ,µ+2σ) 0.1

(simply using the


properties of the 0.0

Normal).
µ

µ − 2σ µ + 2σ
The problem is that we do not know what the true
parameters (i.e., µ and σ) are. We have to infer them from
the data. We estimate them.

So, our plug-in predictive interval could be:

(x − 2s x , x + 2s x )
Now we have the main intuition of what we do.

Let us make an innocuous step forward that does not


change the main idea.

© Imperial College Business School


• We use parameter estimates instead of the true
unknown parameter values

• However, we know that there may be errors in the


estimates when the sample is not sufficiently large

• So, what do we do?

• We correct the sample estimates to account for


imprecise inference due to small samples

© Imperial College Business School


An exact 95% predictive interval for the next value is,

x ± t n−1,.025 sep (X)


1/ 2
 1
sep (X) = s x  1 + 
 n

Notice what happens when n is large: the t distribution approximates


the standard normal and the t cut-off values get closer to the
corresponding normal cut-off values (-2 and 2). In addition, the
correction (1+1/n) converges to 1. In the end, when n is large (when
the sample guarantees accurate estimates of µ and σ) we recover

(x − 2s x , x + 2s x )
346
10 weights:

344
342
weights
pi ci

340
338
Note:

336
The confidence
interval is an 2 4 6 8 10

interval for the true obs #

average weight µ.

346
The predictive 500 weights:

344
interval is an
interval for the next

342
individual weight.
Such weight can be pi ci weights
340
below or above µ.
338
336

0 100 200 300 400 500


obs #
The logic of what we do:

1) We take the data (the past) and analyze it using descriptive tools
(Chapter 1 in the notes, but we’ll see more sophisticated ways to analyze
the data in the next chapters)

2) We come up with a model for the data (a model that we believe is


generating the data). The i.i.d. normal model is just an example. We are
simply saying: the observations are draws from a specific random variable
(the normal random variable) and are independent.(you see now why we
have to introduce the notion of random variable and bother with its
properties?)

3) Once we have a model for the data, we simply have to estimate the
parameters of the model (examples: µ and σ in the normal case, p in the
Bernoulli case) to do prediction (understand the future). Again, prediction
is the bottom line of what we do. (Notice: without a model for the data we
cannot do prediction. For example: our interval µ-2σ, µ+2σ is based on
the normal model.)

You might also like