Estimation Theory
and
Hypothesis Testing
Statistical Inference
The statistical inference is the process of
arriving at conclusions concerning the
parameters of population on the basis of
information contained in the sample selected
from that population.
Statistical Inference treats two different kind of
conclusions / problems.
Estimation
Hypothesis Testing
Estimation
To provide the ‘best possible’ evaluation of
unknown parameters occurring in the population
on the basis of a finite number of observations (i.
e. A random sample obtained from a population)
is known as estimation. The population parameters
are estimated with the help of sample statistics.
There are two methods of estimation.
1. Point estimation
2. Internal Estimation
Point estimation
A point estimation is a single number that
is used to estimate an unknown population
parameter and the process of finding a
point estimation is known as point
estimation.
Internal Estimation
The process of making the statement
P[b1<= θ<=b2] = δ
i.e. , the two values b1 and b2 (based on sample
values ) will contain the true value of parameter
θ in its interior with a given probability δ, δ<1
where δ is any arbitrary positive number, is
called internal estimation
Standard Error of proportion
S.E.= PQ
n
Where P is the probability and Q = 1-P
n = sample size
Hypothesis Testing
Any assumption regarding the form of the
probability distribution or values of unknown
parameters of that distribution is called
statistical hypothesis.
To test a statistical hypothesis on the basis of a
sample is known as hypothesis testing.
Formulation of hypothesis
Hypothesis which is tested for possible rejection under the
assumption that it is true is known as Null Hypothesis and is
denoted as H0 .
In fact , the null Hypothesis is the basis for the deduction of
the theoretical distribution either of population or of statistic.
The hypothesis which differs from a given null hypothesis and
is accepted (or rejected) when H0 is rejected (or accepted) is
called Alternative Hypothesis and is denoted as H1.
CENTRAL LIMIT THEOREM
• specifies a theoretical distribution
• formulated by the selection of all possible
random samples of a fixed size n
• a sample mean is calculated for each
sample and the distribution of sample
means is considered
SAMPLING DISTRIBUTION OF THE MEAN
• The mean of the sample means is equal to
the mean of the population from which the
samples were drawn.
• The variance of the distribution is s divided
by the square root of n. (the standard
error.)
STANDARD ERROR
Standard Deviation of the Sampling Distribution
of Means
sx = /s/ \/n
How Large is Large?
• If the sample is normal, then the sampling
distribution of x will also be normal, no
matter what the sample size.
• When the sample population is approximately
symmetric, the distribution becomes
approximately normal for relatively small values of
n.
• When the sample population is skewed, the sample
x
size must be at least 30 before the sampling
distribution of becomes approximately normal
EXAMPLE
A certain brand of tires has a mean life of 25,000
miles with a standard deviation of 1600 miles.
What is the probability that the mean life of 64
tires is less than 24,600 miles?
Example continued
The sampling distribution of the means has a
mean of 25,000 miles (the population mean)
m = 25000 mi.
and a standard deviation (i.e.. standard error)
of:
1600/8 = 200
Example continued
Convert 24,600 mi. to a z-score and use the normal
table to determine the required probability.
z = (24600-25000)/200 = -2
P(z< -2) = 0.0228
or 2.28% of the sample means will be less than
24,600 mi.
CONFIDENCE INTERVAL ESTIMATES for
LARGE SAMPLES
• The sample has been randomly selected
• The population standard deviation is
known or the sample size is at least 25.
Confidence Interval Estimate of the
Population Mean
s s
Xz Xz
n n
-
X: sample mean
s: sample standard deviation
n: sample size
EXAMPLE
Estimate, with 95% confidence, the lifetime
of nine volt batteries using a randomly
selected sample where:
--
X = 49 hours
s = 4 hours
n = 36
EXAMPLE continued
Lower Limit: 49 - (1.96)(4/6)
49 - (1.3) = 47.7 hrs
Upper Limit: 49 + (1.96)(4/6)
49 + (1.3) = 50.3 hrs
We are 95% confident that the mean lifetime of
the population of batteries is between 47.7 and
50.3 hours.
t-test
T =x- μ
s/Sqrt(n)
If the calculated value of t is equal or less than the
tabulated value, the null hypothesis is accepted>
Exercise
The average breaking strength of steel rod is specified
to be 18.5 Thousand kg. For this a sample of 14 rods
was tested . The mean and standard deviation is 17.85
and 1.9555 respectively. Test the significance of the
deviation.
Exercise
An automobile tyre manufacturer claims that the verage life
of a particular grade of tyre is more than 20,000 km. when
used under normal conditions. A random sample of 16 tyre
was tested and a mean and standard deviation of about
22000 km and 5000 km respectively was computed. Check
claim.
Fisher’s Z-Test
Prof. Ronald Fisher developed a technique for testing the
significance of correlation coefficient in small samples.
Z is applied for solving the following issues:
1. Whether the difference between observed value of r and
some hypothetical value (value of population) of r is
significant or not?
2. Whether the difference between coefficient of correlation
of two samples is significant or not?
Fisher’s Z-Test (contd…)
Zs = 1 log e 1 + r
2 1-r
Zs = 1.1513 log10 1 + r
Zs is Observed Value 1-r
Fisher’s Z-Test (contd…)
Z ρ= 1 log e 1 + r
2 1-r
Z ρ = 1.1513 log10 1 + r
Zρ is Hypothetical value 1-r
Standard Error
S.E. = 1
(n-3)
Determination of Significant Ratio
Significant Ratio = Zs – Zρ
S.E.
Interpretation:
If SR >2.58 is considered significant at 1% level
of significance.
If SR >1.96 difference is considered significant
at 5% level of confidence
If SR > 3 the difference is considered significant
definitely
Test of Significance between two samples
S.E = 1 + 1
n1-3 n2-3
Exercise
A correlation coefficient of 0.72 is obtained from a sample of
29 paired observations. Is the coefficient significantly
different from 0.8?
Exercise
Two groups of students are given an intelligence test
(x) and arithmetic test (y) and the following results are
obtained. N1=45, n2=39, r1=0.45, r2=0.38. is the
difference between the value of r significant?