Basic Concepts of Testing
Presidency University
February, 2025
Problem of Testing of Hypothesis
I In this course, we shall focus on the problem of testing of
hypothesis in particular.
Problem of Testing of Hypothesis
I In this course, we shall focus on the problem of testing of
hypothesis in particular.
I Here our objective is to test the validity of any claim regarding
the population (or parameter) on the basis of sample.
Problem of Testing of Hypothesis
I In this course, we shall focus on the problem of testing of
hypothesis in particular.
I Here our objective is to test the validity of any claim regarding
the population (or parameter) on the basis of sample.
I Any claim regarding the population whose validity is to be
tested on the basis of sample observations is generally called a
statistical hypothesis.
I At this point we shall note that whenever we set up any
hypothesis for testing, this means back in our mind we are in
doubt regarding its validity.
I At this point we shall note that whenever we set up any
hypothesis for testing, this means back in our mind we are in
doubt regarding its validity.
I Thus any hypothesis when set up for testing is usually for
possible rejection.
I At this point we shall note that whenever we set up any
hypothesis for testing, this means back in our mind we are in
doubt regarding its validity.
I Thus any hypothesis when set up for testing is usually for
possible rejection.
I According to Fisher, any hypothesis which is tentatively fixed
up for plausible rejection is called null hypothesis.
I At this point we shall note that whenever we set up any
hypothesis for testing, this means back in our mind we are in
doubt regarding its validity.
I Thus any hypothesis when set up for testing is usually for
possible rejection.
I According to Fisher, any hypothesis which is tentatively fixed
up for plausible rejection is called null hypothesis.
I Any hypothesis which contradicts the null hypothesis is called
alternative hypothesis.
Simple and Composite Hypothesis
I We note that null and alternative hypotheses can be stated
either in terms of distribution or in terms of parameters (if the
family is parametric).
Simple and Composite Hypothesis
I We note that null and alternative hypotheses can be stated
either in terms of distribution or in terms of parameters (if the
family is parametric).
I For example, we may state
H0 : P ∈ P0 against H1 : P ∈ P1 where P0 ∪ P1 ⊆ P
or if we assume the family to be parametric this can be written
as
H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 where Θ0 ∪ Θ1 ⊆ Θ.
Simple and Composite Hypothesis
I We note that null and alternative hypotheses can be stated
either in terms of distribution or in terms of parameters (if the
family is parametric).
I For example, we may state
H0 : P ∈ P0 against H1 : P ∈ P1 where P0 ∪ P1 ⊆ P
or if we assume the family to be parametric this can be written
as
H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 where Θ0 ∪ Θ1 ⊆ Θ.
I P0 (or Θ0 ) is called the null region and P1 (or Θ1 ) is called
the alternative region.
Simple and Composite Hypothesis
I We note that null and alternative hypotheses can be stated
either in terms of distribution or in terms of parameters (if the
family is parametric).
I For example, we may state
H0 : P ∈ P0 against H1 : P ∈ P1 where P0 ∪ P1 ⊆ P
or if we assume the family to be parametric this can be written
as
H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 where Θ0 ∪ Θ1 ⊆ Θ.
I P0 (or Θ0 ) is called the null region and P1 (or Θ1 ) is called
the alternative region.
I In case of any Hi , i = 0, 1 we call the hypothesis to be simple
if Pi (or Θi ) is a singleton set and composite if there are
more than one elements.
Simple and Composite Hypothesis
I We note that null and alternative hypotheses can be stated
either in terms of distribution or in terms of parameters (if the
family is parametric).
I For example, we may state
H0 : P ∈ P0 against H1 : P ∈ P1 where P0 ∪ P1 ⊆ P
or if we assume the family to be parametric this can be written
as
H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 where Θ0 ∪ Θ1 ⊆ Θ.
I P0 (or Θ0 ) is called the null region and P1 (or Θ1 ) is called
the alternative region.
I In case of any Hi , i = 0, 1 we call the hypothesis to be simple
if Pi (or Θi ) is a singleton set and composite if there are
more than one elements.
I This means simple hypothesis completely specifies the
population and composite hypothesis fails to do so.
Statistical Test
I A statistical test is a rule by which we accept or reject the null
hypothesis based on the sample observations.
Statistical Test
I A statistical test is a rule by which we accept or reject the null
hypothesis based on the sample observations.
I Finding a test means partitioning the sample space X into two
regions: critical region (ω) and acceptance region (X − ω).
Statistical Test
I A statistical test is a rule by which we accept or reject the null
hypothesis based on the sample observations.
I Finding a test means partitioning the sample space X into two
regions: critical region (ω) and acceptance region (X − ω).
I If the sample point belongs to the critical region, we reject the
null hypothesis and it belongs to the acceptance region, the
null hypothesis is accepted.
Statistical Test
I A statistical test is a rule by which we accept or reject the null
hypothesis based on the sample observations.
I Finding a test means partitioning the sample space X into two
regions: critical region (ω) and acceptance region (X − ω).
I If the sample point belongs to the critical region, we reject the
null hypothesis and it belongs to the acceptance region, the
null hypothesis is accepted.
I This is the structure of what we call a non-randomized test
procedure.
Types of test
I Often but not neccesarily a critical region is described by
comparing a statistic with one or some thresholds.
Types of test
I Often but not neccesarily a critical region is described by
comparing a statistic with one or some thresholds.
I This statistic is called the test statistic (T ) and the threshold
is called the critical value c.
Types of test
I Often but not neccesarily a critical region is described by
comparing a statistic with one or some thresholds.
I This statistic is called the test statistic (T ) and the threshold
is called the critical value c.
I A test is called
I right tailed test if the critical region is of the form
{x : T (x) > c}
I left tailed test if the critical region is of the form
{x : T (x) < c}
I both tailed test if the critical region is of the form
{x : T (x) < c1 or T (x) > c2 }.
Randomized Test
I We can extend this binary situation of complete acceptance
and rejection using the idea of test function or critical function.
Randomized Test
I We can extend this binary situation of complete acceptance
and rejection using the idea of test function or critical function.
I The idea is, there can exist a part of the sample space (the
boundary of the two regions) where if the sample point
belongs to, we are not sure whether to accept or reject the null
hypothesis.
Randomized Test
I We can extend this binary situation of complete acceptance
and rejection using the idea of test function or critical function.
I The idea is, there can exist a part of the sample space (the
boundary of the two regions) where if the sample point
belongs to, we are not sure whether to accept or reject the null
hypothesis.
I In such a case the decision of acceptance and rejection is left
to some randomization principle whose outcome decides the
course.
Randomized Test
I We can extend this binary situation of complete acceptance
and rejection using the idea of test function or critical function.
I The idea is, there can exist a part of the sample space (the
boundary of the two regions) where if the sample point
belongs to, we are not sure whether to accept or reject the null
hypothesis.
I In such a case the decision of acceptance and rejection is left
to some randomization principle whose outcome decides the
course.
I Due to this post randomization, this type of tests are called as
randomized tests.
I Formally we define a test function φ ∈ [0, 1] to be the
conditional probability of rejecting the null hypothesis given
the data, that is,
φ(x) = P(Reject H0 |X = x).
I Formally we define a test function φ ∈ [0, 1] to be the
conditional probability of rejecting the null hypothesis given
the data, that is,
φ(x) = P(Reject H0 |X = x).
I That is, if X = x is observed, a random experiment is
performed with two possible outcomes R and R̄, the
probabilities of which are φ(x) and 1 − φ(x) respectively.
I Formally we define a test function φ ∈ [0, 1] to be the
conditional probability of rejecting the null hypothesis given
the data, that is,
φ(x) = P(Reject H0 |X = x).
I That is, if X = x is observed, a random experiment is
performed with two possible outcomes R and R̄, the
probabilities of which are φ(x) and 1 − φ(x) respectively.
I If the outcome turn out to be R, we shall reject the
null-hypothesis and if the outcome is R̄, we shall accept the
null hypothesis.
I We note that if φ ∈ {0, 1},then the test boils down to a
non-randomized test. Thus a randomzied test is a
generalization of non-randomized test.
I We note that if φ ∈ {0, 1},then the test boils down to a
non-randomized test. Thus a randomzied test is a
generalization of non-randomized test.
I If the distribution of X is P then
Z
EP (φ(X )) = φ(x)dP(x)
is the probability of rejection of H0 when the true distribution
is P.
Errors in Testing
I Now for any testing problem, there can be any arbitrary
partition of the sample space into these two regions.
Errors in Testing
I Now for any testing problem, there can be any arbitrary
partition of the sample space into these two regions.
I As such there exists many tests for a single problem.
Errors in Testing
I Now for any testing problem, there can be any arbitrary
partition of the sample space into these two regions.
I As such there exists many tests for a single problem.
I A natural question is: How do we decide which test is “best”?
Errors in Testing
I Now for any testing problem, there can be any arbitrary
partition of the sample space into these two regions.
I As such there exists many tests for a single problem.
I A natural question is: How do we decide which test is “best”?
I When deciding the best test, we need to decide a evaluation
criterion by which we shall judge the tests.
Errors in Testing
I Now for any testing problem, there can be any arbitrary
partition of the sample space into these two regions.
I As such there exists many tests for a single problem.
I A natural question is: How do we decide which test is “best”?
I When deciding the best test, we need to decide a evaluation
criterion by which we shall judge the tests.
I Intuitively our objective should be to commit less mistakes or
error.
I In a testing situation, we can encounter two types of error:
Type I error (or error of first kind) where we reject the true
null hypothesis and Type-II error (or error of second kind)
where we accept the false null hypothesis.
I In a testing situation, we can encounter two types of error:
Type I error (or error of first kind) where we reject the true
null hypothesis and Type-II error (or error of second kind)
where we accept the false null hypothesis.
I In terms of test function we can describe the error probabilities
as
P(Type-I error) = EP (φ(X )) for P ∈ P0
and
P(Type-II error) = 1 − EP (φ(X )) for P ∈ P1 .
I In a testing situation, we can encounter two types of error:
Type I error (or error of first kind) where we reject the true
null hypothesis and Type-II error (or error of second kind)
where we accept the false null hypothesis.
I In terms of test function we can describe the error probabilities
as
P(Type-I error) = EP (φ(X )) for P ∈ P0
and
P(Type-II error) = 1 − EP (φ(X )) for P ∈ P1 .
I The consequences of both these types of errors are quite
different.
I For example, while testing for the presence of some disease,
incorrectly deciding the requirement of a treatment may result
in side-effects and financial losses whereas failure to diagonise
the presence of the ailment may lead to death.
I For example, while testing for the presence of some disease,
incorrectly deciding the requirement of a treatment may result
in side-effects and financial losses whereas failure to diagonise
the presence of the ailment may lead to death.
I Ideally in an optimum test procedure we would like to control
both type error probabilities suitably.
I For example, while testing for the presence of some disease,
incorrectly deciding the requirement of a treatment may result
in side-effects and financial losses whereas failure to diagonise
the presence of the ailment may lead to death.
I Ideally in an optimum test procedure we would like to control
both type error probabilities suitably.
I Unfortunately in fixed sample size procedure, we cannot
minimize both the error probabilities simultaneously.
I For example, while testing for the presence of some disease,
incorrectly deciding the requirement of a treatment may result
in side-effects and financial losses whereas failure to diagonise
the presence of the ailment may lead to death.
I Ideally in an optimum test procedure we would like to control
both type error probabilities suitably.
I Unfortunately in fixed sample size procedure, we cannot
minimize both the error probabilities simultaneously.
I This is because, for a fixed sample size procedure, the sample
space X is fixed and hence if we wnat to control
P(Type Ierror) , we need to shrink the critical region ω which
will lead to increase in the acceptance region (X − ω) and in
turn incraese in P(Type -II error) and vice-versa.
I There are some trivial cases where both of them are 0 or 1.
I There are some trivial cases where both of them are 0 or 1.
I Let X ∼ U(θ, θ + 1) and we want to test H0 : θ = 0 vs.
H1 : θ = 1.
I There are some trivial cases where both of them are 0 or 1.
I Let X ∼ U(θ, θ + 1) and we want to test H0 : θ = 0 vs.
H1 : θ = 1.
I Suppose we construct a test as
(
1 if x > 1
φ= .
0 if x ≤ 1
I There are some trivial cases where both of them are 0 or 1.
I Let X ∼ U(θ, θ + 1) and we want to test H0 : θ = 0 vs.
H1 : θ = 1.
I Suppose we construct a test as
(
1 if x > 1
φ= .
0 if x ≤ 1
I Then both the error probabilities are zero.
I There are inferential procedures called sequential procedures
where the sample size is not fixed and samples are drawn
sequentially depending on whether we can take any decision
(accept or reject) regarding H0 based on the current samples.
I There are inferential procedures called sequential procedures
where the sample size is not fixed and samples are drawn
sequentially depending on whether we can take any decision
(accept or reject) regarding H0 based on the current samples.
I In such testing procedures, simultaneous minimization of both
error probabilities is possible because the sample space keeps
changing.
I There are inferential procedures called sequential procedures
where the sample size is not fixed and samples are drawn
sequentially depending on whether we can take any decision
(accept or reject) regarding H0 based on the current samples.
I In such testing procedures, simultaneous minimization of both
error probabilities is possible because the sample space keeps
changing.
I In fixed sample size procedures, what we do is, we fix an upper
bound for the probability of Type I error and under that
restriction we try to minimize the probability of Type II error.
But why?
I There are inferential procedures called sequential procedures
where the sample size is not fixed and samples are drawn
sequentially depending on whether we can take any decision
(accept or reject) regarding H0 based on the current samples.
I In such testing procedures, simultaneous minimization of both
error probabilities is possible because the sample space keeps
changing.
I In fixed sample size procedures, what we do is, we fix an upper
bound for the probability of Type I error and under that
restriction we try to minimize the probability of Type II error.
But why?
I Because we think that rejecting the true null is more serious
mistake than accepting the false null hypothesis and we do not
want make that mistake more than often.
I This upper bound of probability of Type I error is called the
level of significance of the test and is generally denoted by α.
I This upper bound of probability of Type I error is called the
level of significance of the test and is generally denoted by α.
I We say a test is of level α if probability of Type I error is at
most α, that is
sup EP (φ(X )) ≤ α.
P∈P0
I This upper bound of probability of Type I error is called the
level of significance of the test and is generally denoted by α.
I We say a test is of level α if probability of Type I error is at
most α, that is
sup EP (φ(X )) ≤ α.
P∈P0
I And the test is of size α if
sup EP (φ(X )) = α.
P∈P0
I This upper bound of probability of Type I error is called the
level of significance of the test and is generally denoted by α.
I We say a test is of level α if probability of Type I error is at
most α, that is
sup EP (φ(X )) ≤ α.
P∈P0
I And the test is of size α if
sup EP (φ(X )) = α.
P∈P0
I So for a test at 5% level of significance we mean that if we
could repeat the test in chunks of 100 times and note the
number of time we make the mistake of Type I error, then on
an average that number will be at most 5.
I A related quantity is power of a test which is given by
β = 1 − P(Type-II error) = EP (φ(X )) for P ∈ P1 .
I A related quantity is power of a test which is given by
β = 1 − P(Type-II error) = EP (φ(X )) for P ∈ P1 .
I Thus our objective is to maximize the power of a test within
the class of all level α tests.
I A related quantity is power of a test which is given by
β = 1 − P(Type-II error) = EP (φ(X )) for P ∈ P1 .
I Thus our objective is to maximize the power of a test within
the class of all level α tests.
I In general the probabaility of rejection β(P) = EP (φ(X )) is
called the power function and when plotted against P, the
resulting curve is called power curve.
Example
I X has p.d.f
x
(
1
θ exp− θ x >0
f (X ) =
0 otherwise
We have to test H0 : θ = 2 vs H1 : θ = 4. A random sample of
size n = 2,viz X1 , X2 will be used. Suppose the critical region
is given by C = {(X1 , X2 ) : 9.5 ≤ X1 + X2 < ∞}. Find the
power function of the test and the significance level.
Example
I X has p.d.f
x
(
1
θ exp− θ x >0
f (X ) =
0 otherwise
We have to test H0 : θ = 2 vs H1 : θ = 4. A random sample of
size n = 2,viz X1 , X2 will be used. Suppose the critical region
is given by C = {(X1 , X2 ) : 9.5 ≤ X1 + X2 < ∞}. Find the
power function of the test and the significance level.
I The power function of the test will be determined at only 2
points viz θ = 2, θ = 4.
Example
I X has p.d.f
x
(
1
θ exp− θ x >0
f (X ) =
0 otherwise
We have to test H0 : θ = 2 vs H1 : θ = 4. A random sample of
size n = 2,viz X1 , X2 will be used. Suppose the critical region
is given by C = {(X1 , X2 ) : 9.5 ≤ X1 + X2 < ∞}. Find the
power function of the test and the significance level.
I The power function of the test will be determined at only 2
points viz θ = 2, θ = 4.
I The power function of the test is given by,
β(θ) = Pθ {(X1 , X2 ) ∈ C }.
I For θ = 2 we have
β(2) = Pθ=2 {(X1 , X2 ) ∈ C }
= 1 − Pθ=2 {(X1 , X2 ) ∈
/ C}
= 1 − Pθ=2 {0 ≤ X1 + X2 ≤ 9.5}
9.59.5−x
Z 1
1
Z
(x1 +x2 )
= 1− exp− 2 dx2 dx1 = 0.05.
4
0 0
I For θ = 4 we have
β(4) = Pθ=4 {(X1 , X2 ) ∈ C }
= 1 − Pθ=4 {(X1 , X2 ) ∈
/ C}
= 1 − Pθ=4 {0 ≤ X1 + X2 ≤ 9.5}
9.59.5−x
Z 1
1
Z
(x1 +x2 )
= 1− exp− 4 dx2 dx1 = 0.31.
16
0 0
I Hence the power function is given by
(
0.05 θ = 2
β(θ) = .
0.31 θ = 4
I Hence the power function is given by
(
0.05 θ = 2
β(θ) = .
0.31 θ = 4
I Here the level of the test is β(2) = 0.05 i.e. at 5% level.
I Hence the power function is given by
(
0.05 θ = 2
β(θ) = .
0.31 θ = 4
I Here the level of the test is β(2) = 0.05 i.e. at 5% level.
I The power the test is given by β(4) = 0.31.
I Hence the power function is given by
(
0.05 θ = 2
β(θ) = .
0.31 θ = 4
I Here the level of the test is β(2) = 0.05 i.e. at 5% level.
I The power the test is given by β(4) = 0.31.
I Hence we have P(Type II error) = 1 − β(4) = 1 − 0.31 = 0.69
I Often in practice the null hypothesis is set from the idea that
rejection of true null is more serious.
I Often in practice the null hypothesis is set from the idea that
rejection of true null is more serious.
I Suppose we need to administer a vaccine to children in a given
locality depending on their weight. How will we choose any
locality if we know that the vaccine is for the children with
more weights?
I Often in practice the null hypothesis is set from the idea that
rejection of true null is more serious.
I Suppose we need to administer a vaccine to children in a given
locality depending on their weight. How will we choose any
locality if we know that the vaccine is for the children with
more weights?
I We can perform a test of hypothesis where we set the null
hypothesis H0 : µ ≥ 35 where µ is the average weight of
children in this locality.
I Often in practice the null hypothesis is set from the idea that
rejection of true null is more serious.
I Suppose we need to administer a vaccine to children in a given
locality depending on their weight. How will we choose any
locality if we know that the vaccine is for the children with
more weights?
I We can perform a test of hypothesis where we set the null
hypothesis H0 : µ ≥ 35 where µ is the average weight of
children in this locality.
I This assyemtry in testing of hypothesis setup has some serious
consequences in interpretation of a test.
I Often in practice the null hypothesis is set from the idea that
rejection of true null is more serious.
I Suppose we need to administer a vaccine to children in a given
locality depending on their weight. How will we choose any
locality if we know that the vaccine is for the children with
more weights?
I We can perform a test of hypothesis where we set the null
hypothesis H0 : µ ≥ 35 where µ is the average weight of
children in this locality.
I This assyemtry in testing of hypothesis setup has some serious
consequences in interpretation of a test.
I More specifically, we setup a null hypothesis tentatively for
rejection but do not reject it unless we have “significant”
evidence against it.
I Lack of evidence against H0 can mean two things:
I Evidence in favour of H0 .
I Lack of evidence.
I Lack of evidence against H0 can mean two things:
I Evidence in favour of H0 .
I Lack of evidence.
I We note that even in the second situation, we are going to
accept the null hypothesis.
I Lack of evidence against H0 can mean two things:
I Evidence in favour of H0 .
I Lack of evidence.
I We note that even in the second situation, we are going to
accept the null hypothesis.
I This inherent assymetry in the testing of hypothesis can cause
problems in certain applications, as we shall see later.
I Lack of evidence against H0 can mean two things:
I Evidence in favour of H0 .
I Lack of evidence.
I We note that even in the second situation, we are going to
accept the null hypothesis.
I This inherent assymetry in the testing of hypothesis can cause
problems in certain applications, as we shall see later.
I There is another important consequence of this assymetry:
Rejection of a null hypothesis is permanenet but acceptance is
temporary.
I Till this point we have seen any test is described completely by
a test statistic and a critical value and our decisions regarding
acceptance or rejection of the null hypothesis.
I Till this point we have seen any test is described completely by
a test statistic and a critical value and our decisions regarding
acceptance or rejection of the null hypothesis.
I However this technique has one limitation.
I Till this point we have seen any test is described completely by
a test statistic and a critical value and our decisions regarding
acceptance or rejection of the null hypothesis.
I However this technique has one limitation.
I For example, suppose we have constructed a test rule: “Reject
H0 if x̄ > 5”.
I Till this point we have seen any test is described completely by
a test statistic and a critical value and our decisions regarding
acceptance or rejection of the null hypothesis.
I However this technique has one limitation.
I For example, suppose we have constructed a test rule: “Reject
H0 if x̄ > 5”.
I Now consider two situations where in one situation based on
the actual samples we find the sample mean is 5.5 and in
another situation the sample mean comes out to be 25.
I In both the scenarios, we shall reject the null hypothesis, but
do you think the rejection is same in both the cases?
I In both the scenarios, we shall reject the null hypothesis, but
do you think the rejection is same in both the cases?
I Our formal test procedure does not make any distinction
between the two scenarios.
I In both the scenarios, we shall reject the null hypothesis, but
do you think the rejection is same in both the cases?
I Our formal test procedure does not make any distinction
between the two scenarios.
I But note that in the former case the rejection was marginal
whereas in the latter there was very strong evidence to reject
the null hypothesis and decision was strong.
I In both the scenarios, we shall reject the null hypothesis, but
do you think the rejection is same in both the cases?
I Our formal test procedure does not make any distinction
between the two scenarios.
I But note that in the former case the rejection was marginal
whereas in the latter there was very strong evidence to reject
the null hypothesis and decision was strong.
I Recall that we had a similar situation in problem of estimation
where we used to supply standard error along with the
estimate as a measure of “reliability” of our decision rules.
Now what will play the role of standard errors in case of
testing of hypothesis?
I The answer to this question is p-value. Formally we define
p-value as the smallest level of significance at which the
null hypothesis is rejected
inf {x ∈ Rα }
α
where Rα is the rejection region at level α.
I The answer to this question is p-value. Formally we define
p-value as the smallest level of significance at which the
null hypothesis is rejected
inf {x ∈ Rα }
α
where Rα is the rejection region at level α.
I This suggests for a size α test, if p-value comes out to be less
than α, then the null hypothesis is rejected and if it comes out
to be more than α then the null hypothesis is accepted.
I The answer to this question is p-value. Formally we define
p-value as the smallest level of significance at which the
null hypothesis is rejected
inf {x ∈ Rα }
α
where Rα is the rejection region at level α.
I This suggests for a size α test, if p-value comes out to be less
than α, then the null hypothesis is rejected and if it comes out
to be more than α then the null hypothesis is accepted.
I But what does this definition really mean?
I To understand we need to look at the form of p-value
separately for right tailed, left tailed and both tailed tests.
I To understand we need to look at the form of p-value
separately for right tailed, left tailed and both tailed tests.
I If T is the test statistic and t is the observed value of the test
statistic, then the definition of p-value boils down to
PH0 (T > t) for right tailed tests,
PH0 (T < t) for left tailed tests and
2{min{PH0 (T > t), PH0 (T < t)}} for both tailed tests.
I To understand we need to look at the form of p-value
separately for right tailed, left tailed and both tailed tests.
I If T is the test statistic and t is the observed value of the test
statistic, then the definition of p-value boils down to
PH0 (T > t) for right tailed tests,
PH0 (T < t) for left tailed tests and
2{min{PH0 (T > t), PH0 (T < t)}} for both tailed tests.
I That the general definition of p-value gets simplified in each of
the three cases above can be illustrated with an example:
Example
I Suppose X ∼ Bin(10, p).
Example
I Suppose X ∼ Bin(10, p).
I Consider the problem of testing
H0 : p = 0.5 against H1 : p > 0.5.
Example
I Suppose X ∼ Bin(10, p).
I Consider the problem of testing
H0 : p = 0.5 against H1 : p > 0.5.
I Suppose we perform a right tailed test based on X , that is, the
critical region at size α(c) is
Rα(c) = {x : x ≥ c}, where α(c) = PH0 (X ≥ c).
Example
I Suppose X ∼ Bin(10, p).
I Consider the problem of testing
H0 : p = 0.5 against H1 : p > 0.5.
I Suppose we perform a right tailed test based on X , that is, the
critical region at size α(c) is
Rα(c) = {x : x ≥ c}, where α(c) = PH0 (X ≥ c).
I We note that as c increases α(c) decreases, as for example,
α(4) = PH0 (X ≥ 4) > PH0 (X ≥ 5) = α(5).
I Now suppose the observed value of X is x = 6.
I Now suppose the observed value of X is x = 6.
I Then
if we choose c = 4, α(4) = PH0 (X ≥ 4) and Rα(4) = {x : x ≥ 4} and x = 6 ∈ Rα(4) ,
if we choose c = 5, α(5) = PH0 (X ≥ 5) and Rα(5) = {x : x ≥ 5} and x = 6 ∈ Rα(5) ,
if we choose c = 6, α(6) = PH0 (X ≥ 6) and Rα(6) = {x : x ≥ 6} and x = 6 ∈ Rα(6) ,
if we choose c = 7, α(7) = PH0 (X ≥ 7) and Rα(7) = {x : x ≥ 7} and x = 6 ∈
/ Rα(7) .
I Thus the p-value is
inf {x ∈ Rα } = α(6) = PH0 (X ≥ 6).
α
I Thus the p-value is
inf {x ∈ Rα } = α(6) = PH0 (X ≥ 6).
α
I Thus p-value represents the distance between the current
observed value of the test statistic from the alternative
hypothesis.
I Thus the p-value is
inf {x ∈ Rα } = α(6) = PH0 (X ≥ 6).
α
I Thus p-value represents the distance between the current
observed value of the test statistic from the alternative
hypothesis.
I If that distance is small, we can conclude that the observed
value lies close to H1 as compared to H0 leading to the
rejection of the null.
I Thus the p-value is
inf {x ∈ Rα } = α(6) = PH0 (X ≥ 6).
α
I Thus p-value represents the distance between the current
observed value of the test statistic from the alternative
hypothesis.
I If that distance is small, we can conclude that the observed
value lies close to H1 as compared to H0 leading to the
rejection of the null.
I p-value in itself is a measure for taking decision regarding the
rejection of the null hypothesis - we don’t need to bother
about the level of significance.
I A natural question arises when constructing tests. Why can’t
we accept or reject the null hypothesis just by looking at the
test statistic only?
I A natural question arises when constructing tests. Why can’t
we accept or reject the null hypothesis just by looking at the
test statistic only?
I What is that we need for critical values?
I A natural question arises when constructing tests. Why can’t
we accept or reject the null hypothesis just by looking at the
test statistic only?
I What is that we need for critical values?
I For example, suppose we want to test for the population mean
as H0 : µ = 5 and we compute the sample mean x̄.
I A natural question arises when constructing tests. Why can’t
we accept or reject the null hypothesis just by looking at the
test statistic only?
I What is that we need for critical values?
I For example, suppose we want to test for the population mean
as H0 : µ = 5 and we compute the sample mean x̄.
I Suppose the sample mean comes out to be 6 and then you
straight away reject the null hypothesis because sample mean
is different from 5.
I A natural question arises when constructing tests. Why can’t
we accept or reject the null hypothesis just by looking at the
test statistic only?
I What is that we need for critical values?
I For example, suppose we want to test for the population mean
as H0 : µ = 5 and we compute the sample mean x̄.
I Suppose the sample mean comes out to be 6 and then you
straight away reject the null hypothesis because sample mean
is different from 5.
I Obviously that will be very strict against the null hypothesis
but apart from that we need to understand that we are dealing
with random samples.
I We need to take into account the sampling variability of the
sample mean and understand that though on one particular
occasion it comes out to be 6, on other sampling occasions it
might turn out to be 5.
I We need to take into account the sampling variability of the
sample mean and understand that though on one particular
occasion it comes out to be 6, on other sampling occasions it
might turn out to be 5.
I The idea is, if it happens that based on the sampling
distribution of x̄ we believe that 5 is really an impossible value
of µ, then we shall reject H0 .
I We need to take into account the sampling variability of the
sample mean and understand that though on one particular
occasion it comes out to be 6, on other sampling occasions it
might turn out to be 5.
I The idea is, if it happens that based on the sampling
distribution of x̄ we believe that 5 is really an impossible value
of µ, then we shall reject H0 .
I Again it turns out that the rejection or acceptance of H0
depends on our belief or notion about “impossibility” which is
subjective.
I We need to take into account the sampling variability of the
sample mean and understand that though on one particular
occasion it comes out to be 6, on other sampling occasions it
might turn out to be 5.
I The idea is, if it happens that based on the sampling
distribution of x̄ we believe that 5 is really an impossible value
of µ, then we shall reject H0 .
I Again it turns out that the rejection or acceptance of H0
depends on our belief or notion about “impossibility” which is
subjective.
I It is due to this subjective belief about impossibility we may
commit error and the extent of error we allow is governed by
the notion of α.
I For example, if we take α = 0.01, we are very much cautious
regarding committing error and hence the resultant test will be
very stringent, rather than the situation where we allow
α = 0.05.
I For example, if we take α = 0.01, we are very much cautious
regarding committing error and hence the resultant test will be
very stringent, rather than the situation where we allow
α = 0.05.
I The bottomline is: there cannot be any rigorous rule regarding
the acceptance or rejection of a hypothesis, it depends on an
individual’s perspective about the specific problem (which is
governed by α and we are allowed to change it).
Two Approaches of finding tests
I There are, in general, two approaches of constructing tests:
Two Approaches of finding tests
I There are, in general, two approaches of constructing tests:
I Heuristic Approach: Here the construction of the test criterion
starts from intuition. Then we modify the statistic accordingly
to get a form which has a standard sampling distribution.
Two Approaches of finding tests
I There are, in general, two approaches of constructing tests:
I Heuristic Approach: Here the construction of the test criterion
starts from intuition. Then we modify the statistic accordingly
to get a form which has a standard sampling distribution.
I Optimality Approach: Here the construction of the test
criterion starts from the objective of optimizing some suitable
criterion and then modifying it accordingly to get one with a
standard distribution.
I It may appear that when we have a rigorous approach available
based on optimality criterion, why we really care for the
heuristic approach !!!
I It may appear that when we have a rigorous approach available
based on optimality criterion, why we really care for the
heuristic approach !!!
I As a matter of fact both these approaches are equally
important.
I It may appear that when we have a rigorous approach available
based on optimality criterion, why we really care for the
heuristic approach !!!
I As a matter of fact both these approaches are equally
important.
I We care for heuristic approach because in many situations
(specially in multivariate setup) optimal tests are hard to find
and in some cases impossible.