[go: up one dir, main page]

0% found this document useful (0 votes)
12 views116 pages

Basic Testing

The document discusses the problem of hypothesis testing, focusing on the concepts of null and alternative hypotheses, statistical tests, and types of errors. It explains how hypotheses are formulated, the structure of statistical tests, and the implications of Type I and Type II errors in decision-making. The document emphasizes the challenges in minimizing both types of errors simultaneously in fixed sample size procedures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views116 pages

Basic Testing

The document discusses the problem of hypothesis testing, focusing on the concepts of null and alternative hypotheses, statistical tests, and types of errors. It explains how hypotheses are formulated, the structure of statistical tests, and the implications of Type I and Type II errors in decision-making. The document emphasizes the challenges in minimizing both types of errors simultaneously in fixed sample size procedures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

Basic Concepts of Testing

Presidency University

February, 2025
Problem of Testing of Hypothesis

I In this course, we shall focus on the problem of testing of


hypothesis in particular.
Problem of Testing of Hypothesis

I In this course, we shall focus on the problem of testing of


hypothesis in particular.

I Here our objective is to test the validity of any claim regarding


the population (or parameter) on the basis of sample.
Problem of Testing of Hypothesis

I In this course, we shall focus on the problem of testing of


hypothesis in particular.

I Here our objective is to test the validity of any claim regarding


the population (or parameter) on the basis of sample.

I Any claim regarding the population whose validity is to be


tested on the basis of sample observations is generally called a
statistical hypothesis.
I At this point we shall note that whenever we set up any
hypothesis for testing, this means back in our mind we are in
doubt regarding its validity.
I At this point we shall note that whenever we set up any
hypothesis for testing, this means back in our mind we are in
doubt regarding its validity.

I Thus any hypothesis when set up for testing is usually for


possible rejection.
I At this point we shall note that whenever we set up any
hypothesis for testing, this means back in our mind we are in
doubt regarding its validity.

I Thus any hypothesis when set up for testing is usually for


possible rejection.

I According to Fisher, any hypothesis which is tentatively fixed


up for plausible rejection is called null hypothesis.
I At this point we shall note that whenever we set up any
hypothesis for testing, this means back in our mind we are in
doubt regarding its validity.

I Thus any hypothesis when set up for testing is usually for


possible rejection.

I According to Fisher, any hypothesis which is tentatively fixed


up for plausible rejection is called null hypothesis.

I Any hypothesis which contradicts the null hypothesis is called


alternative hypothesis.
Simple and Composite Hypothesis
I We note that null and alternative hypotheses can be stated
either in terms of distribution or in terms of parameters (if the
family is parametric).
Simple and Composite Hypothesis
I We note that null and alternative hypotheses can be stated
either in terms of distribution or in terms of parameters (if the
family is parametric).
I For example, we may state
H0 : P ∈ P0 against H1 : P ∈ P1 where P0 ∪ P1 ⊆ P
or if we assume the family to be parametric this can be written
as
H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 where Θ0 ∪ Θ1 ⊆ Θ.
Simple and Composite Hypothesis
I We note that null and alternative hypotheses can be stated
either in terms of distribution or in terms of parameters (if the
family is parametric).
I For example, we may state
H0 : P ∈ P0 against H1 : P ∈ P1 where P0 ∪ P1 ⊆ P
or if we assume the family to be parametric this can be written
as
H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 where Θ0 ∪ Θ1 ⊆ Θ.
I P0 (or Θ0 ) is called the null region and P1 (or Θ1 ) is called
the alternative region.
Simple and Composite Hypothesis
I We note that null and alternative hypotheses can be stated
either in terms of distribution or in terms of parameters (if the
family is parametric).
I For example, we may state
H0 : P ∈ P0 against H1 : P ∈ P1 where P0 ∪ P1 ⊆ P
or if we assume the family to be parametric this can be written
as
H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 where Θ0 ∪ Θ1 ⊆ Θ.
I P0 (or Θ0 ) is called the null region and P1 (or Θ1 ) is called
the alternative region.
I In case of any Hi , i = 0, 1 we call the hypothesis to be simple
if Pi (or Θi ) is a singleton set and composite if there are
more than one elements.
Simple and Composite Hypothesis
I We note that null and alternative hypotheses can be stated
either in terms of distribution or in terms of parameters (if the
family is parametric).
I For example, we may state
H0 : P ∈ P0 against H1 : P ∈ P1 where P0 ∪ P1 ⊆ P
or if we assume the family to be parametric this can be written
as
H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 where Θ0 ∪ Θ1 ⊆ Θ.
I P0 (or Θ0 ) is called the null region and P1 (or Θ1 ) is called
the alternative region.
I In case of any Hi , i = 0, 1 we call the hypothesis to be simple
if Pi (or Θi ) is a singleton set and composite if there are
more than one elements.
I This means simple hypothesis completely specifies the
population and composite hypothesis fails to do so.
Statistical Test

I A statistical test is a rule by which we accept or reject the null


hypothesis based on the sample observations.
Statistical Test

I A statistical test is a rule by which we accept or reject the null


hypothesis based on the sample observations.

I Finding a test means partitioning the sample space X into two


regions: critical region (ω) and acceptance region (X − ω).
Statistical Test

I A statistical test is a rule by which we accept or reject the null


hypothesis based on the sample observations.

I Finding a test means partitioning the sample space X into two


regions: critical region (ω) and acceptance region (X − ω).

I If the sample point belongs to the critical region, we reject the


null hypothesis and it belongs to the acceptance region, the
null hypothesis is accepted.
Statistical Test

I A statistical test is a rule by which we accept or reject the null


hypothesis based on the sample observations.

I Finding a test means partitioning the sample space X into two


regions: critical region (ω) and acceptance region (X − ω).

I If the sample point belongs to the critical region, we reject the


null hypothesis and it belongs to the acceptance region, the
null hypothesis is accepted.

I This is the structure of what we call a non-randomized test


procedure.
Types of test

I Often but not neccesarily a critical region is described by


comparing a statistic with one or some thresholds.
Types of test

I Often but not neccesarily a critical region is described by


comparing a statistic with one or some thresholds.

I This statistic is called the test statistic (T ) and the threshold


is called the critical value c.
Types of test

I Often but not neccesarily a critical region is described by


comparing a statistic with one or some thresholds.

I This statistic is called the test statistic (T ) and the threshold


is called the critical value c.

I A test is called
I right tailed test if the critical region is of the form
{x : T (x) > c}
I left tailed test if the critical region is of the form
{x : T (x) < c}
I both tailed test if the critical region is of the form
{x : T (x) < c1 or T (x) > c2 }.
Randomized Test

I We can extend this binary situation of complete acceptance


and rejection using the idea of test function or critical function.
Randomized Test

I We can extend this binary situation of complete acceptance


and rejection using the idea of test function or critical function.

I The idea is, there can exist a part of the sample space (the
boundary of the two regions) where if the sample point
belongs to, we are not sure whether to accept or reject the null
hypothesis.
Randomized Test

I We can extend this binary situation of complete acceptance


and rejection using the idea of test function or critical function.

I The idea is, there can exist a part of the sample space (the
boundary of the two regions) where if the sample point
belongs to, we are not sure whether to accept or reject the null
hypothesis.

I In such a case the decision of acceptance and rejection is left


to some randomization principle whose outcome decides the
course.
Randomized Test

I We can extend this binary situation of complete acceptance


and rejection using the idea of test function or critical function.

I The idea is, there can exist a part of the sample space (the
boundary of the two regions) where if the sample point
belongs to, we are not sure whether to accept or reject the null
hypothesis.

I In such a case the decision of acceptance and rejection is left


to some randomization principle whose outcome decides the
course.

I Due to this post randomization, this type of tests are called as


randomized tests.
I Formally we define a test function φ ∈ [0, 1] to be the
conditional probability of rejecting the null hypothesis given
the data, that is,

φ(x) = P(Reject H0 |X = x).


I Formally we define a test function φ ∈ [0, 1] to be the
conditional probability of rejecting the null hypothesis given
the data, that is,

φ(x) = P(Reject H0 |X = x).

I That is, if X = x is observed, a random experiment is


performed with two possible outcomes R and R̄, the
probabilities of which are φ(x) and 1 − φ(x) respectively.
I Formally we define a test function φ ∈ [0, 1] to be the
conditional probability of rejecting the null hypothesis given
the data, that is,

φ(x) = P(Reject H0 |X = x).

I That is, if X = x is observed, a random experiment is


performed with two possible outcomes R and R̄, the
probabilities of which are φ(x) and 1 − φ(x) respectively.

I If the outcome turn out to be R, we shall reject the


null-hypothesis and if the outcome is R̄, we shall accept the
null hypothesis.
I We note that if φ ∈ {0, 1},then the test boils down to a
non-randomized test. Thus a randomzied test is a
generalization of non-randomized test.
I We note that if φ ∈ {0, 1},then the test boils down to a
non-randomized test. Thus a randomzied test is a
generalization of non-randomized test.

I If the distribution of X is P then


Z
EP (φ(X )) = φ(x)dP(x)

is the probability of rejection of H0 when the true distribution


is P.
Errors in Testing

I Now for any testing problem, there can be any arbitrary


partition of the sample space into these two regions.
Errors in Testing

I Now for any testing problem, there can be any arbitrary


partition of the sample space into these two regions.

I As such there exists many tests for a single problem.


Errors in Testing

I Now for any testing problem, there can be any arbitrary


partition of the sample space into these two regions.

I As such there exists many tests for a single problem.

I A natural question is: How do we decide which test is “best”?


Errors in Testing

I Now for any testing problem, there can be any arbitrary


partition of the sample space into these two regions.

I As such there exists many tests for a single problem.

I A natural question is: How do we decide which test is “best”?

I When deciding the best test, we need to decide a evaluation


criterion by which we shall judge the tests.
Errors in Testing

I Now for any testing problem, there can be any arbitrary


partition of the sample space into these two regions.

I As such there exists many tests for a single problem.

I A natural question is: How do we decide which test is “best”?

I When deciding the best test, we need to decide a evaluation


criterion by which we shall judge the tests.

I Intuitively our objective should be to commit less mistakes or


error.
I In a testing situation, we can encounter two types of error:
Type I error (or error of first kind) where we reject the true
null hypothesis and Type-II error (or error of second kind)
where we accept the false null hypothesis.
I In a testing situation, we can encounter two types of error:
Type I error (or error of first kind) where we reject the true
null hypothesis and Type-II error (or error of second kind)
where we accept the false null hypothesis.

I In terms of test function we can describe the error probabilities


as
P(Type-I error) = EP (φ(X )) for P ∈ P0
and
P(Type-II error) = 1 − EP (φ(X )) for P ∈ P1 .
I In a testing situation, we can encounter two types of error:
Type I error (or error of first kind) where we reject the true
null hypothesis and Type-II error (or error of second kind)
where we accept the false null hypothesis.

I In terms of test function we can describe the error probabilities


as
P(Type-I error) = EP (φ(X )) for P ∈ P0
and
P(Type-II error) = 1 − EP (φ(X )) for P ∈ P1 .

I The consequences of both these types of errors are quite


different.
I For example, while testing for the presence of some disease,
incorrectly deciding the requirement of a treatment may result
in side-effects and financial losses whereas failure to diagonise
the presence of the ailment may lead to death.
I For example, while testing for the presence of some disease,
incorrectly deciding the requirement of a treatment may result
in side-effects and financial losses whereas failure to diagonise
the presence of the ailment may lead to death.

I Ideally in an optimum test procedure we would like to control


both type error probabilities suitably.
I For example, while testing for the presence of some disease,
incorrectly deciding the requirement of a treatment may result
in side-effects and financial losses whereas failure to diagonise
the presence of the ailment may lead to death.

I Ideally in an optimum test procedure we would like to control


both type error probabilities suitably.

I Unfortunately in fixed sample size procedure, we cannot


minimize both the error probabilities simultaneously.
I For example, while testing for the presence of some disease,
incorrectly deciding the requirement of a treatment may result
in side-effects and financial losses whereas failure to diagonise
the presence of the ailment may lead to death.

I Ideally in an optimum test procedure we would like to control


both type error probabilities suitably.

I Unfortunately in fixed sample size procedure, we cannot


minimize both the error probabilities simultaneously.

I This is because, for a fixed sample size procedure, the sample


space X is fixed and hence if we wnat to control
P(Type Ierror) , we need to shrink the critical region ω which
will lead to increase in the acceptance region (X − ω) and in
turn incraese in P(Type -II error) and vice-versa.
I There are some trivial cases where both of them are 0 or 1.
I There are some trivial cases where both of them are 0 or 1.

I Let X ∼ U(θ, θ + 1) and we want to test H0 : θ = 0 vs.


H1 : θ = 1.
I There are some trivial cases where both of them are 0 or 1.

I Let X ∼ U(θ, θ + 1) and we want to test H0 : θ = 0 vs.


H1 : θ = 1.

I Suppose we construct a test as


(
1 if x > 1
φ= .
0 if x ≤ 1
I There are some trivial cases where both of them are 0 or 1.

I Let X ∼ U(θ, θ + 1) and we want to test H0 : θ = 0 vs.


H1 : θ = 1.

I Suppose we construct a test as


(
1 if x > 1
φ= .
0 if x ≤ 1

I Then both the error probabilities are zero.


I There are inferential procedures called sequential procedures
where the sample size is not fixed and samples are drawn
sequentially depending on whether we can take any decision
(accept or reject) regarding H0 based on the current samples.
I There are inferential procedures called sequential procedures
where the sample size is not fixed and samples are drawn
sequentially depending on whether we can take any decision
(accept or reject) regarding H0 based on the current samples.

I In such testing procedures, simultaneous minimization of both


error probabilities is possible because the sample space keeps
changing.
I There are inferential procedures called sequential procedures
where the sample size is not fixed and samples are drawn
sequentially depending on whether we can take any decision
(accept or reject) regarding H0 based on the current samples.

I In such testing procedures, simultaneous minimization of both


error probabilities is possible because the sample space keeps
changing.

I In fixed sample size procedures, what we do is, we fix an upper


bound for the probability of Type I error and under that
restriction we try to minimize the probability of Type II error.
But why?
I There are inferential procedures called sequential procedures
where the sample size is not fixed and samples are drawn
sequentially depending on whether we can take any decision
(accept or reject) regarding H0 based on the current samples.

I In such testing procedures, simultaneous minimization of both


error probabilities is possible because the sample space keeps
changing.

I In fixed sample size procedures, what we do is, we fix an upper


bound for the probability of Type I error and under that
restriction we try to minimize the probability of Type II error.
But why?

I Because we think that rejecting the true null is more serious


mistake than accepting the false null hypothesis and we do not
want make that mistake more than often.
I This upper bound of probability of Type I error is called the
level of significance of the test and is generally denoted by α.
I This upper bound of probability of Type I error is called the
level of significance of the test and is generally denoted by α.

I We say a test is of level α if probability of Type I error is at


most α, that is
sup EP (φ(X )) ≤ α.
P∈P0
I This upper bound of probability of Type I error is called the
level of significance of the test and is generally denoted by α.

I We say a test is of level α if probability of Type I error is at


most α, that is
sup EP (φ(X )) ≤ α.
P∈P0

I And the test is of size α if

sup EP (φ(X )) = α.
P∈P0
I This upper bound of probability of Type I error is called the
level of significance of the test and is generally denoted by α.

I We say a test is of level α if probability of Type I error is at


most α, that is
sup EP (φ(X )) ≤ α.
P∈P0

I And the test is of size α if

sup EP (φ(X )) = α.
P∈P0

I So for a test at 5% level of significance we mean that if we


could repeat the test in chunks of 100 times and note the
number of time we make the mistake of Type I error, then on
an average that number will be at most 5.
I A related quantity is power of a test which is given by

β = 1 − P(Type-II error) = EP (φ(X )) for P ∈ P1 .


I A related quantity is power of a test which is given by

β = 1 − P(Type-II error) = EP (φ(X )) for P ∈ P1 .

I Thus our objective is to maximize the power of a test within


the class of all level α tests.
I A related quantity is power of a test which is given by

β = 1 − P(Type-II error) = EP (φ(X )) for P ∈ P1 .

I Thus our objective is to maximize the power of a test within


the class of all level α tests.

I In general the probabaility of rejection β(P) = EP (φ(X )) is


called the power function and when plotted against P, the
resulting curve is called power curve.
Example

I X has p.d.f
x
(
1
θ exp− θ x >0
f (X ) =
0 otherwise
We have to test H0 : θ = 2 vs H1 : θ = 4. A random sample of
size n = 2,viz X1 , X2 will be used. Suppose the critical region
is given by C = {(X1 , X2 ) : 9.5 ≤ X1 + X2 < ∞}. Find the
power function of the test and the significance level.
Example

I X has p.d.f
x
(
1
θ exp− θ x >0
f (X ) =
0 otherwise
We have to test H0 : θ = 2 vs H1 : θ = 4. A random sample of
size n = 2,viz X1 , X2 will be used. Suppose the critical region
is given by C = {(X1 , X2 ) : 9.5 ≤ X1 + X2 < ∞}. Find the
power function of the test and the significance level.

I The power function of the test will be determined at only 2


points viz θ = 2, θ = 4.
Example

I X has p.d.f
x
(
1
θ exp− θ x >0
f (X ) =
0 otherwise
We have to test H0 : θ = 2 vs H1 : θ = 4. A random sample of
size n = 2,viz X1 , X2 will be used. Suppose the critical region
is given by C = {(X1 , X2 ) : 9.5 ≤ X1 + X2 < ∞}. Find the
power function of the test and the significance level.

I The power function of the test will be determined at only 2


points viz θ = 2, θ = 4.

I The power function of the test is given by,


β(θ) = Pθ {(X1 , X2 ) ∈ C }.
I For θ = 2 we have

β(2) = Pθ=2 {(X1 , X2 ) ∈ C }

= 1 − Pθ=2 {(X1 , X2 ) ∈
/ C}
= 1 − Pθ=2 {0 ≤ X1 + X2 ≤ 9.5}
9.59.5−x
Z 1
1
Z
(x1 +x2 )
= 1− exp− 2 dx2 dx1 = 0.05.
4
0 0
I For θ = 4 we have

β(4) = Pθ=4 {(X1 , X2 ) ∈ C }

= 1 − Pθ=4 {(X1 , X2 ) ∈
/ C}
= 1 − Pθ=4 {0 ≤ X1 + X2 ≤ 9.5}
9.59.5−x
Z 1
1
Z
(x1 +x2 )
= 1− exp− 4 dx2 dx1 = 0.31.
16
0 0
I Hence the power function is given by
(
0.05 θ = 2
β(θ) = .
0.31 θ = 4
I Hence the power function is given by
(
0.05 θ = 2
β(θ) = .
0.31 θ = 4

I Here the level of the test is β(2) = 0.05 i.e. at 5% level.


I Hence the power function is given by
(
0.05 θ = 2
β(θ) = .
0.31 θ = 4

I Here the level of the test is β(2) = 0.05 i.e. at 5% level.

I The power the test is given by β(4) = 0.31.


I Hence the power function is given by
(
0.05 θ = 2
β(θ) = .
0.31 θ = 4

I Here the level of the test is β(2) = 0.05 i.e. at 5% level.

I The power the test is given by β(4) = 0.31.

I Hence we have P(Type II error) = 1 − β(4) = 1 − 0.31 = 0.69


I Often in practice the null hypothesis is set from the idea that
rejection of true null is more serious.
I Often in practice the null hypothesis is set from the idea that
rejection of true null is more serious.

I Suppose we need to administer a vaccine to children in a given


locality depending on their weight. How will we choose any
locality if we know that the vaccine is for the children with
more weights?
I Often in practice the null hypothesis is set from the idea that
rejection of true null is more serious.

I Suppose we need to administer a vaccine to children in a given


locality depending on their weight. How will we choose any
locality if we know that the vaccine is for the children with
more weights?

I We can perform a test of hypothesis where we set the null


hypothesis H0 : µ ≥ 35 where µ is the average weight of
children in this locality.
I Often in practice the null hypothesis is set from the idea that
rejection of true null is more serious.

I Suppose we need to administer a vaccine to children in a given


locality depending on their weight. How will we choose any
locality if we know that the vaccine is for the children with
more weights?

I We can perform a test of hypothesis where we set the null


hypothesis H0 : µ ≥ 35 where µ is the average weight of
children in this locality.

I This assyemtry in testing of hypothesis setup has some serious


consequences in interpretation of a test.
I Often in practice the null hypothesis is set from the idea that
rejection of true null is more serious.

I Suppose we need to administer a vaccine to children in a given


locality depending on their weight. How will we choose any
locality if we know that the vaccine is for the children with
more weights?

I We can perform a test of hypothesis where we set the null


hypothesis H0 : µ ≥ 35 where µ is the average weight of
children in this locality.

I This assyemtry in testing of hypothesis setup has some serious


consequences in interpretation of a test.

I More specifically, we setup a null hypothesis tentatively for


rejection but do not reject it unless we have “significant”
evidence against it.
I Lack of evidence against H0 can mean two things:
I Evidence in favour of H0 .
I Lack of evidence.
I Lack of evidence against H0 can mean two things:
I Evidence in favour of H0 .
I Lack of evidence.
I We note that even in the second situation, we are going to
accept the null hypothesis.
I Lack of evidence against H0 can mean two things:
I Evidence in favour of H0 .
I Lack of evidence.
I We note that even in the second situation, we are going to
accept the null hypothesis.

I This inherent assymetry in the testing of hypothesis can cause


problems in certain applications, as we shall see later.
I Lack of evidence against H0 can mean two things:
I Evidence in favour of H0 .
I Lack of evidence.
I We note that even in the second situation, we are going to
accept the null hypothesis.

I This inherent assymetry in the testing of hypothesis can cause


problems in certain applications, as we shall see later.

I There is another important consequence of this assymetry:


Rejection of a null hypothesis is permanenet but acceptance is
temporary.
I Till this point we have seen any test is described completely by
a test statistic and a critical value and our decisions regarding
acceptance or rejection of the null hypothesis.
I Till this point we have seen any test is described completely by
a test statistic and a critical value and our decisions regarding
acceptance or rejection of the null hypothesis.

I However this technique has one limitation.


I Till this point we have seen any test is described completely by
a test statistic and a critical value and our decisions regarding
acceptance or rejection of the null hypothesis.

I However this technique has one limitation.

I For example, suppose we have constructed a test rule: “Reject


H0 if x̄ > 5”.
I Till this point we have seen any test is described completely by
a test statistic and a critical value and our decisions regarding
acceptance or rejection of the null hypothesis.

I However this technique has one limitation.

I For example, suppose we have constructed a test rule: “Reject


H0 if x̄ > 5”.

I Now consider two situations where in one situation based on


the actual samples we find the sample mean is 5.5 and in
another situation the sample mean comes out to be 25.
I In both the scenarios, we shall reject the null hypothesis, but
do you think the rejection is same in both the cases?
I In both the scenarios, we shall reject the null hypothesis, but
do you think the rejection is same in both the cases?

I Our formal test procedure does not make any distinction


between the two scenarios.
I In both the scenarios, we shall reject the null hypothesis, but
do you think the rejection is same in both the cases?

I Our formal test procedure does not make any distinction


between the two scenarios.

I But note that in the former case the rejection was marginal
whereas in the latter there was very strong evidence to reject
the null hypothesis and decision was strong.
I In both the scenarios, we shall reject the null hypothesis, but
do you think the rejection is same in both the cases?

I Our formal test procedure does not make any distinction


between the two scenarios.

I But note that in the former case the rejection was marginal
whereas in the latter there was very strong evidence to reject
the null hypothesis and decision was strong.

I Recall that we had a similar situation in problem of estimation


where we used to supply standard error along with the
estimate as a measure of “reliability” of our decision rules.
Now what will play the role of standard errors in case of
testing of hypothesis?
I The answer to this question is p-value. Formally we define
p-value as the smallest level of significance at which the
null hypothesis is rejected

inf {x ∈ Rα }
α

where Rα is the rejection region at level α.


I The answer to this question is p-value. Formally we define
p-value as the smallest level of significance at which the
null hypothesis is rejected

inf {x ∈ Rα }
α

where Rα is the rejection region at level α.

I This suggests for a size α test, if p-value comes out to be less


than α, then the null hypothesis is rejected and if it comes out
to be more than α then the null hypothesis is accepted.
I The answer to this question is p-value. Formally we define
p-value as the smallest level of significance at which the
null hypothesis is rejected

inf {x ∈ Rα }
α

where Rα is the rejection region at level α.

I This suggests for a size α test, if p-value comes out to be less


than α, then the null hypothesis is rejected and if it comes out
to be more than α then the null hypothesis is accepted.

I But what does this definition really mean?


I To understand we need to look at the form of p-value
separately for right tailed, left tailed and both tailed tests.
I To understand we need to look at the form of p-value
separately for right tailed, left tailed and both tailed tests.

I If T is the test statistic and t is the observed value of the test


statistic, then the definition of p-value boils down to

PH0 (T > t) for right tailed tests,

PH0 (T < t) for left tailed tests and


2{min{PH0 (T > t), PH0 (T < t)}} for both tailed tests.
I To understand we need to look at the form of p-value
separately for right tailed, left tailed and both tailed tests.

I If T is the test statistic and t is the observed value of the test


statistic, then the definition of p-value boils down to

PH0 (T > t) for right tailed tests,

PH0 (T < t) for left tailed tests and


2{min{PH0 (T > t), PH0 (T < t)}} for both tailed tests.

I That the general definition of p-value gets simplified in each of


the three cases above can be illustrated with an example:
Example

I Suppose X ∼ Bin(10, p).


Example

I Suppose X ∼ Bin(10, p).

I Consider the problem of testing

H0 : p = 0.5 against H1 : p > 0.5.


Example

I Suppose X ∼ Bin(10, p).

I Consider the problem of testing

H0 : p = 0.5 against H1 : p > 0.5.

I Suppose we perform a right tailed test based on X , that is, the


critical region at size α(c) is

Rα(c) = {x : x ≥ c}, where α(c) = PH0 (X ≥ c).


Example

I Suppose X ∼ Bin(10, p).

I Consider the problem of testing

H0 : p = 0.5 against H1 : p > 0.5.

I Suppose we perform a right tailed test based on X , that is, the


critical region at size α(c) is

Rα(c) = {x : x ≥ c}, where α(c) = PH0 (X ≥ c).

I We note that as c increases α(c) decreases, as for example,

α(4) = PH0 (X ≥ 4) > PH0 (X ≥ 5) = α(5).


I Now suppose the observed value of X is x = 6.
I Now suppose the observed value of X is x = 6.
I Then
if we choose c = 4, α(4) = PH0 (X ≥ 4) and Rα(4) = {x : x ≥ 4} and x = 6 ∈ Rα(4) ,

if we choose c = 5, α(5) = PH0 (X ≥ 5) and Rα(5) = {x : x ≥ 5} and x = 6 ∈ Rα(5) ,


if we choose c = 6, α(6) = PH0 (X ≥ 6) and Rα(6) = {x : x ≥ 6} and x = 6 ∈ Rα(6) ,

if we choose c = 7, α(7) = PH0 (X ≥ 7) and Rα(7) = {x : x ≥ 7} and x = 6 ∈


/ Rα(7) .
I Thus the p-value is

inf {x ∈ Rα } = α(6) = PH0 (X ≥ 6).


α
I Thus the p-value is

inf {x ∈ Rα } = α(6) = PH0 (X ≥ 6).


α

I Thus p-value represents the distance between the current


observed value of the test statistic from the alternative
hypothesis.
I Thus the p-value is

inf {x ∈ Rα } = α(6) = PH0 (X ≥ 6).


α

I Thus p-value represents the distance between the current


observed value of the test statistic from the alternative
hypothesis.

I If that distance is small, we can conclude that the observed


value lies close to H1 as compared to H0 leading to the
rejection of the null.
I Thus the p-value is

inf {x ∈ Rα } = α(6) = PH0 (X ≥ 6).


α

I Thus p-value represents the distance between the current


observed value of the test statistic from the alternative
hypothesis.

I If that distance is small, we can conclude that the observed


value lies close to H1 as compared to H0 leading to the
rejection of the null.

I p-value in itself is a measure for taking decision regarding the


rejection of the null hypothesis - we don’t need to bother
about the level of significance.
I A natural question arises when constructing tests. Why can’t
we accept or reject the null hypothesis just by looking at the
test statistic only?
I A natural question arises when constructing tests. Why can’t
we accept or reject the null hypothesis just by looking at the
test statistic only?

I What is that we need for critical values?


I A natural question arises when constructing tests. Why can’t
we accept or reject the null hypothesis just by looking at the
test statistic only?

I What is that we need for critical values?

I For example, suppose we want to test for the population mean


as H0 : µ = 5 and we compute the sample mean x̄.
I A natural question arises when constructing tests. Why can’t
we accept or reject the null hypothesis just by looking at the
test statistic only?

I What is that we need for critical values?

I For example, suppose we want to test for the population mean


as H0 : µ = 5 and we compute the sample mean x̄.

I Suppose the sample mean comes out to be 6 and then you


straight away reject the null hypothesis because sample mean
is different from 5.
I A natural question arises when constructing tests. Why can’t
we accept or reject the null hypothesis just by looking at the
test statistic only?

I What is that we need for critical values?

I For example, suppose we want to test for the population mean


as H0 : µ = 5 and we compute the sample mean x̄.

I Suppose the sample mean comes out to be 6 and then you


straight away reject the null hypothesis because sample mean
is different from 5.

I Obviously that will be very strict against the null hypothesis


but apart from that we need to understand that we are dealing
with random samples.
I We need to take into account the sampling variability of the
sample mean and understand that though on one particular
occasion it comes out to be 6, on other sampling occasions it
might turn out to be 5.
I We need to take into account the sampling variability of the
sample mean and understand that though on one particular
occasion it comes out to be 6, on other sampling occasions it
might turn out to be 5.

I The idea is, if it happens that based on the sampling


distribution of x̄ we believe that 5 is really an impossible value
of µ, then we shall reject H0 .
I We need to take into account the sampling variability of the
sample mean and understand that though on one particular
occasion it comes out to be 6, on other sampling occasions it
might turn out to be 5.

I The idea is, if it happens that based on the sampling


distribution of x̄ we believe that 5 is really an impossible value
of µ, then we shall reject H0 .

I Again it turns out that the rejection or acceptance of H0


depends on our belief or notion about “impossibility” which is
subjective.
I We need to take into account the sampling variability of the
sample mean and understand that though on one particular
occasion it comes out to be 6, on other sampling occasions it
might turn out to be 5.

I The idea is, if it happens that based on the sampling


distribution of x̄ we believe that 5 is really an impossible value
of µ, then we shall reject H0 .

I Again it turns out that the rejection or acceptance of H0


depends on our belief or notion about “impossibility” which is
subjective.

I It is due to this subjective belief about impossibility we may


commit error and the extent of error we allow is governed by
the notion of α.
I For example, if we take α = 0.01, we are very much cautious
regarding committing error and hence the resultant test will be
very stringent, rather than the situation where we allow
α = 0.05.
I For example, if we take α = 0.01, we are very much cautious
regarding committing error and hence the resultant test will be
very stringent, rather than the situation where we allow
α = 0.05.

I The bottomline is: there cannot be any rigorous rule regarding


the acceptance or rejection of a hypothesis, it depends on an
individual’s perspective about the specific problem (which is
governed by α and we are allowed to change it).
Two Approaches of finding tests

I There are, in general, two approaches of constructing tests:


Two Approaches of finding tests

I There are, in general, two approaches of constructing tests:

I Heuristic Approach: Here the construction of the test criterion


starts from intuition. Then we modify the statistic accordingly
to get a form which has a standard sampling distribution.
Two Approaches of finding tests

I There are, in general, two approaches of constructing tests:

I Heuristic Approach: Here the construction of the test criterion


starts from intuition. Then we modify the statistic accordingly
to get a form which has a standard sampling distribution.

I Optimality Approach: Here the construction of the test


criterion starts from the objective of optimizing some suitable
criterion and then modifying it accordingly to get one with a
standard distribution.
I It may appear that when we have a rigorous approach available
based on optimality criterion, why we really care for the
heuristic approach !!!
I It may appear that when we have a rigorous approach available
based on optimality criterion, why we really care for the
heuristic approach !!!

I As a matter of fact both these approaches are equally


important.
I It may appear that when we have a rigorous approach available
based on optimality criterion, why we really care for the
heuristic approach !!!

I As a matter of fact both these approaches are equally


important.

I We care for heuristic approach because in many situations


(specially in multivariate setup) optimal tests are hard to find
and in some cases impossible.

You might also like