[go: up one dir, main page]

0% found this document useful (0 votes)
29 views9 pages

P Value Calculation

This study investigates the p-value concept in hypothesis testing, particularly in relation to mortality rate data from a hospital in Nigeria. It emphasizes the importance of p-values in determining statistical significance and avoiding type I errors, while also critiquing the reliance on fixed significance levels. The research aims to enhance understanding of p-values and their application in statistical education and decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views9 pages

P Value Calculation

This study investigates the p-value concept in hypothesis testing, particularly in relation to mortality rate data from a hospital in Nigeria. It emphasizes the importance of p-values in determining statistical significance and avoiding type I errors, while also critiquing the reliance on fixed significance levels. The research aims to enhance understanding of p-values and their application in statistical education and decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

The P-Value Concept in Hypothesis Testing and

Its Application on Mortality Rate Data


1
Pwasong, A.D. and 2Kembe M.M.
1
Department Of Mathematics, University Of Jos. Davougus@Gmail.Com
2
Benue State University, Markurdi, Benue State. Email:kdzever@yahoo.com, Tel: 08036177129

Abstract

This study is aimed at comparing the probability value (p–value) of various hypotheses tested
with the specified level of significance α at 5% level. The study used data obtained from
Hajiya Gambo Sawaba Government General Hospital, Kofan Gaya, Zaria and other related
examples to achieve this objective. The study involved the development and validation of the
reasoning about p–values and statistical significance scale. The study finally recommends the
use of p–value to take care of the probability of committing a type I error.

Keywords: P – value, hypothesis, significance and infarct


________________________________________________________________________

1.0 Introduction
More often than not, experiments are brother Pearson, emphasized mathematical
carried out primarily to discover new facts rigor and methods to obtain more results
or to test the result of previous findings. from many samples and a wider range of
One of the most important tools in the distributions.
analysis of experiment is hypothesis The concept of p–values has been
testing. [2] Defined test of hypotheses or adopted widely in practice to avoid the
test procedure as a method for using imposition of the predefined level of
sample data to decide between two significance that is always fixed at a
competing claims (hypotheses) about a specified level known as α – value. [8],
population characteristic. One hypothesis defined the p–value as the probability of
might be µ = 1000 and the other µ ≠ 1000, obtaining a value for the test statistic that
or one might be π = 0.01 and the other π < is as extreme, or more extreme (taking
0.01. They further assert that, if it were account of the alternative hypothesis).
possible to carry out a census of the entire They further explained that if the actual
population, we would know which of the value of the statistics is too far from its
two hypotheses is correct, but usually we expected value, the test is deemed to be
must decide between them using significant and the decision to reject H0 in
information from samples. favour of H1. If the actual value of the
According to [1] hypothesis testing is statistics is close to its expected value the
largely the product of Ronald Fisher, Jerzy test is deemed not to be significant and the
Neyman, Karl Pearson and Egon Pearson. decision is not to reject H0. The set of
Fisher was an agricultural statistician who values of the statistic that lead to rejection
emphasized rigorous experimental design of H0 is called the critical region or
and methods to extract a result from few rejection region, and the set of values that
samples assuming Gaussian distributions. do not lead to rejection of H0 is called the
Neyman, who teamed with the younger acceptance region.
.

West African Journal of Industrial & Academic Research Vol.10 No.1 135
April 2014
The p-value being a probability can take significant the data are without the data
any value between 0 and 1. Values closed analyst formally imposing a pre-selected
to 0 indicate that the observed difference is significance.
unlikely to be due to chance, whereas a p- A review of the research literature from
value close to 1 suggests there is no the field of statistics, statistical and
difference between groups other than that mathematics education, psychology and
due to random variations. educational psychology reveals difficulties
More technically, a p–value of an or misconceptions students may have
experiment is a random variable defined understanding probability and statistics.
over the sample of the experiment, such Researchers have examined how people’s
that, its distribution under the null prior intuitions, heuristics, and biases may
hypothesis is uniform on the interval [0, impact their reasoning about problems in
1]. Many p– values can be defined for the probability, data analysis and descriptive
same experiment. statistics; for example, as in [3,5,6]. It was
Frequent uses of a fixed level of common in the past for researchers to
significance have become a thing of classify results as statistically significant
concern in hypothesis testing. Looking or non significant, based on whether the p–
into this, the use of p-value as a level of value was smaller than some pre–specified
significance has become necessary. Many cut point, commonly 0.05. This practice is
statisticians or experimenters are not aware now becoming increasingly obsolete, and
of the use of p– values and tend to stick to the use of exact p–values is much
a pre–selected level of significance which preferred. This is partly for practical
would not bring the true picture of whether reasons, because the increasing use of
a given level of significance is barely into statistical software renders calculation of
a rejection region or far into the region. exact p–value simple as compared with the
Hence, this paper is aimed at determining past when tabulated values were used.
how to calculate p–values using The goal of this study is to develop an
appropriate test statistics and why p-values instrument for statistics education research
are preferable to other fixed level of that shows evidence of making inferences
significance, as well as the relationship about student’s inferential understanding.
between p–values and other level of The study involved the development and
significance. validation of the reasoning about p–values
Hypothesis testing is so important and statistical significance scale.
because it provides an objective frame-
work for making decisions using 2.0 Methodology
probabilistic methods, rather than relying In another development, [7] asserts
on subjective impression. People can form that the purpose of the p–value test is to
different opinions by looking at data, but a facilitate statistics education research on
hypothesis test provides a uniform students’ conceptual understanding and
decision making criterion that is consistent misunderstanding of statistical inference
for all people. Hypothesis testing is also and the effect of instructional approaches
important and crucial in decision making. on the understanding. This section
As part of statistical inference, it is widely describes the method used to describe
applied in various aspects of discipline statistical significance and reasoning about
such as: economics, business and science. p–values.
P– values as an extension of hypothesis
testing is very important as it provides the 2.1 Data source
true value of α for which the data is The data used for analysis in this paper
significant. Once the p–value is known was obtained from the secondary source,
the decision maker can determine how where the data is not originally collected
West African Journal of Industrial & Academic Research Vol.10 No.1 136
April 2014
by the investigator, but rather obtained (i) The populations are identical; so
from published or unpublished sources. there is really no difference. You happened
The data was obtained from a recognized to randomly obtained larger values in one
government institution of the Federal group and smaller values in the other, and
Government of Nigeria. The institution is a the difference was large enough to
hospital called “ Hajiya Gambo Sawaba generate a p- value less than the threshold
Government General Hospital”, Kofan you set. Finding a statistically significant
Gaya, Zaria. The data were from the result when the populations are identical is
medical records unit of the hospital. called making a type I error.
(ii) The populations really are
2.2 The p–value different, so your conclusion is correct.
Statistical analysis is most useful when In writing up the result of a study, a
one is looking for difference that is small distinction between scientific and
compared to experimental impression and statistical significance should be made,
biological variability. A p–value is a since the two terms do not necessarily
measure of how much evidence one has coincide. The result of a study can be
against the null hypothesis. The null statistically significant but still not be
hypothesis, traditionally represented by the scientifically important. This situation
symbol H0 is true. The type of hypothesis would occur if a small difference was
tests (right tailed test, left tailed test or two found to be statistically significant because
tailed test) will determine what “more of a large sample size. Conversely, some
extreme” means. The p–value measures statistically non-significant result can be
consistency by calculating the probability scientifically important, encouraging
of observing the result from your sample researchers to perform large studies to
of data or a sample with result more confirm the direction of the findings and
extreme, assuming the null hypothesis is possibly reject H0 with a larger sample
true. The smaller the p–value, the greater size.
the inconsistency of the null hypothesis.
The general rule is that a small p–value is 2.3.1 One vs two-tailed p-value method
evidence against the null hypothesis while When comparing two groups, you must
a large p–value means little or no evidence distinguish between one and two-tailed p-
against the null hypothesis. values. Start with the null hypothesis that
the two populations really are the same
2.3 Statistical significance and that the observed differences between
The term significant is seductive, and it sample means is due to chance. The two-
is easy to misinterpret it. A result is said to tailed p- value answers this question.
be statistically significant when the p- Assuming the null hypothesis, what is the
value is less than a pre-set threshold value. chance that randomly selected samples
It is easy to read into that word significant would have means as far apart as observed
because the statistical use of the word has in this experiment with either group
a meaning entirely distinct from its usual having the larger mean?
meaning. Just because a difference is To interpret a one-tail p-value, you must
statistically significant does not mean it is predict which group would have the larger
important or interesting. And a result that mean before collecting any data. The one-
is not statistically significant (in the first tail p-value answers these questions.
experiment) may turn out to be very Assuming the null hypothesis, what is the
important. chance that randomly selected samples
If a result is statistically significant, would have means as far apart as observed
there are two possible explanations: in this experiment with this specific group
having the larger mean? A one-tail p-value
West African Journal of Industrial & Academic Research Vol.10 No.1 137
April 2014
is appropriate only when previous data, "do not reject the null hypothesis" the
physical limitation or common sense tell difference is not "statistically significant."
you that a difference, if any, can only go in
one direction. The issue is not whether you 2.3.2 P-value under single value
expect a difference to exist that is what (One-tailed test)
you are trying to find out with the A test of any statistical hypothesis,
experiment. The issue is whether you where the alternative hypothesis is
should interpret increases and decreases expressed by means of a less than symbol
the same. (<) or greater than symbol (>) is called a
One should only choose a one-tail p- one tailed test, since the entire critical
value when one believes the following: region lies in one tail of the distribution of
(i) Before collecting any data, you the test statistic. The symbols < or > point
can predict which group will have the to the direction of the critical region. The
larger mean ( if the means are in fact steps for testing a hypothesis about a mean
different) of a population with known variance
(ii) If the other group ends up with the against one sided alternative hypothesis
larger mean, then you should be willing to may be summarized as follows:
attribute that difference to chance, no (i) Ho: µ = µ 0
matter how large the difference. (ii) H1: alternative is either µ< µ0 or µ >
It is usually best to use a two tailed p- µ0
values for these reasons: (iii) Choose a level of significance equal
(i) The relationship between p- to α
values and confidence interval is (iv) Critical region Z < -Zα for the
clearer with two-tailed p-value. alternative µ < µ0 or Z > Zα for the
(ii) Some tests compare three or more alternative µ< µ0
groups, which makes the concepts of tail
inappropriate. where Z has a standard normal
In other situations, you will want to distribution. Compute x from a random
make a decision based on a single sample of size n, and then find
comparison. In these situations, follow the x  0
steps of statistical hypothesis testing: Z

(i) Set a threshold p-value before you do
the experiment. Ideally, you should set this n (1)
value based on the relative consequences
of missing a true difference or falsely (v) Conclusion: Reject H0 if Z falls in the
finding a difference. In fact, the critical region otherwise accepts H0.
threshold value called (alpha) is
traditionally almost always set to 0.05. We can solve for p as a function of Z
(ii) Define the null hypothesis. If you are  x  0 
comparing two means, the null hypothesis by: p   z p    z     (
is that the two populations have the same   
means. 2)
(iii) Do the .appropriate statistical test to
compute the p-value. Example: A topic of recent clinical
(iv) Compare the p-value to the present interest is the probability of using drugs to
threshold value. If the p-value is less than reduce infarct size in patients who have
the threshold, state that you "reject the null had a myocardial infarction within the past
hypothesis" and that the difference is 24 hours. Suppose we know that in
"statistically significant". If the p-value is untreated patients the mean infarct size is
greater than the threshold value, state that 25 with a standard deviation of 10.

West African Journal of Industrial & Academic Research Vol.10 No.1 138
April 2014
Furthermore, in 8 patients treated with the n=8
drug, the mean infarct size is 16. Is the The p-value is computed using
drug effective in reducing infarct size? 16  25 
(Use α= 0.05). =  Z p     z        2.55 =1-
 10 8 
Solution: The hypotheses are:  (2.55) = 1 - ] = 0.9945  0.005
Ho: µ = 25 versus H1: µ<25,  =10 and
Hence, p = 0.005 < 0.05. Thus H0 is (iv) Choose a level of significance
rejected and we conclude that the drug equal to α
reduces infarct size. (v) Critical region Z < -Zα/2 and Z >
“ (vi) Zα/2 for the alternative µ ≠ µ 0
16  25  where Z has a standard normal
p    zp     z       2.55  1    2.55  1  0.9945  0.005
 10 8  distribution. Compute x from a random
x  0
sample of size n and then find Z 
The importance of p-value is that it tells  n
us exactly how significant the results are (vii) Conclusion: reject H0 if Z falls in
without performing repeated significance the critical region otherwise accept H0.
tests at different levels. In the above
example the p-value is equal to 0.005 and These are extracts from [7].
thus the results are highly significant,
which is known under the null hypothesis 2.3.4 Determining the p-value for one
x ~ N  0 ,  2 n  . Hence the probability of sample Z test
obtaining a sample that is no larger than x To determine the p-value for the one-
under the null hypothesis is: sample Z test for the mean of a normal
distribution with known variance (two –
 x  0 
    z  = p-value (3) alternatives ) is given :
 n 
These are extracts from [3].  2  z  ,
 if z  0
 (4)
2.3.3 P-value under single value 2 1    z   , if z  0

(two-tailed test)
A test of any statistical hypothesis Thus, in words, if Z  0, then p = 2
where the alternative is written with a non- times the area under an N (0, 1)
equal sign ( ≠ ) is called a two-tailed test, distribution to the left of Z; if Z > 0, then p
since the critical region is split into two = 2 times the area under an N (0,1)
equal parts, one in each tail of the distribution to the right of Z. Using the
distribution of the test statistic. above example we have:
The null hypothesis, H0 will always be H0: µ = 25 versus H1:   25,
stated using the equality sign so as to  =10
specify a single value. In this way the The p-value is computed using
probability of committing a type 1 error p = 2 x  (-2.55)
can be controlled. The steps for testing a = 2 x [l -  (2.55)]
hypothesis about a mean of a population = 2 x (1 - 0.9945)
with known variance ∂2 against two-sided = 2 x 0.005 = 0.01
alternative hypothesis may be summarized p = 0.01 < 0.05
as follows: Therefore we reject H0 and conclude
that the drug reduces infarct size.
(i) Ho: µ = µ0
(iii) H1: µ ≠ µ0 2.3.4 P-value between two values
West African Journal of Industrial & Academic Research Vol.10 No.1 139
April 2014
(One-tailed test) x  0
The steps for calculating p-value (v) Z (5)
 n
between two values with one-sided
alternative is the same as the one
considered in 2.3.3 above, the only
We can calculate p-value in terms of Z
difference is that the levels of significance
alpha (α) are in two different forms. If we i.e. p =  (Z). Since it is two-sided
decide to use two different levels of α at alternative we have:
0.05 and 0.01 respectively, we may want  2  z  ,
 if z  0
 (6)
to determine whether H0 is rejected at both

 2 1    z   , if z  0
levels of significant or accepted in one of
the levels and rejected in the other. From
our previous example we know that the p- Using our previous example we have
value, using one sided alternative is 0.005.  
16  25 
If we compare this value with the two p =  Z p    z       2.55
 10 
levels of α i.e. 0.05 and 0.01, we will  8 
notice that the p-value which is 0.005 is so
Therefore, for two-sided alternative we
small compared to these two values. So we
have:
reject H0 and conclude that the result is
highly significant. 2 (1 –  (2.55)) = 2 (1 - 0.9945) = 2
(0.005) = 0.01. Therefore p = 0.01.
2.3.5 P-value between two values
(Two-tailed test) Conclusion: Since 0.01 < P < 0.05,
To calculate p–value between two we reject H0 and conclude that the result
values with two-sided alternative requires is statistically significant (i.e. the Drug
the following steps: may reduce the infarct size).
(i) Set H0: µ = µ0
(ii) Set H1 :µ ≠ µ0 3.0 Data Presentation
(iii) Choose a level of significance α. In The data in tables 1 and 2 are part of the
this case α is chosen at two levels (0.01 secondary data collected from the Hajiya
and 0.05) Gambo Sawaba Government General
(iv) Critical region Z < -Zα/2 and Z > Hospital Kofan-Gayan, Zaria, Kaduna
Zα/2 for the alternative where Z has a State. The data were subjected to analysis
standard zormal distribution. Compute X using Predictive Analytic Software
from a random sample of size n, and then (PASW), with a view to calculating the p-
find: value for all the hypotheses under
consideration
.
Table 1: Mortality rate by sex distribution of Hajiya Gambo Sawaba Government
General Hospital, Zaria in 1999.

Male 8 3 12 11 6 8 10 12 6 17 2 8 9 9 8 2

Female 10 14 6 3 9 6 6 3 4 20 2 4 6 7 4 8

Table 2: Mortality rate by age\ sex distribution of Hajiya Gambo Sawaba


Government General Hospital, Zaria
West African Journal of Industrial & Academic Research Vol.10 No.1 140
April 2014
3.1 Analysis and Results.
The data was analyzed using the (ii) H1: There is significant difference
Predictive Analytic Software (PASW) and between male and female with respect to
tables 3(a), 3(b), 4 and 5 revealed the mortality rate.
results obtained. The level of significance (iii) H0: There is no significant
is 0.05 and the various hypotheses to be difference between sex and age group with
tested are listed below: respect to mortality rate.
(iv) H1: There is significant difference
( i ) H0: There is no significant difference between sex and age group with respect to
between male and female with respect to mortality rate.
mortality rate.
.
Table 3(a): t – test for sex with respect to mortality rate (Group Statistics)
Sex N Mean Std. Deviation Std. Error Mean
Mortality rate Male
16 8.1875 3.93647 0.9841

Female 16 7.000 4.6188 1.1547

Table 3(b): Independent sample test


Levene’s test t – test for equality of means
forEquality
of variances
Mortalit
y F Sig. t df Sig.(2- Mean Std.error 95% Confidence
Rate tailed) differe difference
nce

Equal
variances 0.162 0.69 0.783 30 0.44 1.1875 1.51718 -1.9 4.29
assumed 1

Equal 0.783 29.265 0.44 1.1875 1.51718 -1.9 4.29


variances
not
assumed

West African Journal of Industrial & Academic Research Vol.10 No.1 141
April 2014
3.2 Interpretation
In comparing mortality rate between 0.44 is greater than 0.05. Hence we
male and female in Sawaba Government conclude by accepting the null hypothesis
General Hospital, Zaria in 1999, the result that there is no significant difference
above clearly portrayed there is no existing between male and female
significant difference between male and mortality rate. For the chi-square test, the
female with respect to mortality rate. At results in tables 4 and 5 were obtained.
5% significance level, the p-value which is
.
Table 4: Sex and age group cross tabulation
Crosstab
Age
0 -14 15 - 64 65 - above Total
Sex M Count 325 836 348 1509
Expected
349.3 741.5 418.1 1509
Count
F Count 335 565 442 1342
Expected
310.7 859.5 371.9 1342
Count
Total Count 660 1401 790 2851
Expected
660 1401 790 2851
Count

Table 5: Chi-Square Test


Value df Asymp. Sig(2-sided)
Pearson Chi-Square 54.160a 2 0.000
Likelihood Ratio 54.327 2 0.000
Linear-by-Linear Association
5.83 1 0.016

No of Valid Cases 2851


a
cells (0%) have expected count less than 5. The minimum expected count

From the table 5 above we can say that 4.0 Conclusion and Recommendation
the difference between sex and age group From the above results, it is seen that
with respect to mortality rate is highly the conclusion reached using p-value at α
significant. Using p-value approach the (alpha) level of testing is more accurate
null hypothesis is rejected since p-value at and more reliable tha those reached using
0.000 is less than 0.05 level of other criteria.
significance. We therefore, conclude that The use of p-value did not only show
the mortality rate does not depend on age the significant difference but also revealed
and sex. how significant the observed difference is
in statistical point of view. We can deduce

West African Journal of Industrial & Academic Research Vol.10 No.1 142
April 2014
from the p-value whether the observed ability to eliminate the fear of doubt as
difference is far into the critical regions or regarding the validity of conclusions being
merely into the regions and with this drawn from the result of an experiment.
information, we can decide whether the Since performing experiments cost money,
data should be adjusted to meet up with time, energy and resources. Utmost
the desired accuracy or be thrown aside caution has to be exercised before drawing
completely. conclusions. Unfortunately, many
Finally p-value do not only offer experimenters or investigators are not
experimenters or investigators varieties of aware of the use of p-values and for this
choices, but also eliminates the fear of reason, it is recommended that p-values
imposition of pre-set level of significance should always be used in hypothesis
that always result in reaching partial or testing rather than the pre-set level of
inadequate conclusion whenever an significance that we are conversant with.
experiment or trial is conducted. The main Moreover, since a p-value conveys
goal in this research work is to compare much information about the weight of
the probability value (p-value) of various evidence against the null hypothesis and
hypotheses tested with the specified level knowing fully from the fact that rejecting a
of significance α (alpha) at 5% level with a true null hypothesis H0 implies probability
view to determining whether the null of committing type I error, we therefore,
hypothesis should be rejected or not. recommend the use of p-value to take care
One of the most important advantages of this error.
of p-value in hypothesis testing is its
________________________________________________________________________

References
[1] Cobb, G. ( 1992 ). Teaching statistics. In L.A. Steen ( Ed:), Heading the case for change
: Suggestions for curricular action ( pp. 3 – 43 ).Wasshington, D.C: the Mathematical
Association of America.
[2] Devore, J., & Peck, R. ( 2006 ). Statistics: The exploration and analysis of data ( 5th ed: ),
Belmont; CA: Brooks/ Cole – Thomson Learning.
[3] Garfield, J., & Ahlgren, A. ( 1988 ). Difficulties in learning basic concepts in probability
and statistics: Implications for research. Journal for Research in Mathematics Education
19(1), 44 – 63.
[4] Huberty, C.J. (1993). Historical origins of statistical testing practices: The treatment of
Fisher versus Neyman – Pearson view in textbooks. Journal of Experimental Education,
61(4), 317 – 333.
[5] Kahneman, D., & Tversky, A. (1982). Subjective probability: A judgement of
representativeness. In D. Kahneman, P. Slovic & A. Tversky (Eds.), judgement under
uncertainty: Heuristics and biases ( pp. 32 – 47 ): Cambridge: Cambridge University
Press
[6] Konold, C. (1995). Issues in assessing conceptual understanding in probability and
statistics. Journal of Statistics Education, 3(1). Retrieved march 20, 2007, from
http://www.amstat.org/publications/jse/v3n1/konold.htm/
[7] Nickerson, R.S. (2000). Null hypothesis significance testing: A review of an old and
continuing controversy. Psychological methods, 5(2), 241 - 301
[8] Saldanha, L.A., & Thompson, P.W. (2006). Investigating statistical unusualness in the
context of a resampling activity: Students exploring connections between sampling
distribution and statistical inference. In A. Rossman & B. Chance (Eds). Working
cooperatively in statistics education: Proceedings of the Seventh International
West African Journal of Industrial & Academic Research Vol.10 No.1 143
April 2014

You might also like