[go: up one dir, main page]

0% found this document useful (0 votes)
114 views21 pages

Statistical Hypothesis Testing

The document discusses statistical hypothesis testing. It begins by explaining that hypothesis testing involves setting up a null hypothesis and an alternative hypothesis, then collecting data to determine whether to reject or fail to reject the null hypothesis. It provides examples of different types of null and alternative hypotheses. It then goes into detail about how to set up a test criterion to determine whether to reject the null hypothesis based on a test statistic and critical value. It discusses types of errors that can occur and defines a type I error as rejecting the null hypothesis when it is actually true.

Uploaded by

Ajit Karnik
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views21 pages

Statistical Hypothesis Testing

The document discusses statistical hypothesis testing. It begins by explaining that hypothesis testing involves setting up a null hypothesis and an alternative hypothesis, then collecting data to determine whether to reject or fail to reject the null hypothesis. It provides examples of different types of null and alternative hypotheses. It then goes into detail about how to set up a test criterion to determine whether to reject the null hypothesis based on a test statistic and critical value. It discusses types of errors that can occur and defines a type I error as rejecting the null hypothesis when it is actually true.

Uploaded by

Ajit Karnik
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Statistical Hypothesis Testing

Ajit Karnik Middlesex University Dubai

Testing Statistical Hypothesis


We use statistics to describe pattern in our data, and then we use statistical tests to decide whether the predictions of a hypothesis are supported or not. Statistics is an inductive process: we are trying to draw general conclusions based on a specific, limited sample. The process can be as shown on the next slide:

Maintained & Testable Hypotheses


A hypothesis is an assumption about the population. Typically, we may make numerous assumptions but not all are tested. Those that are not tested are called maintained hypotheses. One can never be sure that all the maintained hypotheses are valid we believe these are very likely to be valid. The other assumptions that are tested are called testable hypotheses.

Testable hypotheses
Usually, the testable hypothesis is that a certain population parameter is equal to a given value (or it does not exceed/fall below some value). This is the null hypothesis. Example: Griffindors and Slytherins have equal IQs This is a testable hypothesis stating that the populations means of the two houses at Hogwarts are equal.

Testing Procedure
We would draw random samples from among the Griffindors and from among the Slytherins and compare their respective sample means. Maintained hypotheses:
Any difference in IQs is due to membership of the Houses and not due to family income levels, gender, family background (Muggles vs non-Muggles!) IQ level is normally distributed with known variance No prior presumption that either population mean is greater than the other.

The final assumption suggests an alternative hypothesis.

Alternative Hypothesis
Since the null hyp. is testable, there must be a counterproposition to it: the alternative hypothesis. The following are possible null and alternative hyps. (i) H0: = 0 (iii) H0: 0 HA: = A or HA: > 0 (ii) H0: = 0 HA: 0 or

Precision in Formulating Hypotheses


Specific claims are easier to disprove than vague claims; hence, the null must be stated as specifically as possible. Given two hyps., one simple and the other composite, we choose the simple one as the null. Quite often the null represents the status quo e.g. the new drug leads to the same mean percentage of recoveries as the old drug. We might actually want to disprove this null in favour of an alternative that states that the new drug does better.

Disproving the Null


The null hyp. is a proposition that is considered valid unless evidence throws serious doubt on it. This is much like a court of law where the person on trial is presumed innocent unless evidence suggests beyond reasonable doubt that (s)he is guilty. It is up to the statistician to muster enough evidence to prove the null incorrect. Just as the accused is never proved innocent but only not guilty, likewise we never accept the null; we merely do not reject it.

The process in Words


Setting up the null and alternative hyps. is the first step. The next step is to devise a test criterion which allow us to decide whether the null is to be rejected or not on the basis of available evidence. This criterion is always the same: it defines a test statistic and a boundary for dividing the sample space into a region of rejection and a region of non-rejection.

The test statistic is a random variable whose value changes from sample to sample. The region of rejection (critical region) is a subset of the sample space such that if the test statistic falls in that region, the null is rejected. The region of non-rejection (acceptance region) is a subset of the sample space such that if the test statistic falls in that region, the null is not rejected. The boundary between rejection and acceptance called critical value is determined by
prior information about the distribution of the test statistic, specification of the alternative hyp., by considerations of costs of arriving at a wrong conclusion.

Test Criterion
Suppose we focus on the mean of a variable X. Then, H0: = 0 HA: 0 Suppose also that the maintained hyps. are
X is normally distributed The variance of X is 2 and is known

The sample mean may be used to summarise the sample evidence about the population mean.

Test Criterion (contd.)


Criterion: If the value of X-bar ( X) is very different from 0, reject the null; if not, do not reject the

null. The question is: which values of X-bar are to be considered very different from from 0. To decide this, we consider the sampling distribution of X-bar. If X is normal with mean 0 and variance 2, the mean will also be normal with mean 0 and variance 2/n. Since the normal distribution ranges from - to +, any value of X-bar can be observed, whatever be the population mean.

Test Criterion (contd.)


If, however, the true mean of the population is 0, then the values of X-bar in intervals close to 0 will occur with greater probability. Hence, different from 0 would be those values of X-bar which if 0 is the true mean would occur by chance only very rarely, say with probability of 0.01 or 1%.

Test Criterion (contd.)


does not hold, the true mean may be on either side of 0. Thus, values of X-bar that are very much larger than 0 as well as those which are very much smaller would constitute evidence against the null. The boundaries between the critical and acceptance regions will be such that we will reject the null if the value of X-bar turned out to be either so low or so high that its occurrence by chance would be very unlikely.
As per the HA: 0. This means that if the null

Test Criterion (contd.)


Since we have called a rare event as one which occurs with probability of 0.01, this value must be shared by excessively low and excessively high values of X-bar. That is probability that X-bar is excessively low is 0.005 and that it is excessively high is 0.005. Denote values below which X-bar is excessively low as Land above which X-bar is excessively high by H.

Test Criterion A Picture

0.005

0.99

0.005

In the above picture:


P(x-bar < L) = 0.005 P(x-bar > H) = 0.005 P(L x-bar H) = 0.99

Test Criterion (contd.)


We could consider X-bar as a test statistic and the interval from L to H as the acceptance region. Difficulty: we do not know the location (i.e. values) of L and H in the diagram of the normal distribution on the previous slide. However, we can determine the location of their counterparts if we use the standard normal distribution.

Standard Normal Distribution


Any normally distributed variable X can be converted to a standard normal variable Z if we know the mean () and the standard deviation (): Z = (X- )/ The snv Z always has a mean = 0 and 2 = 1. The total area under the std. normal curve is equal to one and probabilities for z lying between two values (say z1 and z2) are tabulated in the normal tables.

Standard Normal Curve

0.005

0.99

0.005

zL
Rejection region

zH
Rejection region

Acceptance Region

Given that X-bar ~ N(0, 2/n), the corresponding snv will be:

Z=

X 0

2 /n

( X 0 ) n

This will be the test statistic.

Standard Normal Curve (contd.)


We need to find values of zL and zH that correspond to L and H. The normal probability tables give values of zL and zH corresponding to P(zL z zH) = 0.99 as: zL = -2.575 zH = +2.575

10

Criterion for Rejecting/Not rejecting the Null


The criterion for rejecting or not rejecting the null hyp. is then:
( x 0 ) n ( x 0 ) n

reject H 0 if

< 2.575 or if

+2.575

> +2.575

do not reject H 0 if 2.575

( x 0 ) n

Significance Level
The acceptance/rejection of H0 depends on the decision to consider as very different from 0 only those values of X-bar that would occur by chance with a probability of 0.01. That is, if we drew a very large number of samples from the population with mean 0, only 1% of the time would we get a value of X-bar that would lead to incorrect rejection of H0. This probability is called level of significance. Any other level of probability would do as well: e.g. 5%, which is also used quite often.

11

Two-tailed and One-tailed Tests


What we have done so far is a two-tailed test i.e. our HA was 0. Hence the rejection zone was located at either end (tail) of the std. normal curve. If our hypotheses were as: H0: 0 HA: > 0 the acceptance and rejection zone would be as on next slide:

One-tailed Test
Level of sig = 1% Level of sig = 5%

0.99

0.01

0.95

0.05

0
Acceptance Region

zH
Rejection region

zH
Rejection region

Acceptance Region

The critical value for zH can be found from the normal probability tables to be 2.327.

The critical value for zH can be found from the normal probability tables to be 1.645.

12

Errors in Hypothesis Testing


The criterion for rejecting / not rejecting the null on the basis of sample evidence does not guarantee arriving at a correct conclusion. Suppose as before we are testing a hyp. about a population mean. Two outcomes are possible: either the test statistic falls in the acceptance region or it does not.

Type I Error
Suppose the test statistic falls in the rejection region. The value of the test statistic is such that, if the null is true, the probability of this happening by chance would be very small e.g. 1% or 5%. That is, we would incorrectly reject the null hypothesis only 5% (or 1%) of the time. This is known as Type I Error. In a court of law where the null is that the accused is innocent, Type I Error would involve convicting an innocent person. The probability of committing Type I Error is given by the level of significance.

13

Type II Error
There is a second possibility: the test statistic falls in the acceptance zone. Now, we would not reject the null. But we cannot rule out the possibility that we may be accepting the null when it is, in fact, false. This is known as Type II Error. The parallel with a court trial would be letting a guilty person go free.

A Summary of Errors

True Situation Verdict Reject H0 Do not reject H0 H0 is true Type I Error Correct decision H0 is false Correct decision Type II Error

14

Some More on Errors


Suppose we have the following situation:

H0: = 0 HA: = A ,where A> 0 The 2 hyps. can be identified with 2 competing populations which have the same variance, but having different means. To carry out the test we need the boundary between the critical and the acceptance regions. This will depend on the level of sig. and the alternative hyp.

Since HA is that = A and A> 0, only high values

of X-bar relative to 0 which will constitute evidence against H0. The appropriate test will be a one-tailed test, with the rejection region concentrated at the right-hand tail of the distribution. (X ) n Z= The test statistic will be as before: If the level of sig. = 0.05, the boundary value of zH is 1.645. Hence the acceptance region will be: z 1.645
0

15

The relationship between the distribution of X (under the null) and the distribution of z is given below:

0.95

0.05

0.95

0.05

zH

xH x

We also need to consider the sampling distribution of X-bar for samples from the population with mean A.

Distribution 1

Distribution 2

xH

The probability of Type I Error is given by the area of the black region. Type II Error occurs when we accept H0 when in fact it is false i.e. when x-bar should belong to Distribution 2. Suppose x-bar falls to the left of x-barH in the orange region. In this case, the true mean is A, and x-bar should follow Distribution 2. Hence, the probability of Type II Error is given by the area of the orange region.

16

An important result of the previous slide is: by decreasing the probability of one type of error, we increase the probability of the error type of error. We can decrease P(Type I Error) by setting a very low level of significance i.e. by making the black region on the previous slide smaller by shifting the boundary point x-barH to the right. However, simultaneously the orange area will increase i.e. P(Type II Error) must rise. Shifting x-barH to the left will have the opposite effect: lower P(Type II Error) while raising P(Type I Error).

Nonparametric Test
The Mann-Whitney U-test is a non-parametric method which is used as an alternative to the two-sample Student's t-test. Usually this test is used to compare medians of non-normal distributions X and Y (the t-test is not applicable because X and Y are not normal). The test works correctly under the following conditions:
X and Y are continuous distributions (or discrete distributions wellapproximating continuous distributions) X and Y have the same shape. The only possible difference is their position (i.e. the value of the median) the number of elements in each sample is not less than 5 the samples are independent scale of measurement should be ordinal, interval or ratio (i.e. test could not be applied to nominal variables).

17

Ordinal Data
Resp. No. 1 2 3 4 5 6 7 8 9 10 Gender 1 2 2 2 1 2 2 2 2 1 Internet Usage 14 2 3 3 13 6 2 6 6 15 Resp. No. 11 12 13 14 15 16 17 18 19 20 Gender 2 2 1 1 1 2 1 1 1 2 Internet Usage 3 4 9 8 5 3 9 4 14 6 Resp. No. 21 22 23 24 25 26 27 28 29 30 Gender 1 1 2 1 2 1 2 2 1 1 Internet Usage 9 5 2 15 6 13 4 2 4 3

SPSS Output: Mann Whitney Test


Ranks usage gender 1.00 2.00 Total N 15 15 30 Mean Rank 20.93 10.07 Sum of Ranks 314.00 151.00

Test Statistics b Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) Exact Sig. [2*(1-tailed Sig.)] usage 31.000 151.000 -3.406 .001 .000
a

a. Not corrected for ties. b. Grouping Variable: gender

18

SPSS Output: t-test


Group Statistics gender 1.00 2.00 N 15 15 Mean 9.3333 3.8667 Std. Deviation 4.40238 1.68466 Std. Error Mean 1.13669 .43498

usage

Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper 2.97360 2.90983 7.95973 8.02350

F usage Equal variances assumed Equal variances not assumed 15.507

Sig. .000

t 4.492 4.492

df 28 18.014

Sig. (2-tailed) .000 .000

Mean Difference 5.46667 5.46667

Std. Error Difference 1.21707 1.21707

Of course, given the nature of the data, we should not use the t-test here.

Correlation

Attitude to Attitude to technology Resp. No. Internet

Attitude to Attitude to Attitude to Attitude to technology Resp. No. Internet technology Resp. No. Internet

1 2 3 4 5 6 7 8 9 10

7 3 4 7 7 5 4 5 6 7

6 3 3 5 7 4 5 4 4 6

11 12 13 14 15 16 17 18 19 20

4 6 6 3 5 4 5 5 6 6

3 4 5 2 4 3 3 4 6 4

21 22 23 24 25 26 27 28 29 30

4 5 4 6 5 6 5 3 5 7

2 4 2 6 3 6 5 2 3 5

Compute the correlation between Attitude to Internet and Attitude to Technology

19

Pearsons Correlation Coefficient


Correlations internetatt Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N internetatt 1 30 .809** .000 30 techatt .809** .000 30 1 30

techatt

**. Correlation is significant at the 0.01 level (2-tailed).

Even though the data are ordinal (ranked), we can compute the usual correlation coefficient. However, the correct measure is the rank correlation.

Spearmans (Rank) Correlation


Correlations Spearman's rho internetatt Correlation Coefficient Sig. (2-tailed) N Correlation Coefficient Sig. (2-tailed) N internetatt 1.000 . 30 .818** .000 30 techatt .818** .000 30 1.000 . 30

techatt

**. Correlation is significant at the 0.01 level (2-tailed).

The Spearman rank correlation coefficient is defined by:

where d is the difference in the ranks of the corresponding variables.

20

Some Commonly Used Statistical Tests


Normal theory based test t test for independent samples Paired t test Pearson correlation coefficient One way analysis of variance (F test) Two way analysis of variance Corresponding nonparametric test Mann-Whitney U test; Wilcoxon rank-sum test Wilcoxon matched pairs signed-rank test Spearman rank correlation coefficient Kruskal-Wallis analysis of variance by ranks Friedman Two way analysis of variance Purpose of test Compares two independent samples Examines a set of differences Assesses the linear association between two variables. Compares three or more groups Compares groups classified by two different factors

21

You might also like