0 ratings 0% found this document useful (0 votes) 100 views 11 pages Chi Square Lesson
The Chi-square test is a statistical method used to assess the association between categorical variables by comparing observed frequencies with expected frequencies. It can be applied in one-variable tests (goodness-of-fit) and two-variable tests (test of independence), requiring that expected frequencies are above 5 and that categories are independent. The document outlines the process for conducting the test, including formulating hypotheses, calculating the Chi-square statistic, and making decisions based on critical values.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Chi-square-Lesson For Later CHI-SQUARE TEST
The Chi-square (X?) test is similar to tests of correlation
jn that it measures the strength of associations between
variables. The Chi-square test can be used to test associations
in one or more groups and it does this by comparing actual
(observed) numbers in each group, with those that would
be expected according to theory or simply by chance. The
Chi-square test requires that the data be expressed as
frequencies, i.e. numbers in each category; this is nominal
level of measurement. It should be noted that in most cases
almost any data can be reduced to categorical or frequency
data, but it is not always wise to do this because information
is invariably lost in the process.
For example, the weight (interval measurement) of
individual members of a group may be different for each
member of that group, but the individuals could be assigned
to one of two categories (over weight and under weight), by
use of a suitable cut off point in the data. The data would
then be categorical in that you would have the numbers of
people in the category “over weight”, and the numbers in the
category “under weight”; but in doing this the researcher has
lost a lot of information about the weight of individuals in
the group.
To be reliable the Chi Square statistic also requires
that the expected frequencies in each category shouldnot fall below 5 - this can cause problems when sample size
is relatively small.
Finally, the different categories used must be independent
of each other. This means that it must not be possible for
data to fall into more than one category.
For example if the effectiveness of two different
treatments was being compared and some of the patients
were actually receiving both treatments, then Chi-square
could not be used for the analysis.
We will now consider a widely used non-parametric test,
Chi-square, which we can use with data at the nominal level
that is data that is classificatory.
For example, we know the frequency with which entering
freshman computer science students, when required
to purchase a computer for their personal use, select
Macintosh Computers, IBM Computers, or Some other
brand of computer. We want to know if there is a difference
among the frequencies with which these three brands
of computers are selected or if they choose basically
equally among the three brands. This is a problem we can
use the chi-square statistic for.
The chi-square statistic is used to compare the observed
frequency of some observation (such as frequency of buying
different brands of computers) with an expected JSrequency
(such as buying equal numbers of each brand of computer).
The comparison of observed and expected frequencies is
used to calculate the value of the chi-square statistic, which
in turn can be compared with the distribution of chi-square
to make an inference about a statistical problem.The symbol for chi-square and the formula are as follows:
= ) (0-E)? |
E
O is the Observed frequency
E is the Expected frequency
Where:
The degrees of freedom (df) for the one-dimensional chi-
square statistic is:
df=C-1
Where:
© is the number of categories or levels of the
independent variable.
‘ARIABLE CHI-SQUARE (GOODNESS-OF-FIT TEST,
ON!
We can use the Chi-square statistic to test the distribution
of measures over levels of a variable to indicate if the
distribution of measures is the same for all levels. This is the
first use of the one-variable chi-square test. This test is also
referred to as the goodness-of-fit test.
Using the example we already mentioned of the frequency
with which entering freshman, when required to purchase
a computer for college use, select Macintosh Computers,
IBM Computers, or Some other brand of computer. We want
to know if there is a significant difference among the
frequencies with which these three brands of computersare selected or if the students select equally among the
three brands.
The data for 100 students is recorded in the table below
(the observed frequencies).
We have also indicated the expected frequency for each
category. Since there are 100 measures or observations and
there are three categories (Macintosh, IBM, and Other) we
would indicate the expected frequency for each category to
be 100/3 or 33.333. In the third column of the table we have
calculated the square of the observed frequency minus the
expected frequency divided by the expected frequency. The
sum of the third column would be the value of the chi-square
statistic.
Observed Expected 0-E)?/E
een eaes Frequency Frequency ( yu
IBM 47 33.333 5.604
Macintosh 36 | ——-33.333 0.213
Other 17 33.333 8.003
Total ‘
(chi-square) 100 13.820
xX? = Y (O-E)?
| E
|
X? = 5.604+0.213 + 8.003 = 13.820
df=C-1=3-1=2
We can compare the obtained value of chi-square with the
critical value for the .05 level and with degrees of freedom of
2 obtained from Appendix Table (Distribution of Chi Square)
Looking under the column for .0S and the row for df = 2 we
see that the critical value for chi-square is 5.991.APPLICATION OF THE STATISTICAL TEST
We now have the information we nced to complete the
six step process for testing statistical hypotheses for our
research problem,
], State the null hypothesis and the alternative
nypothesis based on your research question.
Ho:O=E
H,:0+¢E
Note: Our null hypothesis, for the chi-square test, states
that there are no differences between the observed and the
expected frequencies. The alternate hypothesis states that
there are significant differences between the observed and
expected frequencies.
2. Set the alpha level. Y
a =.05
Note: As usual we will set our alpha level at .05, we have
5 chances in 100 of making a type I error.
3. Calculate the value of the appropriate statistic.
Also indicate the degrees of freedom for the statistical
test if necgssary-
X? = 13.820
df=C-1=2
4. Write the decision rule for rejecting the null
hypothesis.
Reject H, if X’ >= 5.991.
Note: To write the decision rule we had to know the
critical value for chi-square, with an alpha level of .05, and
2 degrees of freedom. We can do this by looking at AppendixTable and noting the tabled value for the column for the .05
level and the row for 2 df.
5. Write a summary statement based on the decision,
Reject H,, p < .05
Note: Since our calculated value of X? ( 7 3.820) is greater
than 5.991, we reject the null hypothesis and accept the
alternative hypothesis.
6. Write a statement of results
There is a significant difference among the frequencies
with which students purchased three different brands of
computers.
TWO-VARIABLE CHI-SQUARE
[TEST OF INDEPENDENCE)
Now let us consider the case of the two-variable chi-
square test, also known as the test of independence.
For example we may wish to know if there is a significant
difference in the frequencies with which males come
from small, medium, or large cities as contrasted with
females.
The two variables we are considering here are hometown
size (small, medium, or large) and gender (male or female).
Another way of putting our research question is: Is gender
independent of size of hometown?
The data for 30 females and 6 males is in the following
table.[Frequency wit which Males and Females come from
mall, Medium, and Large cities
Large (L) | Total
— z
| Female 6 30
| 1 6 |
“ 7
Where:
O is the Observed frequency, and
E is the Expected frequency.
The degrees of freedom (df) for the two-dimensional chi-
square statistic is:
=(C-1)(R-1)
Where:
C is the number of columns or levels of the first
variable
R is the number of rows or levels of the seconded
variable.
In the table above we have the observed frequencies (six
of them). Now we must calculate the expected frequency for
each of the six cells. For two-variable chi-square we find the
expected frequencies with the formula:
Expected Frequency for a Cell = (Column Total X Row
Total/Grand TotalIn the table above we can see that the Column Totals are
14 (small), 15 (medium), and 7 (large), while the Row Totals
are 30 (female) and 6 (male). The grand total is 36.
Using the formula we can thus find the expecteq
Frequency ( E) for each cell.
1. E for the S female cell is 14X30/36 = 11.667
2. E for the M female cell is 15X30/36 = 12.500
3. E for the L female cell is 7X30/36 = 5.833
4. E for the § male cell is 14X6/36 = 2.333
5. E for the M male cell is 15X6/36 = 2.500
6. E for the L male cell is 7X6/36 = 1.167
We can put these expected frequencies in our table and
also include the values for (O - EP/E. The sum of all these will
of course be the value of chi-square.
Small (S) Medium (M) Large (L) Total
o| © |(Ev/E|o| E | (0-EP/E| 0 | E | (0-E?/E
Female | 10/ 11.66 | 0.238 | 14| 12.50] 0.180 | 6 |5.83| 0.005 | 30
Male | 4 | 2.333 | 1.191 | 2 | 250] 0.900 | 2 [1.16] 0.024 6
Total | 14 15 7 36
x = Y (O-E)?
X?= 0.238 + 0.180 + 0.005 + 1.191 + 0.900 + 0.024 =
2.538
df = (C - 1)(R - 1) = (3 - 1)(2 - 1) = (aap = 2APPLICATION OF THE STATISTICAL. TEST
We now have the information we need to complete the
step process for testing statistical hypotheses for our
six
research problem
1, State the null hypothesis and the alternative
hypothesis based on your research question
Ho:O=E
H,:O#E
2. Set the alpha level.
a =.05
Calculate the value of the appropriate statistic. Also
indicate the degrees of freedom for the statistical test if
necessary.
X = 2.538
= (C- 1(R- 1) = (2)(1) = 2
3. Write the decision rule for rejecting the null
hypothesis.
Reject H, if X? >= 5.991.
Note: To write the decision rule we had to know the
critical value for chi-square, with an alpha level of .05, and
2 degrees of freedom. We can do this by looking at Appendix
Table and noting the tabled value for the column for the .05
level and the row for 2 df.
4. Write a summary statement based on the decision.
Fail to reject H,Note: Since our calculated value of XX? (2.538) is not
greater than 5.991, we fail to reject the null hypothesis and
are unable to accept the alternative hypothesis.
5. Write a statement of results
There is not a significant difference in the frequencies
with which males come from small, medium, a large towns
as compared with females. Hometown size 18 not independent
of gender.Critical Chi-Square-Values Table
df\area .050 | .025 :010
1 3.84146 5.02389 6.63490
2 5.99146 7.37776 9.21034
3 7.81473 9.34840 11.34487
4 9.48773 11.14329 13.27670
5 11.07050 12.83250 15.08627
6 12.59159 14.44938 16.81189
a 14.06714 16.01276 18.47531
8 15.50731 17,.53455 20.09024
9 16.91898 19.02277 21.66599
10 18.30704 20.48318 23.20925
Li 19.67514 21.92005 24.72497
12 21.02607 23.33666 26.21697
13 22.36203 24.73560 27.68825
14 23.68479 26.11895 29.14124
15 24.99579 27.48839 30.57791
16 26.29623 28.84535 31.99993
17 27 .58711 30.19101 33.40866
18 28.86930 31.52638 34.80531
19 30.14353 32.85233 36.19087
20 31.41043 34.16961 37.56623
21 32.67057 |35.47888 38.93217
22 33.92444 36.78071 40.28936
23 35.17246 38.07563 41.63840
24 36.41503 39.36408 42.97982
25 37.65248 40.64647 44.31410
26 38.88514 41.92317 45.64168
27 [40.1 1327 43.19451 46.96294
28 41.33714 44.46079 48.27824
29 42.55697 45.72229 49.58788
30 43.77297 46.97924 |50.89218