Statistical Tests
Statistical Tests
Statistical Tests
Statistical tests are a way of mathematically determining whether two sets of data are
significantly different from each other. To do this, statistical tests use several statistical
measures, such as the mean, standard deviation, and coefficient of variation. Once the
statistical measures are calculated, the statistical test will then compare them to a set of
predetermined criteria. If the data meet the criteria, the statistical test will conclude that there
is a significant difference between the two sets of data.
There are various statistical tests that can be used, depending on the type of data being
analysed. However, some of the most common statistical tests are t-tests, chi-squared tests,
and ANOVA tests.
When working with statistical data, several tools can be used to analyze the
information.
1
Multiple linear regression measures the relationship between a quantitative
dependent variable and two or more independent variables, again using a
straight line.
Logistic regression predicts and classifies the research problem. Logistic
regression helps identify data anomalies, which could be predictive fraud.
Comparison tests determine the differences among the group means. They can be
used to test the effect of a categorical variable on the mean value of other
characteristics.
T-test
One of the most common statistical tests is the t-test, which is used to compare the
means of two groups (e.g. the average heights of men and women). You can use
the t-test when you are not aware of the population parameters (mean and standard
deviation).
Paired T-test
It tests the difference between two variables from the same population (pre-and
post-test scores). For example, measuring the performance score of the trainee
before and after the completion of the training program.
Independent T-test
The independent t-test is also called the two-sample t-test. It is a statistical test that
determines whether there is a statistically significant difference between the means
in two unrelated groups. For example, comparing cancer patients and pregnant
women in a population.
In this test, the mean of a single group is compared with the given mean. For
example, determining the increase and decrease in sales in the given average sales.
2
ANOVA
MANOVA
Z-test
It is a statistical test that determines whether two population means are different,
provided the variances are known and the sample size is large.
Correlation tests check if the variables are related without hypothesizing a cause-
and-effect relationship. These tests can be used to check if the two variables you
want to use in a multiple regression test are correlated.
3
Non-parametric tests do not make as many assumptions about the data compared
to parametric tests. They are useful when one or more of the common statistical
assumptions are violated. However, these inferences are not as accurate as with
parametric tests.
Chi-square test
1. Research Question
The decision for a statistical test depends on the research question that needs to be
answered. Additionally, the research questions will help you formulate the data
structure and research design.
After defining the research question, you could develop a null hypothesis. A null
hypothesis suggests that no statistical significance exists in the expected
observations.
Before performing the study protocol, a level of significance is specified. The level
of significance determines the statistical importance, which defines the acceptance
or rejection of the null hypothesis.
You must decide if your study should be a one-tailed or two-tailed test. If you have
clear evidence where the statistics are leading in one direction, you must perform
4
one-tailed tests. However, if there is no particular direction of the expected
difference, you must perform a two-tailed test.
Statistical tests and procedures are divided according to the number of variables
that are designed to analyze. Therefore, while choosing the test , you must consider
how many variables you want to analyze.
6. Type of Data
A paired design includes comparison studies where the two population means are
compared when the two samples depend on each other. In an unpaired or
independent study design, the results of the two samples are grouped and then
compared.
Now that you know the seven steps for choosing a statistical test, you are on your
way to finding the right test for your research question. Each situation is unique; it
is important to understand all of your options and make an informed decision.
Contingency Table
5
Construct and interpret contingency tables.
A contingency table provides a way of displaying data that can facilitate calculating
probabilities. The table can be used to describe the sample space of an experiment.
Contingency tables allow us to break down a sample pace when two variables are involved.
Cell phone 30
25 280
user 5
Not a cell 45
45 405
phone user 0
75
Total 70 685
5
The left-side column lists all of the values for one of the
variables. In the table shown above, the left-side column
shows the variable about whether or not someone uses a
cell phone while driving.
The top row lists all of the values for the other variable. In
the table shown above, the top row shows the variable about
whether or not someone had a speeding violation in the last
year.
In the body of the table, the cells contain the number of
outcomes that fall into both of the categories corresponding
to the intersecting row and column. In the table shown
above, the number of 25 at the intersection of the “cell
phone user” row and “speeding violation in the last year”
column tells us that there are 25 people who have both of
these characteristics.
The bottom row gives the totals in each column. In the table
shown above, the number 685 in the bottom of the “no
speeding violation in the last year” tells us that there are
685 people who did not have a speeding violation in the last
year.
The right-side column gives the totals in each row. In the
table shown above, the number 305 in the right side of the
“cell phone user” row tells us that there are 305 people who
use cell phones while driving.
The number in the bottom right corner is the size of the
sample space. In the table shown above, the number in the
6
bottom right corner is 755, which tells us that there 755
people in the sample space.
Example 01
Speeding No speeding
Tot
violation in the violation in the last
al
last year year
Cell phone
25 280 305
user
Not a cell
45 405 450
phone user
Solution:
7
violations and not cell phone users total number in
study=45755
4. Probability=number of cell phone users and no violations
total number in study=280755
Example 02IT
This table shows the number of athletes who stretch before exercising and
how many had injuries within the past year.
Injury in No injury in
Total
last year last year
Does not
231 219 450
stretch
Example 03
The table below shows a random sample of 100 hikers broken
down by gender and the areas of hiking they prefer.
On
Gend The Near Lakes
Mountain Total
er Coastline and Streams
Peaks
Femal
18 16 45
e
Male 14 55
Total 41
8
1. Fill in the missing values in the table
2. What is the probability that a randomly selected hiker is
female?
3. What is the probability that a randomly selected hiker
prefers to hike on the coast?
4. What is the probability that a randomly selected hiker is
male and prefers to hike near lakes and streams?
5. What is the probability that a randomly selected hiker is
female and prefers to hike on mountains?
Solution:
1.
Near Lakes On
The
Gender and Mountain Total
Coastline
Streams Peaks
Female 18 16 11 45
Male 16 25 14 55
Total 34 41 25 100
2. Probability=45100=0.45Probability=45100=0.45
3. Probability=34100=0.34Probability=34100=0.34
4. Probability=25100=0.25Probability=25100=0.25
5. Probability=11100=0.11
Example 04:
The table below relates the weights and heights of a group of
individuals participating in an observational study.
Obese 18 28 14
Normal 20 51 28
Underweigh
12 25 9
t
Totals
9
2. Find the probability that a randomly chosen individual from
this group is tall.
3. Find the probability that a randomly chosen individual from
this group is normal.
4. Find the probability that a randomly chosen individual from
this group is obese and short.
5. Find the probability that a randomly chosen individual from
this group is underweight and medium.
6.
Weight/
Tall Medium Short Totals
Height
Obese 18 28 14 60
Normal 20 51 28 99
Underweigh
12 25 9 46
t
(https://ecampusontario.pressbooks.pub/introstats/chapter/3-3-
contingency-tables/)
10