Statistical Tests

Statistical Tests
Statistical tests are a way of mathematically determining whether two sets of data are
significantly different from each other. To do this, statistical tests use several statistical
measures, such as the mean, standard deviation, and coefficient of variation. Once the
statistical measures are calculated, the statistical test will then compare them to a set of
predetermined criteria. If the data meet the criteria, the statistical test will conclude that there
is a significant difference between the two sets of data.
There are various statistical tests that can be used, depending on the type of data being
analysed. However, some of the most common statistical tests are t-tests, chi-squared tests,
and ANOVA tests.
Types of Statistical Tests
When working with statistical data, several tools can be used to analyze the
information.
1. Parametric Statistical Tests

Parametric statistical tests have precise requirements compared with non-
parametric tests. Also, they make a strong inference from the data. Furthermore,
they can only be conducted with data that adhere to common assumptions of
statistical tests. Some common types of parametric tests are regression tests,
comparison tests, and correlation tests.
1.1. Regression Tests
Regression tests determine cause-and-effect relationships. They can be used to

estimate the effect of one or more continuous variables on another variable.
 Simple linear regression is a type of test that describes the relationship

between a dependent and an independent variable using a straight line. This
test determines the relationship between two quantitative variables.
1
 Multiple linear regression measures the relationship between a quantitative
dependent variable and two or more independent variables, again using a
straight line.
 Logistic regression predicts and classifies the research problem. Logistic
regression helps identify data anomalies, which could be predictive fraud.
1.2. Comparison Tests
Comparison tests determine the differences among the group means. They can be
used to test the effect of a categorical variable on the mean value of other
characteristics.
 T-test
One of the most common statistical tests is the t-test, which is used to compare the
means of two groups (e.g. the average heights of men and women). You can use
the t-test when you are not aware of the population parameters (mean and standard
deviation).
 Paired T-test
It tests the difference between two variables from the same population (pre-and
post-test scores). For example, measuring the performance score of the trainee
before and after the completion of the training program.
 Independent T-test
The independent t-test is also called the two-sample t-test. It is a statistical test that
determines whether there is a statistically significant difference between the means
in two unrelated groups. For example, comparing cancer patients and pregnant
women in a population.
 One Sample T-test
In this test, the mean of a single group is compared with the given mean. For
example, determining the increase and decrease in sales in the given average sales.
2
 ANOVA
ANOVA (Analysis of Variance) analyzes the difference between the means of

more than two groups. One-way ANOVAs determine how one factor impacts
another, whereas two-way analyses compare samples with different variables. It
determines the impact of one or more factors by comparing the means of different
samples.
 MANOVA
MANOVA, which stands for Multivariate Analysis of Variance, provides

regression analysis and analysis of variance for multiple dependent variables by
one or more factor variables or covariates. Also, it examines the statistical
difference between one continuous dependent variable and an independent
grouping variable.
 Z-test
It is a statistical test that determines whether two population means are different,
provided the variances are known and the sample size is large.
1.3. Correlation Tests
Correlation tests check if the variables are related without hypothesizing a cause-
and-effect relationship. These tests can be used to check if the two variables you
want to use in a multiple regression test are correlated.
 Pearson Correlation Coefficient
It is a common way of measuring the linear correlation. The coefficient is a

number between -1 and 1 and determines the strength and direction of the
relationship between two variables. The change in one variable changes the course
of another variable change in the same direction.
2. Non-parametric Statistical Tests
3
Non-parametric tests do not make as many assumptions about the data compared
to parametric tests. They are useful when one or more of the common statistical
assumptions are violated. However, these inferences are not as accurate as with
parametric tests.
 Chi-square test
The chi-square test compares two categorical variables. Furthermore, calculating

the chi-square statistic value and comparing it with a critical value from the chi-
square distribution allows you to assess whether the observed frequency is
significantly different from the expected frequency.
7 Essential Ways to Choose the Right Statistical Test
1. Research Question
The decision for a statistical test depends on the research question that needs to be
answered. Additionally, the research questions will help you formulate the data
structure and research design.
2. Formulation of Null Hypothesis
After defining the research question, you could develop a null hypothesis. A null
hypothesis suggests that no statistical significance exists in the expected
observations.
3. Level of Significance in Study Protocol
Before performing the study protocol, a level of significance is specified. The level
of significance determines the statistical importance, which defines the acceptance
or rejection of the null hypothesis.
4. The Decision Between One-tailed and Two-tailed
You must decide if your study should be a one-tailed or two-tailed test. If you have
clear evidence where the statistics are leading in one direction, you must perform
4
one-tailed tests. However, if there is no particular direction of the expected
difference, you must perform a two-tailed test.
5. The Number of Variables to Be Analyzed
Statistical tests and procedures are divided according to the number of variables
that are designed to analyze. Therefore, while choosing the test , you must consider
how many variables you want to analyze.
6. Type of Data
It is important to define whether your data is continuous, categorical, or binary. In

the case of continuous data, you must also check if the data are normally
distributed or skewed, to further define which statistical test to consider.
7. Paired and Unpaired Study Designs
A paired design includes comparison studies where the two population means are
compared when the two samples depend on each other. In an unpaired or
independent study design, the results of the two samples are grouped and then
compared.
Now that you know the seven steps for choosing a statistical test, you are on your
way to finding the right test for your research question. Each situation is unique; it
is important to understand all of your options and make an informed decision.
Remember to always consult with your principal investigator or statistician, or

software, if you are unsure which test to choose.
Contingency Table
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of

table in a matrix format that displays the (multivariate) frequency distribution of the
variables. They are heavily used in survey research, business intelligence, engineering, and
scientific research.
A contingency table, sometimes called a two-way frequency table, is a tabular mechanism
with at least two rows and two columns used in statistics to present categorical data in terms
of frequency counts.
5
 Construct and interpret contingency tables.
A contingency table provides a way of displaying data that can facilitate calculating
probabilities. The table can be used to describe the sample space of an experiment.
Contingency tables allow us to break down a sample pace when two variables are involved.
Speeding violation No speeding violation Tot

in the last year in the last year al
Cell phone 30
25 280
user 5
Not a cell 45
45 405
phone user 0
75
Total 70 685
5
When reading a contingency table:
 The left-side column lists all of the values for one of the
variables. In the table shown above, the left-side column
shows the variable about whether or not someone uses a
cell phone while driving.
 The top row lists all of the values for the other variable. In
the table shown above, the top row shows the variable about
whether or not someone had a speeding violation in the last
year.
 In the body of the table, the cells contain the number of
outcomes that fall into both of the categories corresponding
to the intersecting row and column. In the table shown
above, the number of 25 at the intersection of the “cell
phone user” row and “speeding violation in the last year”
column tells us that there are 25 people who have both of
these characteristics.
 The bottom row gives the totals in each column. In the table
shown above, the number 685 in the bottom of the “no
speeding violation in the last year” tells us that there are
685 people who did not have a speeding violation in the last
year.
 The right-side column gives the totals in each row. In the
table shown above, the number 305 in the right side of the
“cell phone user” row tells us that there are 305 people who
use cell phones while driving.
 The number in the bottom right corner is the size of the
sample space. In the table shown above, the number in the
6
bottom right corner is 755, which tells us that there 755
people in the sample space.
Example 01
Suppose a study of speeding violations and drivers who use cell

phones while driving produced the following fictional data:
Speeding No speeding
Tot
violation in the violation in the last
al
last year year
Cell phone
25 280 305
user
Not a cell
45 405 450
phone user
Total 70 685 755
Calculate the following probabilities:
1. What is the probability that a randomly selected person is a

cell phone user?
2. What is the probability that a randomly selected person
had no speeding violations in the last year?
had a speeding violation in the last year and does not use a
cell phone?
uses a cell phone and had no speeding violations in the last
year?
Solution:
1. Probability=number of cell phone users total number in

study=305755Probability=number of cell phone users total
number in study=305755
2. Probability=number of no violations total number in
study=685755Probability=number of no violations total
number in study=685755
3. Probability=number of violations and not cell phone users
total number in study=45755Probability=number of
7
violations and not cell phone users total number in
study=45755
4. Probability=number of cell phone users and no violations
total number in study=280755
Example 02IT
This table shows the number of athletes who stretch before exercising and
how many had injuries within the past year.
Injury in No injury in
Total
last year last year
Stretches 55 295 350
Does not
231 219 450
stretch
Total 286 514 800
1. What is the probability that a randomly selected athlete stretches

before exercising?
2. What is the probability that a randomly selected athlete had an
injury in the last year?
3. What is the probability that a randomly selected athlete does not
stretch before exercising and had no injuries in the last year?
4. What is the probability that a randomly selected athlete stretches
before exercising and had no injuries in the last year?
5. Probability=350800=0.4375Probability=350800=0.4375
8. Probability=295800=0.36875
Example 03
The table below shows a random sample of 100 hikers broken
down by gender and the areas of hiking they prefer.
On
Gend The Near Lakes
Mountain Total
er Coastline and Streams
Peaks
Femal
18 16 45
e
Male 14 55
Total 41
8
1. Fill in the missing values in the table
2. What is the probability that a randomly selected hiker is
female?
3. What is the probability that a randomly selected hiker
prefers to hike on the coast?
male and prefers to hike near lakes and streams?
female and prefers to hike on mountains?
Solution:
1.
Near Lakes On
The
Gender and Mountain Total
Coastline
Streams Peaks
Female 18 16 11 45
Male 16 25 14 55
Total 34 41 25 100
5. Probability=11100=0.11
Example 04:
The table below relates the weights and heights of a group of
individuals participating in an observational study.
Weight/ Tal Mediu Sho Total

Height l m rt s
Obese 18 28 14
Normal 20 51 28
Underweigh
12 25 9
t
Totals
1. Find the total for each row and column.
9
2. Find the probability that a randomly chosen individual from
this group is tall.
this group is normal.
this group is obese and short.
this group is underweight and medium.
6.
Weight/
Tall Medium Short Totals
Height
Obese 18 28 14 60
Normal 20 51 28 99
Underweigh
12 25 9 46
t
Totals 50 104 51 205

7. Probability=50205Probability=50205
10. Probability=25205
(https://ecampusontario.pressbooks.pub/introstats/chapter/3-3-
contingency-tables/)
10

Statistical Tests

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Statistical Tests

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Tests

Uploaded by

Copyright:

Available Formats

Statistical Tests

Types of Statistical Tests

1. Parametric Statistical Tests

1.1. Regression Tests

Regression tests determine cause-and-effect relationships. They can be used to

 Simple linear regression is a type of test that describes the relationship

1.2. Comparison Tests

 One Sample T-test

ANOVA (Analysis of Variance) analyzes the difference between the means of

MANOVA, which stands for Multivariate Analysis of Variance, provides

1.3. Correlation Tests

 Pearson Correlation Coefficient

It is a common way of measuring the linear correlation. The coefficient is a

2. Non-parametric Statistical Tests

The chi-square test compares two categorical variables. Furthermore, calculating

7 Essential Ways to Choose the Right Statistical Test

2. Formulation of Null Hypothesis

3. Level of Significance in Study Protocol

4. The Decision Between One-tailed and Two-tailed

5. The Number of Variables to Be Analyzed

It is important to define whether your data is continuous, categorical, or binary. In

7. Paired and Unpaired Study Designs

Remember to always consult with your principal investigator or statistician, or

In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of

Speeding violation No speeding violation Tot

When reading a contingency table:

Suppose a study of speeding violations and drivers who use cell

Total 70 685 755

Calculate the following probabilities:

1. What is the probability that a randomly selected person is a

1. Probability=number of cell phone users total number in

Stretches 55 295 350

Total 286 514 800

1. What is the probability that a randomly selected athlete stretches

Weight/ Tal Mediu Sho Total

1. Find the total for each row and column.

Totals 50 104 51 205

You might also like