[go: up one dir, main page]

0% found this document useful (0 votes)
2 views10 pages

Statistical Tests

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 10

Statistical Tests

Statistical tests are a way of mathematically determining whether two sets of data are
significantly different from each other. To do this, statistical tests use several statistical
measures, such as the mean, standard deviation, and coefficient of variation. Once the
statistical measures are calculated, the statistical test will then compare them to a set of
predetermined criteria. If the data meet the criteria, the statistical test will conclude that there
is a significant difference between the two sets of data.

There are various statistical tests that can be used, depending on the type of data being
analysed. However, some of the most common statistical tests are t-tests, chi-squared tests,
and ANOVA tests.

Types of Statistical Tests

When working with statistical data, several tools can be used to analyze the
information.

1. Parametric Statistical Tests


Parametric statistical tests have precise requirements compared with non-
parametric tests. Also, they make a strong inference from the data. Furthermore,
they can only be conducted with data that adhere to common assumptions of
statistical tests. Some common types of parametric tests are regression tests,
comparison tests, and correlation tests.

1.1. Regression Tests

Regression tests determine cause-and-effect relationships. They can be used to


estimate the effect of one or more continuous variables on another variable.

 Simple linear regression is a type of test that describes the relationship


between a dependent and an independent variable using a straight line. This
test determines the relationship between two quantitative variables.

1
 Multiple linear regression measures the relationship between a quantitative
dependent variable and two or more independent variables, again using a
straight line.
 Logistic regression predicts and classifies the research problem. Logistic
regression helps identify data anomalies, which could be predictive fraud.

1.2. Comparison Tests

Comparison tests determine the differences among the group means. They can be
used to test the effect of a categorical variable on the mean value of other
characteristics.

 T-test

One of the most common statistical tests is the t-test, which is used to compare the
means of two groups (e.g. the average heights of men and women). You can use
the t-test when you are not aware of the population parameters (mean and standard
deviation).

 Paired T-test

It tests the difference between two variables from the same population (pre-and
post-test scores). For example, measuring the performance score of the trainee
before and after the completion of the training program.

 Independent T-test

The independent t-test is also called the two-sample t-test. It is a statistical test that
determines whether there is a statistically significant difference between the means
in two unrelated groups. For example, comparing cancer patients and pregnant
women in a population.

 One Sample T-test

In this test, the mean of a single group is compared with the given mean. For
example, determining the increase and decrease in sales in the given average sales.

2
 ANOVA

ANOVA (Analysis of Variance) analyzes the difference between the means of


more than two groups. One-way ANOVAs determine how one factor impacts
another, whereas two-way analyses compare samples with different variables. It
determines the impact of one or more factors by comparing the means of different
samples.

 MANOVA

MANOVA, which stands for Multivariate Analysis of Variance, provides


regression analysis and analysis of variance for multiple dependent variables by
one or more factor variables or covariates. Also, it examines the statistical
difference between one continuous dependent variable and an independent
grouping variable.

 Z-test

It is a statistical test that determines whether two population means are different,
provided the variances are known and the sample size is large.

1.3. Correlation Tests

Correlation tests check if the variables are related without hypothesizing a cause-
and-effect relationship. These tests can be used to check if the two variables you
want to use in a multiple regression test are correlated.

 Pearson Correlation Coefficient

It is a common way of measuring the linear correlation. The coefficient is a


number between -1 and 1 and determines the strength and direction of the
relationship between two variables. The change in one variable changes the course
of another variable change in the same direction.

2. Non-parametric Statistical Tests

3
Non-parametric tests do not make as many assumptions about the data compared
to parametric tests. They are useful when one or more of the common statistical
assumptions are violated. However, these inferences are not as accurate as with
parametric tests.

 Chi-square test

The chi-square test compares two categorical variables. Furthermore, calculating


the chi-square statistic value and comparing it with a critical value from the chi-
square distribution allows you to assess whether the observed frequency is
significantly different from the expected frequency.

7 Essential Ways to Choose the Right Statistical Test

1. Research Question
The decision for a statistical test depends on the research question that needs to be
answered. Additionally, the research questions will help you formulate the data
structure and research design.

2. Formulation of Null Hypothesis

After defining the research question, you could develop a null hypothesis. A null
hypothesis suggests that no statistical significance exists in the expected
observations.

3. Level of Significance in Study Protocol

Before performing the study protocol, a level of significance is specified. The level
of significance determines the statistical importance, which defines the acceptance
or rejection of the null hypothesis.

4. The Decision Between One-tailed and Two-tailed

You must decide if your study should be a one-tailed or two-tailed test. If you have
clear evidence where the statistics are leading in one direction, you must perform

4
one-tailed tests. However, if there is no particular direction of the expected
difference, you must perform a two-tailed test.

5. The Number of Variables to Be Analyzed

Statistical tests and procedures are divided according to the number of variables
that are designed to analyze. Therefore, while choosing the test , you must consider
how many variables you want to analyze.

6. Type of Data

It is important to define whether your data is continuous, categorical, or binary. In


the case of continuous data, you must also check if the data are normally
distributed or skewed, to further define which statistical test to consider.

7. Paired and Unpaired Study Designs

A paired design includes comparison studies where the two population means are
compared when the two samples depend on each other. In an unpaired or
independent study design, the results of the two samples are grouped and then
compared.

Now that you know the seven steps for choosing a statistical test, you are on your
way to finding the right test for your research question. Each situation is unique; it
is important to understand all of your options and make an informed decision.

Remember to always consult with your principal investigator or statistician, or


software, if you are unsure which test to choose.

Contingency Table

In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of


table in a matrix format that displays the (multivariate) frequency distribution of the
variables. They are heavily used in survey research, business intelligence, engineering, and
scientific research.
A contingency table, sometimes called a two-way frequency table, is a tabular mechanism
with at least two rows and two columns used in statistics to present categorical data in terms
of frequency counts.

5
 Construct and interpret contingency tables.

A contingency table provides a way of displaying data that can facilitate calculating
probabilities. The table can be used to describe the sample space of an experiment.
Contingency tables allow us to break down a sample pace when two variables are involved.

Speeding violation No speeding violation Tot


in the last year in the last year al

Cell phone 30
25 280
user 5

Not a cell 45
45 405
phone user 0

75
Total 70 685
5

When reading a contingency table:

 The left-side column lists all of the values for one of the
variables. In the table shown above, the left-side column
shows the variable about whether or not someone uses a
cell phone while driving.
 The top row lists all of the values for the other variable. In
the table shown above, the top row shows the variable about
whether or not someone had a speeding violation in the last
year.
 In the body of the table, the cells contain the number of
outcomes that fall into both of the categories corresponding
to the intersecting row and column. In the table shown
above, the number of 25 at the intersection of the “cell
phone user” row and “speeding violation in the last year”
column tells us that there are 25 people who have both of
these characteristics.
 The bottom row gives the totals in each column. In the table
shown above, the number 685 in the bottom of the “no
speeding violation in the last year” tells us that there are
685 people who did not have a speeding violation in the last
year.
 The right-side column gives the totals in each row. In the
table shown above, the number 305 in the right side of the
“cell phone user” row tells us that there are 305 people who
use cell phones while driving.
 The number in the bottom right corner is the size of the
sample space. In the table shown above, the number in the

6
bottom right corner is 755, which tells us that there 755
people in the sample space.

Example 01

Suppose a study of speeding violations and drivers who use cell


phones while driving produced the following fictional data:

Speeding No speeding
Tot
violation in the violation in the last
al
last year year

Cell phone
25 280 305
user

Not a cell
45 405 450
phone user

Total 70 685 755

Calculate the following probabilities:

1. What is the probability that a randomly selected person is a


cell phone user?
2. What is the probability that a randomly selected person
had no speeding violations in the last year?
3. What is the probability that a randomly selected person
had a speeding violation in the last year and does not use a
cell phone?
4. What is the probability that a randomly selected person
uses a cell phone and had no speeding violations in the last
year?

Solution:

1. Probability=number of cell phone users total number in


study=305755Probability=number of cell phone users total
number in study=305755
2. Probability=number of no violations total number in
study=685755Probability=number of no violations total
number in study=685755
3. Probability=number of violations and not cell phone users
total number in study=45755Probability=number of

7
violations and not cell phone users total number in
study=45755
4. Probability=number of cell phone users and no violations
total number in study=280755

Example 02IT

This table shows the number of athletes who stretch before exercising and
how many had injuries within the past year.

Injury in No injury in
Total
last year last year

Stretches 55 295 350

Does not
231 219 450
stretch

Total 286 514 800

1. What is the probability that a randomly selected athlete stretches


before exercising?
2. What is the probability that a randomly selected athlete had an
injury in the last year?
3. What is the probability that a randomly selected athlete does not
stretch before exercising and had no injuries in the last year?
4. What is the probability that a randomly selected athlete stretches
before exercising and had no injuries in the last year?
5. Probability=350800=0.4375Probability=350800=0.4375
6. Probability=286800=0.3575Probability=286800=0.3575
7. Probability=219800=0.27375Probability=219800=0.27375
8. Probability=295800=0.36875

Example 03
The table below shows a random sample of 100 hikers broken
down by gender and the areas of hiking they prefer.

On
Gend The Near Lakes
Mountain Total
er Coastline and Streams
Peaks

Femal
18 16 45
e

Male 14 55

Total 41

8
1. Fill in the missing values in the table
2. What is the probability that a randomly selected hiker is
female?
3. What is the probability that a randomly selected hiker
prefers to hike on the coast?
4. What is the probability that a randomly selected hiker is
male and prefers to hike near lakes and streams?
5. What is the probability that a randomly selected hiker is
female and prefers to hike on mountains?

Solution:

1.

Near Lakes On
The
Gender and Mountain Total
Coastline
Streams Peaks

Female 18 16 11 45

Male 16 25 14 55

Total 34 41 25 100

2. Probability=45100=0.45Probability=45100=0.45
3. Probability=34100=0.34Probability=34100=0.34
4. Probability=25100=0.25Probability=25100=0.25
5. Probability=11100=0.11

Example 04:
The table below relates the weights and heights of a group of
individuals participating in an observational study.

Weight/ Tal Mediu Sho Total


Height l m rt s

Obese 18 28 14

Normal 20 51 28

Underweigh
12 25 9
t

Totals

1. Find the total for each row and column.

9
2. Find the probability that a randomly chosen individual from
this group is tall.
3. Find the probability that a randomly chosen individual from
this group is normal.
4. Find the probability that a randomly chosen individual from
this group is obese and short.
5. Find the probability that a randomly chosen individual from
this group is underweight and medium.
6.

Weight/
Tall Medium Short Totals
Height

Obese 18 28 14 60

Normal 20 51 28 99

Underweigh
12 25 9 46
t

Totals 50 104 51 205


7. Probability=50205Probability=50205
8. Probability=99205Probability=99205
9. Probability=14205Probability=14205
10. Probability=25205

(https://ecampusontario.pressbooks.pub/introstats/chapter/3-3-
contingency-tables/)

10

You might also like