Hypothesis Testing
Hypothesis Testing
Links:
https://www.w3schools.com/statistics/statistics_hypothesis_testing.php
https://www.geeksforgeeks.org/understanding-hypothesis-testing/
https://www.geeksforgeeks.org/data-science-for-beginners/?
utm_source=geeksforgeeks&utm_medium=gfgcontent_shm&utm_campaign=shm
https://www.geeksforgeeks.org/machine-learning/?ref=shm
https://www.geeksforgeeks.org/hypothesis/
https://www.colorado.edu/amath/sites/default/files/attached-files/lesson9_hyptests.pdf
https://www.cse.iitk.ac.in/users/nsrivast/HCC/lec07-09.pdf
https://www.analyticsvidhya.com/blog/2021/07/hypothesis-testing-made-easy-for-the-data-science-
beginners/
https://www.simplilearn.com/tutorials/statistics-tutorial/hypothesis-testing-in-statistics
https://www.cuemath.com/data/hypothesis-testing/
https://www.vedantu.com/maths/hypothesis-testing
https://www.analyticsvidhya.com/blog/2021/09/hypothesis-testing-in-machine-learning-everything-
you-need-to-know/
https://www.ncl.ac.uk/webtemplate/ask-assets/external/maths-resources/statistics/hypothesis-
testing/one-tailed-and-two-tailed-tests.html
https://www.w3schools.com/statistics/statistics_hypothesis_testing.php
Hypothesis Testing
A method to decide whether there's enough evidence in a sample of data to infer that a certain
condition holds true for the entire population.
Alternative Hypothesis (H₁ or Hₐ): The claim you're testing for (effect or difference exists).
One-tailed test: You check for effect in one direction (greater than / less than).
Two-tailed test: You check for any difference (either greater or smaller).
📌 Example:
5. Conclusion
📊 3. Key Tests and When to Use Them
ANOVA Compare more than 2 means F-distribution Test 3+ groups' mean equality
Great question! Understanding Type I and Type II errors is crucial for mastering hypothesis testing
and statistical thinking in machine learning and beyond. Let’s break it down simply and visually:
🎯 The Setup:
You think you found an effect, but it was just random noise.
📌 Example:
You say a new drug works, but in reality, it doesn’t.
(You're fooled by random chance.)
📌 Example:
You say the drug doesn’t work, but actually, it does.
(You missed a real signal.)
🎯 Visual Intuition:
|------------------------|
|------------------------|
Critical Region
|------------------------|
|------------------------|
💡 Memory Tip:
Type I (False Positive) = "Cried wolf" 🐺 when there was none.
Type II (False Negative) = "Missed the wolf" 🐺 when it was actually there.
Would you like a quick Python simulation to visualize this with actual data?
Z-table gives you area (probability) under the curve to the left of the z-value.
📌 Z-test Formula:
In simple terms, degrees of freedom (df) refer to the number of independent values that can
vary in a calculation. For example, in a t-distribution:
Degrees of freedom = n−1n - 1n−1 for a one-sample t-test (where nnn is the sample size).
The t-distribution becomes wider and has fatter tails compared to the normal distribution.
This means there's more uncertainty (or variability) in your estimate of the population
parameter (like the mean).
📈 Visual idea:
With large df (i.e., large sample sizes), the t-distribution starts to look like the standard
normal (z) distribution.
With small df, the t-distribution has heavier tails, meaning there's a higher probability of
getting values far from the mean.
Critical values (like the t-score you need to reject the null hypothesis) are higher for smaller
samples.
So, with a smaller sample, you need a larger observed effect to claim significance at the
same confidence level.
🧮 7. Chi-Square Test
Two types:
O = Observed
E = Expected
Z-table:
t-table:
Z-test example
✅ Z-Test Walkthrough
🎯 Use When:
A company claims that their light bulbs last 1000 hours on average. A random sample of 36 bulbs has
a mean life of 980 hours. The population standard deviation is 60 hours. At a 0.05 significance level,
test whether the company's claim is valid.
🔢 Step-by-Step Solution
🔚 Conclusion: There is enough evidence to doubt the company’s claim. The average bulb life is likely
not 1000 hours.
Excellent question! Let's clarify the difference between the value you get from the Z-table, the p-
value, and the critical value, because they are related but not the same.
🔹 The cumulative probability (area under the curve to the left of a z-score).
This value is not directly the p-value or the critical value—but it helps you calculate both.
✅ Definitions
Term What It Is How It's Used
Z-table Area to the left of a z-score (i.e., cumulative Used to find p-values and critical
value probability) values
Critical Cutoff z (or t) value at a specific significance level Used to draw rejection regions on
value (e.g., α = 0.05) the distribution
This is not the p-value, but it's half of it (since it’s a two-tailed test)
So:
If you set:
±1.96 → because area to the left of z = 1.96 is 0.975, and to the right is 0.025
So:
🧠 Summary:
p-value Find area from z → double if two-tailed z = -2.0 → area = 0.0228 → p = 0.0456
Critical value Find z at α (e.g., 0.025 per tail) α = 0.05 → critical z = ±1.96
The green area in the center is where we fail to reject the null hypothesis.
Since the blue line lies in the red zone, we reject H₀—our result is statistically significant.
Would you like to go through a similar visual explanation for the t-test next?
Great question! The value ±1.96 comes from the Z-table and is used as the critical z-value for a two-
tailed hypothesis test at a significance level α = 0.05.
🎯 We want to find:
The z-value such that 2.5% is in each tail (for a two-tailed test), meaning:
Corresponds to z ≈ 1.96
So:
zcritical=±1.96\boxed{z_{critical} = ±1.96}
🎨 Visualization
If you're doing a two-tailed test at α = 0.05, the distribution looks like this:
-1.96 1.96
🧠 Summary
Awesome! Let’s break down the types of Z-tests — there are mainly three common types,
depending on what you're comparing:
✅ 1. One-Sample Z-Test
🔍 Use When:
📦 Example:
A company claims the average height of their employees is 170 cm. You take a sample and test if it’s
significantly different from 170 cm.
✅ 2. Two-Sample Z-Test
🔍 Use When:
📦 Example:
Compare average test scores of students from School A vs School B to see if there's a significant
difference.
https://www.w3schools.com/statistics/statistics_hypothesis_testing_proportion.php
🔍 Use When:
🎯 Types:
📦 Example:
Test if 60% of voters support a candidate (claim), and your sample shows 53% do.
🧠 Summary Table
Population σ
Z-Test Type Used For Sample Size
Known?
Z-test for Comparing proportions (%, Large (np > 5, n(1-p) >
N/A
Proportions ratios) 5)
Would you like to try an example of one of these (you can pick which one), or shall we move to t-
tests next?
The difference between qualitative and quantitative data lies in the type of information they
represent and how they are measured:
Examples:
o Eye color
o Type of car
Types:
o Ordinal: Has a meaningful order, but differences between values aren't measurable
(e.g., satisfaction levels: low, medium, high)
Examples:
o Number of students
o Income
Types:
o Continuous: Can take any value within a range (e.g., temperature, time)
Quick Comparison:
Would you like to practice identifying data types or dive deeper into examples like in a dataset?
Perfect! Let’s walk through examples of left-tailed, right-tailed, and two-tailed tests for each type of
Z-test so it’s super clear and organized. 🔍✨
🔹 1. One-Sample Z-Test
✅ Scenario: A company claims the average weight of its product is 500g. Population standard
deviation is known (σ = 10g).
H₀: μ = 500
📦 Example: A customer group thinks the company is cheating and delivering less. You test if the
mean weight is significantly lower.
H₀: μ = 500
📦 Example: A nutritionist believes the company might be adding more than needed, and you test if
it’s significantly higher.
H₀: μ = 500
H₁: μ ≠ 500
📦 Example: A regulator audits whether the weight is different from 500g, in either direction.
✅ Scenario: Compare average exam scores of students from School A and School B. Population
standard deviations for both schools are known.
H₀: μ₁ = μ₂
H₁: μ₁ < μ₂
H₀: μ₁ = μ₂
H₁: μ₁ > μ₂
📦 Example: You believe a new teaching method in School A makes scores higher.
H₀: μ₁ = μ₂
H₁: μ₁ ≠ μ₂
✅ Scenario: A company claims 60% of customers are satisfied. You survey 100 customers.
H₀: p = 0.60
H₀: p = 0.60
H₀: p = 0.60
H₁: p ≠ 0.60
📦 Example: A researcher checks whether the claim is statistically off, in either direction.
🎁 Summary Table
Z-Test Type Left-Tailed (H₁: <) Right-Tailed (H₁: >) Two-Tailed (H₁: ≠)
Awesome! Let's work through one example with full calculation for a Z-test for proportion — this
will include hypothesis setup, Z-score calculation, critical value, p-value, and conclusion. ✅
📌 Problem:
A company claims that at least 60% of customers are satisfied with their product. You believe it’s
higher. You take a sample of 100 customers, and 68 of them are satisfied.
From Z-table:
📌 Step 5: Decision
Since z = 1.633 < 1.645, we fail to reject H₀
✅ Conclusion:
There is not enough evidence at the 5% level to support the claim that more than 60% of customers
are satisfied.
Would you like to try this same example using Python/plotting, or shall we move to t-tests next
(types, examples like we did for Z-tests)?
Great! Let’s break down each Z-test scenario (One-Sample, Two-Sample, Proportion) and show how
to compute for left-tailed, right-tailed, and two-tailed cases — including formulas, step-by-step
calculation, and conclusions.
🔹 1. One-Sample Z-Test
Used when population standard deviation (σ) is known and we want to test the mean (μ).
✅ Given:
✅ Given:
T TEST
Awesome! Let’s now walk through t-tests, just like we did with Z-tests — covering:
📊 Formulas
🔹 1. One-Sample t-Test
Use when population standard deviation (σ) is unknown and sample size is small (n < 30), or
population is not perfectly known.
✅ Given:
Comparing means from two unrelated groups, variances assumed equal or unequal.
✅ 0.896 < 1.729, fail to reject
🔹 3. Paired t-Test
✅ Example:
o n=5
👉 Conclusion: Training has a significant effect.
Would you like me to continue with F-test, ANOVA, and chi-square tests next?
Awesome! Let's continue the learning journey. Here's a quick plan for the next set of topics:
🔸 F-Test
Distribution: F-distribution
Formula:
.
Output: F-statistic
Used for:
Ftest
Great! Let’s break down the F-test step-by-step, along with its types, when to use, formulas, and
examples.
The F-test is a statistical test used to compare two population variances or used as part of other
tests (like ANOVA).
It uses the F-distribution, which is a ratio of two chi-square distributions.
🧠 When to Use F-Test?
3. F-test in Regression
F distribution:
https://statisticsbyjim.com/hypothesis-testing/f-table/
Sure! Let's go through a One-Tailed F-test(right tailed) example step by step — including the
scenario, hypotheses, calculations, and decision-making.
🔹 Scenario:
You are a quality control analyst comparing the consistency (variability) of two machines producing
metal rods.
You want to test if Machine A produces rods with greater variance (less consistency) than
Machine B.
There is enough evidence at α = 0.05 to conclude that Machine A has significantly greater variance
than Machine B.
🔹 Scenario:
You are testing if a new machine (Machine A) is more consistent (i.e., has less variance) than the old
one (Machine B).
You want to test if the variance of A is less than B — which is a left-tailed F-test.
df1=n1−1=11df_1 = n_1 - 1 = 11
df2=n2−1=9df_2 = n_2 - 1 = 9
Since:
0.5>0.316⇒Do not reject H00.5 > 0.316 \Rightarrow \text{Do not reject } H_0
🧾 Conclusion:
At the 0.05 level of significance, we do not have enough evidence to conclude that Machine A has
significantly less variance than Machine B.
Would you like to try a Two-Tailed F-test next or shall we move on to ANOVA?
Great question! When the exact degrees of freedom (df₁ = 11, df₂ = 9, etc.) are not listed in the F-
table, we have a few options to find the critical value:
alpha = 0.05
dfn = 11 # numerator
dfd = 9 # denominator
critical_value = f.ppf(alpha, dfn, dfd)
print(critical_value)
If you're using printed F-tables, you can interpolate between the two closest degrees of freedom.
Example:
You want F0.05(11,9)F_{0.05}(11, 9), but only 10 and 12 are in the table.
1. Find:
o F0.05(10,9)F_{0.05}(10, 9)
o F0.05(12,9)F_{0.05}(12, 9)
🔹 This gives a rough estimate, not exact, but is acceptable for manual testing.
This method tends to increase the critical value, which makes the test slightly more conservative
(i.e., less likely to wrongly reject the null).
Would you like me to calculate the exact critical value for df₁ = 11, df₂ = 9 using Python and show it?
https://saylordotorg.github.io/text_introductory-statistics/s15-03-f-tests-for-equality-of-two-va.html
https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Mostly_Harmless_Statistics_
%28Webb%29/09%3A_Hypothesis_Tests_and_Confidence_Intervals_for_Two_Populations/
9.04%3A_Two_Variance_or_Standard_Deviation_F-Test
Awesome! Let’s walk through a full example of a Two-Tailed F-Test — step by step.
🔷 🎯 Goal:
To test whether two populations have different variances (could be higher or lower), using sample
data.
🔹 Scenario:
You want to check if two different packaging machines (A and B) have significantly different
variances in the weight of packets they produce.
✅ Conclusion:
At the 5% significance level, we do not have enough evidence to say that the variances of Machines
A and B are different.
Would you like to see this visualized on an F-distribution plot with both tails marked?
Great follow-up! Let's talk about how to get upper and lower critical values for a two-tailed F-test
using only an F-table, especially when you don't have software like Python or Excel.
🎯 Scenario
You are testing whether two variances are different using a two-tailed F-test at α=0.05\alpha = 0.05.
You have:
df1=15df_1 = 15 (numerator)
df2=20df_2 = 20 (denominator)
🔹 F-tables only give you the right-tail critical values, like F0.05F_{0.05} or F0.01F_{0.01}.
Upper tail uses: α/2=0.025\alpha/2 = 0.025 → F0.025(15,20)F_{0.025}(15, 20) from the right
tail
If your F-table does not have 0.025, you can interpolate between 0.05 and 0.01 or use an extended
F-table that includes 0.025.
Since F-distribution is not symmetric, we invert the upper-tail critical value for swapped degrees of
freedom:
Then:
✅ Final Result
If your computed F-value falls outside this range (i.e., < 0.345 or > 2.57), then you reject H0H_0.
Would you like me to show an interpolation example for a missing F0.025F_{0.025} value?
Awesome! Let's dive into ANOVA (Analysis of Variance) — a powerful statistical method used when
you want to compare three or more group means.
🔷 What is ANOVA?
ANOVA helps determine whether the differences between group means are statistically significant.
🔍 Purpose: Test if at least one group mean is different from the others.
Here:
🧠 Assumptions
✅ Decision Rule
Great! Let’s dive into the Chi-Square (χ²) Test, which is used for categorical data rather than
numerical data (unlike Z, t, and F tests).
🔍 Step-by-Step
📊 Scenario Recap: Checking if a die is fair, i.e., all outcomes are equally likely.
The die is fair, meaning each face (1–6) has an equal probability of 1/6.
Or in general:
The die is not fair, i.e., at least one outcome appears more or less often than expected.
https://www.scribbr.com/statistics/chi-square-distribution-table/
Awesome question! Understanding how Z-test, t-test, F-test, ANOVA, and Chi-Square fit into
machine learning (ML) and data science gives you superpowers for feature selection, data
validation, and model building.
📌 Example Scenarios:
You're testing if a numerical feature has significantly different means across two groups (e.g.,
churned vs. not churned).
🔹 When to Use:
t-test Sample sizes are small or std dev is unknown (common in ML)
⚙️ML Integration:
Binary classification: Use t-test to check if features differ between the two classes.
Feature selection: Keep only features with significant differences between groups.
📌 Example:
You want to see if the average purchase amount differs across 3 customer segments (A, B, C).
⚙️ML Integration:
Multiclass classification: ANOVA F-test helps check if features differ significantly across
classes.
📌 Example:
You want to check if a categorical feature (like "Gender") is related to the target variable
(like "Buy or Not").
⚙️ML Integration:
Classification: Test if input features are independent from the output.
Degrees of freedom refer to the maximum number of logically independent values, which may vary
in a data sample. Degrees of freedom are calculated by subtracting one from the number of items
within the data sample.
Degrees of freedom are the maximum number of logically independent values, which may vary in a
data sample. Suppose we have two choices of shirt to wear at a party then the degree of freedom is
one, now suppose we have to again go to the party and we can not repeat the shirt then the choice
of shirt we are left with is One then in this case the degree of freedom is zero as we do not have any
choice to choose on the last day.
hown below. In the image, N is total number of options available to us then degree of freedom is N –
1.
https://www.geeksforgeeks.org/degrees-of-freedom/