0% found this document useful (0 votes)

16 views9 pages

Hands on With Probability and Statistical

The document provides hands-on examples of probability and statistical tests using Python, including coin tosses, dice rolls, t-tests, z-tests, and chi-square tests. It also covers confidence intervals, A/B testing, and visualizations of data distributions. Additionally, it explains fundamental concepts of probability, such as conditional probability and distributions, along with practical applications like Monte Carlo simulations.

Uploaded by

yellampalliharshavardhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views9 pages

Hands on With Probability and Statistical

Uploaded by

yellampalliharshavardhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Hands on With Probability and statistical

1. Probability Hands-On

a. Coin Toss Probability

import random

# Simulate 1000 coin tosses

results = [random.choice(['Heads', 'Tails']) for _ in range(1000)]

prob_heads = results.count('Heads') / 1000

prob_tails = results.count('Tails') / 1000

print(f"Probability of Heads: {prob_heads}")

print(f"Probability of Tails: {prob_tails}")

b. Dice Roll Probability

# Simulate 1000 rolls of a fair die

rolls = [random.randint(1, 6) for _ in range(1000)]

prob_of_rolling_3 = rolls.count(3) / 1000

print(f"Probability of rolling a 3: {prob_of_rolling_3}")

2. Statistical Tests Hands-On Dataset Example

Let’s use a dataset of heights of two groups (Group A and Group B) and perform various statistical
tests.

import numpy as np

import scipy.stats as stats

# Generate synthetic data for heights (in cm)

np.random.seed(42)

group_a = np.random.normal(loc=165, scale=10, size=30) # Mean=165, SD=10

group_b = np.random.normal(loc=170, scale=10, size=30) # Mean=170, SD=10

T-Test: Compare Means of Two Groups

A t-test determines if there’s a significant difference between the means of two groups.

Example: Independent T-Test

# Perform t-test
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"T-Statistic: {t_stat}")
print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Reject the null hypothesis: There is a significant difference between the groups.")
else:
print("Fail to reject the null hypothesis: No significant difference between the groups.")

Z-Test: Test for Proportions

A z-test is used for testing proportions, e.g., comparing conversion rates in A/B testing.
Example: Z-Test for Proportions

# Conversion data
n1, x1 = 200, 50 # Group A: 200 samples, 50 conversions
n2, x2 = 200, 70 # Group B: 200 samples, 70 conversions

# Calculate proportions
p1 = x1 / n1
p2 = x2 / n2

# Pooled proportion
p_pool = (x1 + x2) / (n1 + n2)
se = np.sqrt(p_pool * (1 - p_pool) * (1 / n1 + 1 / n2))

# Z-statistic and p-value

z_stat = (p1 - p2) / se
p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

print(f"Z-Statistic: {z_stat}")
print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Reject the null hypothesis: There is a significant difference between the proportions.")
else:
print("Fail to reject the null hypothesis: No significant difference between the proportions.")

Chi-Square Test: Test for Independence

The Chi-Square Test checks for independence between two categorical variables.

Example: Chi-Square Test

# Contingency table for survey data (Age Group vs. Preferred Product)

data = np.array([[50, 30], # Age < 30

[20, 80]]) # Age >= 30

chi2_stat, p_value, dof, expected = stats.chi2_contingency(data)

print(f"Chi-Square Statistic: {chi2_stat}")

print(f"P-Value: {p_value}")
print(f"Degrees of Freedom: {dof}")

print("Expected Frequencies:")

print(expected)

if p_value < 0.05:

print("Reject the null hypothesis: There is a significant association between the variables.")

else:

print("Fail to reject the null hypothesis: No significant association between the variables.")

Confidence Intervals

Confidence intervals estimate a range for population parameters like the mean.

Example: Confidence Interval for the Mean

# Calculate the 95% confidence interval for Group A's mean

mean_a = np.mean(group_a)

se_a = stats.sem(group_a) # Standard error of the mean

ci_lower, ci_upper = stats.t.interval(0.95, len(group_a)-1, loc=mean_a, scale=se_a)

print(f"95% Confidence Interval for Group A's Mean: ({ci_lower:.2f}, {ci_upper:.2f})")

A/B Testing Scenario

Let’s use the statistical tests to analyze A/B test results for a webpage:

# Conversion rates for A/B testing

a_conversions = [1, 0, 1, 1, 0, 1, 0, 1, 1, 0] # 10 users, 6 converted

b_conversions = [1, 1, 1, 0, 1, 1, 1, 1, 1, 1] # 10 users, 9 converted

# Perform t-test on conversion rates

t_stat, p_value = stats.ttest_ind(a_conversions, b_conversions)

print(f"T-Statistic: {t_stat}")

print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Version B performs significantly better than Version A.")

else:

print("No significant difference between the versions.")

Visualizing Results

Histograms

import matplotlib.pyplot as plt

plt.hist(group_a, alpha=0.5, label="Group A", bins=10)

plt.hist(group_b, alpha=0.5, label="Group B", bins=10)

plt.legend()

plt.title("Height Distributions of Two Groups")

plt.show()

Boxplots

import seaborn as sns

import pandas as pd

# Combine data into a single DataFrame

df = pd.DataFrame({'Height': np.concatenate([group_a, group_b]),

'Group': ['A']30 + ['B']30})

sns.boxplot(x='Group', y='Height', data=df)

plt.title("Comparison of Heights Across Groups")

plt.show()

***************************&&&&&&&&&&&&**************************************

1. Basics of Probability: Definition

The probability of an event is given by: P(E)= Number of favorable outcomes/ Total number of outcomes

Probability of Multiple Independent Events

The probability of two independent events AAA and BBB both occurring is:

P(A∩B)=P(A)×P(B)

Example: Rolling a 6 and Tossing Heads

# Probabilities of individual events

prob_heads = 0.5

prob_six = 1 / 6

# Combined probability

prob_heads_and_six = prob_heads * prob_six

print(f"Probability of rolling a 6 and getting Heads: {prob_heads_and_six:.2f}")

Conditional Probability

The probability of A given B is:

P(A∣B)=P(A∩B)/P(B)

Example: Cards in a Deck What is the probability of drawing a king, given it’s a face card?

 Total cards: 52

 Face cards (Jack, Queen, King): 12

 Kings: 4

# Probabilities

prob_face = 12 / 52

prob_king_and_face = 4 / 52

# Conditional probability

prob_king_given_face = prob_king_and_face / prob_face

print(f"Probability of drawing a king given it's a face card: {prob_king_given_face:.2f}")

Probability Distributions

Simulating a Uniform Distribution

The probability of all outcomes is equal (e.g., rolling a fair die)

import numpy as np

# Simulate 1000 rolls of a die

rolls = np.random.randint(1, 7, size=1000)

# Calculate probabilities

for i in range(1, 7):

prob = np.sum(rolls == i) / len(rolls)

print(f"Probability of rolling a {i}: {prob:.2f}")

Simulating a Normal Distribution

A normal distribution is common in data science (e.g., heights, weights).

import matplotlib.pyplot as plt

# Simulate a normal distribution

data = np.random.normal(loc=50, scale=10, size=1000) # Mean=50, SD=10

# Plot the distribution

plt.hist(data, bins=30, density=True, alpha=0.6, color='g')

plt.title("Normal Distribution")

plt.xlabel("Value")

plt.ylabel("Density")

plt.show()

# Calculate probabilities

mean = np.mean(data)

std_dev = np.std(data)

print(f"Mean: {mean:.2f}, Standard Deviation: {std_dev:.2f}")

Scipy for Probability Calculations

Cumulative Probability Example: Normal Distribution

from scipy.stats import norm

# Probability that a value is less than 60 (mean=50, sd=10)

prob = norm.cdf(60, loc=50, scale=10)

print(f"Probability of value < 60: {prob:.2f}")

Simulating Binomial Distribution

The binomial distribution models the number of successes in nnn trials.

Example: Tossing a Coin 10 Times

from scipy.stats import binom

# Probability of getting exactly 6 heads in 10 tosses

n, p = 10, 0.5 # 10 trials, probability of heads = 0.5

prob_6_heads = binom.pmf(6, n, p)

print(f"Probability of exactly 6 heads in 10 tosses: {prob_6_heads:.2f}")

Monte Carlo Simulation

Monte Carlo simulations are used to estimate probabilities via random sampling.

Example: Estimate Probability of Pi

# Monte Carlo Simulation to Estimate Pi

num_points = 10000

inside_circle = 0

for _ in range(num_points):
x, y = np.random.uniform(-1, 1, size=2)

if x2 + y2 <= 1:

inside_circle += 1

pi_estimate = (inside_circle / num_points) * 4

print(f"Estimated value of Pi: {pi_estimate:.4f}")

*********************************&&&&&&&&&&&&&&&&&&&&&&&********************

Conducting Hypothesis Tests with Scipy and Statsmodels

Hypothesis testing is fundamental in data science for making inferences about populations based on
sample data. Here’s how you can perform hypothesis tests using Scipy and Statsmodels.

1. Common Hypothesis Tests

1. T-Test: Compare the means of two groups.

2. Z-Test: Compare proportions or means for large sample sizes.

3. Chi-Square Test: Test independence between categorical variables.

2. Setup: Synthetic Data

import numpy as np

import pandas as pd

import scipy.stats as stats

import statsmodels.api as sm

from statsmodels.stats.weightstats import ztest

3. T-Tests with Scipy: Independent T-Test: Used to compare means of two independent groups.

Example: Comparing heights of two groups

# Generate synthetic data

np.random.seed(42)

group_a = np.random.normal(loc=165, scale=10, size=30) # Mean=165, SD=10

group_b = np.random.normal(loc=170, scale=10, size=30) # Mean=170, SD=10

# Perform independent t-test

t_stat, p_value = stats.ttest_ind(group_a, group_b)

print(f"T-Statistic: {t_stat}")

print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Reject the null hypothesis: Means are significantly different.")

else:

print("Fail to reject the null hypothesis: No significant difference between means.")

Paired T-Test

Used when comparing two related samples, such as pre-test and post-test scores.

Example: Before and After Scores

# Generate synthetic data

before = np.random.normal(loc=75, scale=5, size=30)

after = before + np.random.normal(loc=2, scale=2, size=30) # Improvement

# Perform paired t-test

t_stat, p_value = stats.ttest_rel(before, after)

print(f"T-Statistic: {t_stat}")

print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Reject the null hypothesis: There is a significant change.")

else:

print("Fail to reject the null hypothesis: No significant change.")

4. Z-Test with Statsmodels : One-Sample Z-Test : Used for testing if a sample mean is significantly
different from a population mean.

Example: Testing Sample Mean Against Population Mean

# Generate synthetic data

data = np.random.normal(loc=100, scale=15, size=50)

# Population mean

population_mean = 105

# Perform one-sample z-test

z_stat, p_value = ztest(data, value=population_mean)

print(f"Z-Statistic: {z_stat}")

print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Reject the null hypothesis: Sample mean is significantly different.")

else:
print("Fail to reject the null hypothesis: No significant difference.")

Two-Sample Z-Test : Used for comparing two sample means.

Example: Comparing Means of Two Groups

# Generate synthetic data

group_a = np.random.normal(loc=100, scale=10, size=50)

group_b = np.random.normal(loc=110, scale=10, size=50)

# Perform two-sample z-test

z_stat, p_value = ztest(group_a, group_b)

print(f"Z-Statistic: {z_stat}")

print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Reject the null hypothesis: Means are significantly different.")

else:

print("Fail to reject the null hypothesis: No significant difference between means.")

5. Chi-Square Test with Scipy The Chi-Square Test is used for testing independence between categorical
variables.

Example: Testing Independence

# Contingency table

data = np.array([[50, 30], [20, 80]]) # Rows: Age groups, Columns: Product preferences

# Perform chi-square test

chi2_stat, p_value, dof, expected = stats.chi2_contingency(data)

print(f"Chi-Square Statistic: {chi2_stat}")

print(f"P-Value: {p_value}")

print(f"Degrees of Freedom: {dof}")

print("Expected Frequencies:")

print(expected)

if p_value < 0.05:

print("Reject the null hypothesis: Variables are dependent.")

else:

print("Fail to reject the null hypothesis: Variables are independent.")

Brisky
No ratings yet
Brisky
2 pages
BN2102 1-6 Notes
No ratings yet
BN2102 1-6 Notes
38 pages
Business Research Methods
100% (2)
Business Research Methods
230 pages
Fresco
100% (2)
Fresco
17 pages
Intro Stats Formula Sheet
No ratings yet
Intro Stats Formula Sheet
5 pages
QAM-AmoreFrozenFood - Group8
No ratings yet
QAM-AmoreFrozenFood - Group8
14 pages
fha unit 2
No ratings yet
fha unit 2
17 pages
Intro To Essential Stats With Python
No ratings yet
Intro To Essential Stats With Python
51 pages
Statistics and Risk Modelling Using Python
No ratings yet
Statistics and Risk Modelling Using Python
99 pages
Stat 1124 Tables and Formulas (V. 202110)
No ratings yet
Stat 1124 Tables and Formulas (V. 202110)
7 pages
ML2_Math_Algo
No ratings yet
ML2_Math_Algo
72 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
ABD Formulas
No ratings yet
ABD Formulas
55 pages
ML UNIT-3
No ratings yet
ML UNIT-3
18 pages
6.2Hypothesis.docx
No ratings yet
6.2Hypothesis.docx
3 pages
Exercise2 Submission Group 12 Yalcin Mehmet
No ratings yet
Exercise2 Submission Group 12 Yalcin Mehmet
10 pages
2a EDA
No ratings yet
2a EDA
16 pages
Simulating Continuous and Non-Continuous Distributions
No ratings yet
Simulating Continuous and Non-Continuous Distributions
17 pages
TC2-Lab Manual
No ratings yet
TC2-Lab Manual
35 pages
datascince2
No ratings yet
datascince2
90 pages
Python Programs
No ratings yet
Python Programs
7 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
AP Stats Cheat Sheet FINAL
No ratings yet
AP Stats Cheat Sheet FINAL
8 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Lecture 8
No ratings yet
Lecture 8
76 pages
Formula
No ratings yet
Formula
7 pages
Formula PDF
No ratings yet
Formula PDF
7 pages
Important Formulas: Data Description Discrete Probability Distributions
No ratings yet
Important Formulas: Data Description Discrete Probability Distributions
7 pages
6.Lab Activity
No ratings yet
6.Lab Activity
23 pages
Statistics07_TwoSamplesHypothesisTest
No ratings yet
Statistics07_TwoSamplesHypothesisTest
45 pages
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
No ratings yet
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
10 pages
MECH 262 - Notes (Statistics)
No ratings yet
MECH 262 - Notes (Statistics)
7 pages
2.2 Probability
No ratings yet
2.2 Probability
19 pages
Lab 04 Hypothesis Testing
No ratings yet
Lab 04 Hypothesis Testing
9 pages
Lesson 2. Simple Comparative Experiments
No ratings yet
Lesson 2. Simple Comparative Experiments
8 pages
Statistics and Machine Learning
No ratings yet
Statistics and Machine Learning
51 pages
Assignment - Basics Statics Level 1
100% (2)
Assignment - Basics Statics Level 1
15 pages
Lecture Notes Statistics
100% (2)
Lecture Notes Statistics
117 pages
AD3411 - 6 To11
No ratings yet
AD3411 - 6 To11
15 pages
Con Dence: ECON 226 - J L. G
No ratings yet
Con Dence: ECON 226 - J L. G
54 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
8 pages
Chapter 03 MultiplexingA
No ratings yet
Chapter 03 MultiplexingA
12 pages
STATSCHEATSHeet
No ratings yet
STATSCHEATSHeet
5 pages
Probability and Statistics Ii: George Deligiannidis Module Lecturer 2020/21: Kalliopi Mylona
No ratings yet
Probability and Statistics Ii: George Deligiannidis Module Lecturer 2020/21: Kalliopi Mylona
107 pages
Psychology Statistics
No ratings yet
Psychology Statistics
26 pages
Staff Manual 06
No ratings yet
Staff Manual 06
3 pages
Important Formulas and Tables Statistics
No ratings yet
Important Formulas and Tables Statistics
7 pages
Presentation 3
No ratings yet
Presentation 3
29 pages
Practical 8 PDF
No ratings yet
Practical 8 PDF
3 pages
Lab 05 Presentation
No ratings yet
Lab 05 Presentation
107 pages
Lecture Note Sse2193
33% (3)
Lecture Note Sse2193
251 pages
cosc416
No ratings yet
cosc416
6 pages
1 Descriptive Statistics
No ratings yet
1 Descriptive Statistics
20 pages
101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann
No ratings yet
101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann
31 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Quant Developers' Tools and Techniques: Quant Books, #1
From Everand
Quant Developers' Tools and Techniques: Quant Books, #1
Manfred Hindering
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Excel Simulations
From Everand
Excel Simulations
Gerard M. Verschuuren
3.5/5 (2)
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
MAT2379 - Assignment #4 Solutions
No ratings yet
MAT2379 - Assignment #4 Solutions
3 pages
Module Code - 7BUS0244
No ratings yet
Module Code - 7BUS0244
45 pages
Training Methods and Owner Dog Interactions Links With Dog Behaviour and Learning Ability 2011 Applied Animal Behaviour Science
No ratings yet
Training Methods and Owner Dog Interactions Links With Dog Behaviour and Learning Ability 2011 Applied Animal Behaviour Science
9 pages
M Phil Clinical Psychology PDF
No ratings yet
M Phil Clinical Psychology PDF
41 pages
Effectiveness of Performance Appraisal Systems in Knowledge Organizations in Bangalore
No ratings yet
Effectiveness of Performance Appraisal Systems in Knowledge Organizations in Bangalore
75 pages
PS - 3rd Unit
No ratings yet
PS - 3rd Unit
53 pages
Ruisen PDF
No ratings yet
Ruisen PDF
38 pages
PGDM Syllabus
No ratings yet
PGDM Syllabus
64 pages
Labuyo
No ratings yet
Labuyo
19 pages
SPSS Guide Book
No ratings yet
SPSS Guide Book
305 pages
283 Pharmd PB
No ratings yet
283 Pharmd PB
59 pages
Normality Test N Homogeneity
No ratings yet
Normality Test N Homogeneity
2 pages
Effect Size, Calculating Cohen's D
No ratings yet
Effect Size, Calculating Cohen's D
6 pages
Illustration 2
No ratings yet
Illustration 2
1 page
Project 1 - Assignment: Cold Storage Case Study
No ratings yet
Project 1 - Assignment: Cold Storage Case Study
21 pages
Ayesha
No ratings yet
Ayesha
21 pages
Introduction To Econometrics (3 Updated Edition, Global Edition)
No ratings yet
Introduction To Econometrics (3 Updated Edition, Global Edition)
9 pages
Cfa L1 - 2024: Subjects
No ratings yet
Cfa L1 - 2024: Subjects
19 pages
MESERET
No ratings yet
MESERET
19 pages
QTBD-Term I
No ratings yet
QTBD-Term I
3 pages
数据分析习题
0% (1)
数据分析习题
12 pages
Copy of Assignment5_Fall 2024
No ratings yet
Copy of Assignment5_Fall 2024
14 pages
ReceivedPolicies 20240311 075726
No ratings yet
ReceivedPolicies 20240311 075726
2 pages
Course Title Biostatistics in Midwifery
100% (1)
Course Title Biostatistics in Midwifery
3 pages
Econ 1005 - Final July 2018
No ratings yet
Econ 1005 - Final July 2018
6 pages
Chi-Square Test Non Parametric - ppt07
No ratings yet
Chi-Square Test Non Parametric - ppt07
66 pages

Hands on With Probability and Statistical

Uploaded by

Hands on With Probability and Statistical

Uploaded by

Hands on With Probability and statistical

a. Coin Toss Probability

# Simulate 1000 coin tosses

results = [random.choice(['Heads', 'Tails']) for _ in range(1000)]

prob_heads = results.count('Heads') / 1000

prob_tails = results.count('Tails') / 1000

print(f"Probability of Heads: {prob_heads}")

print(f"Probability of Tails: {prob_tails}")

b. Dice Roll Probability

# Simulate 1000 rolls of a fair die

rolls = [random.randint(1, 6) for _ in range(1000)]

prob_of_rolling_3 = rolls.count(3) / 1000

print(f"Probability of rolling a 3: {prob_of_rolling_3}")

2. Statistical Tests Hands-On Dataset Example

import scipy.stats as stats

# Generate synthetic data for heights (in cm)

group_a = np.random.normal(loc=165, scale=10, size=30) # Mean=165, SD=10

group_b = np.random.normal(loc=170, scale=10, size=30) # Mean=170, SD=10

T-Test: Compare Means of Two Groups

Example: Independent T-Test

if p_value < 0.05:

Z-Test: Test for Proportions

# Z-statistic and p-value

if p_value < 0.05:

Chi-Square Test: Test for Independence

Example: Chi-Square Test

data = np.array([[50, 30], # Age < 30

[20, 80]]) # Age >= 30

chi2_stat, p_value, dof, expected = stats.chi2_contingency(data)

print(f"Chi-Square Statistic: {chi2_stat}")

if p_value < 0.05:

Example: Confidence Interval for the Mean

# Calculate the 95% confidence interval for Group A's mean

se_a = stats.sem(group_a) # Standard error of the mean

ci_lower, ci_upper = stats.t.interval(0.95, len(group_a)-1, loc=mean_a, scale=se_a)

print(f"95% Confidence Interval for Group A's Mean: ({ci_lower:.2f}, {ci_upper:.2f})")

A/B Testing Scenario

# Conversion rates for A/B testing

a_conversions = [1, 0, 1, 1, 0, 1, 0, 1, 1, 0] # 10 users, 6 converted

b_conversions = [1, 1, 1, 0, 1, 1, 1, 1, 1, 1] # 10 users, 9 converted

# Perform t-test on conversion rates

t_stat, p_value = stats.ttest_ind(a_conversions, b_conversions)

if p_value < 0.05:

print("Version B performs significantly better than Version A.")

print("No significant difference between the versions.")

import matplotlib.pyplot as plt

plt.hist(group_a, alpha=0.5, label="Group A", bins=10)

plt.hist(group_b, alpha=0.5, label="Group B", bins=10)

plt.title("Height Distributions of Two Groups")

import seaborn as sns

# Combine data into a single DataFrame

df = pd.DataFrame({'Height': np.concatenate([group_a, group_b]),

'Group': ['A']*30 + ['B']*30})

sns.boxplot(x='Group', y='Height', data=df)

plt.title("Comparison of Heights Across Groups")

1. Basics of Probability: Definition

Probability of Multiple Independent Events

Example: Rolling a 6 and Tossing Heads

# Probabilities of individual events

prob_heads_and_six = prob_heads * prob_six

print(f"Probability of rolling a 6 and getting Heads: {prob_heads_and_six:.2f}")

The probability of A given B is:

 Face cards (Jack, Queen, King): 12

prob_king_given_face = prob_king_and_face / prob_face

print(f"Probability of drawing a king given it's a face card: {prob_king_given_face:.2f}")

Simulating a Uniform Distribution

The probability of all outcomes is equal (e.g., rolling a fair die)

# Simulate 1000 rolls of a die

rolls = np.random.randint(1, 7, size=1000)

for i in range(1, 7):

prob = np.sum(rolls == i) / len(rolls)

print(f"Probability of rolling a {i}: {prob:.2f}")

Simulating a Normal Distribution

A normal distribution is common in data science (e.g., heights, weights).

import matplotlib.pyplot as plt

# Simulate a normal distribution

'Group': ['A']30 + ['B']30})

if x2 + y2 <= 1: