[go: up one dir, main page]

0% found this document useful (0 votes)
16 views9 pages

Hands on With Probability and Statistical

The document provides hands-on examples of probability and statistical tests using Python, including coin tosses, dice rolls, t-tests, z-tests, and chi-square tests. It also covers confidence intervals, A/B testing, and visualizations of data distributions. Additionally, it explains fundamental concepts of probability, such as conditional probability and distributions, along with practical applications like Monte Carlo simulations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

Hands on With Probability and Statistical

The document provides hands-on examples of probability and statistical tests using Python, including coin tosses, dice rolls, t-tests, z-tests, and chi-square tests. It also covers confidence intervals, A/B testing, and visualizations of data distributions. Additionally, it explains fundamental concepts of probability, such as conditional probability and distributions, along with practical applications like Monte Carlo simulations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Hands on With Probability and statistical

1. Probability Hands-On

a. Coin Toss Probability

import random

# Simulate 1000 coin tosses

results = [random.choice(['Heads', 'Tails']) for _ in range(1000)]

prob_heads = results.count('Heads') / 1000

prob_tails = results.count('Tails') / 1000

print(f"Probability of Heads: {prob_heads}")

print(f"Probability of Tails: {prob_tails}")

b. Dice Roll Probability

# Simulate 1000 rolls of a fair die

rolls = [random.randint(1, 6) for _ in range(1000)]

prob_of_rolling_3 = rolls.count(3) / 1000

print(f"Probability of rolling a 3: {prob_of_rolling_3}")

2. Statistical Tests Hands-On Dataset Example

Let’s use a dataset of heights of two groups (Group A and Group B) and perform various statistical
tests.

import numpy as np

import scipy.stats as stats

# Generate synthetic data for heights (in cm)

np.random.seed(42)

group_a = np.random.normal(loc=165, scale=10, size=30) # Mean=165, SD=10

group_b = np.random.normal(loc=170, scale=10, size=30) # Mean=170, SD=10

T-Test: Compare Means of Two Groups

A t-test determines if there’s a significant difference between the means of two groups.

Example: Independent T-Test

# Perform t-test
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"T-Statistic: {t_stat}")
print(f"P-Value: {p_value}")

if p_value < 0.05:


print("Reject the null hypothesis: There is a significant difference between the groups.")
else:
print("Fail to reject the null hypothesis: No significant difference between the groups.")

Z-Test: Test for Proportions


A z-test is used for testing proportions, e.g., comparing conversion rates in A/B testing.
Example: Z-Test for Proportions

# Conversion data
n1, x1 = 200, 50 # Group A: 200 samples, 50 conversions
n2, x2 = 200, 70 # Group B: 200 samples, 70 conversions

# Calculate proportions
p1 = x1 / n1
p2 = x2 / n2

# Pooled proportion
p_pool = (x1 + x2) / (n1 + n2)
se = np.sqrt(p_pool * (1 - p_pool) * (1 / n1 + 1 / n2))

# Z-statistic and p-value


z_stat = (p1 - p2) / se
p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

print(f"Z-Statistic: {z_stat}")
print(f"P-Value: {p_value}")

if p_value < 0.05:


print("Reject the null hypothesis: There is a significant difference between the proportions.")
else:
print("Fail to reject the null hypothesis: No significant difference between the proportions.")

Chi-Square Test: Test for Independence

The Chi-Square Test checks for independence between two categorical variables.

Example: Chi-Square Test

# Contingency table for survey data (Age Group vs. Preferred Product)

data = np.array([[50, 30], # Age < 30

[20, 80]]) # Age >= 30

chi2_stat, p_value, dof, expected = stats.chi2_contingency(data)

print(f"Chi-Square Statistic: {chi2_stat}")

print(f"P-Value: {p_value}")
print(f"Degrees of Freedom: {dof}")

print("Expected Frequencies:")

print(expected)

if p_value < 0.05:

print("Reject the null hypothesis: There is a significant association between the variables.")

else:

print("Fail to reject the null hypothesis: No significant association between the variables.")

Confidence Intervals

Confidence intervals estimate a range for population parameters like the mean.

Example: Confidence Interval for the Mean

# Calculate the 95% confidence interval for Group A's mean

mean_a = np.mean(group_a)

se_a = stats.sem(group_a) # Standard error of the mean

ci_lower, ci_upper = stats.t.interval(0.95, len(group_a)-1, loc=mean_a, scale=se_a)

print(f"95% Confidence Interval for Group A's Mean: ({ci_lower:.2f}, {ci_upper:.2f})")

A/B Testing Scenario

Let’s use the statistical tests to analyze A/B test results for a webpage:

# Conversion rates for A/B testing

a_conversions = [1, 0, 1, 1, 0, 1, 0, 1, 1, 0] # 10 users, 6 converted

b_conversions = [1, 1, 1, 0, 1, 1, 1, 1, 1, 1] # 10 users, 9 converted

# Perform t-test on conversion rates

t_stat, p_value = stats.ttest_ind(a_conversions, b_conversions)

print(f"T-Statistic: {t_stat}")

print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Version B performs significantly better than Version A.")

else:

print("No significant difference between the versions.")


Visualizing Results

Histograms

import matplotlib.pyplot as plt

plt.hist(group_a, alpha=0.5, label="Group A", bins=10)

plt.hist(group_b, alpha=0.5, label="Group B", bins=10)

plt.legend()

plt.title("Height Distributions of Two Groups")

plt.show()

Boxplots

import seaborn as sns

import pandas as pd

# Combine data into a single DataFrame

df = pd.DataFrame({'Height': np.concatenate([group_a, group_b]),

'Group': ['A']*30 + ['B']*30})

sns.boxplot(x='Group', y='Height', data=df)

plt.title("Comparison of Heights Across Groups")

plt.show()

***************************&&&&&&&&&&&&**************************************

1. Basics of Probability: Definition

The probability of an event is given by: P(E)= Number of favorable outcomes/ Total number of outcomes

Probability of Multiple Independent Events

The probability of two independent events AAA and BBB both occurring is:

P(A∩B)=P(A)×P(B)

Example: Rolling a 6 and Tossing Heads

# Probabilities of individual events

prob_heads = 0.5

prob_six = 1 / 6

# Combined probability

prob_heads_and_six = prob_heads * prob_six

print(f"Probability of rolling a 6 and getting Heads: {prob_heads_and_six:.2f}")


Conditional Probability

The probability of A given B is:

P(A∣B)=P(A∩B)/P(B)

Example: Cards in a Deck What is the probability of drawing a king, given it’s a face card?

 Total cards: 52

 Face cards (Jack, Queen, King): 12

 Kings: 4

# Probabilities

prob_face = 12 / 52

prob_king_and_face = 4 / 52

# Conditional probability

prob_king_given_face = prob_king_and_face / prob_face

print(f"Probability of drawing a king given it's a face card: {prob_king_given_face:.2f}")

Probability Distributions

Simulating a Uniform Distribution

The probability of all outcomes is equal (e.g., rolling a fair die)

import numpy as np

# Simulate 1000 rolls of a die

rolls = np.random.randint(1, 7, size=1000)

# Calculate probabilities

for i in range(1, 7):

prob = np.sum(rolls == i) / len(rolls)

print(f"Probability of rolling a {i}: {prob:.2f}")

Simulating a Normal Distribution

A normal distribution is common in data science (e.g., heights, weights).

import matplotlib.pyplot as plt

# Simulate a normal distribution

data = np.random.normal(loc=50, scale=10, size=1000) # Mean=50, SD=10

# Plot the distribution

plt.hist(data, bins=30, density=True, alpha=0.6, color='g')


plt.title("Normal Distribution")

plt.xlabel("Value")

plt.ylabel("Density")

plt.show()

# Calculate probabilities

mean = np.mean(data)

std_dev = np.std(data)

print(f"Mean: {mean:.2f}, Standard Deviation: {std_dev:.2f}")

Scipy for Probability Calculations

Cumulative Probability Example: Normal Distribution

from scipy.stats import norm

# Probability that a value is less than 60 (mean=50, sd=10)

prob = norm.cdf(60, loc=50, scale=10)

print(f"Probability of value < 60: {prob:.2f}")

Simulating Binomial Distribution

The binomial distribution models the number of successes in nnn trials.

Example: Tossing a Coin 10 Times

from scipy.stats import binom

# Probability of getting exactly 6 heads in 10 tosses

n, p = 10, 0.5 # 10 trials, probability of heads = 0.5

prob_6_heads = binom.pmf(6, n, p)

print(f"Probability of exactly 6 heads in 10 tosses: {prob_6_heads:.2f}")

Monte Carlo Simulation

Monte Carlo simulations are used to estimate probabilities via random sampling.

Example: Estimate Probability of Pi

# Monte Carlo Simulation to Estimate Pi

num_points = 10000

inside_circle = 0

for _ in range(num_points):
x, y = np.random.uniform(-1, 1, size=2)

if x**2 + y**2 <= 1:

inside_circle += 1

pi_estimate = (inside_circle / num_points) * 4

print(f"Estimated value of Pi: {pi_estimate:.4f}")

*********************************&&&&&&&&&&&&&&&&&&&&&&&********************

Conducting Hypothesis Tests with Scipy and Statsmodels

Hypothesis testing is fundamental in data science for making inferences about populations based on
sample data. Here’s how you can perform hypothesis tests using Scipy and Statsmodels.

1. Common Hypothesis Tests

1. T-Test: Compare the means of two groups.

2. Z-Test: Compare proportions or means for large sample sizes.

3. Chi-Square Test: Test independence between categorical variables.

2. Setup: Synthetic Data

import numpy as np

import pandas as pd

import scipy.stats as stats

import statsmodels.api as sm

from statsmodels.stats.weightstats import ztest

3. T-Tests with Scipy: Independent T-Test: Used to compare means of two independent groups.

Example: Comparing heights of two groups

# Generate synthetic data

np.random.seed(42)

group_a = np.random.normal(loc=165, scale=10, size=30) # Mean=165, SD=10

group_b = np.random.normal(loc=170, scale=10, size=30) # Mean=170, SD=10

# Perform independent t-test

t_stat, p_value = stats.ttest_ind(group_a, group_b)

print(f"T-Statistic: {t_stat}")

print(f"P-Value: {p_value}")

if p_value < 0.05:


print("Reject the null hypothesis: Means are significantly different.")

else:

print("Fail to reject the null hypothesis: No significant difference between means.")

Paired T-Test

Used when comparing two related samples, such as pre-test and post-test scores.

Example: Before and After Scores

# Generate synthetic data

before = np.random.normal(loc=75, scale=5, size=30)

after = before + np.random.normal(loc=2, scale=2, size=30) # Improvement

# Perform paired t-test

t_stat, p_value = stats.ttest_rel(before, after)

print(f"T-Statistic: {t_stat}")

print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Reject the null hypothesis: There is a significant change.")

else:

print("Fail to reject the null hypothesis: No significant change.")

4. Z-Test with Statsmodels : One-Sample Z-Test : Used for testing if a sample mean is significantly
different from a population mean.

Example: Testing Sample Mean Against Population Mean

# Generate synthetic data

data = np.random.normal(loc=100, scale=15, size=50)

# Population mean

population_mean = 105

# Perform one-sample z-test

z_stat, p_value = ztest(data, value=population_mean)

print(f"Z-Statistic: {z_stat}")

print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Reject the null hypothesis: Sample mean is significantly different.")

else:
print("Fail to reject the null hypothesis: No significant difference.")

Two-Sample Z-Test : Used for comparing two sample means.

Example: Comparing Means of Two Groups

# Generate synthetic data

group_a = np.random.normal(loc=100, scale=10, size=50)

group_b = np.random.normal(loc=110, scale=10, size=50)

# Perform two-sample z-test

z_stat, p_value = ztest(group_a, group_b)

print(f"Z-Statistic: {z_stat}")

print(f"P-Value: {p_value}")

if p_value < 0.05:

print("Reject the null hypothesis: Means are significantly different.")

else:

print("Fail to reject the null hypothesis: No significant difference between means.")

5. Chi-Square Test with Scipy The Chi-Square Test is used for testing independence between categorical
variables.

Example: Testing Independence

# Contingency table

data = np.array([[50, 30], [20, 80]]) # Rows: Age groups, Columns: Product preferences

# Perform chi-square test

chi2_stat, p_value, dof, expected = stats.chi2_contingency(data)

print(f"Chi-Square Statistic: {chi2_stat}")

print(f"P-Value: {p_value}")

print(f"Degrees of Freedom: {dof}")

print("Expected Frequencies:")

print(expected)

if p_value < 0.05:

print("Reject the null hypothesis: Variables are dependent.")

else:

print("Fail to reject the null hypothesis: Variables are independent.")

You might also like