[go: up one dir, main page]

0% found this document useful (0 votes)
3 views9 pages

Statistics Cheat Sheet

The document provides a comprehensive list of statistical functions and methods for data analysis, including calculations for mean, median, mode, variance, and standard deviation. It also covers probability distributions, hypothesis testing, regression analysis, and sampling techniques. Additionally, it features visualization tools for data representation and analysis.

Uploaded by

syedyunusds
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views9 pages

Statistics Cheat Sheet

The document provides a comprehensive list of statistical functions and methods for data analysis, including calculations for mean, median, mode, variance, and standard deviation. It also covers probability distributions, hypothesis testing, regression analysis, and sampling techniques. Additionally, it features visualization tools for data representation and analysis.

Uploaded by

syedyunusds
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Tajamul Khan

@Tajamulkhann
mean(data): Calculate the average
of a dataset.
median(data): Find the middle
value of sorted data.
mode(data): Identify the most
frequently occurring value.
variance(data): Measure the spread
of data points.
std(data): Calculate the standard
deviation to quantify data
dispersion.
min(data): Determine the smallest
value in the dataset.
max(data): Identify the largest
value in the dataset.
quantile(data, q): Compute the q-
th quantile of the data.

@Tajamulkhann
binom.pmf(k, n, p): Probability of
k successes in n Bernoulli trials.
poisson.pmf(k, λ): Probability of
k events in a given interval.
norm.pdf(x, μ, σ): Probability
density of normal distribution @x.
norm.cdf(x, μ, σ): Cumulative
probability for a value x in a
normal distribution.
uniform.pdf(x, a, b): Uniform
distribution probability density
at x.
expon.pdf(x, λ): Probability
density for exponential
distribution.
bernoulli.pmf(k, p): Probability
mass for Bernoulli trials.
beta.pdf(x, α, β): Probability
density for Beta distribution.

@Tajamulkhann
ttest_ind(data1, data2): Perform
an independent t-test.
ttest_rel(data1, data2): Perform a
paired t-test.
anova(data1, data2, data3):
Conduct ANOVA to compare multiple
means.
zscore(data): Standardize data
using z-scores.
chi2_contingency(table): Test
independence using a Chi-square
test.
confidence_interval(mean, std, n,
confidence=0.95): Compute
confidence intervals.
pearsonr(x, y): Calculate Pearson
correlation coefficient.
spearmanr(x, y): Compute Spearman
rank correlation.

@Tajamulkhann
p_value(data, null_hypothesis):
Calculate p-value to test null
hypothesis.
z_test(x̄ , μ, σ, n): Perform a Z-
test for population mean.
one_sample_ttest(data, μ0):
Compare sample mean to a
population mean.
two_sample_ttest(data1, data2):
Compare means of two independent
samples.
wilcoxon(data1, data2): Non-
parametric test for paired
samples.
f_oneway(data1, data2, data3):
One-way ANOVA for multiple groups.
mannwhitneyu(data1, data2): Test
differences between two
independent samples.
power_analysis(effect_size, n, α):
Determine sample size or test
power.

@Tajamulkhann
corrcoef(x, y): Calculate
correlation matrix.
linregress(x, y): Perform linear
regression analysis.
polyfit(x, y, deg=1): Fit a
polynomial regression model.
ols(formula, data): Perform
ordinary least squares regression.
residuals(y_actual, y_pred):
Calculate regression residuals.
r_squared(y_actual, y_pred):
Compute coefficient of
determination (R²).
logit(data): Apply logistic
regression analysis.
predict(model, new_data): Use
trained model for predictions.

@Tajamulkhann
sns.boxplot(data): Identify
outliers and visualize
distributions.
sns.histplot(data, kde=True):
Visualize data distribution with
KDE overlay.
sns.scatterplot(x, y): Analyze
relationships between variables.
sns.heatmap(corr_matrix,
annot=True): Display correlation
matrix with annotations.
sns.pairplot(data): Show pairwise
relationships and trends.
sns.violinplot(x, y): Display
distribution for categorical
comparisons.
plt.errorbar(x, y, yerr): Add
error bars to scatterplots.
sns.lmplot(x, y, data): Plot
regression line with scatter
points.

@Tajamulkhann
random.sample(population, k): Draw
a random sample of size k.
random.choices(population,
weights, k): Perform weighted
sampling.
stratified_sample(data, strata,
size): Generate stratified random
samples.
bootstrapping(data, n_samples):
Create bootstrap samples for
confidence intervals.
systematic_sample(data, interval):
Select samples systematically.
cluster_sample(data, clusters,
size): Perform cluster sampling.
oversampling(data,
minority_class): Handle imbalanced
datasets.
undersampling(data,
majority_class): Reduce majority
class size for balance

@Tajamulkhann
Follow for more!

You might also like