Basic
Statistics
for
Data Analysis
• Mean: The sum of all data points divided by the
number of points. It represents the average value.
• Median: The middle value in a sorted dataset. If the
dataset has an even number of observations, it is the
average of the two middle numbers.
• Mode: The value that appears most frequently in a
dataset. A dataset can have more than one mode if
multiple values have the same highest frequency.
• Range: The difference between the maximum and minimum
values in a dataset.
• Variance: The average of the squared differences between
each data point and the mean. It measures the spread of
data points.
• Standard Deviation: The square root of the variance. It
provides a measure of dispersion around the mean in the
same units as the data.
• Interquartile Range (IQR): The difference between the first
quartile (25th percentile) and the third quartile (75th
percentile). It measures the range of the middle 50% of the
data.
• Percentiles: Values that divide the data into 100
equal parts. The nth percentile is the value below
which n% of the data fall.
• Quartiles: Values that divide the data into four equal
parts. The first quartile (Q1) is the 25th percentile,
the second quartile (Q2) is the median (50th
percentile), and the third quartile (Q3) is the 75th
percentile.
• Correlation Coefficient: A statistical measure
(ranging from -1 to 1) that describes the strength and
direction of a relationship between two variables.
Commonly used is the Pearson correlation
coefficient.
• Simple Linear Regression: A method to model the
relationship between a dependent variable and one
independent variable by fitting a linear equation to
observed data.
• Normal Distribution: A bell-shaped distribution that
is symmetrical about the mean, with data near the
mean more frequent in occurrence than data far
from the mean.
• Binomial Distribution: Describes the number of
successes in a fixed number of independent
Bernoulli trials (e.g., flipping a coin).
• Null Hypothesis (H0): A statement that there is no
effect or no difference, and it is the hypothesis that
researchers aim to test against.
• Alternative Hypothesis (H1): The hypothesis that
there is an effect or a difference.
• P-value: The probability of observing the data, or
something more extreme, assuming the null
hypothesis is true. A low p-value (typically < 0.05)
leads to the rejection of the null hypothesis.