Statistics Class Notes
Date: November 2, 2024
Introduction to Statistics
● Definition: The study of data collection, analysis, interpretation, and presentation.
● Applications: Statistics is used in various fields like economics, medicine, psychology, business, and more
to make informed decisions.
Types of Data
● Qualitative (Categorical) Data: Non-numeric data that can be categorized (e.g., colors, names).
● Quantitative (Numerical) Data:
○ Discrete: Countable, finite values (e.g., number of students in a class).
○ Continuous: Infinite values within a range (e.g., height, weight).
Descriptive vs. Inferential Statistics
● Descriptive Statistics: Summarizes data using numbers and graphs (e.g., mean, median, mode,
range).
● Inferential Statistics: Makes predictions or inferences about a population based on a sample (e.g.,
hypothesis testing, confidence intervals).
Measures of Central Tendency
● Mean (Average): Sum of all data points divided by the number of points.
● Median: Middle value in a data set (or average of two middle values if even number of points).
● Mode: Most frequently occurring value in a data set.
Measures of Dispersion
● Range: Difference between the highest and lowest values.
● Variance: Average squared deviation from the mean; shows how data is spread out.
● Standard Deviation (σ): Square root of variance; represents data spread in the same units as the data.
Probability Basics
● Probability: Likelihood of an event occurring, expressed as a number between 0 and 1.
○ Formula: P(A)=Number of favorable outcomesTotal number of outcomesP(A) = \frac{\text{Number
of favorable outcomes}}{\text{Total number of outcomes}}P(A)=Total number of outcomesNumber of
favorable outcomes
● Complement Rule: P(Not A)=1−P(A)P(\text{Not A}) = 1 - P(A)P(Not A)=1−P(A)
● Conditional Probability: Probability of an event occurring given another event has already occurred.
Distributions
● Normal Distribution: Symmetrical, bell-shaped distribution; characterized by mean (μ) and standard
deviation (σ).
● Binomial Distribution: Probability distribution for a fixed number of trials with two possible outcomes
(success/failure).
● Poisson Distribution: Probability of a given number of events occurring in a fixed interval.
Hypothesis Testing
● Null Hypothesis (H₀): Statement that there is no effect or difference.
● Alternative Hypothesis (H₁): Statement that there is an effect or difference.
● Significance Level (α): Probability of rejecting the null hypothesis when it is true, often set at 0.05.
● p-Value: Probability of observing test results at least as extreme as the results actually observed, under the
assumption that the null hypothesis is correct.
○ If p ≤ α: Reject the null hypothesis.
○ If p > α: Fail to reject the null hypothesis.
Correlation and Regression
● Correlation: Measures the strength and direction of the relationship between two variables (ranging from
-1 to 1).
● Linear Regression: Models the relationship between a dependent variable and one or more independent
variables.
○ Simple Linear Regression Equation: Y=a+bXY = a + bXY=a+bX, where YYY is the dependent
variable, XXX is the independent variable, aaa is the intercept, and bbb is the slope.