Comprehensive Guide to Statistics for High-Quality Data Analytics
1. Descriptive Statistics
1.1 Measures of Central Tendency
Solved Problem 1: Mean Calculation
Problem: Find the mean of the dataset [10, 20, 30, 40, 50].
Solution:
Mean = (10 + 20 + 30 + 40 + 50) / 5 = 30
Solved Problem 2: Median Calculation
Problem: Find the median of [5, 10, 15, 20, 25, 30].
Solution:
Since the dataset has an even number of elements, the median is the average of the two
middle values:
Median = (15 + 20) / 2 = 17.5
Solved Problem 3: Mode Calculation
Problem: Find the mode of [4, 5, 6, 7, 5, 8, 5, 9].
Solution: The number 5 appears most frequently. Hence, Mode = 5.
2. Inferential Statistics
Solved Problem 6: Binomial Probability
Problem: A fair coin is flipped 4 times. What is the probability of getting exactly 2 heads?
Solution: Using the binomial probability formula:
P(2 heads) = 6 * (0.5)^2 * (0.5)^2 = 0.375
3. Regression Analysis
Solved Problem 8: Simple Linear Regression
Problem: Given the data X = [1, 2, 3, 4] and Y = [2, 2.5, 3.5, 5], find the regression equation.
Solution: Using the least squares method, we get:
Regression equation: Y = 1.2 + 0.9X
Practice Problems (For You to Try)
1. Calculate the mean, median, and mode for [10, 12, 14, 14, 18, 20].
2. A dataset has a mean of 20 and a standard deviation of 5. Find the variance.
3. A dice is rolled 3 times. Find the probability of getting exactly 2 sixes.
4. A university claims students study 6 hours per day. A survey shows a mean of 5.5 hours
with a standard deviation of 1 hour. Use a t-test to verify the claim.
5. Given data (X: [1, 2, 3], Y: [3, 6, 9]), calculate the Pearson correlation coefficient.