[go: up one dir, main page]

0% found this document useful (0 votes)
14 views17 pages

Continuous Distributions

The document discusses the Normal Distribution, highlighting its key features such as symmetry, the empirical rule, and the concept of Z-scores. It explains the application of Z-scores in evaluating performance in races and introduces the Central Limit Theorem, which states that sample means from a non-normally distributed population will approximate a normal distribution as sample size increases. Additionally, it mentions methods to test for normality in data, such as the Shapiro-Wilk Test and QQ plots.

Uploaded by

rgrewal112233
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views17 pages

Continuous Distributions

The document discusses the Normal Distribution, highlighting its key features such as symmetry, the empirical rule, and the concept of Z-scores. It explains the application of Z-scores in evaluating performance in races and introduces the Central Limit Theorem, which states that sample means from a non-normally distributed population will approximate a normal distribution as sample size increases. Additionally, it mentions methods to test for normality in data, such as the Shapiro-Wilk Test and QQ plots.

Uploaded by

rgrewal112233
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Continuous Distributions:

Normal Distribution
Real life and Normal Distribution
• Many real-life data points follow
Normal Distribution:
• People’s heights and weights
• Population blood pressure
• Test scores
• Also called as Gaussian
Distribution
• Generally, less/non-natural
phenomena do not have normal
distributions, e.g. income of
people
Normal Distribution: Key Features
• Symmetry: Perfect symmetry around the mean
• Mean = Median = Mode = Center point of the normal distribution
• Bell-shaped curve
• Empirical rule:
• 68% of the data falls within Mean ± 1 SD
• 95% of the data falls within Mean ± 2 SD
• 99.7% of the data falls within Mean ± 3 SD
• Z-score (Standard score): Useful in finding relative position of an
observation with respect to the overall population
Example: Student Heights (C:\code\Data
Analytics\normal_distribution.py)
Z-Score Calculations

Z-scores > +3 and < -3


are considered outliers
Z-Score: Problem
• A runner participated in a 200m race and a 500m race
• Consider the following, calculate Z-scores and determine where she
did better
Race Average time Standard deviation Runner’s time
200m 31s 1.5s 28s
500m 125s 8.2s 132s
Z-Score Example
• A runner participated in a 200m race and a 500m race
• Consider the following and determine where she did better
Race Average time Standard deviation Runner’s time
200m 31s 1.5s 28s
500m 125s 8.2s 132s
In other examples, positive/higher Z-score will be
better, e.g. marks obtained by a student –
Visualizing Z-Scores Because, here the student would want to be
above average

• In this example, a lower time would be preferable when completing a


race and so, the lower z-score would be better
Z-Score Interpretation
Normal Distribution and Probability
• Standard normal distribution = Normal distribution with mean of 0 and
standard deviation of 1

• Total area under the curve = 1


• Can be used to map Z-Score to probability of area under the curve (Next)
Understanding Z-Score and Area Under the Curve (Probability)
• Suppose Z-Score = 1.15 • Suppose Z-Score = -0.24
Student Example: Z-Scores, Probabilities,
Percentiles
• •
Three Important Measurements

Is our Data Normally Distributed?
• Shapiro-Wilk Test: p-value should be > 0.05 (Data size <= 5000 rows)
• QQ plot (Quantile-Quantile): Ideal is straight line
The Central Limit Theorem (CLT)
Central Limit Theorem (CLT)
• Problem: Suppose population data does not follow normal distribution (i.e.
it is left/right-skewed)
• Population->Samples
• Example: 10 lakh examination result of students->500 samples of 100
students each
• For each sample, calculate average marks (Sample mean or x̄)
• Plot these sample means on a graph
• They will follow normal distribution: Central Limit Theorem (CLT)
• Generally, minimum sample size = 30
• How many such samples? No such number
• Result: Consider original population also as normally distributed now
CLT
Population

Sample 1 Sample 2 Sample 3 .. Sample n

Sample mean Sample mean Sample mean .. Sample mean

You might also like