Unit-4 Biostatistics Descriptive
Unit-4 Biostatistics Descriptive
Normal Distribution
Introduction
Normal distribution, often referred to as the Gaussian distribution, is a probability distribution that is
symmetric about the mean. It is characterized by its bell-shaped curve, where the mean, median, and
mode are all equal. The spread of the distribution is determined by its standard deviation, which
indicates how much the data varies from the mean.
1. Prevalence in Data: While few real-world datasets are perfectly normal, many distributions
approximate normality. This is crucial because it allows us to apply statistical methods that assume
normality, even when the data isn’t exactly normal.
2. Central Limit Theorem: According to this theorem, the sampling distribution of the sample mean
will tend to be normally distributed, regardless of the shape of the population distribution, as long
as the sample size is sufficiently large (typically n ≥ 30). This is fundamental in statistics, as it
justifies the use of normal distribution in inferential statistics.
3. Applications in Statistics: Normal distribution plays a central role in estimation and inferential
statistics, such as:
Hypothesis Testing: Many tests (e.g., t-tests, z-tests) rely on the assumption of normality for the
sampling distribution of the test statistic.
Confidence Intervals: Calculating confidence intervals for means and proportions often uses the
normal distribution.
Regression Analysis: Assumptions about residuals in regression models often include normality.
Mean (μ):
The mean is the central point of the distribution and determines where the peak of the curve lies. In a
normal distribution, the mean, median, and mode are all equal and located at the center of the curve.
Changing the mean shifts the entire curve left or right along the x-axis without altering its shape.
A larger standard deviation results in a flatter and wider curve, indicating that the data points are more
spread out. Conversely, a smaller standard deviation produces a steeper and narrower curve, indicating
that the data points are closer to the mean.
Larger σ: More spread out; the curve is flatter, indicating greater variability in the data.
Smaller σ: Less spread out; the curve is taller and narrower, indicating that the data points cluster more
closely around the mean.
These characteristics of the normal distribution are essential for understanding how data behaves and
for making statistical inferences.
1. Location
The location of a normal curve is determined by its mean (μ):
Mean (μ): This is the central point of the distribution where the peak of the curve occurs. Changing the
mean shifts the entire curve left or right along the horizontal axis.
For example, if you have a normal distribution with a mean of 100 and another with a mean of 120, the
curve with the mean of 120 will be shifted to the right, while the one with the mean of 100 will be
shifted to the left. Both curves retain their bell-shaped form, but their center points differ.
2. Shape
The shape of a normal curve is determined by its standard deviation (σ):
Standard Deviation (σ): This measures the spread or dispersion of the data around the mean. A larger
standard deviation results in a flatter and wider curve, while a smaller standard deviation leads to a
steeper and narrower curve.
For instance, consider two normal distributions: one with a standard deviation of 10 and another with a
standard deviation of 20. The curve with the standard deviation of 10 will be tall and narrow, indicating
that the data points are closely clustered around the mean. In contrast, the curve with the standard
deviation of 20 will be shorter and wider, showing that the data points are more spread out.
Bell-Shaped:
The standard normal curve has a distinct bell shape, which reflects the distribution of data. Most of
the values cluster around the mean, with fewer values occurring as you move away from the
center. This shape represents the probabilities of different outcomes, with the highest probabilities
occurring near the mean.
Unit 4: The Normal distribution
Asymptotic:
The tails of the curve approach, but never actually touch, the horizontal axis. This property means
that there is always a small probability of extreme values occurring, no matter how far from the
mean you go. In practical terms, this indicates that while extreme values are rare, they are still
possible.
Perfectly Symmetrical:
The standard normal curve is symmetric around the mean. This means that the left half of the
curve is a mirror image of the right half. As a result, the probabilities of values equidistant from the
mean are the same, reflecting a balanced distribution of data.
Mean (μ): For the standard normal curve, μ = 0. This parameter shifts the curve left or right on
the x-axis.
Standard Deviation (σ): For the standard normal curve, σ = 1. This parameter affects the width
and height of the curve. A smaller standard deviation results in a steeper curve, while a larger
standard deviation makes the curve flatter and wider.
These properties of the standard normal curve are essential for understanding how data behaves and
for conducting statistical analyses. The standard normal distribution serves as a reference for converting
any normal distribution into a standard form using z-scores, which facilitates comparison across
different datasets and applications in inferential statistics.
The Empirical Rule is a key property of the normal distribution that describes how data is distributed
around the mean. It applies to any normal distribution, regardless of the specific values of the mean (μ)
and standard deviation (σ).
Interpretation: This wider range includes nearly all the typical observations, highlighting that values
further from the mean are much less common.
Visual Representation
Graph: If you were to plot the normal curve, you would see the peak at μ. The intervals (μ - σ to μ + σ),
(μ - 2σ to μ + 2σ), and (μ - 3σ to μ + 3σ) would be highlighted to show the areas representing 68%, 95%,
and 99.7% of the data, respectively.
Statistical Analysis: The Empirical Rule is invaluable in statistics for making predictions
about data and understanding variability. It helps in:
Identifying Outliers: Values that fall outside of three standard deviations can be
considered outliers.
Confidence Intervals: When estimating population parameters based on sample data, the
rule aids in constructing confidence intervals.
Quality Control: In manufacturing and other fields, it helps set acceptable limits for
processes.
The Empirical Rule elegantly illustrates the characteristics of the normal distribution and provides a
framework for understanding the spread of data. By recognizing that most data points are clustered
around the mean, analysts can make informed decisions and interpretations in various fields, from
psychology to finance.
Normal distribution is commonly associated with the 68-95-99.7 rule, or empirical rule, which you can
see in the image below. Sixty-eight percent of the data is within one standard deviation (σ) of the mean
(μ), 95 percent of the data is within two standard deviations (σ) of the mean (μ), and 99.7 percent of
the data is within three standard deviations (σ) of the mean (μ).
Unit 4: The Normal distribution
Quick Estimation:
The rule allows for quick assessments of data variability and the proportion of data within certain
ranges without complex calculations. This is especially useful in initial data analysis.
Identifying Outliers:
By knowing how much data falls outside the standard deviations, analysts can identify potential
outliers. For instance, data points that lie more than three standard deviations from the mean may be
considered unusual or outliers.
Simplifying Probabilities:
It aids in probability assessments for normally distributed data, enabling statisticians to easily calculate
probabilities of outcomes within certain intervals.
Facilitating Communication:
Unit 4: The Normal distribution
The empirical rule is widely understood in statistics, making it easier for statisticians and researchers to
communicate findings and interpretations of data distribution.
Example:
How good is rule for real data
Check some example data: The mean weight of the 120 runners = 127.8
The standard deviation (SD) = 15.5
Solution:
Mean weight (μ): 127.8 pounds
Standard deviation (σ): 15.5 pounds
Sample size (n): 120 runners
So;
68% of runners weigh between 112.3 and 143.3 pounds.
95% of runners weigh between 96.8 and 158.8 pounds.
99.7% of runners weigh between 81.3 and 174.3 pounds.
This distribution provides valuable insights into the weight of the runners and helps identify the range
of typical weights within this population.
Unit 4: The Normal distribution
Unit 4: The Normal distribution
Example
Suppose SAT scores roughly follows a normal distribution in the U.S. population of college-bound
students (with range restricted to 200-800), and the average math SAT is 500 with a standard deviation
of 50, then:
1. What % of students will have scores between 450 and 550
2. What % will be between 400 and 600
3. What % will be between 350 and 650
Solution:
To analyze the SAT scores following a normal distribution with a mean (μ) of 500 and a standard
deviation (σ) of 50, we can use the properties of the normal distribution and the Empirical Rule, as well
as the z-score formula.
Given Data:
Mean (μ) = 500
Standard Deviation (σ) = 50
Z-Score Calculation
The z-score for a value X can be calculated using the formula:
Z= X−μ/ σ
So the Results:
Percentage of students with scores between 450 and 550: 68%
Percentage of students with scores between 400 and 600: 95%
Percentage of students with scores between 350 and 650: 99.7%
These percentages illustrate how SAT scores are distributed around the mean in a typical normal
distribution for college-bound students.
Where:
Z is the Z-score, representing the number of standard deviations a data point (X) is from the mean (μ).
X is the value from the original distribution.
μ is the mean of the original distribution.
σ is the standard deviation of the original distribution.
2. Purpose of Z-Scores
The Z-score helps standardize different normal distributions, making it easier to compare and analyze
data. It indicates how far and in what direction a value deviates from the mean.
Example Calculation:
Suppose we want to find the probability that a value from a normal distribution (mean = 50, standard
deviation = 10) is less than 60.
Calculate the Z-score:
Z=60−50/10=1.0
Look up Z = 1.0 in the Z-table:
The table shows that the cumulative probability for Z = 1.0 is approximately 0.8413. This means there is
an 84.13% chance that a randomly selected value is less than 60.
Historically, statisticians created Z-tables to list cumulative probabilities for Z-scores. This saved time
because it eliminated the need to integrate the probability density function of the normal distribution
for each calculation.
Integration: The area under the curve of the normal distribution is calculated using integration, which
can be complex. Instead, the pre-calculated values in the Z-table allow for quick reference.
Computers: Nowadays, statistical software and calculators can compute probabilities for any Z-score
instantly. Functions in software like R, Python, or Excel provide tools to calculate Z-scores and
probabilities without needing to refer to a table.
The Standard Normal Distribution is essential in statistics because it allows for the standardization of
any normal distribution. By using Z-scores and pre-calculated tables (or computational tools), we can
efficiently find probabilities, perform hypothesis testing, and conduct other statistical analyses.
The Standardized Normal Curve The z-score Tells us how many standard deviations there are
between the selected score (x) and the Mean (µ)
Examples:
If z is the standard normal random variable, then
To find the probability of getting a math SAT score of 575 or less, given a mean (μ\muμ) of 500 and a
standard deviation (σ\sigmaσ) of 50, you can follow these steps:
Z=575−500/50=75/50=1.5
Look up the Z-score in the standard normal distribution table: A Z-score of 1.50 corresponds to a
cumulative probability of approximately 0.9332. This means that about 93.32% of scores fall below 575.
Interpret the result: Therefore, the probability of scoring 575 or less on the math SAT is approximately
0.9332, or 93.32%.
Unit 4: The Normal distribution
Another Example
Suppose the scores in a test have a normal distribution with mean 70 and standard deviation equal to 5.
assume the students need to get at least 60 in the test to pass the course.
What percent of the students who take this test are able to pass the course?
To find the percentage of students who pass the test (score at least 60), we can use the properties of
the normal distribution.
The area under the curve between mean and -2 is 0.4772, and to the right side is 0.5, we add the
0.4772 and 0.5 and it comes to 0.9772. thus 97.72% students are able to pass.
Contd …
What percent of the students who take this test are able to pass the course with a score of at least
65?
Unit 4: The Normal distribution
Solution
Z Score = 65 - 70 / 5 = - 1 The area corresponding to -1 is 0.3413, thus adding 0.5 and 0.3413 is 0.8413
or 84.13% students are able to get at least 65 or more score
Contd …
What percent of the students who take this test are able to get score between 60 and 75? Mean = 70,
S.D. = 5
Formula; Z Score
Z= X - /
Given Values:
Mean (μ\muμ) = 70
Standard deviation (σ\sigmaσ) = 5
Calculate Z-scores:
For X1=60
Z1= X1−μ/ σ = 60−70/5 = 5−10 = −2
For X2=75
Z2 = X2−μ/ σ = 75−70/5 = 5/5 = 1
Z Score1 = X1 - µ / S.D 60 – 70 / 5 = -2 From tableArea1 = 0.4772
Z Score2 = X2 - µ / S.D 75 – 70 / 5 = 1 From table Area2 = 0.3413
Adding the 2 areas under the curve we get 0.8185 or 81. 85% students score between 60 and 75
Example :
Z-Score Population Mean IQ Score = 100 and Standard deviation = 15
Formula:- Z = (X – μ) / σ If IQ score of a person is 110, how do you compare him to the rest of people
To compare the IQ score of 110 to the rest of the population using the Z-score
Interpret the Z-score: A Z-score of approximately 0.67 means that an IQ score of 110 is 0.67 standard
deviations above the mean.
Unit 4: The Normal distribution
In terms of comparison, a Z-score of 0.67 suggests that the individual scored higher than about 74.9% of
the population (using a standard normal distribution table or calculator). This indicates that the person
has an IQ that is above average compared to the general population.
Example:
Z-Score Population Mean IQ Score = 100 and Standard deviation = 15
Formula:-
Z = (X – μ) / σ
If IQ score of a person is 110, how do you compare him to the rest of people? Z = (110 – 100) / 15 Z =
0.67
Example:
Assume that the mean weight of American males between 21 and 30 years old is 156 pounds with a
standard deviation of 8 pounds.
If you select a male at random, what is the probability that his weight will be 160 pounds or more
Z= x-/
Z Score = 160 - 156 / 8
Z Score = 4/ 8 = 0.5 on the Z-table = 0.1915
0.5 – 0.1915 = 0.3085
Probability that mean weight of American males between 21 and 30 years old will be 160 pounds or
more is about 31%
Tutorial
If birth weights in a population are normally distributed with a mean of 109 oz and a standard
deviation of 13 oz,
a. What is the chance of obtaining a birth weight of 131 oz or heavier when sampling birth records at
random?
b. What is the chance of obtaining a birth weight of 121 or lighter?
Answer a. What is the chance of obtaining a birth weight of 141 oz or heavier when
sampling birth records at random?
Example:
1: How to determine what proportion of a normal population lies above/below a certain level The
average height of Hobbit = 120 cm If distribution of Hobbit heights is normal with mean = 120 cm, SD =
20 What is probability of finding a Hobbit taller than 125 cm??
Solution:
Given:
Mean height (μ) = 120 cm
Standard deviation (σ) = 20 cm
Height of interest (X) = 125 cm
P(Z<0.25) = 0.5987
Example:
If the cholesterol level of men in the community is normally distributed with a mean of 160 and
standard deviation of 30.
Identify any outliers from among the following measurements. a) 248 b) 255 c) 197 d) 300 e) 99 f) 79 g)
89 h) 59
Solution:
Given:
Mean (μ) = 160
Standard deviation (σ) = 30
Now we will calculate the Z-score for each of the provided cholesterol levels.
For 248:
Z = 248-160/30 = 88/30= 2.93
For 255:
Z = 255−160/30= 95/30= 3.17
For 197:
Z=197−160/30= 37/30= 1.23
For 300:
Z=300−160/30= 140/30= 4.67
For 99:
Z= 99−160/30 = 30−61=−2.03
For 79:
Z= 79−160/30 = 30−81= −2.70
For 89:
Z= 89−160/30 = 30−71= −2.37
For 59:
Z= 59−160/30 = 30−101= −3.37
Example:
If the cholesterol level of men in the community is normally distributed with a mean of 160 and
standard deviation of 30.
Find % of people who have cholesterol level 170 and < 170
Find % of people who have cholesterol level 130 and
Solution:
Unit 4: The Normal distribution
To find the percentages of people with specific cholesterol levels, we will use the Z-score formula and
the standard normal distribution table.
Given:
Mean (μ) = 160
Standard deviation (σ) = 30
Step 1: Calculate Z-scores
Summary of Results:
% of people with cholesterol level ≥ 170: 37.07%
% of people with cholesterol level < 130: 15.87%
% of people with cholesterol level between 130 and 190: 68.26%