Chapter 2
Chapter 2
STANDARD DISTRIBUTIONS
2.1: Introduction
Table – 1
Class Frequency
20-25 10
25-30 6
30-35 4
35-40 4
40-45 6
45-50 7
50-55 2
55-60 1
n = 40
Now, for each class, we can compute the relative frequency; for
example, for the class 20-25, the relative frequency is 10 40 = 1 4 . Similarly,
for the class 25-30, the relative frequency is 6 40 = 3 20 , and so on. We can
now tabulate this as follows. See Table – 2 given below.
1
Table – 2
Note that the sum of all relative frequencies is 1. Also, observe that,
if we take another set of 40 employees of the same company and form the
relative frequency distribution then, we will get completely different table.
Thus, Tables like Table – 2 depends on the set of data. But, still, the sum of
all relative frequencies will be 1.
This gives rise to the question: Are there distributions which do not
depend on the data? Can we define them independent of any data? If so,
how?
Such questions are very natural to ask, but not so very easy to answer;
though, the answers are available to all of them. The answers are yes and
that is precisely what we are going to study in this Chapter.
2
2.2: Discrete versus Continuous
The data arising from a real – life situation can be of either one of the
following two categories: discrete and continuous.
3
2.3.1: Binomial Distribution
4
For such situations, the chances of getting k successes out of n trials
can be calculated from the formula:
Solution: One way of doing this is as follows: take this pack and test each
one of the 100 bulbs! If we come across exactly 20 defective bulbs, then the
probability is 1; otherwise, it is zero. Of course, this is a straightforward
method.
But if the company wants to know this probability for 1000 packs of
100 bulbs each, then this method would be cumbersome and problematic.
So, we need methods to deal with such situations. Let us look at this
situation as a situation with two outcomes – either a bulb is good or
defective (note that it cannot be both or neither!). All the above given 5
conditions for a Binomial distribution are satisfied. (Verify this). So, we are
in a binomial situation. Thus, we can compute the probability, if we can find
n, k and p. We have here n = 100 and k = 20. But what is p? It is not given.
So, assume for the moment that p = 0.31. Then the required probability is
P(x = 20), whose value is given by
100 !
(0.31) 20 (0.69) 80 .
20 !80!
5
Note that, this probability is same for every pack of 100 bulbs of this
company. This also means that the chance of finding exactly 20 defective
bulbs in every pack of 100 bulbs is only 0.0046, which is very small. So, the
ADC company can be happy that the quality control is up to their
expectations.
6
Another method is to simply assume a value for p (note that the value
assumed should be between 0 and 1) which would be “realistic”. For
example, when USA sent a man into the space for the first time, the
probability that the rocket would function normally or the probability that he
would survive in the unknown conditions of the space or the probability that
he would safe land, were not known as there was no past records available
(naturally). Also you cannot send 100 people into the space to collect the
required data. Similarly, what will happen after a nuclear holocaust is
anybody’s guess. So, we cannot know the probability of survival of a human
being or a nation or a species or oxygen in the atmosphere. And it will be the
height of stupidity to conduct experiments regarding this. Under such
circumstances, either we assume a value for the probability which is
“realistic’ or we admit that probability theory is NOT applicable to such
situations.
It might appear to you that these are very extreme cases, which are
rare. If that is the case, consider the following examples from real life:
(i) A production company is thinking of launching a new product. In this
case, the probability that this new product will out beat its rivals or the
probability that it will garner 40% of the market, cannot be determined (that
is, not known a priori);
(ii) The government of a country is thinking of launching a new program for
agriculture. The probability of its success cannot be determined (that is, not
known a priori);
(iii) An educational institution is considering the introduction of a new
method for teaching. The probability of its success cannot be determined
(that is, not known a priori);
(iv) You are preparing to appear in an examination. The probability of your
success cannot be determined (that is, not known a priori). Even your past
records regarding the other examinations you have taken, are of no use here.
All these and many other examples are encountered quite often in real
life. So, even in such familiar situations, determining the probability is either
difficult or might not be possible.
This aspect should be kept firmly in the mind, when dealing with
applications of probability to real life situations.
End of Digression
--------------------------------------------------------------------------------------------
7
Example:2: In a certain experiment, 6 rabbits are given a drug. It is known
that one – fifth of all rabbits which are giver the drug develop certain
symptoms. Let us determine the chances that 4 out of these 6 rabbits develop
the symptoms.
It is given that one – fifth of all rabbits which are given the drug
develop symptoms. Thus
1
p = = 0 .2
5
From the tables, we get this value as 0.015. This means that, if we
repeat the above experiments, we will be able to observe 4 rabbits
developing the symptoms on 1.5% of times.
Example: 3: Go back to the above example 2. Let us now ask: what are the
chances that at most 2 rabbits develop symptoms.
Solution: Now, it may happen that (i) none of the rabbits develops
symptoms, that is, k = 0; or (ii) one of the rabbits develops symptoms, that
is, k = 1; or (iii) two of the rabbits develops symptoms, that is, k = 2. This is
the meaning of saying that “at most 2 rabbits develop symptoms”.
= 0.9111.
8
In other words, the chances are very high (more than 91%) that at most 2
rabbits develop symptoms.
Just as we have seen, for every data set we can compute the mean and
the variance, in a similar manner we can compute the mean and the variance
for every distribution. Note that a binomial distribution involves two
parameters n and p.
σ2 = n p (1 – p)
σ2 = 21.39
EXERCISES
Note: Keep your Text Book by your side (or any book will do)
Caution: In certain books, tables are given only for cumulative
distributions. Use it with caution, as we are dealing with the probability
directly.
1. An urn (or a box) contains 4 red and 6 blue balls. A ball in drawn at
random, its color noted and replaced before next drawing. Drawing a
red ball is considered as success. What are the chances of getting a red
ball in n number of drawings?
9
2. Two teams play a five game series. The chances of the home team
winning a particular game is 0.55. What are the chances that the home
team wins at least 3 games?
3. Of every 1000 parts produced by a machine, 10 are defective, on the
average. What are the chances that some, but not all, of a sample of
three of these parts turn out to be defective.
4. A shopkeeper has been getting over Rs. 200 a day, on the average, for
eight days out of every ten days over the past several months. What
are his chances of getting the same turn-over at least five out of the
next six days?
Note that, the examples (1) to (5) involve time period and examples
(5) to (8) involve space (length or area or volume).
10
1. Events which occur in one time interval do not depend on the
happening or non - happening of those occurring in any other non
over-lapping time interval.
2. The probability that an event occurs is proportional to the length of
the time or space units.
3. The probability that two or more events occur in a very small time or
space unit is supposed to be small enough that it can be neglected.
e −m m k
(2) P (x = k) =
k!
The mean (µ) and the variance (σ2) can be easily derived. They are
given by
Mean of the Poisson distribution = µ = m
Variance of the Poisson distribution = σ2 = m.
11
Note : (1) The sum of all probabilities is 1. That is, if P(x = k) is as in
equation (1) then Σ P(x = k) = 1.
(2) The probability of having n or less number of occurrences (or
successes) is given by P(x ≤ n) = Σ P(x = k) where the sum is taken from k
= 0 to k = n.
(3) The probability of having n or more number of occurrences is
given by P(x ≥ n) = 1 - Σ P(x = k) where the sum is taken from k = 0 to k =
n – 1.
Example: 5: If a person receives 5 calls on the average during a day, what is
the probability that he will receive fewer than 5 calls tomorrow?
Solution: According to the previous discussion, experience has shown that
Poisson probability model is appropriate for this situation. The average value
m = 5 is given. Thus we need to compute P(x ≤ 4). This is given by
4
P ( x ≤ 4 ) = ∑ P (x = k) = 0.44049
k =0
(The value can be found from the table in any statistics book);
P ( x = 5) = p (5) = 0.17547
Example: 6: A secretary claims that she averages one error per page. A
sample page is selected at random from some of her work, and five errors
are found. What is the probability of her making five or more errors on a
page if her claim is correct?
Solution: Assuming that the Poisson process is appropriate, we take m = 1
per page. The required probability is given by P(x ≥ 5). Thus we have
P(x ≥ 5) = 1 – P(x ≤ 4) = 1 – 0.9963 = 0.0037.
In general, such problems do not end with the computation of the
required probability. It is also required to interpret the value according to the
given context. This is done as follows: Note that, the value of the probability
is very small, indicating that the secretary is an exceptional one (that is, she
makes very few errors!). But, if she is not very experienced person in this
area, then, in view of the small value of the probability, we may conclude
one of the following:
12
1. The Poisson model is correct and a near miracle has occurred;
2. The model is correct but the wrong average value m has been claimed;
3. The model is incorrect.
Probably (2) is more plausible, in this case.
EXERCISES
(1) A city has on the average, five traffic deaths per month. What is the
probability that this average is exceeded in any given month?
(2) A taxicab company has, on the average, 10 flat tyres per week. During
the past week they had 20. Assuming the Poisson model is
appropriate, what is the probability of having 20 or more flats daring a
week? Would you suspect foul play?
13
Theoretically, although it may not be apparent from figures, the
curve never touches the X – axis. However, it approaches it so closely that
for practical purposes the area lying farther than ± 3σ from the mean µ can
be ignored without any loss.
Example:7: Determine standard scores for x = 18-3, 27-9, 43-4, 39.3 in the
normal distribution for which µ = 30.1, and σ = 2.4.
Solution :
18.3 − 30.1
For x =18.3, z = = − 4.92
2 .4
27.9 − 30.1
if x = 27.9, z = = − 0.92
2 .4
34.4 − 30.1
if x = 34.4 z = =1.79
2 .4
14
39.3 − 30.1
if x = 39.3, z= = 3.83
2 .4
(6) x = µ + z σ.
The fact that any normal – distribution can be related to the standard
normal distribution is of central importance. Because of this, the standard
normal distribution can be studied in detail and the results transferred to any
normal distribution. Table of cumulative values of the standard normal
distribution is usually given at the end of every statistics book.
Now in this table, the entries on the left and top correspond to
the values of z and the decimal value, usually given to two decimal places.
15
The integer part and the first decimal value are given in the column at
the left, and the second decimal value in the top row. The entries in the body
of the table are the areas under the normal curve, between the mean (0) and
the given value z, correct to four decimal places. (Caution: In some books
the area is given from - ∞ to the given value of z).
For instance, if z = 1.62, to find the corresponding area, look down the
left column to find 1.6 and look along the top row to find 0.02. Then, the
entry which is in both the row of 1.6 and the column of 0.2 is 0.4474. This
means that the area under the standard normal curve between 0 and 1.62 is
0.4474. This gives the value of the probability P( 0 ≤ z ≤ 1.62).
You should see the graphs of these from your Text Book or from
some good book.
Finally, the area between –1.62 and 1.62 is twice the area between 0
and 1.62 or 0.8948. This is interpreted as: the probability that the variable of
the standard normal distribution has a value between – 1.62 and 1.62 is equal
to 0.8948. This is illustrated by the following
Example:11: Find the area under the standard normal curve between 0 and –
z if z = 0.07, 0.83, 1.70, 2.56, – 0.24, - 1.12 , - 3.01
Solution:
16
z Area between 0 and z
0.07 0.0279
0.83 0.2967
1.70 0.4554
2.56 0.4948
-0.24 0.0948
-1.12 0.3686
-3.01 0.4984
Points to Remember:
1. The area under the curve is always positive.
2. The entries on the edge of the table (left and top) represent standard
deviations distance from the mean (standard scores).
3. The entries in the body of the table represent areas under the standard
normal curve between the mean and the given standard score (z –
value).
Example: 12: Find the standard scores for which the area under the standard
normal curve between it and the mean is 0.2019, 0.4908, 0.3621
Solution: From the table we can obtain the values of z
Area z
0.2019 0.53
0.4908 2.36
0.3621 1.09
Example:13: Find the area under the normal curve between z = -1.34 and z =
0.57, between z = 0.59 and z = 1.27
Solution: For z = -1.34 and z = 0.57, the values of the areas from the table
are respectively 0.4099 and 0.2157. Since these are on the opposite sides of
the mean, they should be added together. Thus the area between z = - 1.34
and z = 0.57 under the normal curve is 0.6256. For z = 0.59 and z = 1.27 the
corresponding areas are 0.2224 and 0.3980. Since they are on the same side
17
of mean, their difference is the desired area. Thus the area under the normal
curve between z = 0.59 and z = 1.27 is 0.1765.
EXERCISES
18
2. Find the values of x, for the following standard scores, in a normal
distribution with mean, m = 10.4 and variance, σ = 11.8
5. Find the area under the standard normal curve between z and – z if
(a) z=1
(b) z = 1.96
(c) z = 1.28
6. Find the value of z for which 0.1230 of the area under the standard
normal curve lies to the right of z.
7. The mean of a normal distribution is 100. If the probability that the
variable assumes a value greater than 121.0 is .1446, what is the
standard deviation of the distribution?
8. A normal distribution has a standard deviation of 134. The probability
that the variable takes a value less than 1072 is 0.7734. What is the
mean of the distribution?
19
2.4.2: Applications of the Normal Distributions
The areas under the normal curve associated with z1 and z2 are 0.2734 and
0.3944. The area between 48 and 50, then is 0.3944 – 0.2734 = 0.1210.
Thus P(48 ≤ x ≤ 50) = 0.1210, where x is the age of the worker.
20
15.5 −10
Here z = =1.75 and the associated area is 0.4599. So, P( x > 15 ) =
3.14
0.0401 (why?).
Example:18: In a certain high rent district, the monthly rental for apartments
is approximately normally distributed with a mean of Rs 384.22 and a
standard deviation of 126.40. Above what value is the highest 30 percent of
the monthly rentals in this district?
Solution: Although the variable is discrete, its values are not given in
integers; so we do not apply the continuity correction. According to the table
20% of the values are between the mean and z, for z = 0.52. At this point, 30
percent of the values are above it.
x − 384.22
Thus, we have 0.52 = or x = 449.95.
126.40
Thus, about 30 percent of the rentals are above Rs. 499.95. A slightly more
accurate figure could be obtained with more detailed working.
EXERCISES
21
2.4.3: Normal Approximations
22
2.5: Exponential Distributions
−m x
(7) F ( x) =1 − e for x ≥ 0
23
an event A has not occurred during the first N repetitions. Then the
probability that it will not occur during the next M repetitions, is the
same as the probability of that it will not occur during the first M
repetitions”. In other words, the information of no successes is forgotten
so far as subsequent developments are concerned.
Therefore,
Expected cost for Process I = (5) P(m>200) + (105) P(m ≤ 200)
= (5) Exp(- 2) + (105) [1 - Exp(- 2)]
= 91.466
By a similar computations, we get for the process II m = 1 / 150
Cost per fuse = 10 if m > 200
= 110 if m ≤ 200
Expected cost for Process II = (10) Exp( - 4/3) + (110) [1 - Exp ( - 4/3)]
= 83.64
The expected cost of Process I is, though, slightly more than that for the
Process II, still we prefer Process I as the cost per fuse for Process II is
double that for Process I. Hence we prefer, Process I.
24
Solution: The required probability is P(x > 1 / m). We have
P(x > 1 / m) = P(x > 1 / 10) = Exp (- 10 / 10) = Exp ( - 1) . Find this value
using your calculator.
EXERCISES
APPENDIX
It can also be proved that mean = µ and variance = σ2 . Thus, the two
parameters µ and σ represent the mean and standard deviation of this
distribution.
25
2. Standard Normal Variate
The variable
x−m
(2) z=
σ
is called the standard normal variate (SNV) and has the distribution given
by
1 − z2
(3) y= exp .
2π 2
Note that the mean and standard deviation of this distribution are 0 and 1
respectively. A Diagram of this is shown below;
26
Note the following
1. When z lies between – 1.96 and 1.96, the corresponding area is .95.
We say that the probability of z lying between – 1.96 and 1.96 is 0.95
or p (−1.96 < z <1.96)= 0.95
3. Sigma Levels
27