[go: up one dir, main page]

0% found this document useful (0 votes)
5 views27 pages

Chapter 2

Uploaded by

rajpd28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views27 pages

Chapter 2

Uploaded by

rajpd28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Chapter 2

STANDARD DISTRIBUTIONS

2.1: Introduction

So far, we saw howa frequency distribution gave a convenient way of


presenting the given set of data. For example, we consider the frequency
distribution of the ages of 40 employees of an organization as given in Table
1.

Table – 1

Class Frequency
20-25 10
25-30 6
30-35 4
35-40 4
40-45 6
45-50 7
50-55 2
55-60 1
n = 40

Now, for each class, we can compute the relative frequency; for
example, for the class 20-25, the relative frequency is 10 40 = 1 4 . Similarly,
for the class 25-30, the relative frequency is 6 40 = 3 20 , and so on. We can
now tabulate this as follows. See Table – 2 given below.

1
Table – 2

Class Frequency Relative Frequency


20-25 10 1
4
25-30 6 3
20
30-35 4 1
10
35-40 4 1
10
40-45 6 3
20
45-50 7 7
40
50-55 2 1
20
55-60 1 1
40

Note that the sum of all relative frequencies is 1. Also, observe that,
if we take another set of 40 employees of the same company and form the
relative frequency distribution then, we will get completely different table.
Thus, Tables like Table – 2 depends on the set of data. But, still, the sum of
all relative frequencies will be 1.

This gives rise to the question: Are there distributions which do not
depend on the data? Can we define them independent of any data? If so,
how?

Such questions are very natural to ask, but not so very easy to answer;
though, the answers are available to all of them. The answers are yes and
that is precisely what we are going to study in this Chapter.

2
2.2: Discrete versus Continuous

The data arising from a real – life situation can be of either one of the
following two categories: discrete and continuous.

Generally, observation, which can be measured only in whole


numbers, are said to be of discrete type. For example, the number of people
residing in a town at an instant of time, the number of defective bulbs
produced in a factory in one week run, the number of runs made by a
cricketer in a year, the number of shafts manufactured by a company in a
week, etc.

There are also observations which can be measured only in a


continuous way. Such observations or the corresponding data are said to be
of continuous type. For example, the heights of all the students of a college,
the weights of all new – born babies in a maternity hospital, the lengths of
cloth produced by a textile – mill over a month etc.

Distributions associated with discrete type situation are called


discrete distribution and those associated with continuous – type situations
are called continuous distribution.

Under discrete distributions, we will be studying binomial and


Poisson distributions.

And under continuous distributions we will be studying normal and


exponential distributions.

2.3 Discrete Distributions

We have already seen that certain situations are of discrete – type.


Such situations can be studied by means of discrete distributions. For such a
study to become feasible, we need to know about and become familiar with
certain important and well – known distributions. In this Section, we will
study and learn two of the important and simple discrete distributions,
namely, the Binomial and the Poisson Distributions, which have many
applications in real life problems.

3
2.3.1: Binomial Distribution

Binomial distribution is applied to situations with exactly two out


comes: For example life - or - death, head – or - tail, success – or - failure
etc.
One of these two outcomes is called ‘success’ and the other will
automatically be called ‘failure’. This choice will depend on the situation
under study and the decision called for.

As an illustration, imagine that two players – A and B – are playing a


game: a coin is tossed. If head appears A wins. If tail appears B wins. Now,
for A, head is success and tail is a failure. Whereas, for B, tail is a success
and head is failure!

So, naming an outcome as success depends on the objective of the


study undertaken. If you are a supporter of the player A, then ‘head’ will be
rated as a ‘success’ by you.

In the above game, ‘a toss of the coin’ is usually called a trial.


Similarly, appearing in an exam is a trial. Observing a person is a trial as he
could be alive or dead!

A series of trials constitutes an experiment. For example the above


game may consist of 10 tosses of the coin! In which case, each toss is a trial
and all the 10 tosses constitute an experiment. Similarly, a pensioner has to
refer to the pension office every month to prove he is alive. In this case, the
observation made every mouth is a trial and all the observations, over a
period of time will constitute an experiment.

Binomial distributions are appropriate for situations which involve


experiments, such that (i) only two out comes are possible for each trial; (ii)
these two outcomes are same for each trial; (iii) they cannot occur together
(for example, a person could not be living as well as dead !); (iv) one of the
outcomes has to always occur; and (v) the chances of getting any single
outcome remains constant throughout the experiment.

4
For such situations, the chances of getting k successes out of n trials
can be calculated from the formula:

(1) P(x = k) = C(n, k) pk (1 – p) 1 – k

where p is the chance or probability of success in a single trial, and C(n, k)


are the binomial coefficients. P(x = k) stands for the probability of getting k
successes.
Example:1: The ADC company manufactures bulbs. It wants to regulate the
quality of bulbs produced. To keep a control on the quality, it requires
knowing the chances of having 20 defective bulbs in a pack of 100 bulbs.
So, let us help them by determining this probability.

Solution: One way of doing this is as follows: take this pack and test each
one of the 100 bulbs! If we come across exactly 20 defective bulbs, then the
probability is 1; otherwise, it is zero. Of course, this is a straightforward
method.

But if the company wants to know this probability for 1000 packs of
100 bulbs each, then this method would be cumbersome and problematic.

So, we need methods to deal with such situations. Let us look at this
situation as a situation with two outcomes – either a bulb is good or
defective (note that it cannot be both or neither!). All the above given 5
conditions for a Binomial distribution are satisfied. (Verify this). So, we are
in a binomial situation. Thus, we can compute the probability, if we can find
n, k and p. We have here n = 100 and k = 20. But what is p? It is not given.
So, assume for the moment that p = 0.31. Then the required probability is
P(x = 20), whose value is given by

100 !
(0.31) 20 (0.69) 80 .
20 !80!

This is a complicated expression and difficult to determine by computation.


Fortunately, tables are available. Form the table, we get

P(x = 20) = 0.0046.

5
Note that, this probability is same for every pack of 100 bulbs of this
company. This also means that the chance of finding exactly 20 defective
bulbs in every pack of 100 bulbs is only 0.0046, which is very small. So, the
ADC company can be happy that the quality control is up to their
expectations.

But, at the same time, the chances of finding 33 defective bulbs in


every pack of 100 bulbs turns out to be .0771, which is much more than that
for 20 defective bulbs ! This is no contradiction, but is a direct consequence
of our assumptions about binomial distributions.
-------------------------------------------------------------------------------------------
A DIGRESSION: (How to determine the value of the probability of success
in a given situation)

Now, going back to the above example, we see that we assumed a


value for p, that is, p = 0.31. Usually, in the text book problems, this value is
given or at least can be calculated easily from the given data. But, in real –
life situations, nobody gives you this value; or to put in another way, it is not
known. Then, how to determine it?

One method is the following: this is generally done through scanning


the past records of the company, or, by taking samples, as explained below:
Suppose that our company keeps records meticulously. This means that,
over a period, the company has been noting down the daily production of
bulbs and the number of defectives, say, per 100 bulbs. Then, we can take a
period of time (preferably 1 year) which is a recent one and find out the total
number of defective bulbs (say d) and the total number of bulbs produced
d
(say, N) for this period. Then p can be taken to be the ratio .
N
Of course, this estimate requires a lot of caution on the part of the
investigator – the period chosen should be sufficiently long, the data should
be realistic and so on; but this can be done, providing the past records are
accurate and available. Otherwise, one can take a sample of M bulbs
produced on a day and count the number d of defective ones. Then, the ratio
d / M can be taken as the value of p. This is one way of determining p.

6
Another method is to simply assume a value for p (note that the value
assumed should be between 0 and 1) which would be “realistic”. For
example, when USA sent a man into the space for the first time, the
probability that the rocket would function normally or the probability that he
would survive in the unknown conditions of the space or the probability that
he would safe land, were not known as there was no past records available
(naturally). Also you cannot send 100 people into the space to collect the
required data. Similarly, what will happen after a nuclear holocaust is
anybody’s guess. So, we cannot know the probability of survival of a human
being or a nation or a species or oxygen in the atmosphere. And it will be the
height of stupidity to conduct experiments regarding this. Under such
circumstances, either we assume a value for the probability which is
“realistic’ or we admit that probability theory is NOT applicable to such
situations.

It might appear to you that these are very extreme cases, which are
rare. If that is the case, consider the following examples from real life:
(i) A production company is thinking of launching a new product. In this
case, the probability that this new product will out beat its rivals or the
probability that it will garner 40% of the market, cannot be determined (that
is, not known a priori);
(ii) The government of a country is thinking of launching a new program for
agriculture. The probability of its success cannot be determined (that is, not
known a priori);
(iii) An educational institution is considering the introduction of a new
method for teaching. The probability of its success cannot be determined
(that is, not known a priori);
(iv) You are preparing to appear in an examination. The probability of your
success cannot be determined (that is, not known a priori). Even your past
records regarding the other examinations you have taken, are of no use here.

All these and many other examples are encountered quite often in real
life. So, even in such familiar situations, determining the probability is either
difficult or might not be possible.

This aspect should be kept firmly in the mind, when dealing with
applications of probability to real life situations.
End of Digression
--------------------------------------------------------------------------------------------

7
Example:2: In a certain experiment, 6 rabbits are given a drug. It is known
that one – fifth of all rabbits which are giver the drug develop certain
symptoms. Let us determine the chances that 4 out of these 6 rabbits develop
the symptoms.

Solution: This is a binomial situation with outcomes


Success = developing symptoms
Failure = not developing any symptoms.

It is given that one – fifth of all rabbits which are given the drug
develop symptoms. Thus
1
p = = 0 .2
5

Also, n = 6 and k = 4. Thus the required probability is

C(6, 4) (0.2)4 (0.8)2 .

From the tables, we get this value as 0.015. This means that, if we
repeat the above experiments, we will be able to observe 4 rabbits
developing the symptoms on 1.5% of times.

Example: 3: Go back to the above example 2. Let us now ask: what are the
chances that at most 2 rabbits develop symptoms.

Solution: Now, it may happen that (i) none of the rabbits develops
symptoms, that is, k = 0; or (ii) one of the rabbits develops symptoms, that
is, k = 1; or (iii) two of the rabbits develops symptoms, that is, k = 2. This is
the meaning of saying that “at most 2 rabbits develop symptoms”.

From the tables, for n = 6, k = 0, and p = 0.2;


we have p0 = P(x = 0) = 0 .2621;
For n = 6, k = 1, p = 0.2, p1 = P(x = 1) = 0 .3932;
For n = 6, k = 2, p = 0.2, p2 = P(x = 2) = 0.2458.

Thus the required probability is = p0+ p1 +p2

= 0.9111.

8
In other words, the chances are very high (more than 91%) that at most 2
rabbits develop symptoms.

2.3.1.1 The Mean and the Variance of a Binomial Distribution

Just as we have seen, for every data set we can compute the mean and
the variance, in a similar manner we can compute the mean and the variance
for every distribution. Note that a binomial distribution involves two
parameters n and p.

The mean of such distribution is denoted by µ and is given by the


formula
µ = n p.

Similarly, the variance is given by

σ2 = n p (1 – p)

Example:4: Go back to our first example. We had n = 100 and p = 0.31.


This gives us
µ = 31

σ2 = 21.39

EXERCISES

Note: Keep your Text Book by your side (or any book will do)
Caution: In certain books, tables are given only for cumulative
distributions. Use it with caution, as we are dealing with the probability
directly.

1. An urn (or a box) contains 4 red and 6 blue balls. A ball in drawn at
random, its color noted and replaced before next drawing. Drawing a
red ball is considered as success. What are the chances of getting a red
ball in n number of drawings?

9
2. Two teams play a five game series. The chances of the home team
winning a particular game is 0.55. What are the chances that the home
team wins at least 3 games?
3. Of every 1000 parts produced by a machine, 10 are defective, on the
average. What are the chances that some, but not all, of a sample of
three of these parts turn out to be defective.
4. A shopkeeper has been getting over Rs. 200 a day, on the average, for
eight days out of every ten days over the past several months. What
are his chances of getting the same turn-over at least five out of the
next six days?

2.3.2: The Poisson distribution

The Poisson distribution is another example of discrete type


distributions. The Poisson distribution plays a very important rote in its own
right as an appropriate probability - model for a large number of random
phenomena. The Poisson model is often used for random variables
distributed over time or space. Study the following statements carefully;
each one describes a real life situation.

1. The number of automobile deaths per month in a large city;


2. The number of telephone calls a person receives per day;
3. The number of defectives in an article produced by a manufacturing
company in a day;
4. The pulse rate of a critically ill patient admitted in a hospital;
5. The number of words a typist can type in 15 minutes;
6. The number of bacteria in a given culture;
7. The number of red blood cells in a specimen of blood;
8. The number of typographical errors per page of a book or a magazine;
9. The number of acres of land (in a locality) suitable for irrigation;
10.The length of defective road per 10 miles of transportable roads.

Note that, the examples (1) to (5) involve time period and examples
(5) to (8) involve space (length or area or volume).

Each of the above situations has the following characteristics in


common:

10
1. Events which occur in one time interval do not depend on the
happening or non - happening of those occurring in any other non
over-lapping time interval.
2. The probability that an event occurs is proportional to the length of
the time or space units.
3. The probability that two or more events occur in a very small time or
space unit is supposed to be small enough that it can be neglected.

You must be wondering what these observations and these factors


have got to do with Poisson distributions. First of all, these are the
characteristic features of any situation following a Poisson model.
Secondly, by using these factors, it is possible to derive the Poisson model.

The Poisson model or distribution gives the probability that exactly k


successes occur in a given time (or space) interval. This probability is
denoted by P(x = k) and is given by

e −m m k
(2) P (x = k) =
k!

where e is a real number whose approximate value is 2.71828182; m is the


expected number of occurrences in the given time interval and k = 0,1,2, ---.

Some times, it is obvious that the independence condition (i.e.


assumption (1) above) is not satisfied. For example, we might be tempted to
use the Poisson distribution to compute the probability distribution of the
number of insects found in a hill of corn. A little reflection reveals that, in
this case, events are not independent, since the present number of insects is
dependent on the previous population. That is, more the number of insects,
at any instant, more are produced. In spite of this, the Poisson model, some
times, gives fairly accurate probabilities, even though all the assumptions are
not satisfied.

2.3.2.1: The Mean and Variance of Poisson distribution

The mean (µ) and the variance (σ2) can be easily derived. They are
given by
Mean of the Poisson distribution = µ = m
Variance of the Poisson distribution = σ2 = m.

11
Note : (1) The sum of all probabilities is 1. That is, if P(x = k) is as in
equation (1) then Σ P(x = k) = 1.
(2) The probability of having n or less number of occurrences (or
successes) is given by P(x ≤ n) = Σ P(x = k) where the sum is taken from k
= 0 to k = n.
(3) The probability of having n or more number of occurrences is
given by P(x ≥ n) = 1 - Σ P(x = k) where the sum is taken from k = 0 to k =
n – 1.
Example: 5: If a person receives 5 calls on the average during a day, what is
the probability that he will receive fewer than 5 calls tomorrow?
Solution: According to the previous discussion, experience has shown that
Poisson probability model is appropriate for this situation. The average value
m = 5 is given. Thus we need to compute P(x ≤ 4). This is given by
4
P ( x ≤ 4 ) = ∑ P (x = k) = 0.44049
k =0

(The value can be found from the table in any statistics book);

The probability of receiving exactly five calls is

P ( x = 5) = p (5) = 0.17547

Example: 6: A secretary claims that she averages one error per page. A
sample page is selected at random from some of her work, and five errors
are found. What is the probability of her making five or more errors on a
page if her claim is correct?
Solution: Assuming that the Poisson process is appropriate, we take m = 1
per page. The required probability is given by P(x ≥ 5). Thus we have
P(x ≥ 5) = 1 – P(x ≤ 4) = 1 – 0.9963 = 0.0037.
In general, such problems do not end with the computation of the
required probability. It is also required to interpret the value according to the
given context. This is done as follows: Note that, the value of the probability
is very small, indicating that the secretary is an exceptional one (that is, she
makes very few errors!). But, if she is not very experienced person in this
area, then, in view of the small value of the probability, we may conclude
one of the following:

12
1. The Poisson model is correct and a near miracle has occurred;
2. The model is correct but the wrong average value m has been claimed;
3. The model is incorrect.
Probably (2) is more plausible, in this case.

EXERCISES

(1) A city has on the average, five traffic deaths per month. What is the
probability that this average is exceeded in any given month?
(2) A taxicab company has, on the average, 10 flat tyres per week. During
the past week they had 20. Assuming the Poisson model is
appropriate, what is the probability of having 20 or more flats daring a
week? Would you suspect foul play?

2.4 Continuous Distributions

We already know through section 7.2, that distributions can be of two


types and up till now, we were learning about discrete distributions. In this
Section, we will learn about continuous distributions, namely Normal and
Exponential distributions.

2.4.1 Normal Distributions

One of the most important and useful sets of continuous


distributions in statistics is the set which comes under the name of ‘Normal’
distributions. The graphs of typical normal distributions look like the one
given below. Refer Text Book..

The curve is determined entirely by the mean and standard


deviation. As a result, the graphs of normal distributions with the same
mean, but different standard deviation differ only in the amount of
dispersions.

Normal distributions with the same standard deviations, but


different means, look identical in shape and differ only in their placement on
X – axis.

13
Theoretically, although it may not be apparent from figures, the
curve never touches the X – axis. However, it approaches it so closely that
for practical purposes the area lying farther than ± 3σ from the mean µ can
be ignored without any loss.

One of the special features of normal distributions is that the


total area under the curve is one. A normal distribution with µ = 0, σ = 1 is
called the standard normal distribution.

A general Normal distribution with mean µ and standard deviation σ is


given by the formula
(3) P(x) = K Exp (- (x - µ)2 / 2 σ 2)
Where K is a constant and Exp ( ) is the exponential function. The standard
normal distribution is given by
(4) P(z) = K1 Exp ( - z2 / 2)
where K1 is a (different) constant.

Now, any value of x can be located in its relationship to µ in the form


of the number of standard deviations distant from the mean. For example, if
13.5
µ = 32, σ = 9 a score of 45. 5 is exactly 13.5 units above the mean or ,
9
i.e 1.5 standard deviation units above the mean. A score of 18.5 is 13.5 units
below the mean, or 1.5 standard deviation units below the mean. This can be
symbolized as 1.5 for the score 45.5 and as - 1.5 for the score 18.5. These
scores are called standard scores and referred to as z – value or z scores.

Note that the values of x of any normal distribution can be related to


standard scores by the following formula
x−µ
(5) z=
σ

Example:7: Determine standard scores for x = 18-3, 27-9, 43-4, 39.3 in the
normal distribution for which µ = 30.1, and σ = 2.4.
Solution :
18.3 − 30.1
For x =18.3, z = = − 4.92
2 .4
27.9 − 30.1
if x = 27.9, z = = − 0.92
2 .4
34.4 − 30.1
if x = 34.4 z = =1.79
2 .4

14
39.3 − 30.1
if x = 39.3, z= = 3.83
2 .4

Example:8: Determine the values of x in the distribution of example 1 for


which the standard scores are z = - 3.07, - 1.04, 0.73, and 2.44.
Solution: This is the reverse of Example 1. Here standard scores are given
and we want to determine the actual values of x. We use the formula

(6) x = µ + z σ.

For z = - 3.07, x = 30.1 + 3.07 × 2.4 = 37.668


For z = 2.44, x = 30.1 – 2.44 × 2.4 = 24.344

Similarly, compute the other two values.

Example:9: In a normal distribution, a value of 42.1 is 1.3 standard


deviations above the mean with value 31.7. What is the standard deviation
of the distribution?
Solution: In this case, we know x, z, and m, but not σ. Since
x−µ x −µ
z= , σ=
σ z
Since x = 42 .1, z = 1 .3 and µ = 31.7
42.1 − 31.7
σ= = 8 .0
1 .3

Example:10: A normal distribution has a standard deviation of 1.7 if a value


of 11.3 lies 2.1 standard deviations below the mean. Determine the mean of
the distribution
Solution: Do it yourself. (Ans: 14.9)

The fact that any normal – distribution can be related to the standard
normal distribution is of central importance. Because of this, the standard
normal distribution can be studied in detail and the results transferred to any
normal distribution. Table of cumulative values of the standard normal
distribution is usually given at the end of every statistics book.
Now in this table, the entries on the left and top correspond to
the values of z and the decimal value, usually given to two decimal places.

15
The integer part and the first decimal value are given in the column at
the left, and the second decimal value in the top row. The entries in the body
of the table are the areas under the normal curve, between the mean (0) and
the given value z, correct to four decimal places. (Caution: In some books
the area is given from - ∞ to the given value of z).

For instance, if z = 1.62, to find the corresponding area, look down the
left column to find 1.6 and look along the top row to find 0.02. Then, the
entry which is in both the row of 1.6 and the column of 0.2 is 0.4474. This
means that the area under the standard normal curve between 0 and 1.62 is
0.4474. This gives the value of the probability P( 0 ≤ z ≤ 1.62).

Several other things follow immediately from this observation. Since


the normal curve is symmetric about the Y - axis, each side from the mean
contains exactly half of the area of the curve. Thus the area under the curve
to the right of 1.62 is 0.5000 – 0.4474 or 0.0526. This is the value of the
probability P(1.62 ≤ z).

In addition, since the curve is symmetric, the area between – z and 0 is


equal to the area between 0 and z. Therefore, in this case, the area between
–1.62 and 0 is also 0.4474 and the area to the left of – 1.62 is again 0.0526.

You should see the graphs of these from your Text Book or from
some good book.

Finally, the area between –1.62 and 1.62 is twice the area between 0
and 1.62 or 0.8948. This is interpreted as: the probability that the variable of
the standard normal distribution has a value between – 1.62 and 1.62 is equal
to 0.8948. This is illustrated by the following

Example:11: Find the area under the standard normal curve between 0 and –
z if z = 0.07, 0.83, 1.70, 2.56, – 0.24, - 1.12 , - 3.01
Solution:

16
z Area between 0 and z
0.07 0.0279
0.83 0.2967
1.70 0.4554
2.56 0.4948
-0.24 0.0948
-1.12 0.3686
-3.01 0.4984

Points to Remember:
1. The area under the curve is always positive.
2. The entries on the edge of the table (left and top) represent standard
deviations distance from the mean (standard scores).
3. The entries in the body of the table represent areas under the standard
normal curve between the mean and the given standard score (z –
value).

Example: 12: Find the standard scores for which the area under the standard
normal curve between it and the mean is 0.2019, 0.4908, 0.3621
Solution: From the table we can obtain the values of z

Area z
0.2019 0.53
0.4908 2.36
0.3621 1.09

We notice that for 0.4908, no entry in the table is precisely 0.4908.


The two entries close to it are 0.4906 and 0.4909. Since 0.4908 is closer to
0.4909 than 0.4906 we use the value for 0.4909.

Example:13: Find the area under the normal curve between z = -1.34 and z =
0.57, between z = 0.59 and z = 1.27
Solution: For z = -1.34 and z = 0.57, the values of the areas from the table
are respectively 0.4099 and 0.2157. Since these are on the opposite sides of
the mean, they should be added together. Thus the area between z = - 1.34
and z = 0.57 under the normal curve is 0.6256. For z = 0.59 and z = 1.27 the
corresponding areas are 0.2224 and 0.3980. Since they are on the same side

17
of mean, their difference is the desired area. Thus the area under the normal
curve between z = 0.59 and z = 1.27 is 0.1765.

Example:14: For a normal distribution with mean 38.7 and standard


deviation 10.2, estimate the probability that a value will fall between 29.6
and 44.8.
Solution: Each score should be converted to a standard score in the first
29.6 − 38.7
place. For the first value z= = − .89 and for the second
10.2
44.8 − 38.7
value z = = 0.60 . The corresponding areas under the standard Normal
10.2
curve are 0.3133 and 0.2257. Since the z – scores are on opposite sides of
the mean the areas most be added. Adding we obtain 0.5390. This is the
personality that a value will fall between 29.6 and 44.8. We express this
probability by P(29.6 ≤ x ≤ 44.8).

Example;15: A normal distribution has a mean of 13.3 with a standard


deviation of 21. Determine a number such that 80% of all the scores fall
within that number of the mean.
Solution: If 80% of all scores fall within the desired number of the mean
then 40% fall between the mean and that number (why?)
Let n be the desired number. This means 40% of the scores fall between the
mean, 13.3 and x. The z score corresponding to 0.4000 is 1.28. Thus, we
have
x − 133
1.28 = which gives x = 160
21
Thus 40% of the scores fall between 133 and 160, so that n = 27, and 80% of
all scores fall within 27 units from the mean on both sides; that is 80% of
all scores fall between 106 and 160.

EXERCISES

1. Estimate standard scores for the following values of x in a normal


distribution with mean 284.7 and standard deviation 14.6.

(a) X = 261.4 (d) X = 280.4


(b) X = 303.4 (e) X = 293.9
(c) X = 259.3 (f) X = 321.2

18
2. Find the values of x, for the following standard scores, in a normal
distribution with mean, m = 10.4 and variance, σ = 11.8

(a) z = 1.64 (d) z = 0.50


(b) z = 2.07 (e) z = - 0.13
(c) z = - 2.06 (f) z = 1.14

3. Find the area under the standard normal curve between

(a) z= 0 and z = 2.18


(b) z = - 1.04 and z = 1.54
(c) z = 1.56 and z = 2.93
(d) z = - 0.49 and z = - 0.12

4. Find the area under the standard normal curve


(a) To the right of z = 1.43
(b) To the left of z = -1.03
(c) To the right of z = -0.77
(d) To the left of z = 2.01

5. Find the area under the standard normal curve between z and – z if

(a) z=1
(b) z = 1.96
(c) z = 1.28

6. Find the value of z for which 0.1230 of the area under the standard
normal curve lies to the right of z.
7. The mean of a normal distribution is 100. If the probability that the
variable assumes a value greater than 121.0 is .1446, what is the
standard deviation of the distribution?
8. A normal distribution has a standard deviation of 134. The probability
that the variable takes a value less than 1072 is 0.7734. What is the
mean of the distribution?

19
2.4.2: Applications of the Normal Distributions

Normal distributions have a wide range of applications. Many


sets of data have distributions which are approximately normal. This is
evident from the following set of examples.

Example:16: Among workers in a certain industrial plant, the mean age is 45


years with a standard deviation of 4 years. One worker is stopped at random
and asked to fill out a questionnaire. What is the probability that he is
between 48 and 50 years of age.
Solution: If z1 denotes the standard score for 48 and z2 the standard score for
50, we have
48 − 45 50 − 45
z1 = = 0.75 z2 = =1.25
4 4

The areas under the normal curve associated with z1 and z2 are 0.2734 and
0.3944. The area between 48 and 50, then is 0.3944 – 0.2734 = 0.1210.
Thus P(48 ≤ x ≤ 50) = 0.1210, where x is the age of the worker.

Example:17: A machine in a factory is used to produce light bulbs. The


bulbs are examined in lots of 1000. On the average, a lot will have 10
defective bulbs. The distribution of the defective bulbs is approximately
normal with a standard deviation of 3.14. (i) What is the probability that a
certain lot will have at least 3 but not more than 6 defective bulbs? (ii) What
is the probability that it will have more than 15 defective bulbs?
Solution: Since the number of bulbs are discrete and measured in integers, a
continuity correction must be applied. (i) Since the interval representing 3 is
2.5 to 3.5 and the interval representing 6 is 5.5 to 6.5 the interval
representing at least 3, but not more than 6 is 2.5 to 6.5, since both 3 and 6
are included. Now, if z1 and z2 represent the standard scores for 2.5 and 6.5,
respectively, we have
2.5 − 10 6.5 − 10
z1 = = − 2.39 z 2 = = − 1.11
3.14 3.14
The corresponding areas are 0.4916 and 0.3665. So, the area between 2.5
and 6.5 then is 0.4916 – 0.3665 = 0.1251. Thus P (2.5 < x < 6.5) = 0.1251.
In terms of the original discrete data, P(1 ≤ x ≤ 3) = 0.1251.
(ii) To determine P(x > 15), note that 15 is represented by the interval 14.5
to 15.5. To be greater than 15 means, after application of the continuity
correction, greater than 15.5, since 15 is not included.

20
15.5 −10
Here z = =1.75 and the associated area is 0.4599. So, P( x > 15 ) =
3.14
0.0401 (why?).

Example:18: In a certain high rent district, the monthly rental for apartments
is approximately normally distributed with a mean of Rs 384.22 and a
standard deviation of 126.40. Above what value is the highest 30 percent of
the monthly rentals in this district?
Solution: Although the variable is discrete, its values are not given in
integers; so we do not apply the continuity correction. According to the table
20% of the values are between the mean and z, for z = 0.52. At this point, 30
percent of the values are above it.
x − 384.22
Thus, we have 0.52 = or x = 449.95.
126.40
Thus, about 30 percent of the rentals are above Rs. 499.95. A slightly more
accurate figure could be obtained with more detailed working.

EXERCISES

1. Weights of male students in a large university are approximately


normally distributed. Estimate the mean and standard deviation of the
distribution if 6.68% of the students weigh less than 125 pounds and
15.87% weigh more than 170 pounds.
2. The efficiency rating of certain machines is calculated everyday over
the year. Machine A’s ratings have mean of 0.873 with standard
deviation of 0.38, and machine B’s ratings have a mean of 0.846 with
a standard deviation 0.038 on a certain day. What is probability that
machine A will have a rating less than the mean for machine B? What
is the probability that machine B will have a rating greater than the
mean for machine A?
3. To test whether a process is in control, a reading is taken on its
working. If the process is in control, the daily readings have a mean of
832.4 with a standard deviation of 10.2. What is the probability of
getting a reading above 860.0 when the process is in control?
4. A survey organization regularly sends out questionnaires. The number
of replies on a mailing of 1,000 is approximately normally distributed
with a mean of 785 and a standard deviation of 41. If a mailing of
1000 questionnaires is made, what is the probability of receiving more
than 850 replies?

21
2.4.3: Normal Approximations

Normal distribution is one of the most important and useful of


probability distributions. The fact that many distributions approximate to
the normal distribution as the data increases is of prime importance. For
example, let us take the binomial distribution. As the number of trials
increases, it becomes more and more cumbersome to determine the
probability. For example 30 or more heads in 50 tosses of a coin would
require 21 separate calculations. Fortunately, as the number of trials
increases, and if the probability of a success is about 0.50, the normal
distribution is satisfactory as a continuous approximation to the binomial
distribution
.
For a very large number of trials, the probability of success can differ
substantially from 0.50. The accuracy of the experiment depends, of
course, on the number of trials and the actual probability involved. As a
rule of thumb, the approximation should not be used unless both n p and
n(1 – p) are greater than 0.5, where n is number of trails and p is the
probability of success.

The mean and standard deviation of binomial distribution are given by


n p and np (1− p ) . Let us consider an example to illustrate this point.

Example:19: What is the probability of obtaining 30 or more heads in 50


tosses of a coin.
Solution: This is a binomial experiment with n = 50 and p = 0.5. Its
distribution can be approximated by a normal distribution with m = 50 x
0.5 = 25 and σ = 50 * 0.5* 0.5 = 3.54. Now, 30 is represented on the
continuous distribution by the interval 29.5 to 30.5; since 30 is included,
we must determine the probability that greater than 29.5. The appropriate
standard score is
29.5 25
z= = 1.27
3.54
This corresponds to an area of 0.3980 between 29.5 and the mean. Since
we are interested in the probability of obtaining 30 or more, this
probability is 0.5000 – 0.3980 = 0.1020
Thus, P(x ≥ 30) = 0.1020.

22
2.5: Exponential Distributions

Exponential distributions play an important rote in describing a large


class of phenomena (which includes, the life length of a certain device,
life length of a certain species etc.).

A continuous type situation is said to follow an exponential


distribution with parameter m if the probability of the life – length (of a
device) is less than or equal to x time units, is given by, 1 − e − mx for all x
≥ 0. This value 1 − e − m x is denoted by F (x), which gives the probability
of the life – length of a device which is less than or equal to x time units.
Mathematically

−m x
(7) F ( x) =1 − e for x ≥ 0

Probability of the life – length greater than x time units is given by


1 – f (x) or 1 − 1 − e − mx  = e − mx from (7)
 
Refer your Textbook for the graph of an exponential distribution. The
number m is called the parameter of exponential distribution.

Note: 1. An exponential distribution is mainly used to describe the life


length of a certain device. This means that time units are used in an
exponential distribution. Some time units may be in terms of seconds,
minutes, hours, days, months, or years etc.
2. For different values of m we get different types of exponential
distributions. (can you visualize the graphs)

2.5.1: The Mean and Variance of an Exponential Distribution

The mean µ which is also known as the expected value is given


by 1/m (where m is the parameter) and the variance σ 2 of the
exponential distribution is given by 1/m2.

Note: Memory - less Property


Now, we come to an important aspect of exponential distributions. It
has the property of having “NO MEMORY”. This means that “suppose

23
an event A has not occurred during the first N repetitions. Then the
probability that it will not occur during the next M repetitions, is the
same as the probability of that it will not occur during the first M
repetitions”. In other words, the information of no successes is forgotten
so far as subsequent developments are concerned.

Example:20: Let the life – length of the fuses produced by a


manufacturing company be assumed to follow an exponential
distribution. There are two processes by which the fuses may be
manufactured. Process I yields an expected life – length of 100 hours,
while Process II yields an expected life – length of 150 hours. Suppose
Process II is twice as costly (per fuse) as Process I. Let the cost of the
fuse be Rs 5/ for Process I. Assume further, that if a fuse lasts less than
200 hours, a loss of Rs. 100/- is incurred by the manufacturer. Which
Process should be used?
Solution: Let us compute the expected cost for each process.
For Process I, the expected life – length µ = 100 hours. We know
µ = 1/m. Hence, m = 1 / 100 hours.
Cost per fuse = 5 if m > 200
= 5 + 100 if m≤ 200

Therefore,
Expected cost for Process I = (5) P(m>200) + (105) P(m ≤ 200)
= (5) Exp(- 2) + (105) [1 - Exp(- 2)]
= 91.466
By a similar computations, we get for the process II m = 1 / 150
Cost per fuse = 10 if m > 200
= 110 if m ≤ 200

Expected cost for Process II = (10) Exp( - 4/3) + (110) [1 - Exp ( - 4/3)]
= 83.64
The expected cost of Process I is, though, slightly more than that for the
Process II, still we prefer Process I as the cost per fuse for Process II is
double that for Process I. Hence we prefer, Process I.

Example:21: Suppose a variable x has the exponential distribution with


parameter 10. Compute the probability that it exceeds its mean.

24
Solution: The required probability is P(x > 1 / m). We have
P(x > 1 / m) = P(x > 1 / 10) = Exp (- 10 / 10) = Exp ( - 1) . Find this value
using your calculator.

EXERCISES

1. In example 18, what should be the cost of the fuse manufactured by


Process I, in order that the expected costs for both the Processes be equal.
2. In Example 18, what should be the cost of the fuse manufactured by
Process I, in order that the expected cost of Process I is smaller than that
for the Process II?

APPENDIX

In this appendix we are going to briefly recall about normal


distributions, standard normal variate, and 1σ, 2σ and 3σ levels.
1. Normal Distributions
Consider the equation
1  1 ( x − µ )2 
(1) y= exp  − 2


σ 2π  2 σ 
Here µ and σ are called parameters. (µ can take all real values but σ is
allowed to take only positive values). The above equation describes a curve
as shown in your Text Book. This curve is symmetric about x = µ . It can be
shown that the total area under this curve is 1, whatever be the values of µ
and σ. Thus, the curve (1) gives a probability distribution called the normal
distribution.

It can also be proved that mean = µ and variance = σ2 . Thus, the two
parameters µ and σ represent the mean and standard deviation of this
distribution.

25
2. Standard Normal Variate

The variable
x−m
(2) z=
σ
is called the standard normal variate (SNV) and has the distribution given
by
1  − z2 
(3) y= exp   .
2π  2 

This distribution is called the standard normal distribution (SND). It is


advantageous to work with this distribution because,
(i) it is unique; that is, there is only one such distribution; and
(ii) it has no parameters.

Note that the mean and standard deviation of this distribution are 0 and 1
respectively. A Diagram of this is shown below;

26
Note the following
1. When z lies between – 1.96 and 1.96, the corresponding area is .95.
We say that the probability of z lying between – 1.96 and 1.96 is 0.95
or p (−1.96 < z <1.96)= 0.95

2. Similarly, we have p (−1.64 < z <1.64)= 0.90 .

3. Similarly, we have p (− 2.56 < z < 2.56)= 0.99

3. Sigma Levels

1. Consider P ( - 1 ≤ z ≤ 1) = 0.683. Note that this implies that


P( µ - σ ≤ x ≤ µ + σ) = 0.683, and for any given µ and σ , this
interval ( µ − σ , µ + σ ) is known as 1σ - level or 1σ – limit.
2.. Similarly, note that, p (− 2 ≤ z ≤ + 2) = 0.954 ; this gives the 2σ – level or
2σ – limit.
3. Finally, note that, p (− 3 ≤ z ≤ + 3)= 0.997 ; this gives the 3σ – level or 3σ –
limit.

27

You might also like