Introduction to Probability and Probability Distribution
Some Basic Concepts:
A random experiment is a process leading to at least two possible outcomes with
uncertainty as to which will occur.
The possible outcomes of a random experiment are called the basic outcomes and
the set of all basic outcomes is called the sample space. In other words, a sample
space of an experiment is a set or collection of all possible outcomes of the
experiment and is usually denoted by the symbol ‘S’.
Example: A die is rolled. The basic outcomes are the numbers 1, 2, 3, 4, 5, 6. Thus
the sample space S=[1, 2, 3, 4, 5, 6]
An event is a set of basic outcomes from the sample space, and it is said to occur if
the random experiment gives rise to one of its constituent basic outcomes.
Example: If a die is rolled, an event that might be of interest is whether the
resulting number is even, a result that will occur if one the basic outcomes 2, 4 or 6
arises.
Let A and B be two events in the sample space S. Their intersection, denoted by A
B, is the set of all basic outcomes in S that belong to both A and B.
If the events A and B have no common basic outcomes, their intersection A B is
said to be the empty set, and they are called mutually exclusive.
That is, if the events A and B cannot both happen simultaneously, the events are
said to be mutually exclusive or disjoint. Thus the events A and B are mutually
exclusive if and only if A B=.
Let A and B be two events in the sample space S. Their union, denoted by A B,
is the set of all basic outcomes in S that belong to at least one of the two events A
and B.
Let A be an event in the sample space S. The set of basic outcomes of a random
experiment belonging to S but not to A is called complement of A and is denoted
by A or Ac.
Example: A die is rolled. Let A be the event ‘Number resulting is even’ and B be
the event ‘Number resulting is at least 4’. Then
A = [2, 4, 6] and B = [4, 5, 6]
1
Ac = [1, 3, 5] and Bc = [1, 2, 3]
A B = [4, 6] and A B = [2, 4, 5, 6]
Probability is the chance or likelihood of occurrence of any event.
Classical approach:
If an experiment has a total of n(S) possible outcomes, all of which are mutually
exclusive and equally likely, such that n(A) of the outcomes are favorable to an
event A, then the probability of the event A is defined as
n( A)
P( A)
n( S )
If an event can occur in N mutually exclusive and equally likely ways, and if m of
these possesses a trait, E, the probability of the occurrence of E is equal to m/N.
Example: A bag contains 4 white balls and 6 red balls. A ball is drawn at random
from the bag. What is the probability that it would be red? White?
A possible sample space for this experiment is S= {w, r}, where w stands for the
white ball and r for the red ball. Let A be the event that the ball is red and B be the
event that the balls is white. Here n(S) =10, n(A) =6 and n(B) = 4. Therefore,
6 4
P ( A) 0.6 and P( B ) 0.4
10 10
Example: A card is drawn at drawn from an ordinary pack of 52 playing cards.
Find the probability that the card is a seven.
Here S = {pack of 52 cards}, n(S)= 52. If A denotes the event “the card is a
4 1
seven”, then n(A) = 4 and hence P ( A)
52 13
Example: A die is rolled once. Find the probability that
(i) An even number occurs
(ii) A number greater than 4.
Since A = {2, 4, 6} and B= {5, 6}, we have
n( A) 3 1 n( B ) 2 1
(i) P ( A) =0.5 and (ii) P ( B ) =0.33
n( S ) 6 2 n( S ) 6 3
2
Elementary properties of probability:
1. If A is any event in the sample space S, then 0 P(A) 1
2. P(S)=1
3. For two events A and B: are two mutually exclusive events, then
(i) P ( A B ) P ( A) P ( B ). A and B are mutually exclusive
(ii) P ( A B ) P ( A) P ( B ) P ( A B ) . A and B are disjoint sets
Independence of events:
If A and B represent two events and if the occurrence of A does not affect the
occurrence of B, then A and B are said to be independent. In other words, two
events A and B are said to be independent if and only if
P ( A B ) P ( A) P ( B ).
Example: Suppose two ideal coins are tossed. A sample space for this experiment
is: S ={HH, HT, TH, TT}
Let us define two events A and B as follows:
1
A = Head on the first coin ={HH, HT} P(A) =
2
1
B = Head on the second coin = {HH, TH} P(B) =
2
Then
1
A B = Both coins will turn up heads = {HH} P(A B) =
4
1 1 1
It follows that P ( A B ) P ( A) P ( B ).
4 2 2
This implies that the events A and B are independent i.e. the occurrence of head on
the first coin does not influence the occurrence of head on the second coin.
Addition rule:
A general rule for events A and B is :
P ( A B ) P ( A) P ( B ) P ( A B )
Conditional Probability:
With two events A and B, the conditional probability for A given B is
P( A B)
P( A | B) , P( B) 0
P( B)
and that for B given A is
3
P( A B)
P( B | A) , P( A) 0
P( A)
Thus for two dependent events A and B,
P( A B) P( A) P( B | A) P( B) P( A | B)
This rule is called multiplicative rule for probability.
Example: The probability that a married man watches a certain TV show is 0.4 and
that a married woman watches the show is 0.5. The probability that a man watches
the show, given that his wife does, is 0.7. Find
(a) The probability that a married couple watches the show,
(b) The probability that a wife watches the show given that her husband
does, and
(c) The probability that at least one person of a married couple will
watch the show.
Let us define two events H and W as follows:
H: Husband watches the show, and W: Wife watches the show
It is given that, P(H) =0.4 P(W) =0.5 and P(H|W) =0.7
(a) P(married couple watches the show) = P(W H) = P(W)P(H|W)
= 0.5 0.7 =0.35
(b) P(wife watches given her husband does) = P(W|H)
P (W H ) 0.35
0.875
P ( H ) = 0 .4
(c) P(at least one of them watches) = P (W H )
= P(W) + P(H) - P(W H)
= 0.4 + 0.5 - 0.35 =0.55
Assignment Problem: A certain statistician's breakfast consists of either Ruti or
Parata (but not both) to eat and one drink from a choice of fruit juice, tea or coffee.
If he has Ruti to eat, the probability that he chooses juice is 53 and the probability
he chooses tea is 103 . If he has Parata to eat, the probability he chooses coffee is 52
and the probability he chooses tea is 15 . Given that he has Ruti to eat with
probability 34 .
(a) find the probability that on any particular day he has
(i) fruit juice (ii) Parata and coffee
(b) Find his most popular breakfast combination.
4
Random variable:
A variable whose values are any definite numbers or quantities that arise as a result
of chance factors such that they can not exactly be predicted in advance, is called a
random variable.
A random variable is a variable that takes on numerical values determined by the
outcome of a random experiment.
Example: If two coins are tossed and let X be the random variable that the number
of head is observed. Here the value of the random variable -will be [0, 1, 2].
Discrete random variable:
A random variable defined over a discrete sample space (i.e. that may take only a
finite or countable number of different isolated values) is referred to as a discrete
random variable. For example, the number of telephone calls received in a
telephone booth during one day.
Continuous random variable:
A random variable defined over a continuous sample space (i.e. which may take
any value within a certain interval or collection of intervals), is referred to as a
continuous random variable. For example, time taken to serve a customer.
Probability distribution:
Any statement of a function associating each of a set of mutually exclusive and
exhaustive classes or class intervals with its probability is a probability distribution
A probability distribution will be either discrete or continuous according as the
random variable is discrete or continuous.
Discrete probability distribution: A discrete random variable assumes each of its
values or numbers with a certain probability. The values of the random variable
along with their probabilities are called discrete probability distribution.
Assignment problem: Let a coin is tossed three times. Show the sample space. If
X denote the number of head appears, find the distribution of X. Calculate the
mean & variance of the distribution.
Solution: The sample space will consist of eight possible outcomes. The outcomes
may be enumerated as follows: S = {HHH, HHT, HTT, HTH, THT, THH, TTH,
TTT}
If X denotes the number of heads obtained then X, by definition, is a discrete
random variable. The possible values x of the random variable X and their
associated probabilities can be presented in a tabular form as follows:
5
Values of X: x 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8
Mean & variance?
Continuous probability distribution: The probability distribution of a continuous
random variable can not be presented in a tabular form, because a continuous
variable can assume a innumerable or infinite set of values. More precisely, a
continuous variate can take any value in the given interval a X b. As a result, a
continuous random variable has a probability zero of assuming exactly any of its
values. This implies that
P(a X b) P( X a) P( A X B) P( X b)
P(a X B).
Binomial Distribution
When an experiment has two possible outcomes, success and failure and the
experiment is repeated n times independently and the probability p of success of
any given trial remains constant from trial to trail, the experiment is known as
binomial experiment.
Example: An insurance salesman contacts ten different families. The outcome
associated with visiting each family can be referred to as a success if the family
purchases an insurance policy and a failure, if not. If the probability of selling a
policy is assumed to be the same for each family, and the decision to purchase or
not a policy by one family is not influenced by the decision of any other family,
then we have a situation analogous to the binomial experiment.
If X is a random variable designating the number of successes in n Bernoulli trials
having the set of values {0 , 1,.....n} then X is called the binomial random variable.
The distribution associated with the random variable X defined above is said to
have binomial distribution with parameter n and p. The result can be restated as
follows:
If the random variables X1, X2, ......Xn form n Bernoulli trials with p as probability
of success in a single trial and if X= X1+ X2+ .....+Xn then X has a binomial
distribution with parameters n and p and its probability function is as follows:
nC x p x 1 p n x, x 0,1, .......... .n.
b( x, n, p )
0, otherwise
Obviously, X is discrete because the possible values it may assume are counts of
successes out of n trials, which are integers from 0 through n.
6
Properties of the binomial distribution
1. Mean:
The mean of a binomial random variable X, designated or E(X), is the theoretical
expected number of successes in n trials. Symbolically,
n
1 E ( X ) x b( x; n, p)
x 0
By definition,
n
E ( X ) E X i E ( X 1 X 2 .......... . X n )
i 1
E ( X 1 ) E ( X 2 ) ......... E ( X n )
p p .......... .......... .. p np
n
2. Variance: Let X be a binomial variate. Then X X ,
i 1
i
where Xi is a Bernoulli random variable and Xis are all independent and each
having mean p and variance pq. Now the variance of X is
n
V ( X ) V X i V ( X 1 X 2 .......... .. X n ) V ( X 1 ) V ( X 2 ) .......... V ( X n )
i 1
= pq + pq + + pq
= npq
Thus V(X)=npq.
Example: The probability that a patient recovers from a delicate heart operation is
0.9. What is the probability that exactly five out of the next seven patients
undergoing this operation will survive?
Assuming that the operations are made independently and p=0.9 for each of the
seven patients. Then
b(5, 7, 0.9) = 7C5 (0.9)5 (1-0.9)7-5
7!
= (0.9)5 (0.1)2
5!(7 5)!
= 21 x 0.5905 x 0.01=0.1240
Assignment problem: A survey result shows that 30% of the total students of
Stamford university are smokers. Ten students are randomly selected. What is the
probability that
(a) Only two students are smokers
(b) Maximum three students are smokers.
7
Normal Distribution
Normal distribution occupies a central position in statistical inference. This is the
most important continuous probability distribution in the entire field of statistics.
Its graph, called the normal curve, is the bell-shaped curve that describes the
distribution of so many sets of data, which occur in nature, industry, and research.
It is said that a random variable X has a normal distribution with mean and
variance 2 ( < < and > 0) if X has a continuous distribution for which the
probability density function is
1 1 x 2
f ( x; , )
2
exp , x
2 2
where =3.1416 and e=2.71828 are two constants.
Figure: Normal probability curve
Properties of normal distribution:
We summarize below some of the important properties of normal distribution.
1. The normal probability curve is symmetrical about the ordinate at x=, where
is the mean of the distribution.
2. The mode, which is the point on the horizontal axis, where the curve is a
1
maximum, occurs at x= and is equal to
2
3. The total area under the curve and above the horizontal axis is equal to 1.
4. The curve has its point of inflection at x = .
5. The curve approaches the horizontal axis asymptotically as we proceed in either
direction away from the mean.
6. As a consequences of the above properties, the distribution has identical mean,
median and mode
7. The parameters and are respectively the mean and standard deviation of the
distribution.
8
8. All odd moments of the distribution about the mean vanish.
9. The values of 1 and 2 are 0 and 3 respectively.
10. The mean deviation of normal curve is approximately 4 .
5
Standard normal distribution
If a random variable X has a normal distribution with mean and variance 2, then
X
the variable Z , will be called a standard normal variate (or Z score) and its
distribution is referred to as the standard normal distribution with mean zero and
variance one, and having the following density function:
z2
1
f ( z; 0,1) e 2 , z <
2
The cumulative distribution function (cdf) of the standard normal variate Z is
usually denoted by the symbol (z ) .Thus
z z2
1
( z )
2 e
2 dz,
The value of (z ) given by the above definite integral is called the probability
integral or the error function.
Areas under the normal curve:
The curve of any continuous probability distribution is constructed so that the area
under the curve bounded by the two ordinates x=x1 and x=x2 equals the probability
that the random variable X assumes a value between x=x1 and x=x2. That is
x2 x2 1 x 2
1
f ( x; , e
2
P( x1 X x2 ) 2
) dx dx
x1 2 x1
Figure: P(x1 < X <x2) =Area of the shaded region
x1 x2
9