Introduction To Probability
Introduction To Probability
Table of Contents
Experiments.........................................................................................................................................4
Random Experiments...................................................................................................................5
Probability Models.............................................................................................................................6
Sample Spaces...................................................................................................................................11
Probabilities........................................................................................................................................14
Probability Axioms........................................................................................................................14
Joint Probabilities.........................................................................................................................15
Probability Assignment..............................................................................................................16
Subjective Approach...................................................................................................................18
Conditional Probability....................................................................................................................19
Bayes Formula...................................................................................................................................25
Independence..................................................................................................................................26
Probability and statistics is the study of situations and phenomena that produces
know for certain what the outcome is until we actually see the outcome.
The course will teach us that, for a problem description or report that can produce
A deterministic experiment is one which has only one possible outcome, i.e. the
A random experiment has more than one possible outcome. The number of
outcomes can be anything from 2 to infinity. The outcome of any specific instance of
the experiment is random, and we do not know which one it is until we actually
perform the experiment. A random experiment can be repeated any number of times
For our purposes, since the results are uncertain, we use the term ‘random
experiment’.
the physical process of tossing a coin is a procedure, since we cannot get the
The observation is the results we are looking for. For example, if we send a packet of
data from one computer to another, we would be observing the fate of the packet,
i.e. if it was sent successfully or if it was unsuccessful. In this case, sending the
packet is the procedure and whether or not the delivery was successful is the
observation.
Random Experiments
This course will deal mostly with random experiments, and it will do so in two parts:
probability and statistics. The first part deals with probability theory, or classical
probability, and the second part deals with statistical inference, which will lead up to
Random experiments can be of two types. The first type is one in which we know
everything about the experiment, its internal properties and the set of outcomes.
Here, the job of the probability theory is to mathematically define the uncertainty.
Once this is done, it is called a probability model. For example, if we are tossing a coin,
we need to know about several factors. We need to know whether or not the coin is
biased, whether external factors are affecting the toss, etc. Once all of this is known,
probability model. This model can help us find the likelihood of getting a certain
outcome and what the expected outcome of the experiment will be, which we call
the expectation.
The other type is where statistical inference comes in. We are given a random
experiment and a set of outcomes for the experiment as data, but we do not have
any information about the experiment. Statistical inference can help us figure out
the unknown details of the experiment. We can estimate details related to the
probability model of the experiment, we can identify the probability model without
knowing the experiment and we can forecast future data of the random experiment.
Probability Models
experiment). For example, if we are sending a packet of data from one computer to
another, the probability model will tell us that the chances of sending the packet
The probability model can also describe other things. For example, each instance of a
packet being sent is independent from the rest. Essentially, the outcome of one
instance of the experiment will not affect the other instances. In fact, the concept of
independent, since if we do not, the situation will become very difficult to deal with.
Now let’s look at a few examples. For the first example, consider that the procedure
is sending three packets of data from one computer to another. The observation
outcomes include DDD , DDF , DFD , etc. where D indicates a successful delivery and
F indicates a failed one.
In the second example, consider that the procedure is again that three packets of
data are being sent from one computer to another, but the observation is the
In the third example, consider that the procedure is to send a packet of data
repeatedly from one computer to another until the packet is delivered successfully
and the observation is number of attempts that were required. Here, possible
after another until three packets of data have been sent and the observation is the
number of attempts that we needed. Here, the possible outcomes are between 3
and ∞ .
Notice that the procedures are similar for all the examples, but these will not be
identical if they have the same procedure and the same observation.
In any probability model, we work with a certain random experiment with a certain set
of outcomes. This should include all the possible uniquely distinguishable outcomes.
For this set of outcomes, also called the sample space ( S), we do not know for
However, based on some data, we can make a prediction about the likelihood of an
outcome appearing. The chances of each outcome occurring is called the probability
of that outcome.
For some experiments, the probability of all the outcomes are equal, such as when
we roll a fair die. For others, some outcomes have a higher probability than others. By
this, we mean that for a number of runs of the experiment, those outcomes will
. This is done by convention. A higher value indicates that an outcome is more likely
to occur and a lower value indicates that an outcome is less likely to occur. It can also
probabilities are between 0 and 1 exclusive, but there are cases where the
which are all equally likely to occur. For this experiment, the graph would look a little
like this:
We still need to talk about where we are actually getting the values for the
probabilities. These values are either set or predicted. The assignment of these
values needs to follow a set of rules called the probability laws, which we will study
later.
Example 1.
Sample Space: S= { D, F }
It is very unlikely that the probabilities of both of these outcomes is equal. For now,
let us assume that P [ D ] =p and P [ F ] =1−p . These two equations, along with the
equation for the sample space, represent the probability model of this experiment.
The outcomes here are independent. Note that this is an extremely simple model
and is incomplete. We still need to give the actual values of the probabilities for the
model to be complete.
Example 2.
Sample Space: S= { FFF , FFD , FDF , FDD , DFF , DFD , DDF , DDD }
The probabilities here are a little more complicated and will be discussed later.
Example 3.
The outcomes here are, for 0 successful deliveries, FFF , for 1 successful delivery,
FFD or FDF or DFF and so on. However, just by looking at these values, we are
unable to tell specifically which outcome occurred. For example, the value 1 could
mean any of the three possibilities. Thus, the values we are searching for do not form
Example 4:
In this case, even though the values we are searching for are not the outcomes
themselves, we can tell what the exact outcome is using the values. For example,
the value 5 can only mean that the outcome was FFFFD . As such, this is the sample
space.
Example 5.
Here, the actual sample space is S= { DDD , FDDD , DFDD , DDFD , … }. Notice however
that DDDF is not one of the possibilities. This is because the experiment stops as
this experiment.
Sample Spaces
Collectively exhaustive
Mutually exclusive
A set of elements are considered collectively exhaustive if they contain all of the
outcomes of the experiment. If we get some outcome that is not in the sample space
S, then the outcome is considered unknown. Our experiment should not have any
unknown outcomes. Also, we should not have anything in the sample space that can
produces a null set. In other words, we should not have any scenario where more
The finest grain property is related to the granularity of the elements of the sample
space. Essentially, this means that if we were to break down any of the elements of
the sample space, none of the smaller elements would qualify as an outcome for the
All of this also means one more thing, that the sample space of an experiment is
unique. There can never be more than one sample space for a single experiment.
The finest grain property is likely to be vague and difficult to understand. This can be
Sample Space: S= { 1, 2 , 3 , 4 , 5 ,6 }
Consider the sample space S= { { 1∨3 } , 2 , {1∨4 } ,5 , 6 } . Notice that there are two
experiment. As such, this sample space does not have the finest grain property and
our sample space. This will help remove many of the confusions related to the
spaces can lead to situations where the probability of a result is found to be more
than 1.
Events and Event Spaces
Formally, an event is a set of outcomes that consists of one or more elements of the
sample space.
For the last example we say, S= { 1, 2 , 3 , 4 , 5 ,6 }. Here, a few possible events are:
E1= {1 }
Sample Space: S= { FFF , FFD , FDF , FDD , DFF , DFD , DDF , DDD }
E0 ={ FFF }
E3 ={ DDD }
Now consider the set E={ E0 , E1 , E 2 , E3 } . All the elements are mutually exclusive and
collectively exhaustive. A set of events that have these two properties is called an
event space. However, the elements are not finest grain, since the subset of an
element could be an outcome. We can tell that a success occurs if an event occurs,
but we cannot specify which outcome occurred. As such, we cannot call such a set a
sample space.
Probabilities
Now we shall discuss how to actually assign a value to a probability. All assignments
are of course, not valid. For a probability to be assigned, it needs to satisfy a few
Probability Axioms
By definition, axioms do not require any proof or validation. They are accepted as
nothing preventing this. However, this is simply accepted as fact and always
followed.
Additivity – If A and B are two events and they are disjoint (mutually
either A or B occurs.
This same formula applies for any number of events. Thus, generally, if A1, A2 ,
… are disjoint events, P [ A 1 ∪ A 2 ∪ … ]=P [ A 1 ] + P [ A 2 ]+ ….
Thus, the result of any instance of the experiment is always in the sample
space.
There are a few corollaries we can derive from these axioms.
A corollary extended from the additivity axiom is the case where the events A
and B are not disjoint. Here, P [ A ∪ B ]=P [ A ] + P [ B ]−P [ AB ]. This last term, P [ AB ]
If A is a subset of B, A ⊆ B, then P [ A ] ≤ P [ B ] .
There are more corollaries we can derive, but these are the ones we will end up using
frequently.
Joint Probabilities
finding the join probability is to simply find the sum of the probabilities of the events
that are common to A and B. Of course, the joint probability can be calculated for
multiple events.
Probability Assignment
Classical Approach
Subjective Approach
Classical Approach
In the classical approach, it is assumed that the outcomes are all equally likely. Say
1
there are n outcomes, E1, E2, …, En . Thus, the probability of each outcome will be .
n
Intuitively, we know that this is correct under this approach, but we still need to
check this against the three axioms. We will of course, find that all three check out.
On the other hand, we could assign the probabilities by following the axioms first. In
this way, we know for certain that the axioms were followed.
We are assuming that each of the outcomes are mutually exclusive. Thus,
We also assumed that each of the outcomes are equally likely, and thus have
1
the same probability. Thus, nP [ E1 ]=1 , and P [ E ] = .
n
1 1
Thus, P [ E1 ]=P [ E2 ]=…=P [ En ] = , meaning P [ Ei ]= , where i=1 , 2 ,… , n .
n n
Example:
Sample Space: S= { ( 1 ,1 ) , ( 1 ,2 ) , ( 1 ,3 ) , ( 1 , 4 ) , … . , ( 4 , 4 ) }
There is a total of 16 possible outcomes. The question here is, are the outcomes
equally likely to occur? In this case, they are. Thus, the probability of each of the
1
outcomes is .
16
In this scenario, the outcomes are not all equally likely. This is because, some of the
sums can occur due to more combinations than others. For example, the pairs ( 1 , 3 ),
( 3 , 1 ), ( 2 , 2 ) and ( 2 , 2 ) will all give us a sum of 4 , but the only way to get the sum 8 is from
the pair ( 4 , 4 ). Thus, obviously, the outcomes are not equally likely.
However, we can still assign probabilities to the outcomes. This is done using the
1 1 1 1 1
P [ 5 ] =P [ (1 , 4 ) ] + P [ ( 2, 3 ) ] + P [ ( 3 , 2 ) ] + P [ ( 4 , 1 ) ] = + + + =
16 16 16 16 4
However, this is not always easily noticeable. Instead, in the relative frequency
approach, we run the experiment a very large number of times, n times approaching
nA
P [ A ] =lim
n→∞ n
Subjective Approach
The subjective approach is used in very special scenarios and is not always included
in text books. However, this will come in handy when we get to the Statistics part of
Sample Space: S= { 1, 2 , 3 , 4 , 5 ,6 }
1
Obviously, the probability of any of the outcomes occurring is .
6
A ≡ { outcome is 2 }
1
P [ A ]=
6
B≡ { outcome is even }= {2 , 4 ,6 }
1
P [ B ]=
2
Now, say we cannot actually see the dice after rolling it, because we have a screen
between us and the dice. Someone else on the other side of the screen is informing
us about the results. This person is not giving us the exact result, but instead telling
1
P [ A given B has already occured ] =P [ A|B ] =
3
The reason the probability that event A occurs has changed is because we have a
SC =S∨B= { S ∩ B }= {2 , 4 , 6 }
model.
More formally,
P [ AB ] number of elements∈A ∩B
P [ A|B ] = = , P [ B ] >0
P [ B] number of elements∈B
P[ A]
¿
P[B]
1
6
¿
1
2
1
¿
3
P [ A|B ] ≥ 0
P [ S C ]=P [ S|B ] =1
P [ A ] and P [ A|B ] are not necessarily equal, but sometimes they might be. Both of
these events deal with the probability of event A occurring. The only difference is
that in the first case, we know nothing about the experiment, and in the second
case, we have some partial information. This difference causes the sample space to
be different.
Law of Multiplication (Product Rule)
This rule gives us the joint probability of two or more events occurring, e.g. P [ AB ] .
P [ AB ]
We know that, P [ A|B ] =
P [ B]
Similarly, P [ BA ] =P [ B| A ] ⋅ P [ A ], P [ A ] > 0.
Of course, P [ BA ] =P [ AB ] .
Consider that we have a sample space S for an experiment, where A is an event, and
we have an event space consisting of B1, B2, B3 and B4 . Using Venn diagrams, these
C 4 must be mutually exclusive since B1, B2, B3 and B4 are mutually exclusive.
n
Thus, P [ A ] =∑ P [ A|Bi ] ⋅ P [ Bi ].
i=1
Notice how we did not need to find the probability of A occurring separately. Instead,
we found the conditional probabilities of A occurring given that each of the events in
the event space occur, and then found the sum. This is the law of total probability.
Example
Consider that we are sending 3 data packets. The possible events are:
B0= { FFF }, B1={ FFD , FDF , DFF }, B2= { FDD , DFD , DDF } and B3= { DDD }.
Thus, E={ B0 , B1 , B2 , B3 }.
1 3 3 1
We know that P [ B 0 ]= , P [ B 1 ]= , P [ B 2 ]= and P [ B 4 ]= .
8 8 8 8
1 2
Thus, P [ A|B0 ]=0 , since A ∩ B0=∅ , P [ A|B1 ]= , P [ A|B2 ]= and P [ A|B3 ]=1.
3 3
Bayes formula combines conditional probability and the law of total probability
together. Here, we will not be dealing with conditional probability, but the reverse.
Say an event A is picking a biased coin from some set of coins, and an event B is
getting three heads in a row. We want to know the probability that the event A had
occurred, given B has occurred. Notice what this means. The event A is not
dependant on the event B, but it is one of the possible ways in which B can occur.
The way to solve this is to recognize that B can occur in more cases than just after A
has occurred. In this problem, B can occur for A and AC (not picking a biased coin),
with different probabilities in each case. Thus, the result we are looking for is the
probability that B has occurred after A divided by the total probability of B occurring.
If there were multiple ways for B to occur, for the events from A1 to An , the probability
P [ A1 B]
P [ A 1|B ]=
P[B]
P [ B| A1 ] ⋅ P [ A 1 ]
¿
P [ B]
P [ B| A1 ] ⋅ P [ A 1 ]
¿
P [ B| A1 ] ⋅ P [ A 1 ]+ …+ P [ B| A n ] ⋅ P [ A n ]
Say we have 4 events, A1, A2 , A3 and A 4, each signifying that a machine produces
some products. The event B signifies that some product is defective. Using Bayes
formula, we can find, given that we have found a defective product, the probability
Two events, A and B, are independent if the joint probability of the two events
Example
Sample Space: S= { FFF , FFD , FDF , FDD , DFF , DFD , DDF , DDD }
1.
P [ { FDD }|E2 ] =?
P¿
2.
P¿
3.
3
P[ E2∩ B] 8 3
P [ E2|B ] = = =
P[B] 7 7
8
4.
3
P [ B ∩ E2 ] 8
P [ B| E2 ] = = =1
P [ E2 ] 3
8
Example
The law of multiplication states that the probability of two or more events occurring
Consider that there is a radar covering a certain area, and a plane enters that area.
The radar sends a signal that bounces off of the plane and is detected by the radar.
However, no system is perfect, and there is always the possibility the signal is
reflecting off of some other body, and the radar is identifying that body as a plane.
Assume that the probability of an aeroplane being present with the range of the
radar is 0.05 . From the cases in which the aeroplane is present, the radar is able to
detect it in 99 % of cases. The 1 % of cases where it is unable to detect the aeroplane
are called missed detections. When the aeroplane is not present, the radar gives a
We need to find the probability of a missed detection and the probability of a false
alarm. To solve this, let’s make use of a tree diagram. It should make the problem
Here, P [ B ]=0.05, P [ B C ]=0.95, P [ A|B ] =0.99, P [ A C|B ]=0.01 , P [ A|BC ]=0.10 and
P [ A C|BC ]=0.90 .
A missed detection occurs when the aeroplane is present and there is no alarm, thus
P [ A C B ]=P [ A C|B ] ⋅ P [ B ] =0.01× 0.05=0.0005. Notice that this is the product of the two
P [ A BC ]=P [ A|B C ] ⋅ P [ B c ]=0.10 ×0.95=0.095. Again, this is the product of the two
Note that here, the presence of an aircraft is an unconditional event, and the radar
detecting the plane is a conditional event, since it depends on the presence of the
aircraft.
Example
Let’s look at an example for the law of total probability. Say we are playing in a chess
tournament. There are three types of players here, 50 % of type 1 and 25 % each of
type 2 and 3, with each type being an indication of how good the type of player is.
Say the probabilities of us beating a player of each of those types is 0.3 , 0.4 and 0.5
resepctively.
For a game with a randomly selected player, what is the probability that we win?
Let Bi be an event where we play against a type i player, where i=1 , 2 ,3, and A be an
For the same problem as the previous example, consider the problem using Bayes
formula. Say we played against a random player and won. What is the probability that
How are we supposed to differentiate between the case where we used conditional
Given Target
Conditional
Play Win
Probability
Bayes Type of
Win
Formula opponent
In conditional probability, the event that occurs first is given and we have to find the