[go: up one dir, main page]

0% found this document useful (0 votes)
9 views30 pages

Introduction To Probability

Uploaded by

jehadalam123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views30 pages

Introduction To Probability

Uploaded by

jehadalam123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Introduction to Probability

Table of Contents

Experiments.........................................................................................................................................4

Random Experiments...................................................................................................................5

Probability Models.............................................................................................................................6

Probability Model Examples.......................................................................................................8

Sample Spaces...................................................................................................................................11

Events and Event Spaces..............................................................................................................13

Probabilities........................................................................................................................................14

Probability Axioms........................................................................................................................14

Joint Probabilities.........................................................................................................................15

Probability Assignment..............................................................................................................16

Relative Frequency Approach.................................................................................................18

Subjective Approach...................................................................................................................18
Conditional Probability....................................................................................................................19

Law of Multiplication (Product Rule).........................................................................................22

Law of Total Probability (Sum Rule)..........................................................................................23

Bayes Formula...................................................................................................................................25

Independence..................................................................................................................................26
Probability and statistics is the study of situations and phenomena that produces

uncertain outcomes. Essentially, there is a defined range of outcomes, but we do not

know for certain what the outcome is until we actually see the outcome.

The course will teach us that, for a problem description or report that can produce

some uncertainty, how we can:

 Identify the uncertainty

 Select a probability model to mathematically define the uncertainty

 Predict the most likely outcome

 Make appropriate conclusions and forecasts


Experiments

An experiment is anything that can produce a result. Experiments can be classified

into two categories, deterministic and random.

A deterministic experiment is one which has only one possible outcome, i.e. the

outcome is fixed. For example, if we perform a multiplication of two numbers, we are

guaranteed to always get the same result.

A random experiment has more than one possible outcome. The number of

outcomes can be anything from 2 to infinity. The outcome of any specific instance of

the experiment is random, and we do not know which one it is until we actually

perform the experiment. A random experiment can be repeated any number of times

with identical conditions. Examples of random experiments include tossing a coin,

rolling a dice, etc.

For our purposes, since the results are uncertain, we use the term ‘random

experiment’.

An experiment has two components, a procedure and an observation.

A procedure is a real-world phenomenon that produces an outcome. For example,

the physical process of tossing a coin is a procedure, since we cannot get the

outcome of the toss until we actually perform the toss.

The observation is the results we are looking for. For example, if we send a packet of

data from one computer to another, we would be observing the fate of the packet,

i.e. if it was sent successfully or if it was unsuccessful. In this case, sending the
packet is the procedure and whether or not the delivery was successful is the

observation.

Random Experiments

This course will deal mostly with random experiments, and it will do so in two parts:

probability and statistics. The first part deals with probability theory, or classical

probability, and the second part deals with statistical inference, which will lead up to

machine learning later on.

Random experiments can be of two types. The first type is one in which we know

everything about the experiment, its internal properties and the set of outcomes.

Here, the job of the probability theory is to mathematically define the uncertainty.

Once this is done, it is called a probability model. For example, if we are tossing a coin,

we need to know about several factors. We need to know whether or not the coin is

biased, whether external factors are affecting the toss, etc. Once all of this is known,

the uncertainty can be captured by the mathematical description provided by the

probability model. This model can help us find the likelihood of getting a certain

outcome and what the expected outcome of the experiment will be, which we call

the expectation.

The other type is where statistical inference comes in. We are given a random

experiment and a set of outcomes for the experiment as data, but we do not have

any information about the experiment. Statistical inference can help us figure out

the unknown details of the experiment. We can estimate details related to the

probability model of the experiment, we can identify the probability model without

knowing the experiment and we can forecast future data of the random experiment.
Probability Models

A probability model is a mathematical description of an uncertain situation (a random

experiment). For example, if we are sending a packet of data from one computer to

another, the probability model will tell us that the chances of sending the packet

successfully is n times that of failing to do so.

The probability model can also describe other things. For example, each instance of a

packet being sent is independent from the rest. Essentially, the outcome of one

instance of the experiment will not affect the other instances. In fact, the concept of

independence is extremely important in probability theory. There are situations

where we will be forced to assume that successive instances of an experiment are

independent, since if we do not, the situation will become very difficult to deal with.

Now let’s look at a few examples. For the first example, consider that the procedure

is sending three packets of data from one computer to another. The observation

here would be the sequence of successful and unsuccessful deliveries. Possible

outcomes include DDD , DDF , DFD , etc. where D indicates a successful delivery and
F indicates a failed one.

In the second example, consider that the procedure is again that three packets of

data are being sent from one computer to another, but the observation is the

number of successful deliveries. Possible outcomes include 0, 1, 2 and 3.

In the third example, consider that the procedure is to send a packet of data

repeatedly from one computer to another until the packet is delivered successfully

and the observation is number of attempts that were required. Here, possible

outcomes could be anywhere between 1 and ∞ .


In the fourth example, consider that the procedure is to send packets of data one

after another until three packets of data have been sent and the observation is the

number of attempts that we needed. Here, the possible outcomes are between 3

and ∞ .

Notice that the procedures are similar for all the examples, but these will not be

considered the same experiments. Two experiments are only considered to be

identical if they have the same procedure and the same observation.

In any probability model, we work with a certain random experiment with a certain set

of outcomes. This should include all the possible uniquely distinguishable outcomes.

For this set of outcomes, also called the sample space ( S), we do not know for

certain which of the outcomes will appear in one instance of an experiment.

However, based on some data, we can make a prediction about the likelihood of an

outcome appearing. The chances of each outcome occurring is called the probability

of that outcome.

For some experiments, the probability of all the outcomes are equal, such as when

we roll a fair die. For others, some outcomes have a higher probability than others. By

this, we mean that for a number of runs of the experiment, those outcomes will

appear more times.

The probability of an outcome is denoted by a non-negative number between 0 and 1

. This is done by convention. A higher value indicates that an outcome is more likely

to occur and a lower value indicates that an outcome is less likely to occur. It can also

indicate the relative frequency of a particular outcome occurring. Usually, the

probabilities are between 0 and 1 exclusive, but there are cases where the

probability may be 0 or 1. This value is also called the probability measure.


For an outcome A , the probability of the outcome is denoted by P [ A ].

The outcomes and their corresponding probabilities can be represented using a

graph. For example, consider an experiment with 3 possible outcomes, A , B and C ,

which are all equally likely to occur. For this experiment, the graph would look a little

like this:

We still need to talk about where we are actually getting the values for the

probabilities. These values are either set or predicted. The assignment of these

values needs to follow a set of rules called the probability laws, which we will study

later.

Probability Model Examples

Example 1.

Procedure: Send a packet of data from one computer to another.

Observation: Whether the packet was delivered ( D ) or the delivery failed ( F ).

Sample Space: S= { D, F }

It is very unlikely that the probabilities of both of these outcomes is equal. For now,

let us assume that P [ D ] =p and P [ F ] =1−p . These two equations, along with the
equation for the sample space, represent the probability model of this experiment.

The outcomes here are independent. Note that this is an extremely simple model

and is incomplete. We still need to give the actual values of the probabilities for the

model to be complete.

Example 2.

Procedure: Sending 3 packets of data.

Observation: The sequence of outcomes.

Sample Space: S= { FFF , FFD , FDF , FDD , DFF , DFD , DDF , DDD }

The probabilities here are a little more complicated and will be discussed later.

Example 3.

Procedure: Sending 3 packets of data.

Observation: The number of successes.


{ 0 , 1 ,2 , 3 }

The outcomes here are, for 0 successful deliveries, FFF , for 1 successful delivery,
FFD or FDF or DFF and so on. However, just by looking at these values, we are

unable to tell specifically which outcome occurred. For example, the value 1 could

mean any of the three possibilities. Thus, the values we are searching for do not form

the sample space.

Example 4:

Procedure: Send a packet of data repeatedly, until it is successful.

Observation: The number of attempts required.


Sample Space: S= { 1, 2 , 3 ,… ,∞ }

In this case, even though the values we are searching for are not the outcomes

themselves, we can tell what the exact outcome is using the values. For example,

the value 5 can only mean that the outcome was FFFFD . As such, this is the sample

space.

Example 5.

Procedure: Send packets repeatedly until 3 packets have been sent.

Observation: The number of attempts required.


{ 3 , 4 ,5 , … , ∞ }

Here, the actual sample space is S= { DDD , FDDD , DFDD , DDFD , … }. Notice however

that DDDF is not one of the possibilities. This is because the experiment stops as

soon as we have 3 successful deliveries. Thus, that is a condition we must abide by in

this experiment.
Sample Spaces

A sample space is a set of outcomes that are

 Collectively exhaustive

 Mutually exclusive

 Have the finest grain property

A set of elements are considered collectively exhaustive if they contain all of the

outcomes of the experiment. If we get some outcome that is not in the sample space
S, then the outcome is considered unknown. Our experiment should not have any

unknown outcomes. Also, we should not have anything in the sample space that can

never be an outcome for that experiment.

Mathematically, two elements are considered mutually exclusive if their intersection

produces a null set. In other words, we should not have any scenario where more

than one element of S satisfies the outcome of the experiment.

The finest grain property is related to the granularity of the elements of the sample

space. Essentially, this means that if we were to break down any of the elements of

the sample space, none of the smaller elements would qualify as an outcome for the

experiment. All the elements should be uniquely distinguishable, and a subset of an

element of S cannot be an outcome.

All of this also means one more thing, that the sample space of an experiment is

unique. There can never be more than one sample space for a single experiment.

The finest grain property is likely to be vague and difficult to understand. This can be

made easier with the help of an example.


Procedure: Roll a die

Observation: Number of dots on the side facing upwards

Sample Space: S= { 1, 2 , 3 , 4 , 5 ,6 }

Consider the sample space S= { { 1∨3 } , 2 , {1∨4 } ,5 , 6 } . Notice that there are two

elements such that a subset of those elements, 1, can be an outcome of the

experiment. As such, this sample space does not have the finest grain property and

cannot be a sample space.

Whenever we start working on an experiment, the first thing we should do is define

our sample space. This will help remove many of the confusions related to the

development of the probability model of a random experiment. Incorrect sample

spaces can lead to situations where the probability of a result is found to be more

than 1.
Events and Event Spaces

Formally, an event is a set of outcomes that consists of one or more elements of the

sample space.

For the last example we say, S= { 1, 2 , 3 , 4 , 5 ,6 }. Here, a few possible events are:

E1= {1 }

E2= { outcome is an even number }= { 2, 4 , 6 }

E3 ={ outcome is greater than4 }= {5 , 6 }

From these, we can calculate the probability of a particular event occurring.

Let’s consider another example.

Procedure: Send 3 packets

Observation: Sequence of successes and failures

Sample Space: S= { FFF , FFD , FDF , FDD , DFF , DFD , DDF , DDD }

Ei ={ number of success isi } i=0 , 1 , 2, 3

E0 ={ FFF }

E1= { FFD , FDF , DFF }

E2= { FDD , DFD , DDF }

E3 ={ DDD }

Now consider the set E={ E0 , E1 , E 2 , E3 } . All the elements are mutually exclusive and

collectively exhaustive. A set of events that have these two properties is called an

event space. However, the elements are not finest grain, since the subset of an

element could be an outcome. We can tell that a success occurs if an event occurs,
but we cannot specify which outcome occurred. As such, we cannot call such a set a

sample space.

Probabilities

Now we shall discuss how to actually assign a value to a probability. All assignments

are of course, not valid. For a probability to be assigned, it needs to satisfy a few

conditions. These conditions are called probability axioms.

Probability Axioms

By definition, axioms do not require any proof or validation. They are accepted as

they are. The probability axioms are:

 Non-negativity – A probability cannot be negative. Theoretically, there is

nothing preventing this. However, this is simply accepted as fact and always

followed.

If A is an event, P [ A ] ≥ 0, ∀ A (for all values of A ).

 Additivity – If A and B are two events and they are disjoint (mutually

exclusive), P [ A ∪ B ]=P [ A ] + P [ B ]. Collectively, P [ A ∪ B ] is the probability that

either A or B occurs.

This same formula applies for any number of events. Thus, generally, if A1, A2 ,
… are disjoint events, P [ A 1 ∪ A 2 ∪ … ]=P [ A 1 ] + P [ A 2 ]+ ….

 Normalization – The probability of the entire sample space is 1, i.e. P [ S ] =1.

Thus, the result of any instance of the experiment is always in the sample

space.
There are a few corollaries we can derive from these axioms.

 A corollary extended from the additivity axiom is the case where the events A

and B are not disjoint. Here, P [ A ∪ B ]=P [ A ] + P [ B ]−P [ AB ]. This last term, P [ AB ]

, is the probability of the intersection of A and B, and is called the joint

probability. This will be discussed further in a moment.

 Another term we need to be aware of is the complementary probability of an

event. The complimentary probability of the event A , P [ A C ] , is the probability

that the event A does not occur. Thus, P [ A C ]=1−P [ A ].

 If A is a subset of B, A ⊆ B, then P [ A ] ≤ P [ B ] .

There are more corollaries we can derive, but these are the ones we will end up using

frequently.

Joint Probabilities

Joint probabilities are represented as A ∩ B, A , B or AB. This simple means the

probability of both events A and B occurring simultaneously. One approach to

finding the join probability is to simply find the sum of the probabilities of the events

that are common to A and B. Of course, the joint probability can be calculated for

multiple events.
Probability Assignment

There are three ways in which probabilities can be assigned.

 Classical Approach

 Relative Frequency Approach

 Subjective Approach

Classical Approach

In the classical approach, it is assumed that the outcomes are all equally likely. Say
1
there are n outcomes, E1, E2, …, En . Thus, the probability of each outcome will be .
n
Intuitively, we know that this is correct under this approach, but we still need to

check this against the three axioms. We will of course, find that all three check out.

On the other hand, we could assign the probabilities by following the axioms first. In

this way, we know for certain that the axioms were followed.

 For this scenario, S= { E1 , E 2 , … , En } ={ E1 ∪ E2 ∪ …∪ E n }.

 According to the third axiom, we know that P [ S ] =1. Thus,


P [ E1 ∪ E 2 ∪ … ∪ E n ]=1.

 We are assuming that each of the outcomes are mutually exclusive. Thus,

following the second axiom, we know that P [ E1 ] + P [ E 2 ]+ …+ P [ En ]=1.

 We also assumed that each of the outcomes are equally likely, and thus have
1
the same probability. Thus, nP [ E1 ]=1 , and P [ E ] = .
n
1 1
 Thus, P [ E1 ]=P [ E2 ]=…=P [ En ] = , meaning P [ Ei ]= , where i=1 , 2 ,… , n .
n n

Example:

Procedure: Roll 2 fair 4-sided dice.

Observation: Number of dots on the top face of both dies

Sample Space: S= { ( 1 ,1 ) , ( 1 ,2 ) , ( 1 ,3 ) , ( 1 , 4 ) , … . , ( 4 , 4 ) }

There is a total of 16 possible outcomes. The question here is, are the outcomes

equally likely to occur? In this case, they are. Thus, the probability of each of the
1
outcomes is .
16

Now, let’s change the observation.

Observation: Sum of the dots on both dice


{ 2 , 3 , 4 , 5 , 6 ,7 ,8 }

In this scenario, the outcomes are not all equally likely. This is because, some of the

sums can occur due to more combinations than others. For example, the pairs ( 1 , 3 ),
( 3 , 1 ), ( 2 , 2 ) and ( 2 , 2 ) will all give us a sum of 4 , but the only way to get the sum 8 is from

the pair ( 4 , 4 ). Thus, obviously, the outcomes are not equally likely.

However, we can still assign probabilities to the outcomes. This is done using the

relative frequency approach.


Relative Frequency Approach

For the previous example,

1 1 1 1 1
P [ 5 ] =P [ (1 , 4 ) ] + P [ ( 2, 3 ) ] + P [ ( 3 , 2 ) ] + P [ ( 4 , 1 ) ] = + + + =
16 16 16 16 4

However, this is not always easily noticeable. Instead, in the relative frequency

approach, we run the experiment a very large number of times, n times approaching

infinity. From here, P [ A ] is the number of times A occurs, or n A . Thus,

nA
P [ A ] =lim
n→∞ n

Subjective Approach

The subjective approach is used in very special scenarios and is not always included

in text books. However, this will come in handy when we get to the Statistics part of

this course, so we shall discuss it a bit here.

Essentially, the subjective approach is based off of someone’s experience. As such,

it varies from person to person.


Conditional Probability

In conditional probability, some information about the outcomes is given in advance.

It is best to see how this works using an experiment.

Procedure: Roll a 6-sided fair die

Observation: Number of dots

Sample Space: S= { 1, 2 , 3 , 4 , 5 ,6 }

1
Obviously, the probability of any of the outcomes occurring is .
6

Now consider these cases.

A ≡ { outcome is 2 }
1
P [ A ]=
6

B≡ { outcome is even }= {2 , 4 ,6 }
1
P [ B ]=
2

Now, say we cannot actually see the dice after rolling it, because we have a screen

between us and the dice. Someone else on the other side of the screen is informing

us about the results. This person is not giving us the exact result, but instead telling

is if event B occurs or not.


Now, we need to find the probability of event A , subject to the condition that event B

has already occurred.

1
P [ A given B has already occured ] =P [ A|B ] =
3

The reason the probability that event A occurs has changed is because we have a

changed sample space, called the conditional sample space.

SC =S∨B= { S ∩ B }= {2 , 4 , 6 }

The probability model we have in this situation is called a conditional probability

model.

More formally,

P [ AB ] number of elements∈A ∩B
P [ A|B ] = = , P [ B ] >0
P [ B] number of elements∈B
P[ A]
¿
P[B]
1
6
¿
1
2
1
¿
3

The conditional probability model satisfies the axioms of probability.

 P [ A|B ] ≥ 0

 If A1 and A2 are disjoint events, P [ A 1|B ∪ A 2| B ] =P [ A 1|B ] + P [ A2|B ]

 P [ S C ]=P [ S|B ] =1
P [ A ] and P [ A|B ] are not necessarily equal, but sometimes they might be. Both of

these events deal with the probability of event A occurring. The only difference is

that in the first case, we know nothing about the experiment, and in the second

case, we have some partial information. This difference causes the sample space to

be different.
Law of Multiplication (Product Rule)

This rule gives us the joint probability of two or more events occurring, e.g. P [ AB ] .

These probabilities are found using conditional probabilities.

P [ AB ]
We know that, P [ A|B ] =
P [ B]

Thus, P [ AB ] =P [ A|B ] ⋅ P [ B ], P [ B ] >0

Similarly, P [ BA ] =P [ B| A ] ⋅ P [ A ], P [ A ] > 0.

Of course, P [ BA ] =P [ AB ] .

This is known as the law of multiplication.

Extending upon this idea,

P [ ABC ]=P [ C| AB ] ⋅ P [ AB ] =P [ C| AB ] ⋅ P [ B| A ] ⋅ P [ A ] =P [ A ] ⋅ P [ B| A ] ⋅ P [ C|AB ]

A Venn diagram should make visualizing why this works easier.

In general, P [ A 1 A 2 A3 … ]=P [ A 1 ] ⋅ P [ A 2| A1 ] ⋅ P [ A 3| A 1 A 2 ] ⋅… ⋅ P [ An| A 1 A2 … A n−1 ]. Thus, we

are following a chain rule.


Law of Total Probability (Sum Rule)

Consider that we have a sample space S for an experiment, where A is an event, and

we have an event space consisting of B1, B2, B3 and B4 . Using Venn diagrams, these

look like this:

Here C 1=A ∩B 1, C 2=A ∩B 2, C 3= A ∩ B3 and C 4 = A ∩ B 4. Thus,


P [ C1 ]=P [ A ∩ B1 ]=P [ A|B1 ] ⋅ P [ B1 ], P [ C2 ]=P [ A ∩ B2 ]=P [ A|B2 ] ⋅ P [ B2 ] and so on.

We also have A=C1 ∪ C2 ∪ C3 ∪ C 4. Thus,


P [ A ] =P [ C1 ∪ C2 ∪ C3 ∪ C 4 ]=P [ C 1 ]+ P [ C 2 ]+ P [ C 3 ]+ P [ C 4 ] , since we know C 1, C 2, C 3 and

C 4 must be mutually exclusive since B1, B2, B3 and B4 are mutually exclusive.

n
Thus, P [ A ] =∑ P [ A|Bi ] ⋅ P [ Bi ].
i=1

Notice how we did not need to find the probability of A occurring separately. Instead,

we found the conditional probabilities of A occurring given that each of the events in

the event space occur, and then found the sum. This is the law of total probability.
Example

Consider that we are sending 3 data packets. The possible events are:

B0= { FFF }, B1={ FFD , FDF , DFF }, B2= { FDD , DFD , DDF } and B3= { DDD }.

Thus, E={ B0 , B1 , B2 , B3 }.

Say the event A={ 3 rd packet is successfully sent }.

1 3 3 1
We know that P [ B 0 ]= , P [ B 1 ]= , P [ B 2 ]= and P [ B 4 ]= .
8 8 8 8

1 2
Thus, P [ A|B0 ]=0 , since A ∩ B0=∅ , P [ A|B1 ]= , P [ A|B2 ]= and P [ A|B3 ]=1.
3 3

( 18 )+( 13 × 38 )+( 32 × 38 )+(1 × 18 )= 12 .


Thus, P [ A ] = 0 ×
Bayes Formula

Bayes formula combines conditional probability and the law of total probability

together. Here, we will not be dealing with conditional probability, but the reverse.

Say an event A is picking a biased coin from some set of coins, and an event B is

getting three heads in a row. We want to know the probability that the event A had

occurred, given B has occurred. Notice what this means. The event A is not

dependant on the event B, but it is one of the possible ways in which B can occur.

The way to solve this is to recognize that B can occur in more cases than just after A

has occurred. In this problem, B can occur for A and AC (not picking a biased coin),

with different probabilities in each case. Thus, the result we are looking for is the

probability that B has occurred after A divided by the total probability of B occurring.

If there were multiple ways for B to occur, for the events from A1 to An , the probability

that A1 had occurred given B has occurred is given by

P [ A1 B]
P [ A 1|B ]=
P[B]

P [ B| A1 ] ⋅ P [ A 1 ]
¿
P [ B]

P [ B| A1 ] ⋅ P [ A 1 ]
¿
P [ B| A1 ] ⋅ P [ A 1 ]+ …+ P [ B| A n ] ⋅ P [ A n ]

Say we have 4 events, A1, A2 , A3 and A 4, each signifying that a machine produces

some products. The event B signifies that some product is defective. Using Bayes

formula, we can find, given that we have found a defective product, the probability

that that product was produced by the i th machine.


Independence

Two events, A and B, are independent if the joint probability of the two events

occurring is the same as the product of their individual probabilities, i.e.


P [ AB ] =P [ A ] ⋅ P [ B ] .

If A and B are independent, P [ A|B ] =P [ A ]. Thus, P [ AB ] =P [ A|B ] ⋅ P [ B ] =P [ A ] ⋅ P [ B ].

Example

Procedure: Send 3 data packets

Observations: Sequence of successes and failures

Sample Space: S= { FFF , FFD , FDF , FDD , DFF , DFD , DDF , DDD }

Ei ={ ith attempt was a success }

B= { at least one of the attempts was a success }

1.

P [ { FDD }|E2 ] =?

E2= { FDD , DFD , DDF }

P¿
2.

P¿

3.

3
P[ E2∩ B] 8 3
P [ E2|B ] = = =
P[B] 7 7
8

4.

3
P [ B ∩ E2 ] 8
P [ B| E2 ] = = =1
P [ E2 ] 3
8

Example

The law of multiplication states that the probability of two or more events occurring

simultaneously is given by P [ AB ] =P [ A|B ] ⋅ P [ B ] =P [ B| A ] ⋅ P [ A ] .

Consider that there is a radar covering a certain area, and a plane enters that area.

The radar sends a signal that bounces off of the plane and is detected by the radar.

However, no system is perfect, and there is always the possibility the signal is

reflecting off of some other body, and the radar is identifying that body as a plane.

Assume that the probability of an aeroplane being present with the range of the

radar is 0.05 . From the cases in which the aeroplane is present, the radar is able to
detect it in 99 % of cases. The 1 % of cases where it is unable to detect the aeroplane

are called missed detections. When the aeroplane is not present, the radar gives a

false alarm in 10 % of cases.

We need to find the probability of a missed detection and the probability of a false

alarm. To solve this, let’s make use of a tree diagram. It should make the problem

easier for us.

Here, P [ B ]=0.05, P [ B C ]=0.95, P [ A|B ] =0.99, P [ A C|B ]=0.01 , P [ A|BC ]=0.10 and

P [ A C|BC ]=0.90 .

A missed detection occurs when the aeroplane is present and there is no alarm, thus

P [ A C B ]=P [ A C|B ] ⋅ P [ B ] =0.01× 0.05=0.0005. Notice that this is the product of the two

edges from the tree graph that lead to this result.


A false alarm occurs when an aeroplane is not present and there is an alarm, thus

P [ A BC ]=P [ A|B C ] ⋅ P [ B c ]=0.10 ×0.95=0.095. Again, this is the product of the two

edges from the tree graph that lead to this result.

Note that here, the presence of an aircraft is an unconditional event, and the radar

detecting the plane is a conditional event, since it depends on the presence of the

aircraft.

Example

Let’s look at an example for the law of total probability. Say we are playing in a chess

tournament. There are three types of players here, 50 % of type 1 and 25 % each of

type 2 and 3, with each type being an indication of how good the type of player is.

Say the probabilities of us beating a player of each of those types is 0.3 , 0.4 and 0.5

resepctively.

For a game with a randomly selected player, what is the probability that we win?

Let Bi be an event where we play against a type i player, where i=1 , 2 ,3, and A be an

event that we win against a random player.

P [ A ] =P [ A B1 ] + P [ A B 2 ]+ P [ A B3 ]=P [ A|B1 ] ⋅ P [ B1 ]+ P [ A|B 2 ] ⋅ P [ B 2 ]+ P [ A|B3 ] ⋅ P [ B3 ] = ( 0.3× 0.5 ) + ( 0.4 ×0.25


Example

For the same problem as the previous example, consider the problem using Bayes

formula. Say we played against a random player and won. What is the probability that

we played against a type 1 player?

P [ A B1 ] P [ A|B1 ] ⋅P [ B1 ] 0.3× 0.5


Thus, P [ B 1| A ]= = = =0.4 .
P[ A] 0.375 0.375

How are we supposed to differentiate between the case where we used conditional

probability and the case where we used Bayes formula?

Play versus type i


Events Result
player

Order First Second

Given Target

Conditional
Play Win
Probability
Bayes Type of
Win
Formula opponent

In conditional probability, the event that occurs first is given and we have to find the

other event. In Bayes formula, the reverse is true.

You might also like