[go: up one dir, main page]

0% found this document useful (0 votes)
3 views49 pages

Special Random Variables

The document discusses various types of random variables, focusing on geometric and negative binomial distributions. It explains how to calculate probabilities and expected values for scenarios involving independent Bernoulli trials, such as hitting a target or reporting incidents. Additionally, it covers the application of these distributions in real-world problems, including marketing and safety monitoring.

Uploaded by

jpc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views49 pages

Special Random Variables

The document discusses various types of random variables, focusing on geometric and negative binomial distributions. It explains how to calculate probabilities and expected values for scenarios involving independent Bernoulli trials, such as hitting a target or reporting incidents. Additionally, it covers the application of these distributions in real-world problems, including marketing and safety monitoring.

Uploaded by

jpc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Special

Random Variables
Example: Remember this?

I am trying to hit a target, and I keep making successive independent attempts


until I hit the target for the first time. The chance of hitting the target in any trial
is ¼. What is the expected number of attempts needed?

Now consider the following questions:


a) P(no hit in the first 6 attempts) =
b) P(first hit in the 7th attempt) =
c) P(at least 1 hit in the first 10 attempts) =

2
Situation: Geometric

• You flip a coin (which comes up heads with probability 𝑝) until you get a head.
How many flips did you need?

• More generally: how many independent trials are needed until the first success?
Geometric Random Variable

We get to define “success”!

• A Bernoulli trial is a random event with only two possible outcomes,


say “success” and “failure”, with fixed probabilities.
• A geometric random variable is the number of Bernoulli trials needed
for the first success.
Geometric Distribution: Waiting for the 1st success

▪ Geometric distribution shows the number of trials needed until


success is achieved.
▪ Example: When shooting baskets in a basketball game, what is the
probability that the first time you make the basket will be the fourth
time you shoot the ball?
▪ If X is a random variable representing the number of trials required
until a success occurs, then for X to equal n, it is necessary and
sufficient that the first n-1 trials are failures and the 𝑛𝑡ℎ trial is a
success, i.e.,
𝑃 𝑋 = 𝑛 = (1 − 𝑝)𝑛−1 𝑝; 𝑛 = 1, 2, 3, … .
5
Geometric Probability distribution

▪ The expected value of a geometric random variable is


▪ 𝐸 𝑋 = 1/𝑝
▪ The variance of a geometric random variable is
1−𝑝
▪ Var 𝑋 =
𝑝2
▪ Sometimes, a geometric random variable counts the number of failures before
the first success, instead of number of trials.
▪ Notice that number of failures = number of trials – 1 = X-1
▪ R software uses this definition.

6
Geometric Distribution Using R

• Probability mass function: dgeom(x,p)


• Cumulative distribution function: pgeom(x,p)
• where x is the number of failures and p is the probability of success

7
Example

• A representative from the Marketing Division of a beverage company randomly


surveys people on a random street in Delhi, until he finds a person who has tasted
their new beverage launched last month. Suppose the probability that he succeeds
in finding such a person equals 0.20. What is the probability that he met the person
who tasted their new beverage was the 7th random interaction?
• Number of trials, n = 7
• Number of failures, x = n-1 = 6,
• Probability of success = 0.2
• R: dgeom(6, 0.2) = 0.05243

8
Geometric Property

• Geometric random variables are called “memoryless”

• Suppose you’re flipping coins (independently) until you see a head.


• The first three came up tails.
• How many flips are left until you see the first heads?

• It’s another independent copy of the original!


• The coin “forgot” it already came up tails 3 times.

Note: Geometric is the only discrete distribution with memoryless property.


Problem

Each game you play is a win with probability p. You plan to play 4 games, but if you
win the fourth game, you will keep playing until you lose.
a) Find the expected number of games that you play.
• Let X denote the number of games you play.

• After your third game, you will continue to play until you lose. Therefore, X-3 is
a geometric random variable with parameter 1- p, so
1
• 𝐸 𝑋 =𝐸 3+ 𝑋−3 =3+𝐸 𝑋−3 =3+
1−𝑝

10
Problem

(b) Find the expected number of games that you lose.

Let Y denote the number of games you lost and Z the number of games you lose
in the first 3 games. And Z is a binomial random variable with parameters 4 and
1-p
𝐸 𝑌 =𝐸 𝑍+1 =𝐸 𝑍 +1=3 1−𝑝 +1

11
Binomial A binomial random variable is the number of
Random successes (or failures) in a fixed number of
independent Bernoulli trials.
Variable

12
• n independent Bernoulli trials with probability of
success p
• X = number of “success”: X ~ Bin(n,p)
• P(X=x) = ? x = 0, 1, 2, ..., n
𝑛 𝑥 𝑛−𝑥 𝑛!
• 𝑃 𝑋=𝑥 = 𝑝 1−𝑝 = 𝑝𝑥 1−𝑝 𝑛−𝑥
𝑥 𝑥! 𝑛−𝑥 !

Binomial(n, p) • E(X) = np
• Var(X) = np(1-p)

p.m.f. c.d.f.

R: dbinom(x,n,p)/pbinom(x,n,p)

13
Example : Continuing with firing shots

A sequence of shots are being fired independently. P(hit) = ¼.

Exactly 10 shots are fired.


a) P(at least 1 hit in 10 shots)=
b) P(at least 2 hits in 10 shots) =
c) P(at least 2 hits in 10 shots| at least 1 hit in 10 shots) =

14
Additive Property

• If X~Bin(n, p) and Y~Bin(m, p) are independent, then X+Y~Bin(n+m, p)

• Example: Number of hits has distribution Bin(10, ¼).


• Try 5 more times. Number of hits now has distribution Bin(15, ¼).
• Note: p has to be the same, and independence is needed.
• Note: The above also means that Bin(n,p) is sum of n independent Bin(1,p), or
Ber(p).

15
Example : Continuing with firing shots

A sequence of shots are being fired independently. P(hit) = ¼.

Exactly 10 shots are fired.


a) P(at least 1 hit in 10 shots)=?
b) P(at least 2 hits in 10 shots) =?
c) P(at least 2 hits in 10 shots| at least 1 hit in 10 shots) =?
5 more shots are fired.
d) P(at least 2 hits in 15 shots)=?
What if the last 5 shots were fired from a different angle, with P(Success) = ½?

16
Problem
• A management school has always encouraged its first-year students to report deviant
behaviour of their seniors. However, the first-year students are generally reluctant to do
so: historically a very small percentage of such incidents were reported.
• A new awareness campaign was held at the beginning of the current academic year.
The authorities are hoping that 70% of the serious incidents (SI), and 50% of the minor
incidents (MI), will get reported this year.
• Suppose five incidents have happened recently: three SI and two MI, not all of which
might necessarily have been reported. Assume all reporting takes place independently,
with the probabilities as assumed by the authorities (0.7 for SI and 0.5 for MI). Denote
by X the number of serious incidents reported, and by Y the number of minor incidents
reported, from the above five incidents.

17
Problem cont’d…
a) Identify the distribution of X. Hence compute the probability that X ≥ 2.
X gives the number of reported SI out of 3, where probability of reporting is 0.7. Hence,
X ~ Bin(3,0.7). So P(X ≥ 2) = P(X = 2) + P(X = 3) = 3C2 0.72 0.3 + 0.73 = 0.784.
b) Interpret and compute the probability that X+Y ≥ 4.
X~Bin(3,0.7). Using a similar logic, Y~ Bin(2, 0.5). X and Y are independent variables as
the reporting of SI and MI are independent.
As X ≤ 3 and Y ≤ 2, X+Y = 4 means either (X = 3, Y = 1) or (X = 2, Y = 2). X + Y = 5 means
(X= 3, Y = 2).
Hence, P(X+Y = 4) = P(X = 3)P(Y = 1) + P(X = 2)P(Y = 2) (using independence) = 0.7 3 × 2C1
0.52 + 3C2 0.72 0.3 × 0.52 = 0.28175
P(X+Y = 5) = 0.73 ×0.52 = 0.08575.
Hence, P(X+Y≥ 4) = 0.28175 + 0.08575 = 0.3675
Cont’d…
• Compute E(X+Y | X+Y ≥ 4).
• E(X+Y | X+Y ≥ 4) represents the expected value of (X+Y), given (X+Y ≥ 4).
Hence, only two cases are possible, X+Y = 4, and X+Y = 5, with respective
𝑃(𝑋+𝑌=4)
probabilities P(X + Y = 4| X+Y ≥ 4) = = 0.7667 and P(X+Y = 5| X+Y ≥ 4) =
𝑃(𝑋+𝑌≥4)
𝑃(𝑋+𝑌=5)
= 0.2333
𝑃(𝑋+𝑌≥4)

𝑃(𝑋+𝑌=4) 𝑃(𝑋+𝑌= 5)
• Hence the expected value is 4 × + 5× = 4.2333.
𝑃(𝑋+𝑌≥4) 𝑃(𝑋+𝑌≥4)
Extension: More than 1 hit

Shots are being fired independently. P(hit) = ¼ at each shot.


How do we formulate a variable tracking number of attempts till the 3rd hit? For
example, how do we compute
d) P(third hit in the 7th shot) =?

20
Example

Example: P(Hit) = ¼ Y = Number of trials needed to obtain the third


success
~ Negative Binomial(3, ¼)
• P(third hit in the 7th shot) = P(Y = 7) = ?
Negative Binomial Random Variable

A negative binomial random variable is the number of trials needed to obtain a


pre-decided number of successes in a sequence of independent Bernoulli
trials.

Example: Suppose a digital marketing team runs an online advertising


campaign for a newly launched product. The team has determined that the
probability of converting a website visitor into a paying customer (success) is
0.05. They want to know how many website visitors they must attract (trials)
before achieving ten conversions (successes).

22
Negative Binomial Distribution: General setup

▪ Let X equal the number of trials required for the 𝑟 𝑡ℎ success to occur, then
𝑘−1 𝑟
𝑃 𝑋 = 𝑘 = 𝐶𝑟−1 𝑝 (1 − 𝑝)𝑘−𝑟
▪ We know that the very last trial must be a success and the first 𝑘 − 1 trials
should have had 𝑟 − 1 successes.
▪ Alternative formulation: no. of failures = no. of trials – no. of success

23
Negative Binomial Probabilities Using R

• Probability mass function: dnbinom(x,r,p)


• Cumulative distribution function: pnbinom(x,r,p)
• where x is the number of failures, p is the probability of success, x+r is the
number of trials

24
Example : Continued

Shots are being fired independently. P(hit) = ¼ at each shot.


Write the following probabilities in terms of appropriate negative binomial
random variables, and then compute:
d) P(third hit in the 7th shot)
e) P(at least 3 hits in the first 4 shots)

= P(3rd hit on or before the 4th shot)


Note: the above are the numerical solutions obtained after proper
formulation

25
Additive property

• Negative binomial distribution is the sum of independent geometric random


variables.
• If X ~ Geometric(p) and Y ~ Geometric(p) are independent, then X+Y ~ NB(2,p).
• In general, if X ~ NB(r1,p) and Y ~ NB(r2,p) are independent, then X+Y ~
NB(r1+r2,p).

Note: p has to be the same, and independence is needed

26
Negative Binomial Distribution: Expectation and
Variance

• E(Y) = r/p
• V(Y) = r(1-p)/p2
• Note: Can think of Y = Y1+ Y2 + … + Yr, where these are independent
Geometric(p).

• Example: Shots are being fired independently. P(hit) = 2/5 at each shot.
Expected number of attempts to hit 3 times? Variance?

27
Problem

• Walk-in recruitment drive, on a first-come-first-serve basis, is going on for two


separate types of jobs, say Grade A and Grade B. For Grade B, there is one
vacancy, while for Grade A, there are two vacancies. Assume a candidate can
only apply for one type of job.
a) If the chance of any walk-in candidate getting the Grade A job is 0.2, and
getting the Grade B job is 0.1, and their performances are independent, then
what is the chance that a total of 5 candidates are interviewed to fill up all
posts?
0.22 ×0.92 × 0.1 + 2C1× 0.8 × 0.22 × 0.9 × 0.1 + 3C1 × 0.82 × 0.22 × 0.1 = 0.01668
b) How would you solve the problem if both probabilities were 0.2?
4C × 0.82 × 0.23 = 0.03072
2

28
Safety Monitoring

• Companies that sell food and other consumer products constantly review customers’
feedback. Occasional complaints from dissatisfied customers always occur, but an
increase in the complaints may signal a serious problem that is better handled sooner
rather.
• Getting ahead of bad news and actively working to fix a problem can save a company’s
reputation. The company sells frozen, prepackaged dinners. It generally receives one or
two calls to its problem center each month from dissatisfied consumers.
• If the number of calls rises to six calls per month, then there’s a real problem. To be
specific, let’s say that the expected normal rate of calls is 1.5 per month and that the
anticipated rate of calls when there is a serious problem is 6 per month.

29
Safety Monitoring

• What type of random variable seems suited to modeling the number of calls
during a normal period? During a problem period? What assumptions are
necessary for the random variable chosen?

Poisson Distribution

30
Poisson Probability Distribution

• A Poisson-distributed random variable is often useful in estimating the


number of occurrences over a specified interval of time or space.
• Poisson does not have a given number of trials (n) as a binomial experiment
does.
• Occurrences are independent of other occurrences
• Examples of Poisson distributed random variables:
• The number of customers who use a new banking app in a day
• The number of spam emails received in a month
• The number of vehicles arriving at a toll booth in one hour
31
• It may be used as an approximation for a
binomial Bin(n,p) random variable when large n
and small p such that np = λ is moderate.
The Poisson • X = number of events: rate λ. X~ Poisson(λ)
Distribution • 𝑃 𝑋=𝑥 =
𝑒 −𝜆 𝜆𝑥
𝑥!
;𝑥 = 0,1,2 …
• The mean is 𝐸 𝑋 = 𝜆
• The variance is 𝑉𝑎𝑟 𝑋 = 𝜎 2 = 𝐸(𝑋) = 𝜆

32
Safety Monitoring
• During normal months, would it be surprising to receive more than
3 calls to the problem center?
Let 𝑋 be a Poisson random variable to model the number of calls.
Now, during a normal period with average number of calls 𝜆 = 1.5 per
month
𝑃 𝑋 >3 =1−𝑃 𝑋 ≤3
=1− 𝑃 𝑋 =0 +𝑃 𝑋 =1 +𝑃 𝑋 =2 +𝑃 𝑋 =3
𝑒 −1.5 𝜆0 𝑒 −1.5 𝜆1 𝑒 −1.5 𝜆2 𝑒 −1.5 𝜆3
=1− + + +
0! 1! 2! 3!
= 0.0656
There’s about a 6.6% chance of more than three calls (rare event), so it
would be somewhat surprising (not impossible, but unlikely) to have a more
than three calls. 33
Safety Monitoring

What would be the chances of receiving more than 3 calls to the problem center
during problem months?
• During a problem month the average number of calls 𝜆 = 6 per month
• 𝑃 𝑋 >3 =1−𝑃 𝑋 ≤3
• =1− 𝑃 𝑋 =0 +𝑃 𝑋 =1 +𝑃 𝑋 =2 +𝑃 𝑋 =3
𝑒 −6 𝜆0 𝑒 −6 𝜆1 𝑒 −6 𝜆2 𝑒 −6 𝜆3
• =1− + + +
0! 1! 2! 3!
• = 0.849
• There’s about a 85% chance of more than three calls during problem month

34
Safety Monitoring

The company seldom has problems. There is a 5% chance of a problem. If the


company receives more than 3 calls to the problem center in a month, then what
is the probability that there is, in fact a problem?
𝑃(𝑋>3|𝑃𝑟𝑜𝑏𝑙𝑒𝑚) 𝑃(𝑃𝑟𝑜𝑏𝑙𝑒𝑚)
•𝑃 𝑃𝑟𝑜𝑏𝑙𝑒𝑚 | 𝑋 > 3 =
𝑃(𝑋>3)
𝑃(𝑋>3|𝑃𝑟𝑜𝑏𝑙𝑒𝑚) 𝑃(𝑃𝑟𝑜𝑏𝑙𝑒𝑚)
•=
𝑃 𝑋>3 𝑃𝑟𝑜𝑏𝑙𝑒𝑚)𝑃 𝑃𝑟𝑜𝑏𝑙𝑒𝑚 +𝑃 𝑋>3 𝑁𝑜𝑟𝑚𝑎𝑙)𝑃(𝑁𝑜𝑟𝑚𝑎𝑙)
0.849∗0.05
•= = 0.405
0.849∗0.05+0.0656∗0.95

35
Application Problems
An insurance company has called a consulting firm to determine if the company has an
unusually high number of false insurance claims. It is known that the industry proportion for
false claims is 3%. The consulting firm has decided to randomly and independently sample
100 of the company’s insurance claims. What type of probability distribution will the
consulting firm most likely employ to analyze the insurance claims in the following problem?

Binomial distribution
A footballer practices kicking extra point goals for 30 minutes during practice. The kicker
has a 75% probability of making the extra-point goals. Let X represent the number of goals
the kicker makes within the 30-minute practice. Assume the kicker has an equal chance of
making it within 30 minutes (the kicker does not tire toward the end) and that each kick is
independent of the other.

Poisson distribution
36
Obtaining Probabilities Using R

• Using R to Obtain Poisson Probabilities


• Probability mass function: dpois(x, μ)
• Cumulative distribution function: ppois(x, μ)
• where x is the number of successes over some interval and μ is the mean over this
interval

37
• If X ~ Poisson(λ1) and Y ~ Poisson(λ2) are
independent, then X+Y ~ Poisson(λ1+λ2).
Additive
property Note: Independence is needed

38
Approximations to Binomial
Distribution

• When the number of trials are large (n), and the probability of success is very
small (p), (or “small 1-p”) so that the mean is “moderate λ = np”, then use Poisson
distribution as an approximation to the binomial distribution.

39
Example

• Suppose that the chance of a house catching fire during a year is 1/10,000.
Calculate the probability that during a particular year, exactly three houses
will catch fire in an area with 25,000 houses.

• Use (a) binomial distribution and (b) Poisson distribution

40
Problem

• A newly launched food delivery app receives orders, on average, at the rate of
2 orders per minute. Assume that the number of orders received in a minute
is distributed as a Poisson distribution, independent of the number of orders
received in any other period.

a) What is the probability of receiving no orders during a 2-minute period?

Let X be the number of orders during that 2-minute period. Then, X~


40
Poisson(2×2 = 4), and hence, the required probability is P(X = 0) = 𝑒 −4 = e-4
0!
= 0.0183.
41
Problem cont’d…

b) Given no orders are received during a 2-minute period, what is the


probability of receiving at least 1 order during the next 2-minute period?

The number of orders during disjoint periods are independent of each other,
so if we denote the number of orders in the two 2-minute periods by X and Y
respectively, they are going to be independent.

Hence, the required probability is just P(Y ≥ 1), where Y ~ Poisson(4). This is
given by P(Y ≥ 1) = 1 – P(Y = 0) = 1 – e-4 = 0.9817.
Problem cont’d…

c) Given 10 orders are received during a 5-minute period, what is the probability that no
orders are received during the first 2 minutes of that period, i.e., all 10 orders are
received during the last 3 minutes?
Let U be the number of orders received during the first 2 minutes, and V be the number of
orders received in the last 3 minutes. U and V are independent, U ~ Poisson(2×2 = 4) and V
~ Poisson(2×3 = 6).
U+V ~ Poisson(4+6) = Poisson(10) as sum of independent Poisson distributions is Poisson.
P(U = 0,V = 10)
The required probability is P(U = 0, V = 10| U+V = 10) = as (U = 0, V = 10) is a
𝑃(𝑈+𝑉=10)
subset of the event where U+V = 10.
610
P U = 0 ×P(V = 10) 𝑒 −4 ×𝑒 −6 10! 6 10
Using independence of U and V, the above is = 1010
= = 0.006.
𝑃(𝑈+𝑉=10) 𝑒 −10 10! 10
Problem cont’d…

d) When an order is received, the order is either prepaid (probability 0.2), or pay-on-
delivery (probability 0.8). If that happens independently for all orders, what is the
probability of receiving exactly one prepaid and one pay-on-delivery order during a
1-minute period?
Sampling without replacement

▪ Assume that a box contains 10 pins, of which 6 are good and the rest defective. An
operator picks 5 pins randomly from the 10 and is interested in the number of good
pins.
▪ Let X denote the number of good pins picked. This is the case of sampling without
replacement, and therefore, X is not a binomial random variable but follows a
hypergeometric distribution.
▪ The probability of picking a good pin is neither constant nor independent from trial
to trial.
▪ When a pool of size N contains K successes and (N - K) failures, and a random
sample of size n is drawn from the pool, the number of successes X in a sample
follows a hypergeometric distribution.

45
Hypergeometric distribution

• Let X is the random variable following the hypergeometric distribution


function, then
• X = no. of successes in the sample ~Hypergeometric(N,K,n)

𝐶𝑥𝐾 𝐶𝑛−𝑥
𝑁−𝐾
𝑃 𝑋=𝑥 =
𝐶𝑛𝑁

▪ 𝐸 𝑋 = 𝑛𝑝 where p = K/N
𝑁−𝑛
▪ 𝑉𝑎𝑟 𝑋 = 𝑛(1 − 𝑝)
𝑁−1

46
Hypergeometric Probabilities Using R

• Probability mass function: dhyper(x, K, N − K, n)


• Cumulative distribution function: phyper(x, K, N − K, n)
• where x is the number of successes in the sample, S is the number of successes in
the population, N − S is the number of failures in the population, and n is the
sample size.

47
Problem

• A purchaser of electrical components buys them in lots of size 10. It is


his policy to inspect 3 components randomly from a lot and to accept
the lot only if all 3 are non-defective.
• If 30 percent of the lots have 4 defective components and 70 percent
have only 1, what proportion of lots does the purchaser reject?

48
Summary for usage of common
discrete distributions

APPLICATION/SITUATION DISTRIBUTION FUNCTION TO USE

When interested in drawing the number of Use Hypergeometric Distribution


successes/failures without replacement

When interested in drawing the number of Use Binomial Distribution. For a large number of trials
successes/failures with replacement or repeated with a small probability of success, approximate by
independent trials Poisson.

When interested in the number of trials needed for a Use negative binomial or geometric.
set number of success

When interested in the number of events when the Use Poisson


events occur at a constant rate
49

You might also like