[go: up one dir, main page]

0% found this document useful (0 votes)
57 views14 pages

Random Variables and Probability Distribution

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 14

Random Variables and Probability Distribution

I. Random variables
- a function whose value is a real number determined by
each element in the sample space denoted by capital
letters like X, Y, or Z
- the use of random variables provides a convenient way
of expressing elements of a sample space as numbers
- Random variables can be discrete or continuous

A. Discrete random variables


- have a countable number of outcomes
Examples: Dead/alive, treatment/placebo, dice, counts,
etc.
B. Continuous random variables
- have an infinite continuum of possible values
Examples: blood pressure, weight, the speed of a car,
and the real numbers from 1 to 6.

Exercise:

Security analysts are professionals who devote full-time efforts


to evaluating the investment worth of a narrow list of stocks. The
following variables are of interest to security analysts. Which are
discrete and which are continuous random variables?

The closing price of a particular stock on the New York


Stock Exchange
The number of shares of a particular stock that are
traded each business day
The quarterly earnings of a particular firm
The percentage change in earnings between last year
and this year for a particular firm
The number of new products introduced per year by a
firm
The time until a pharmaceutical company gains approval
from the U.S. Food and Drug Administration to market a
new drug
II. Probability Distribution

- A probability function maps the possible values of x


against their respective probabilities of occurrence, p ( x )
- p( x )is a number from 0 to 1.0.
- The area under a probability function is always 1.

Mean: μ= ∑ xi P(X =x i)
i

Variance: σ
2
=∑ ( x i−μ )2 P( X =x i)
i

1. Example:

On average, what is the order of


children born in the US?

Compute the variance.

2. Suppose you work for an insurance company, and you sell a


$10,000 one-year term insurance policy at an annual
premium of $290. Actuarial tables show that the probability
of death during the next year for a person of your customer’s
age, sex, health, etc., is .001. What is the expected gain
(amount of money made by the company) for a policy of this
type?
Note: The experiment is to observe whether the
customer survives the upcoming year. The probabilities
associated with the two sample points, Live and Die,
are .999 and 0.001, respectively. The random variable you
are interested in is the gain x, which can assume the values
shown in the following table. If the customer lives, the
company gains the $290 premium as profit. If the customer
dies, the gain is negative because the company must pay
$10,000, for a net “gain” of $(290 - 10,000) = - $9,710. The
expected gain is, therefore:

Gain, x Sample Point Probability

$290 Customer lives 0.999

- $9,710 Customer dies 0.001

μ=∑ xp(x )=( 290 ) ( 0.999 ) + (−9710 ) ( 0.001 )=280

Thus, if the company were to sell a very large number of


one-year $10,000 policies to customers possessing the
characteristics previously described, it would (on
the average) net $280 per sale in the next year.

A. Discrete Probability Distribution

1. Binomial Distribution
- When outcomes of an experiment are binary
Dichotomous (Bernoulli): X = 0 or 1
o P(X=1) = p
o P(X=0) = 1-p
Examples: Heads, Tails; True, False; Success,
Failure

 Binomial experiment
- a sequence of independent Bernoulli trials (n ) with a
constant probability of success at each trial ( p) and we
are interested in the total number of successes ( x ).
- A binomial experiment possesses the following
characteristics:
 The experiment consists of n repeated trials
 Each trial results in one of two mutually exclusive
outcomes that may be classified as either a
“success” or a “failure”
 The probability of success in one trial, denoted by p,
remains constant from trial to trial
 The repeated trials are independent

Examples:

i. You randomly select 3 bonds out of a possible


10 for an investment portfolio. Unknown to you,
8 of the 10 will maintain their present value, and
the other 2 will lose value due to a change in
their ratings. Let x be the number of the 3 bonds
you select that lose value.
Note: In checking the binomial
characteristics in the box, a problem arises with
both characteristic 3 (probabilities remaining the
same from trial to trial) and characteristic 4
(independence). The probability that the first
bond you pick loses value is clearly 2/10. Now
suppose the first bond you picked was 1 of the 2
that will lose value. This reduces the chance
that the second bond you pick will lose value to
19 because now only 1 of the 9 remaining
bonds are in that category. Thus, the choices
you make are dependent, and therefore x, the
number of 3 bonds you select that lose value, is
not a binomial random variable.

ii. Before marketing a new product on a large


scale, many companies will conduct a consumer
preference survey to determine whether the
product is likely to be successful. Suppose a
company develops a new diet soda and then
conducts a taste preference survey in which 100
randomly chosen consumers to state their
preferences among the new soda and the two
leading sellers. Let x be the number of the 100
who choose the new brand over the two others.
Note: Surveys that produce
dichotomous responses and use random
sampling techniques are classic examples of
binomial experiments. In our example, each
randomly selected consumer either state a
preference for the new diet soda or does not.
The sample of 100 consumers is a very small
proportion of the totality of potential consumers,
so the response of one would be, for all
practical purposes, independent of another.
Thus, x is a binomial random variable.

iii. Suppose a television cable company plans to


conduct a survey to determine the fraction of
households in the city that would use the cable
television service. The sampling method is to
choose a city block at random and then survey
every household on that block. This sampling
technique is called cluster sampling. Suppose
10 blocks are so sampled, producing a total of
124 household responses. Let x be the number
of the 124 households that would use the
television cable service.
Note: This example is a survey with
dichotomous responses (Yes or No to the cable
service), but the sampling method is not simple
random sampling. Again, the binomial
characteristic of independent trials would
probably not be satisfied. The responses of
households within a particular block would be
dependent because the households within a
block tend to be similar with respect to income,
level of education, and general interests. Thus,
the binomial model would not be satisfactory for
x if the cluster sampling technique were
employed.

Calculating probabilities:

Where n = number of trials

p = probability of success
q = probability of failure
x = number of successes

iv. Does participation in youth and/or high school


sports lead to greater wealth later in life? This
was the subject of a recent Harris POLL (March
2015). The poll found that 15% of adults who
participated in sports now have an income
greater than $100,000. In comparison, 9% of
adults who did not participate in sports have an
income greater than $100,000. Consider a
random sample of 25 adults, all of whom have
participated in youth and/or high school sports.
 What is the probability that fewer than
20 of these adults have an income
greater than $100,000?
 What is the probability that between 10
and 20 of these adults have an income
greater than $100,000?
 Repeat parts a and b, but assume that
none of the 25 sampled adults
participated in youth and/or high school
sports.

The mean, variance, and standard deviation of a binomially distributed


random variable are given by

Sample Problem:

According to the Internal Revenue Service (IRS), the chances of your tax
return being audited are about 1 in 100 if your income is less than $1
million and 9 in 100 if your income is $1 million or more (IRS
Enforcement and Services Statistics).

Assuming that 15 taxpayers with incomes of $1 million or more are


randomly selected.
 How many of them are expected to be audited?
 What is the standard deviation of the number of taxpayers being
audited?

2. Poisson Distribution
- The Poisson probability distribution provides a good
model for the probability distribution of the number of
“rare events” that occur randomly in time, distance, or
space.
- Assume that an interval is divided into a very large
number of subintervals so that the probability of the
occurrence of an event in any subinterval is very small.
- Assumptions of a Poisson probability distribution:
 The probability of an occurrence of an event is
constant for all subintervals: independent events;
 You are counting the number of times a particular
event occurs in a unit; and
 As the unit gets smaller, the probability that two or
more events will occur in that unit approaches zero.

The random variable X is said to follow the Poisson probability distribution


if it has the probability function:

Where:
P(x) = the probability of x successes over a given period of
time or space, given 

λ = the expected number of successes per time or space


unit;  > 0

e = 2.71828 (the base for natural logarithms)

The mean and variance of the Poisson probability distribution are:

Sample Problem:

The Federal Deposit Insurance Corporation (FDIC)


normally insures deposits of up to $100,000 in banks that are
members of the Federal Reserve System against losses due to
bank failure or theft. Over the last 10 years, the average number
of bank failures per year among insured banks was 52 (FDIC
Failed Bank List, 2016). Assume that x, the number of bank
failures per year among insured banks can be adequately
characterized by a Poisson probability distribution with a mean
of 52.

Find the expected value and standard deviation of x.

What is the probability that there will be more than 4


bank failures?
B. Continuous Distribution
- The probability function that accompanies a continuous
random variable is a continuous mathematical function
that integrates to 1.

- Area of the shaded region:

1. Normal Distribution
- The normal probability distribution is symmetrical about
its mean.
- Thus, half the area under the curve is above the mean
and half is below it.
- The mean, median, and mode are all equal

The tails are asymptotic relative to the


horizontal line.
 Areas under the Curve:
2. Standard Normal Distribution

You might also like