Random variables and univariate probability
distributions
Prerequisites and complementary topics: Probability and probability measures.
Other learning materials: Solved exercises, Fundamentals of probability theory -
Multiple choice test 2.
Random variable - Definition
In this lecture we define the concept of random variable. We start with an informal
definition of random variable and then give a more precise definition.
Definition (informal)_ A random variable X is a (real) number with the following
characteristics:
the value of X is unknown;
the possible values can take are known; the set of such values is called
support and denoted by RX ;
if A is a subset of the set of real numbers R (i.e. A Í R), we know the
probability that X belongs to A (denoted by P(X Î A).
If (and when) the value x of X becomes known, i.e. we know that X=x , we say that x is
the realization of X. Hence, the support RX is the set of all possible realizations of X .
To give a more formal definition of random variable, we need to use the notion of a
sample space introduced in previous lectures:
Definition (formal)_ Let Ω be a sample space and let P(E) be a probability measure
defined on the events EÍ Ω. A random variable X is a function from the sample space
Ω to the set of real numbers R:
In rigorous (measure-theoretic) probability theory, the function X is also required to be
measurable (see a more rigorous definition of random variable).
The intuition is the following: we conduct a random experiment; the possible outcomes
of the random experiment are the sample points ; to each sample point is
associated a real number ; when the outcome of the experiment is , the
realization of the random variable is .
Some remarks on notation are in order:
1. The dependence of on is often omitted, i.e. we
simply write instead of .
2. Let be a subset of the set of real numbers (i.e.
). The exact meaning of the notation is the
following:
3. Let be a subset of the set of real numbers (i.e.
). Sometimes we use the notation with the
following meaning:
In this case,
is to be interpreted as a probability measure on the set of real
numbers, induced by the random variable . Often,
statisticians construct probabilistic models where a random
variable is defined by directly specifing , without
specifying the sample space .
Example_ Suppose that we flip a coin. The possible outcomes are either tail ( ) or
head ( ), i.e.: The two outcomes are assigned equal probabilities:
If tail ( ) is the outcome, we win one dollar, if head ( ) is the
outcome we lose one dollar. The amount we win (or lose) is a random variable,
defined as follows: The probability of winning one dollar is:
The probability of losing one dollar
is: The probability of losing two
dollars is:
Most of the time, statisticians deal with two special kinds of random variables:
1. discrete random variables;
2. absolutely continuous random variables.
We define these two kinds of random variables below.
Discrete random variables
Discrete random variables are defined as follows:
Definition_ A random variable is discrete if:
1. its support is a countable set;
2. there is a function , called the
probability mass function (or pmf or probability
function) of , such that, for any :
The following is an example of a discrete random variable:
Example_ A Bernoulli random variable is an example of a discrete random variable. It
can take only two values: with probability and with probability , where
. Its support is . Its probability mass function is:
The properties of probability mass functions are discussed more in detail in the lecture
entitled Legitimate probability mass functions. We anticipate here that probability mass
functions are characterized by two fundamental properties:
1. Non-negativity: for any ;
2. Sum over the support equals : .
It turns out not only that any probability mass function must satisfy these two properties,
but also that any function satisfying these two properties is a legitimate probability mass
function. You can find a detailed discussion of this fact in the aforementioned lecture.
Absolutely continuous random variables
Absolutely continuous random variables are defined as follows.
Definition_ A random variable is absolutely continuous if:
1. its support is a set with the power of the
continuum;
2. there is a function , called the
probability density function (or pdf or density
function) of , such that, for any interval :
Absolutely continuous random variables are often called continuous random
variables, omitting the adverb absolutely.
The following is an example of an absolutely continuous random variable:
Example_ A uniform random variable (on the interval ) is an example of an
absolutely continuous random variable. It can take any value in the interval . All
sub-intervals of equal length are equally likely. Its support is . Its probability
density function is: The probability that the realization of
belongs, for example, to the interval is:
The properties of probability density functions are discussed more in detail in the lecture
entitled Legitimate probability density functions. We anticipate here that probability
density functions are characterized by two fundamental properties:
1. Non-negativity: for any ;
2. Integral over equals : .
It turns out not only that any probability density function must satisfy these two
properties, but also that any function satisfying these two properties is a legitimate
probability density function. You can find a detailed discussion of this fact in the
aforementioned lecture.
Random variables in general
Random variables, also those that are neither discrete nor absolutely continuous, are
often characterized in terms of their distribution function:
Definition_ Let be a random variable. The distribution function (or cumulative
distribution function or cdf ) of is a function such that:
If we know the distribution function of a random variable , then we can easily compute
the probability that belongs to an interval as:
Proof
Note that: where the two sets on the right hand side are
disjoint. Hence, by additivity: Rearranging terms:
Random variables - More details
In the following subsections you can find more details on random variables and
univariate probability distributions:
Absolutely continuous random variables - Derivative of the distribution function
Note that, if is absolutely continuous, then: Hence, by taking the
derivative with respect to of both sides of the above equation, we obtain:
Absolutely continuous random variables and zero-probability events
Note that, if is an absolutely continuous random variable, the probability that takes
on any specific value is equal to zero: Thus, the event
is a zero-probability event for any . The lecture entitled zero-
probability events contains a thorough discussion of this apparently paradoxical fact:
although it can happen that , the event has zero probability of
happening.
A more rigorous definition of random variable
Random variables can be defined in a more rigorous manner using the terminology of
measure theory. Let be a probability space. Let be a function . Let
be the Borel -algebra of . If, for any then
is a random variable on . As a consequence, if satisfies the above property, then
for any , can be defined as follows:
where the probability on the right hand side is
well-defined because the set is measurable by the very definition of
random variable.