Probability Distribution Basics
Probability Distribution Basics
In probability theory and statistics, a probability distribution is the mathematical function that gives the
probabilities of occurrence of different possible outcomes for an experiment.[1][2] It is a mathematical
description of a random phenomenon in terms of its sample space and the probabilities of events (subsets of
the sample space).[3]
For instance, if X is used to denote the outcome of a coin toss ("the experiment"), then the probability
distribution of X would take the value 0.5 (1 in 2 or 1/2) for X = heads, and 0.5 for X = tails (assuming
that the coin is fair). More commonly, probability distributions are used to compare the relative occurrence
many different random values.
Probability distributions can be defined in different ways and for discrete or for continuous variables.
Distributions with special properties or for especially important applications are given specific names.
Introduction
A probability distribution is a mathematical description of the probabilities of events, subsets of the sample
space. The sample space, often denoted by , is the set of all possible outcomes of a random phenomenon
being observed; it may be any set: a set of real numbers, a set of vectors, a set of arbitrary non-numerical
values, etc. For example, the sample space of a coin flip would be Ω = {heads, tails}.
To define probability distributions for the specific case of random variables (so the sample space can be
seen as a numeric set), it is common to distinguish between discrete and absolutely continuous random
variables. In the discrete case, it is sufficient to specify a probability mass function assigning a probability
to each possible outcome: for example, when throwing a fair dice, each of the six values 1 to 6 has the
probability 1/6. The probability of an event is then defined to be the sum of the probabilities of the
outcomes that satisfy the event; for example, the probability of the event "the die rolls an even value" is
In contrast, when a random variable takes values from a continuum then typically, any individual outcome
has probability zero and only events that include infinitely many outcomes, such as intervals, can have
positive probability. For example, consider measuring the weight of a piece of ham in the supermarket, and
assume the scale has many digits of precision. The probability that it weighs exactly 500 g is zero, as it will
most likely have some non-zero decimal digits. Nevertheless, one might demand, in quality control, that a
package of "500 g" of ham must weigh between 490 g and 510 g with at least 98% probability, and this
demand is less sensitive to the accuracy of measurement instruments.
Absolutely continuous probability distributions can be described in several ways. The probability density
function describes the infinitesimal probability of any given value, and the probability that the outcome lies
in a given interval can be computed by integrating the probability density function over that interval.[4] An
alternative description of the distribution is by means of the cumulative distribution function, which
describes the probability that
the random variable is no
larger than a given value (i.e.,
for some ). The
cumulative distribution
function is the area under the
probability density function
from to , as described
by the picture to the right.[5]
The left graph shows a probability density function. The right graph shows the
cumulative distribution function, for which the value at a equals the area
General under the probability density curve to the left of a.
probability
definition
A probability distribution can be described in various forms, such as by a probability mass function or a
cumulative distribution function. One of the most general descriptions, which applies for absolutely
continuous and discrete variables, is by means of a probability function whose input space
is a σ-algebra, and gives a real number probability as its output, particulary, a number in .
The probability function can take as argument subsets of the sample space itself, as in the coin toss
example, where the function was defined so that P(heads) = 0.5 and P(tails) = 0.5 . However,
because of the widespread use of random variables, which transform the sample space into a set of numbers
(e.g., , ), it is more common to study probability distributions whose argument are subsets of these
particular kinds of sets (number sets),[6] and all probability distributions discussed in this article are of this
type. It is common to denote as the probability that a certain value of the variable belongs to
a certain event . [7][8]
The above probability function only characterizes a probability distribution if it satisfies all the Kolmogorov
axioms, that is:
The concept of probability function is made more rigorous by defining it as the element of a probability
space , where is the set of possible outcomes, is the set of all subsets whose
probability can be measured, and is the probability function, or probability measure, that assigns a
probability to each of these measurable subsets .[9]
Probability distributions usually belong to one of two classes. A discrete probability distribution is
applicable to the scenarios where the set of possible outcomes is discrete (e.g. a coin toss, a roll of a die)
and the probabilities are encoded by a discrete list of the probabilities of the outcomes; in this case the
discrete probability distribution is known as probability mass function. On the other hand, absolutely
continuous probability distributions are applicable to scenarios where the set of possible outcomes can
take on values in a continuous range (e.g. real numbers), such as the temperature on a given day. In the
absolutely continuous case, probabilities are described by a probability density function, and the probability
distribution is by definition the integral of the probability density function.[7][4][8] The normal distribution is
a commonly encountered absolutely continuous probability distribution. More complex experiments, such
as those involving stochastic processes defined in continuous time, may demand the use of more general
probability measures.
A probability distribution whose sample space is one-dimensional (for example real numbers, list of labels,
ordered labels or binary) is called univariate, while a distribution whose sample space is a vector space of
dimension 2 or more is called multivariate. A univariate distribution gives the probabilities of a single
random variable taking on various different values; a multivariate distribution (a joint probability
distribution) gives the probabilities of a random vector – a list of two or more random variables – taking on
various combinations of values. Important and commonly encountered univariate probability distributions
include the binomial distribution, the hypergeometric distribution, and the normal distribution. A commonly
encountered multivariate distribution is the multivariate normal distribution.
Besides the probability function, the cumulative distribution function, the probability mass function and the
probability density function, the moment generating function and the characteristic function also serve to
identify a probability distribution, as they uniquely determine an underlying cumulative distribution
function.[10]
Terminology
Some key concepts and terms, widely used in the literature on
the topic of probability distributions, are listed below.[1]
Cumulative distribution function: function evaluating the probability that will take a value
less than or equal to for a random variable (only for real-valued random variables).
Quantile function: the inverse of the cumulative distribution function. Gives such that, with
probability , will not exceed .
Related terms
Support: set of values that can be assumed with non-zero probability by the random
variable. For a random variable , it is sometimes denoted as .
Tail:[12] the regions close to the bounds of the random variable, if the pmf or pdf are relatively
low therein. Usually has the form , or a union thereof.
Head:[12] the region where the pmf or pdf is relatively high. Usually has the form .
Expected value or mean: the weighted average of the possible values, using their
probabilities as their weights; or the continuous analog thereof.
Median: the value such that the set of values less than the median, and the set greater than
the median, each have probabilities no greater than one-half.
Mode: for a discrete random variable, the value with highest probability; for an absolutely
continuous random variable, a location at which the probability density function has a local
peak.
Quantile: the q-quantile is the value such that .
Variance: the second moment of the pmf or pdf about the mean; an important measure of the
dispersion of the distribution.
Standard deviation: the square root of the variance, and hence another measure of
dispersion.
Symmetry: a property of some distributions in which the portion of the distribution to the left
of a specific value (usually the median) is a mirror image of the portion to its right.
Skewness: a measure of the extent to which a pmf or pdf "leans" to one side of its mean. The
third standardized moment of the distribution.
Kurtosis: a measure of the "fatness" of the tails of a pmf or pdf. The fourth standardized
moment of the distribution.
The cumulative distribution function of any real-valued random variable has the properties:
is non-decreasing;
is right-continuous;
and ; and
Conversely, any function that satisfies the first four of the properties above is the cumulative
distribution function of some probability distribution on the real numbers.[13]
Any probability distribution can be decomposed as the mixture of a discrete, an absolutely continuous and a
singular continuous distribution,[14] and thus any cumulative distribution function admits a decomposition
as the convex sum of the three according cumulative distribution functions.
The points where the cdf jumps always form a countable set; this
may be any countable set and thus may even be dense in the real
numbers.
or in short,
Similarly, discrete distributions can be represented with the Dirac delta function as a generalized probability
density function , where
which means
for any event [17]
Indicator-function representation
For a discrete random variable , let be the values it can take with non-zero probability.
Denote
It follows that the probability that takes any value except for is zero, and thus one can write
as
except on a set of probability zero, where is the indicator function of . This may serve as an
alternative definition of discrete random variables.
One-point distribution
A special case is the discrete distribution of a random variable that can take on only one fixed value; in
other words, it is a deterministic distribution. Expressed formally, the random variable has a one-point
distribution if it has a possible outcome such that [18] All other possible outcomes then
have probability 0. Its cumulative distribution function jumps immediately from 0 to 1.
This is the definition of a probability density function, so that absolutely continuous probability distributions
are exactly those with a probability density function. In particular, the probability for to take any single
value (that is, ) is zero, because an integral with coinciding upper and lower limits is always
equal to zero. If the interval is replaced by any measurable set , the according equality still holds:
There are many examples of absolutely continuous probability distributions: normal, uniform, chi-squared,
and others.
Absolutely continuous probability distributions as defined above are precisely those with an absolutely
continuous cumulative distribution function. In this case, the cumulative distribution function has the
form
For a more general definition of density functions and the equivalent absolutely continuous measures see
absolutely continuous measure.
Kolmogorov definition
In the measure-theoretic formalization of probability theory, a random variable is defined as a measurable
function from a probability space to a measurable space . Given that probabilities of
events of the form satisfy Kolmogorov's probability axioms, the probability
distribution of is the image measure of , which is a probability measure on satisfying
.[22][23][24]
This kind of complicated support appears quite frequently in dynamical systems. It is not simple to establish
that the system has a probability measure, and the main problem is the following. Let be
instants in time and a subset of the support; if the probability measure exists for the system, one would
expect the frequency of observing states inside set would be equal in interval and , which
might not happen; for example, it could oscillate similar to a sine, , whose limit when does
not converge. Formally, the measure exists only if the limit of the relative frequency converges when the
system is observed into the infinite future.[28] The branch of dynamical systems that studies the existence of
a probability measure is ergodic theory.
Note that even in these cases, the probability distribution, if it exists, might still be termed "absolutely
continuous" or "discrete" depending on whether the support is uncountable or countable, respectively.
For example, suppose has a uniform distribution between 0 and 1. To construct a random Bernoulli
variable for some , we define
so that
This random variable X has a Bernoulli distribution with parameter .[29] This is a transformation of
discrete random variable.
For example, suppose a random variable that has an exponential distribution must be
constructed.
A frequent problem in statistical simulations (the Monte Carlo method) is the generation of pseudo-random
numbers that are distributed in a given way.
The following is a list of some of the most common probability distributions, grouped by the type of
process that they are related to. For a more complete list, see list of probability distributions, which groups
by the nature of the outcome being considered (discrete, absolutely continuous, multivariate, etc.)
All of the univariate distributions below are singly peaked; that is, it is assumed that the values cluster
around a single point. In practice, actually observed quantities may cluster around multiple values. Such
quantities can be modeled using a mixture distribution.
Fitting
Probability distribution fitting or simply distribution fitting is the fitting of a probability distribution to a
series of data concerning the repeated measurement of a variable phenomenon. The aim of distribution
fitting is to predict the probability or to forecast the frequency of occurrence of the magnitude of the
phenomenon in a certain interval.
There are many probability distributions (see list of probability distributions) of which some can be fitted
more closely to the observed frequency of the data than others, depending on the characteristics of the
phenomenon and of the distribution. The distribution giving a close fit is supposed to lead to good
predictions.
In distribution fitting, therefore, one needs to select a distribution that suits the data well.
See also
Mathematics
portal
Lists
List of probability distributions
List of statistical topics
References
Citations
1. Everitt, Brian (2006). The Cambridge dictionary of statistics (3rd ed.). Cambridge, UK:
Cambridge University Press. ISBN 978-0-511-24688-3. OCLC 161828328 (https://www.worl
dcat.org/oclc/161828328).
2. Ash, Robert B. (2008). Basic probability theory (Dover ed.). Mineola, N.Y.: Dover
Publications. pp. 66–69. ISBN 978-0-486-46628-6. OCLC 190785258 (https://www.worldcat.
org/oclc/190785258).
3. Evans, Michael; Rosenthal, Jeffrey S. (2010). Probability and statistics: the science of
uncertainty (2nd ed.). New York: W.H. Freeman and Co. p. 38. ISBN 978-1-4292-2462-8.
OCLC 473463742 (https://www.worldcat.org/oclc/473463742).
4. "1.3.6.1. What is a Probability Distribution" (https://www.itl.nist.gov/div898/handbook/eda/sec
tion3/eda361.htm). www.itl.nist.gov. Retrieved 2020-09-10.
5. A modern introduction to probability and statistics : understanding why and how. Dekking,
Michel, 1946-. London: Springer. 2005. ISBN 978-1-85233-896-1. OCLC 262680588 (https://
www.worldcat.org/oclc/262680588).
6. Walpole, R.E.; Myers, R.H.; Myers, S.L.; Ye, K. (1999). Probability and statistics for
engineers. Prentice Hall.
7. Ross, Sheldon M. (2010). A first course in probability. Pearson.
8. DeGroot, Morris H.; Schervish, Mark J. (2002). Probability and Statistics. Addison-Wesley.
9. Billingsley, P. (1986). Probability and measure. Wiley. ISBN 9780471804789.
10. Shephard, N.G. (1991). "From characteristic function to distribution function: a simple
framework for the theory" (https://ora.ox.ac.uk/objects/uuid:a4c3ad11-74fe-458c-8d58-6f745
11a476c). Econometric Theory. 7 (4): 519–529. doi:10.1017/S0266466600004746 (https://d
oi.org/10.1017%2FS0266466600004746). S2CID 14668369 (https://api.semanticscholar.or
g/CorpusID:14668369).
11. Chapters 1 and 2 of Vapnik (1998)
12. More information and examples can be found in the articles Heavy-tailed distribution, Long-
tailed distribution, fat-tailed distribution
13. Erhan, Çınlar (2011). Probability and stochastics. New York: Springer. p. 57.
ISBN 9780387878584.
14. see Lebesgue's decomposition theorem
15. Erhan, Çınlar (2011). Probability and stochastics. New York: Springer. p. 51.
ISBN 9780387878591. OCLC 710149819 (https://www.worldcat.org/oclc/710149819).
16. Cohn, Donald L. (1993). Measure theory. Birkhäuser.
17. Khuri, André I. (March 2004). "Applications of Dirac's delta function in statistics".
International Journal of Mathematical Education in Science and Technology. 35 (2): 185–
195. doi:10.1080/00207390310001638313 (https://doi.org/10.1080%2F0020739031000163
8313). ISSN 0020-739X (https://www.worldcat.org/issn/0020-739X). S2CID 122501973 (http
s://api.semanticscholar.org/CorpusID:122501973).
18. Fisz, Marek (1963). Probability Theory and Mathematical Statistics (3rd ed.). John Wiley &
Sons. p. 129. ISBN 0-471-26250-1.
19. Jeffrey Seth Rosenthal (2000). A First Look at Rigorous Probability Theory. World Scientific.
20. Chapter 3.2 of DeGroot & Schervish (2002)
21. Bourne, Murray. "11. Probability Distributions - Concepts" (https://www.intmath.com/counting
-probability/11-probability-distributions-concepts.php). www.intmath.com. Retrieved
2020-09-10.
22. W., Stroock, Daniel (1999). Probability theory : an analytic view (Rev. ed.). Cambridge
[England]: Cambridge University Press. p. 11. ISBN 978-0521663496. OCLC 43953136 (htt
ps://www.worldcat.org/oclc/43953136).
23. Kolmogorov, Andrey (1950) [1933]. Foundations of the theory of probability. New York, USA:
Chelsea Publishing Company. pp. 21–24.
24. Joyce, David (2014). "Axioms of Probability" (https://mathcs.clarku.edu/~djoyce/ma217/axio
ms.pdf) (PDF). Clark University. Retrieved December 5, 2019.
25. Alligood, K.T.; Sauer, T.D.; Yorke, J.A. (1996). Chaos: an introduction to dynamical systems.
Springer.
26. Rabinovich, M.I.; Fabrikant, A.L. (1979). "Stochastic self-modulation of waves in
nonequilibrium media". J. Exp. Theor. Phys. 77: 617–629. Bibcode:1979JETP...50..311R (htt
ps://ui.adsabs.harvard.edu/abs/1979JETP...50..311R).
27. Section 1.9 of Ross, S.M.; Peköz, E.A. (2007). A second course in probability (http://people.b
u.edu/pekoz/A_Second_Course_in_Probability-Ross-Pekoz.pdf) (PDF).
28. Walters, Peter (2000). An Introduction to Ergodic Theory. Springer.
29. Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf
Erwin (2005), "Why probability and statistics?", A Modern Introduction to Probability and
Statistics, Springer London, pp. 1–11, doi:10.1007/1-84628-168-7_1 (https://doi.org/10.100
7%2F1-84628-168-7_1), ISBN 978-1-85233-896-1
30. Bishop, Christopher M. (2006). Pattern recognition and machine learning. New York:
Springer. ISBN 0-387-31073-8. OCLC 71008143 (https://www.worldcat.org/oclc/71008143).
31. Chang, Raymond. (2014). Physical chemistry for the chemical sciences. Thoman, John W.,
Jr., 1960-. [Mill Valley, California]. pp. 403–406. ISBN 978-1-68015-835-9.
OCLC 927509011 (https://www.worldcat.org/oclc/927509011).
32. Chen, P.; Chen, Z.; Bak-Jensen, B. (April 2008). "Probabilistic load flow: A review". 2008
Third International Conference on Electric Utility Deregulation and Restructuring and Power
Technologies. pp. 1586–1591. doi:10.1109/drpt.2008.4523658 (https://doi.org/10.1109%2Fd
rpt.2008.4523658). ISBN 978-7-900714-13-8. S2CID 18669309 (https://api.semanticscholar.
org/CorpusID:18669309).
33. Maity, Rajib (2018-04-30). Statistical methods in hydrology and hydroclimatology.
Singapore. ISBN 978-981-10-8779-0. OCLC 1038418263 (https://www.worldcat.org/oclc/10
38418263).
Sources
den Dekker, A. J.; Sijbers, J. (2014). "Data distributions in magnetic resonance images: A
review". Physica Medica. 30 (7): 725–741. doi:10.1016/j.ejmp.2014.05.002 (https://doi.org/1
0.1016%2Fj.ejmp.2014.05.002). PMID 25059432 (https://pubmed.ncbi.nlm.nih.gov/2505943
2).
Vapnik, Vladimir Naumovich (1998). Statistical Learning Theory. John Wiley and Sons.
External links
"Probability distribution" (https://www.encyclopediaofmath.org/index.php?title=Probability_di
stribution), Encyclopedia of Mathematics, EMS Press, 2001 [1994]
Field Guide to Continuous Probability Distributions (http://threeplusone.com/FieldGuide.pdf),
Gavin E. Crooks.