1. Introduction
Probability distributions are used in hydrology to determine the probability of occurrence of extreme events: maximum flows, maximum volumes and maximum precipitation.
A criticism of the probabilistic estimation of extreme events is that it does not take climate changes into account, this being generally solved with hydrological modeling using learning machines [
1] of the neural network type and others. However, statistical modeling is complementary and necessary to other types of black-box and gray-box modeling, because the input and output data from these models must be presented probabilistically, especially for extreme events with low probability, necessary for the design of hydrotechnical constructions.
In the case of determining the maximum flows, three or more parameter distributions are recommended, while for the determination of precipitation and maximum volumes, two parameter distributions are recommended [
2,
3,
4].
In general, finding approximate relations to express the parameters is very important because many of these probability distributions, for exact determination, require solving with the Levenberg–Marquardt algorithm [
5] for nonlinear least squares curve-fitting problems, which can sometimes represent a difficulty requiring the use of specific calculation programs.
Another important aspect is the expression of the quantile function (inverse of the distribution function) of these distributions, both of which facilitate the use of these distributions in the determination of the analyzed extreme values.
This article presents approximate forms of the expressions of some parameters of some frequently used probability distributions in hydrology, such as: Pearson III (PE3), Generalized extreme value (GEV), Weibull (W3), Weibull (W2), Fréchet (F3), Fréchet (F2), Generalized Pareto (P3), Log-Logistics (LG3), Kappa (K3-Generalized Gumbel), Kappa (K3-Park), Pearson V (PV3), Pearson V (PV2) and Generalized Exponential (EG2).
The approximate forms are for determining the parameters using the method of ordinary moments (MOM) and the method of linear moments (L-moments).
For the method of ordinary moments, the parameters of the three-parameter distributions are determined from relationships that use mean, variance and skewness, the latter representing the ratio between the centered moment of order 3 and variance to the 1.5 power. For the method of linear moments, the parameters of the distribution are determined from relations that use the arithmetic mean (), and , the latter representing the ratio between the moment and .
These methods are some of the most frequently used methods for estimating parameters in hydrology [
2,
3,
4,
6,
7,
8,
9]. The approximations use rational and polynomial functions in which the coefficients are obtained by least squares nonlinear regression on the domain of definition considered.
In some cases, the approach is similar by using the same type of function but with more terms, and in other cases the approach is different by using other types of functions.
For the method of ordinary moments, the approximation range most often used for skewness is 0–4, but in certain regions of the world they meet up to a skewness equal to 9 [
3], or the skewness is adopted by multiplying the coefficient of variation with a chosen positive coefficient [
10], so the skewness is always positive, even if that of the given sample may have negative skewness values. The definition range of approximations 0–8 was chosen in the case of MOM, and the definition range for the method of linear moments is the default 0–1. Considering that the approximation relations must be as accurate as possible, the relative error graphs are also presented. The relative error is the ratio between the difference between the exact and approximate value of the skewness, and the exact value. In this article, the new approximation relations are noted as “a better approximation”. For the L-skewness of the sample, it can have negative values, generally greater than −0.4.
Thus, all these novelty elements for these distributions presented in
Table 1 will help hydrology researchers to use these distributions easily.
The research also had as its object the use of mathematics by the large mass of engineers in the field of hydrotechnics, because the use of dedicated software without knowledge of mathematical methods is not beneficial. In this way, functions from Mathcad were presented comparatively, which can be equated by researchers with functions from their favorite programs (Matlab, Excel, Python, R, etc.).
2. Probability Distributions
In this section, the analyzed probability distributions are presented. The probability density function, ; the complementary cumulative distribution function, , and quantile function (inverse function), are presented below.
2.1. Pearson III Distribution (PE3)
The Pearson III distribution belongs to the Gamma distribution family, being a generalized form of the two-parameter Gamma distribution [
11], with shifted
, and a particular case of the four-parameter gamma distribution [
5,
12,
13].
where
are the shape, the scale and the position parameters and
can take any values of range
if
or
if
and
;
represent the mean (expected value) and standard deviation. If
(negative skewness) then the first argument of the inverse of the distribution function Gamma,
becomes
.
The expressions of the inverse function , using the mean and standard deviation, are valid only for the method of ordinary moments.
2.2. Generalized Extreme Value Distribution (GEV)
The GEV distribution was introduced for the first time by Jenkins in 1955 [
11] for the analysis of extreme values. It is also known as the Fisher–Tippett distribution.
Depending on the sign of the shape parameter, this can be transformed into the Weibull (
), Fréchet (
) or Gumbel (
) distributions [
11,
13,
14].
where
are the shape, the scale and the position parameters;
can take any values of range
if
, and
if
.
2.3. Weibull Distribution (W3)
The distribution represents a particular case of the GEV distribution when the shape parameter is negative. It is also known as the Type II extreme value distribution, [
11,
15,
16].
where
are the shape, the scale and the position parameters;
.
2.4. Weibull Distribution (W2)
The distribution represents a particular case of the W3 distribution when the position parameter is 0 [
11,
16]:
where
are the shape and the scale parameters;
.
2.5. Fréchet Distribution (F3)
The distribution represents a particular case of the GEV distribution when the shape parameter is positive. It is also known as the Type III extreme value distribution [
11,
17,
18].
where
are the shape, the scale and the position parameters;
.
2.6. Fréchet Distribution (F2)
The distribution represents a particular case of the F3 distribution when the position parameter is 0 [
16].
where
are the shape and the scale parameters;
.
2.7. Generalized Pareto Distribution (P3)
The Pareto distribution was introduced by Pickands in 1975 [
17]. The distribution represents a special case of the five-parameter Wakeby distribution [
11,
14], respectively, of the four-parameter Wakeby distribution [
13,
17].
where
are the shape, the scale and the position parameters;
if
or
if
.
2.8. Log-Logistic Distribution (LL3)
The distribution was popularized in hydrology by Ahmad, et al. in 1988 and represents a generalized form of the two-parameter Log-Logistic distribution [
11,
17].
where
are the shape, the scale and the position parameters;
,
.
2.9. Kappa Distribution (K3-Generalized Gumbel)
The distribution represents a particular case of the four-parameter Kappa distribution. It is also known as the generalized Gumbel distribution [
19].
where,
are the shape, the scale and the position parameters.
.
2.10. Kappa Distribution (K3-Park)
The distribution represents a particular case of the four-parameter Kappa distribution and a generalized form of the two-parameter Kappa distribution by adding a location parameter (shifted x), being presented for the first time in 2009 by Park et al. [
20].
where,
are the shape, the scale and the position parameters.
2.11. Pearson V Distribution (PV3)
The distribution represents the inverse of the Pearson III distribution [
21].
where
are the shape, the scale and the position parameters;
and
can take any values of the range
.
2.12. Pearson V Distribution (PV2)
Two parameters, Pearson V distribution, represent a particular form of PV5 when the position parameter is 0 [
21].
where
are the shape and the scale parameters;
.
2.13. Generalized Exponential Distribution (EG2-Gupta)
The Generalized Exponential Distribution is an alternative to the two-parameter Gamma and Weibull distributions. It was introduced by Gupta and Kundu in 1999 [
22].
where
are the shape and the scale parameters;
.
3. Methods
This chapter presents the parameter estimation relationships for the method of ordinary moments and the method of linear moments, as well as the approximate estimation of the parameters, using various approximation functions; the variation graphs of approximately estimated parameters and how to obtain the approximation functions.
The estimated parameters for the analyzed distribution can be exactly obtained only by numerical methods, because the moments equations are nonlinear, being presented in
Appendix B. Mathematical notations for the built-in functions in the Mathcad program used in this article are shown in
Appendix A.
3.1. Method for Parameter Approximation
The approach method for three-parameter distributions consisted of determining the domain of definition of the variable as a function of for MOM and for L-moments. The limits of the definition domain of the parameter were established according to the extreme values of and , thus was calculated for , respectively, . The and functions were plotted for both methods.
The variation graph of skewness (
) for the first vertical axis and L-skewness (
) and for the vertical secondary axis, depending on the parameter
are presented in
Figure 1.
For two-parameter distributions, the definition range of the variable was determined as a function of for MOM and for L-moments. The limits of the definition domain of the parameter were established according to the extreme values of and , thus was calculated for , respectively, for . The functions and were plotted for both methods.
The variation graph of coefficient of variation for the first vertical axis (
) and L-coefficient of variation (
) for the vertical secondary axis, depending on the parameter
are presented in
Figure 2.
Graphical examination indicated the type of approximation functions. The first-order derivatives were calculated to determine the slope variation, so for uniform variations over certain intervals a type of function is defined. If the variation is uniform in logarithmic scale, then the argument of the function is chosen logarithmically for the approximation with a polynomial, otherwise the rational function is chosen without logarithmization. The rational function is the ratio of two polynomial functions. There were situations in which the rational function was reduced only to a polynomial function, because “the denominator” represents a constant equal to one. For some probability distributions, the same type of functions, rational or polynomial, were used, and in some cases both functions were used on some distinct defining intervals.
The attempts that were made for approximation functions took into account existing approximations for some distributions [
6,
11]. The approximation functions considered optimal in terms of relative errors and number of terms were chosen, ensuring a balance between the relative error allowed and the complexity of the relationship.
The calibration of the approximation functions was carried out with the linear least squares method for the polynomial functions and nonlinear for the rational functions.
3.2. Methods of Parameter Estimation
In this article, two methods of parameters estimation are studied: the method of ordinary moments (MOM) and the L-moments method.
3.2.1. Pearson III Distribution (PE3)
For estimation with MOM, the distribution parameters have the following expressions [
10]:
where
represents the skewness coefficient.
The parameter can be estimated using the rational function presented by Hosking in 1997, named in this section the “Hosking approximation”, or based on an approximation made up of two polynomial functions and one rational, depending on the definition domain of the estimated parameter, named here “a better approximation”.
Thus, for the estimation with the L-moments, the shape parameter can be evaluated numerically with the following approximate forms, depending on L-skewness ():
The Hosking approximation, rational function [
11,
14]:
If
:
if
:
better approximation, polynomial and rational functions:
If
:
if
:
if
:
The scale parameter
and the position parameter
are determined with the following expressions [
14]:
where
are the sample L-moments and L-skewness, [
7,
11,
14].
3.2.2. Generalized Extreme Value Distribution (GEV)
The parameter
can be estimated using the “approximation 1”, which represents a polynomial function presented by Rao et.al [
11], or using two new approximations of the polynomial or rational type, hereafter named “approximation 2” and “approximation 3”.
Thus, the parameter has the following approximate forms depending on :
Approximation 1, polynomial function [
11]:
If
:
if
:
better approximations:
Approximation 2, polynomial form:
Approximation 3, rational form:
For estimation with the L-moments, parameter has the following approximate forms depending on :
Approximation 1, polynomial form, presented in Rao et al. [
11] and Hosking [
14]:
Approximation 2, also a polynomial function but more accurate, presented in Rao et al. [
11]:
a better approximation, named in this section approximation 3, being a rational function:
3.2.3. Weibull Distribution (W3)
For estimation with MOM, parameter can be approximated, depending on the skewness, using the “approximation 1” of the rational function type presented by Rao, valid for a maximum skewness of 2, or using the new approximation called here “a better approximation” valid for a skewness in the range of 0–8:
Approximation 1, for
, [
11]:
a better approximation, called here approximation 2, valid for
:
For estimation with the L-moments, parameter has the following approximate forms depending on :
Approximation 1, a rational function presented by Rao et al. [
11]:
Approximation 2, polynomial form, for positive
, presented by Y.Goda [
15]:
a better approximation, named here approximation 3, which is a rational function, valid for negative and positive values of L-skewness:
3.2.4. Weibull Distribution (W2)
For estimation with MOM, parameter
has the following approximate forms depending on the coefficient of variation
:
For estimation with the L-moments, parameters have the following expressions [
11,
16,
17]:
3.2.5. Fréchet Distribution (F3)
For estimation with MOM, parameter has the following approximate forms depending on :
Rational approximation form, for
:
For estimation with the L-moments, parameter has the following approximate forms depending on L-skewness, :
Approximation 1, a rational function, adopted from GEV approximation [
11]:
a better approximation, named here approximation 2, also a rational function form but characterized by smaller relative errors in the range of 0.6–1 of the L-skewness:
3.2.6. Fréchet Distribution (F2)
For estimation with MOM, parameter
has the following approximate forms depending on the coefficient of variation
:
For estimation with the L-moments, parameters have the following expressions [
15]:
3.2.7. Generalized Pareto Distribution (P3)
For estimation with MOM, parameter has the following approximate polynomial or rational forms depending on :
For
:
for
:
For estimation with the L-moments, parameters have the following expressions [
11,
14,
16,
17]:
3.2.8. Log-Logistic Distribution (LL3)
For estimation with MOM, parameter has the following approximate forms depending on :
Approximation 1, adopted from the expression for generalized Logistic [
11,
19]:
For
, the approximation function is polynomial:
for
, the approximation function has a rational form:
approximation 2, adopted from the expression for generalized Logistic [
11]:
a better approximation referred here as approximation 3, which is a polynomial function:
For
:
for
:
For estimation with L-moments, parameters have the following expressions [
11]:
3.2.9. Kappa Distribution (K3-Generalized Gumbel)
For estimation with MOM, parameter
has the following approximate polynomial form, for
:
where
is the Euler–Mascheroni constant and
represents the digamma function, which has the following approximate form [
11]:
For estimation with the L-moments, parameter has the following approximate forms depending on :
For
:
for
:
3.2.10. Kappa Distribution (K3-Park)
For estimation with MOM, parameter
has the following approximate logarithmic polynomial form, depending on
:
For estimation with the L-moments, parameter
has the following approximate forms depending on L-skewness:
3.2.11. Pearson V Distribution (PV3)
For estimation with MOM, parameter
can be evaluated numerically with the following approximation, depending on
:
For estimation with the L-moments, parameter
has the following approximate forms depending on L-skewness:
where
has the following expression:
3.2.12. Pearson V Distribution (PV2)
For estimation with MOM, the distribution parameters have the following expressions:
For estimation with the L-moments, parameter
has the following approximate forms depending on
:
3.2.13. Generalized Exponential Distribution (EG2-Gupta)
Parameter
can be accurately obtained only by numerical methods, because the coefficient of variation and the L-coefficient of variation are nonlinear equations presented in
Appendix B.
For estimation with MOM, parameter
has the following approximate forms depending on
:
where
represent the Euler–Mascheroni’s constant and
represents the digamma function.
For estimation with the L-moments, parameter has the following approximate forms depending on :
If
if
4. Result and Discussion
The errors of the new approximations are compared with the existing ones, having a uniform relative error below 10
−3.
Figure 3,
Figure 4 and
Figure 5 present the relative errors of the estimated parameters for the analyzed probability distributions.
The improvement in the approximations was achieved by adopting other types of functions compared to the existing ones or by using some conditional functions, for example the approximation for the PE3 distribution in which the definition domain was divided into three equal intervals for .
The proposed approximations were made for the estimated parameters, sometimes being necessary to transform them by logarithmization. Thus, there are approximations with polynomial functions to the logarithmic parameters, it being necessary to write them later in the exponent. In the following, this type of approximation will be referred to as a polynomial with a logarithmic argument.
For the GEV distribution, a polynomial approximation was tried for the estimation of MOM, following an existing model calibrated in the range applied for the extended range . The errors obtained were 10−2, so a new approximation with a rational function was tried, which gave relative errors below 10−3. The same type of rational function was used for the L-moments estimation, with errors below 10−4 over the entire domain of .
For the W3 distribution, rational functions were used to approximate the parameter. In the case of MOM estimation, the existing approximation was of a rational type, valid for , and the new approximation extended the definition domain to with a relative error for MOM below 10−2. For the estimation of L-moments, there were two rational-type approximations with logarithmic and polynomial arguments with significant errors for The proposed rational type approximation is characterized by relative errors less than 10−2 over the entire domain.
The W2 distribution requires approximation only for MOM, not existing so far, thus the proposed approximation with polynomial function with logarithmic argument has relative errors below 10−4.
For the F3 distribution, to estimate the parameters with MOM and L-moments, the rational function type was adopted, with relative errors below 10−3 and 10−4, respectively. For the F2 distribution, the approximation used is polynomial with the logarithmic argument having relative errors below 10−3.
For the P3 distribution, in the case of MOM estimation, the analysis resulted in the necessity of dividing the skewness coefficient into two intervals. For a polynomial function of the 7th degree was adopted with relative errors below 10−6, respectively; for the interval a rational function was chosen with relative errors below 10−4.
For the LL3 distribution, the two existing approximations for parameter estimation with MOM have relative errors up to 0.1. The proposed approximation uses the same ranges for , but the approximation functions differ. For the approximation function is of the polynomial type with a logarithmic argument with relative errors below 10−4, and for the approximation function is of the polynomial type with relative errors below 10−3.
For the K3-Generalized Gumbel distribution, a polynomial type approximation function was adopted for MOM estimation. For L-moments estimation, the approximation function is of the conditional type defined on two intervals using a polynomial and a rational function. The relative errors for both parameter estimation methods are below 10−3.
For the K3-Park and PV3 distributions, the type of polynomial function with logarithmic argument with relative errors below 10−3 was adopted for both parameter estimation methods.
For the PV2 distribution for the L-moment estimation of the parameter, the approximation function adopted is of the polynomial function type with logarithmic argument with relative errors below 10−4.
For the EG2-Gupta distribution, a polynomial type approximation function with a logarithmic argument was adopted for MOM estimation, with relative errors below 10−2. For L-moments estimation, the approximation function is of the conditional type defined on two equal intervals using a polynomial function with a logarithmic argument and a rational one with relative errors below 10−4.
5. Conclusions
The probability distributions presented here are frequently used in hydrology to calculate maximum rainfall, maximum and minimum flows, and volumes of synthetic floods.
The need to approximate the parameters is given by obtaining some initial values for the numerical calculation of the parameters, reducing the number of iterations and thus the calculation time.
The approximate values calculated with the formulas presented here can be used directly to estimate the parameters of the probability distributions due to very small errors.
The relative errors of the parameter estimates are generally well below 10−3, which has a much smaller implication on the relative errors of the inverse function values and are independent of the length of the observed data. The first-order derivatives of the parameters determined from the approximations show negligible errors, especially for “better approximations”. The choice of the approximation functions using mathematical analysis tools, especially the analysis of the parameter variation graph, allowed the improvement of the approximation functions and the choice of new approximation functions, especially for the distributions for which there were no approximation formulas. The functions used in the approximation relations are of the rational and polynomial type, sometimes with a logarithmic argument.
The comparative presentation of the variation in the estimated parameter for the two methods of estimating the parameters of the probability distributions is useful in choosing the skewness coefficient in the case of MOM, considering that the linear moments are closer to reality, an important aspect in Romania where the skewness is chosen according to the genesis of the flows, a legacy from the USSR normative standards [
23].
In general, in Romania, tabular calculation is used for a small number of distributions (Pearson III and Kritsky-Menkel) using linear interpolation, so the approximations presented in this article prove to be extremely useful, facilitating the use of a larger number of distributions, a necessity regarding the updating of Romanian normative standards at the international level.
It is worth mentioning that this article can be used as a guideline for the simpler implementation of these probability distributions in various software.
All research was carried out by authors in the Faculty of Hydrotechnics with hydrological data provided by the National Institute of Hydrology and Water Management and National Administration “Romanian Waters”.
The presentation of some approximate forms of the parameters, especially for the estimation with the L-moments method, represents a step forward in the implementation stage of a transition from MOM to a regionalization based on L-moments in Romania.