Aalborg Universitet
Decompounding random sums: A nonparametric approach
Bøgsted, Martin; Pitts, Susan M.
Publication date:
2006
Document Version
Publisher final version (usually the publisher pdf)
Link to publication from Aalborg University
Citation for published version (APA):
Hansen, M. B., & Pitts, S. M. (2006). Decompounding random sums: A nonparametric approach. Department of
Mathematical Sciences, Aalborg University. (Research Report Series; No. R-2006-20).
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
? Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
? You may not further distribute the material or use it for any profit-making activity or commercial gain
? You may freely distribute the URL identifying the publication in the public portal ?
Take down policy
If you believe that this document breaches copyright please contact us at vbn@aub.aau.dk providing details, and we will remove access to
the work immediately and investigate your claim.
Downloaded from vbn.aau.dk on: august 16, 2015
AALBORG UNIVERSITY
✬
✩
Decompounding random sums:
A nonparametric approach
by
Martin B. Hansen and Susan M. Pitts
R-2006-20
✫
May 2006
Department of Mathematical Sciences
Aalborg University
Fredrik Bajers Vej 7 G
DK - 9220 Aalborg Øst
Denmark
+
Telefax: +45 98 15 81 29
Phone: 45 96 35 80 80
URL: http://www.math.aau.dk
ISSN 1399–2503
On-line version ISSN 1601–7811
✪
❡
Decompounding random sums:
A nonparametric approach
MARTIN B. HANSEN∗ and SUSAN M. PITTS∗∗
Department of Mathematical Sciences, Aalborg University, Fredrik Bajers Vej 7G, DK-9220
Aalborg Ø, Denmark. Email: mbh@math.aau.dk
∗
Statistical Laboratory, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3
0WB, UK. Email: S.Pitts@statslab.cam.ac.uk
∗∗
Summary. Observations from sums of random variables with a random number of summands,
known as random, compound or stopped sums arise within many areas of engineering and
science. Quite often it is desirable to infer properties of the distribution of the terms in the
random sum. In the present paper we review a number of applications and consider the nonlinear inverse problem of inferring the cumulative distribution function of the components in
the random sum. We review the existing literature on non-parametric approaches to the problem. The models amenable to the analysis are generalized considerably and the properties
of the proposed estimators are studied. Bootstrap methods are suggested to provide confidence bounds. Finally a number of algorithms are suggested to make the methods operational
and tested on simulated data. In particular we show how Panjer recursion in general can be
inverted for the Panjer as well as the Willmot class.
Keywords: Asymptotic normality; Compound distributions; Compound Poisson; Decompounding; Empirical processes; Functional central limit; Infinite-dimensional delta-method; Inverse
problems; Sampled network traffic; Workload.
1.
Introduction
Let N be an integer-valued discrete random variable with probability function
pk = P(N = k), for k = 0, 1, 2, . . . ,
and X1 , X2 , X3 , . . . a series of strictly positive independent and identically distributed (iid)
random variables with cumulative distribution function (cdf) F independent of N . Following the tradition of insurance mathematics N will be denoted the claim number and F the
claim size distribution.
Now the cdf of the random sum Y = X1 + · · · + XN is given by
G = Ψ(F ) with Ψ(F ) =
∞
X
pk F ⋆k ,
(1)
k=0
where ⋆k denotes k-fold convolution. The main objective of the present paper is to infer F
given an iid sample Y1 , . . . , Yn from G.
This problem arises in a number of applications within engineering and science. A
prominent example is the compound Poisson process which has many applications - within
insurance mathematics and queueing theory - as G denotes the distribution of the amount
2
Hansen, M. B. and Pitts, S.M.
of a random number of randomly sized insurance claims or workload requirements over
a given time interval, see Grandell (1997) for a comprehensive overview of the literature
on this subject. Buchmann and Grübel (2003) considered this decompounding problem in
detail and describe motivations from insurance mathematics and queueing theory. In their
work the problem was solved by proposing a non-parametric estimator of the cdf F and its
properties (in weak convergence sense) were investigated.
Another motivation for the set-up presented above arises when one intends to sample
(or probe) the status of communication systems by test traffic. This is a standard tool for
performance evaluation. It is e.g. used in call-center evaluations, where the distribution of
the waiting time to service is estimated by the empirical distribution function of the call
answer time for repeated phone calls. Another application is when a call admission controller
in an ATM network decides whether there are sufficient resources to allow a new connection
to be established, based on information obtained by sampling the workload at neighbouring
nodes. This problem was considered by Sharma and Mazumdar (1998) and Sharma (1999).
They consider an M/G/1 queue which is probed by Poisson traffic from which moments of
the service time distribution are inferred by the well-known moment relations which can be
derived from the Pollaczeck-Khintchine formula (Asmussen, 2003, Theorem VIII.5.7 (5.6)
and (5.7)). In Hansen and Pitts (2006) the even harder problem of inferring the whole
distribution function of the service time was solved. Hansen and Pitts (2006) extended the
work by Buchmann and Grübel (2003) to the geometric case and proposed a semi-parametric
approach to estimate the parameter of the geometric distribution and the service time cdf.
Moreover, the paper showed one can deal with the problem for regenerative samples from
G, a problem which is especially relevant dealing with queuing systems.
Another motivation arises in an infinite-capacity storage model, where inputs S1 , S2 , S3 , . . .
to the storage facility arrive as a Poisson process with rate λ, where S 1 , S2 , S3 . . . are independent identically distributed random variables, with distribution function F independent
of the Poisson process. The total amount in the storage facility at time t has the same
distribution as the workload distribution in the M/G/1 queueing model above. Suppose
that observations on the sampled total amounts at times t = 1, 2, 3 . . . are available, and
that interest lies in inference for the distribution F of the inputs to the facility. This inference problem is exactly analogous to that described above for the queueing model and the
methods of that paper apply.
Another motivation arises in network traffic. According to Hohn and Veitch (2003)
many routers have the ability to output statistics about packets and flows that traverse
them. However as the generation of detailed traffic statistics does not scale well with link
speed, many manufactures have implemented sampling strategies at the packet level. Hohn
and Veitch (2003) describe a solution to this binomial deconvolution problem and derivation
of the statistical properties of this estimator is work in progress.
The Neyman-Scott cluster point process, originally developed in Neyman and Scott
(1958) to describe the distribution of galaxies in space has become an important representation for a broad range of phenomena in the physical, biological, and social sciences. Here
the decompounding problem arises if one assumes a random number (distributed according
to pk ) of entities per observation field each of which produces a random number (distributed
according to F ) of offspring, and one asks for an estimate of F , based on observations of
the number of offspring only.
Finally, one could also consider examples where some of the pk ’s are zero, e.g. p2 = 1 and
Decompounding
3
p0 , p1 , p3 , · · · = 0, then we are dealing with the following non-linear deconvolution problem
Z
G(y) = F ⋆ F (y) = F (y − z)F (dz)
(2)
which resembles the classical linear deconvolution problem
Z
G(y) = F ⋆ H(y) = F (y − z)H(dz)
where F is an unknown distribution function and H is a known error function. This is an illposed and highly studied inverse problem, see e.g. Zhang (1990). However, the non-linearity
in the present problem requires new methods to be developed.
In what follows a cdf of a non-negative random variable is denoted by a capital letter,
e is denoted by
A, say. The Laplace-Stieltjes transform L(A) or A
Z
e−θx A(dx)
L(A)(θ) =
[0,∞)
for those θ ∈ R for which the integral is defined. The Fourier-Stieltjes transform F(A) or
b is denoted by
A
Z
F(A)(θ) = eixθ A(dx), θ ∈ R.
We suggest the empirical cumulative distribution function (ecdf) as an estimator for G
Gn (x) = n−1
n
X
i=1
1(Yi ≤ x), 0 ≤ x < ∞.
(3)
Its n’th empirical process counterpart is defined as
Gn (x) = n1/2 (Gn (x) − G(x)), 0 ≤ x < ∞.
(4)
Assume we want to make statistical inference about F . Formula (1) can be inverted under
conditions in the following way
F (x) =
∞
X
πk (Go )⋆k (x),
(5)
k=1
where the πk ’s can be expressed in terms of the pk ’s. For any function h we write ho for
the function x 7→ h(x) − h(0) and the convergence in (5) is in a suitable weighted Banach
space, see Theorem 1 below. This leads to the following plug-in estimator of F
Fn (x) =
∞
X
πk (Gon )⋆k (x).
(6)
k=1
In (5), F is given in terms of G, so that F is determined by G, and we can write F as a functional of G. The proposed estimator Fn is then the result of applying the same functional
to the estimator Gn of G respectively. Given an appropriate asymptotic normality result
for Gn , the infinite-dimensional delta method can be used to derive asymptotic normality
of Fn , provided that the functional in question satisfies a particular differentiability result.
4
Hansen, M. B. and Pitts, S.M.
For general descriptions of the infinite-dimensional delta method, see Gill (1989) and van
der Vaart (1998).
From (1), we see that G is a compound distribution function, based on F , and so the
inverse functional that takes G onto F is a decompounding functional that “decompounds”
the compound distribution. As such, this functional is closely related to that in Buchmann
and Grübel (2003), where the notion of decompounding is introduced in the context of decompounding for a compound Poisson distribution. The set-up and proofs for the definition
and differentiability of our functional follow those in Buchmann and Grübel (2003), making
adaptations for the general case as necessary. Moreover, if data do not consist of independent identically distributed observations, but rather exhibit regenerative structure, then
asymptotic normality results for the input estimator can be obtained using the approach
of Hansen and Pitts (2006), whose approach was obtained using an empirical central limit
theorem for regenerative data, see Tsai (1998) and Levental (1988).
The infinite-dimensional delta method has been used for stochastic models in previous
work, and this paper follows the set-up and approach developed in, for example, Grübel and
Pitts (1993), Hansen and Pitts (2006), Bingham and Pitts (1999b) and Bingham and Pitts
(1999a). These last two papers study inference for service-time distributions given data
on busy and idle periods for the M/G/1 and the M/G/∞ queues respectively, and can be
regarded as tackling inverse problems, in the same way that inference for decompounding
can be regarded as an inverse problem.
In this paper we estimate the distribution function and in our work (Hansen and Pitts,
2006) we have, inspired by recent interests in queueing and insurance applications been much
concerned with tail behaviour. However, in many informal investigations of the properties of
a given set of data a density estimator is popular as it can give valuable indications of such
features as skewness and multi-modality. It is indeed possible to make a kernel-estimator
of the density function based on the methods presented in this paper
fn (x) =
Z
K(x − y)Fn (dy),
where K is a smoothing-kernel, see e.g. Wand and Jones (1995) for more details. We
are not going to discuss any properties of such estimators whatsoever, but merely refer to
detailed papers on this matter. More specifically Hall and Park (2004) gives an approach
to estimation of the density of F from busy period data presented in Bingham and Pitts
(1999a), and van Es et al. (2005) for a density approach to Buchmann and Grübel (2003).
The paper is organised as follows. In Section 2 some necessary background from complex
analysis as well as (sub-) probability generating functions are reviewed. This insight is used
to define the estimator and show it is well-defined. The section is concluded with statements
of the asymptotic normality for Fn as well as stating asymptotic properties of the proposed
bootstrap confidence region. Section 3 contains a comparisons of necessary bounds under the
developed general theory and the bounds which can be derived using the explicit parametric
structure of the compounding under study. Various algorithms for dealing with the general
approach to decompounding are introduced in Section 4. In this section an example of the
estimator in action is also given and a discussion and conclusion are provided in Section 5.
Proofs related to the properties of the estimator are given in Section 6.
Decompounding
2.
5
Estimation
2.1. Background on sub-probability generating functions
In order to invert (1) we rephrase the problem as reversing (in line with the tradition of
complex analysis we use the word reverse for finding the power series of the inverse with
respect to composition of a power series, see Henrici (1974, p. 47)) the following power series
g(z) =
∞
X
k=1
pk z k , z ∈ C.
(7)
Notice, that g is actually a sub-probability generating function (if p0 > 0, otherwise a
probability generating function) and will therefore always have a radius of convergence r(g)
greater than or equal to 1 (Johnson et al., 2005, Section 1.2.11).
way to calculate candidates for the coefficients πk of the reversion f (w) =
P∞A standard
k
π
w
is
formally
to solve for πk in the equation
k=1 k
z=
∞
X
k=1
πk
"
∞
X
pl z
l=1
l
#k
(8)
by equating powers of z. This leads to the recursion schemes in Henrici (1974, (1.7-2)),
which indeed have a unique solution, if p1 6= 0, see Henrici (1974, Theorem 1.7a).
However, the procedure to obtain the coefficients πk of f is usually difficult to implement
in practice. This issue will be treated in Section 4.1
The next step is to prove that the formal power series is a reverse of g. We need to give
a precise statement of results from complex analysis about reversion of power series, as they
apply to our power series g. The next proposition follows directly from Theorem 2.4b and
the proof of Theorem 2.4c in Henrici (1974).
P∞
Proposition 1. Let g(z) = k=1 pk z k where pk = P(N = k), p1 6= 0, and r(g) is the
radius of convergence.
(a) Then the reversion of g
f (w) =
∞
X
πk w k
k=1
(as defined above) has positive radius of convergence r(f ).
(b) There exist σg , 0 < σg ≤ r(g), and σf , 0 < σf ≤ r(f ), such that |w| < σf implies that
there is a unique z in |z| < σg such that g(z) = w, and this z is given by z = f (w).
(c) We have |w| < σf implies that
z = f (w) = g −1 (w) =
∞
X
k=1
πk wk and |f (w)| < σg .
(9)
(d) There exists ρg with 0 < ρg ≤ σg such that
|z| < ρg implies that |g(z)| < σf ,
and so for z such that |z| < ρg , we have z = f (g(z)) =
P∞
k=1
(10)
k
πk g(z) .
6
Hansen, M. B. and Pitts, S.M.
It is also possible to give lower bounds on those ρg and σf for which Proposition 1(d)
holds. The next lemma follows directly from Copson (1935, p. 123) (according to Copson
this result was first derived by Laudau)
Lemma 1 (Landau). Under the assumptions of Proposition 1, let 0 < R < r(g) and
|g(z)| ≤ M for |z| ≤ R. Then if |w| < (Rp1 )2 /(6M ) we have that there is a unique z in
|z| < R2 p1 /(4M ) such that g(z) = w, and this z is given by z = f (w).
In this case it is possible to choose
ρg = R2 p1 /(4M )
and
σf = (Rp1 )2 /(6M ).
Actually, Redheffer (1962) improved upon the lower bound for the radius of convergence
for the reversed series. The following result follows directly by an adaption of his proof.
Lemma 2 (Redheffer). Under the assumptions of Proposition 1, let 0 < R < r(g)
and
Z 2π
1
|g(reiθ )|2 dθ ≤ M 2
2π 0
for |z| < R, and define A = Rp1 /M . Then if |w| < (Rp1 )2 (1 − (3/4)A2 )−1/2 /(4M ) we have
that there is a unique z in |z| < R sin(arctan(A(1 − A2 )−1/2 /2)) such that g(z) = w, and
this z is given by z = f (w).
In this case it is possible to choose
ρg = R sin(arctan(A(1 − A2 )−1/2 /2))
and
σf = (RA)2 (1 − (3/4)A2 )−1/2 /(4M ).
Remark 1. As the sub-probability generating function always has a radius of convergence greater than one, we immediately obtain, the following “universal” bounds on z and
g(z) for reverting the series in (7) (take R = 1 and M = 1)
ρg
σf
= sin(arctan((p1 (1 − p21 )−1/2 /2))
=
p21 (1
−
(3/4)p21 )−1/2 /4
(11)
(12)
To our knowledge there do not exist any systematic studies of improvements on the
Redheffer bound for power series. Although very interesting and relevant for the present
paper we leave this as an open question.
2.2. Definition of the estimator
We avoid trivialities by requiring throughout the paper that the random variable N is not
concentrated at 0. We also rule out the case where F is concentrated at one point and N
is concentrated at 1 (so that G is not concentrated at one point).
Turning now to the general decompounding problem, we define appropriate weighted
spaces for our estimator. We follow the approach and methodology of Buchmann and
Decompounding
7
Grübel (2003), and in particular we use the weighted spaces defined there, which we now
describe. Let D[0, ∞) be the Banach space of cadlag functions h on [0, ∞). For τ in R, let
Dτ [0, ∞) be the space of all functions h : [0, ∞) → R, such that the function x 7→ e−τ x h(x),
is in D[0, ∞). For h in Dτ [0, ∞), let khk∞,τ = supx≥0 e−τ x |h(x)|, so that (Dτ [0, ∞), k·k∞,τ )
is a Banach space. Theorem 1 below states that, under conditions on G, the right-hand
side of (5) is in Dτ [0, ∞). Recall that σf , σg and ρg are defined in Proposition 1.
Theorem 1. Let τ > 0 and assume that G is a distribution function on [0, ∞) with
e o (τ ) < σf . Then the series
G
Λ(G) =
∞
X
πk (Go )⋆k
k=1
converges in Dτ [0, ∞).
Suppose in addition that Fe (τ ) < σg . Then
G = Ψ(F ) implies that F = Λ(G).
e o (τ ) < σf
Note that, with ρg defined as in (10), Fe (τ ) < ρg (≤ σg ) implies that G
o
(≤ r(f )), so that in this case Λ(G ) is defined and is equal to F . This means that the
e o (τ ) in the statement of the theorem may be replaced by the
conditions on Fe(τ ) and G
e
single condition F (τ ) < ρg .
Remark 2. As argued in Remark 1 it is possible to obtain universal bounds on ρ g and
e o (τ ) → 0 and Fe(τ ) → 0 as τ → ∞, it always
σf which are strictly greater than 0. As G
possible to find a space Dτ [0, ∞) such that (1) is invertible!
In (5), our quantity of interest F is given in terms of G, and this inverse representation
was made precise in Theorem 1 above. Thus the the plug-in estimator given in (6) is then
Fn = Λ(Gn ).
2.2.1. Properties of the estimator
In this section, we state two results, one giving asymptotic normality of the proposed
estimator, in terms of convergence in distribution to a Gaussian process in a Dτ -space,
and the second giving asymptotic validity of bootstrap confidence regions. Throughout the
paper, weak convergence in Banach spaces refers to σ-algebras generated by the open balls
in the respective norms, see Pollard (1984), page 199. We can now formulate the main
result on weak convergence of the inverse estimator of the distribution function F .
e o (τ ) < σf . Then
Theorem 2. Assume that τ > 0 is such that Fe(τ ) < σg and G
√
n(Fn − F ) →D A as n → ∞,
in (Dτ [0, ∞), k · k∞,τ ), where A is a centered Gaussian process with
Z Z
cov A(s), A(t) =
Go (s − x) ∧ (t − y) H(dx)H(dy) − Go ⋆ H(s)Go ⋆ (H)(t),
s, t ≥ 0,
8
Hansen, M. B. and Pitts, S.M.
where
H =
∞
X
kπk (Go )⋆(k−1) .
k=1
Turning to the question of obtaining confidence regions in Dτ [0, ∞) for the unknown F ,
we define, for z ≥ 0,
√
Rn (z) = P nkFn − F k∞,τ ≤ z ,
and
R(z) = P kAk∞,τ ≤ z .
From the convergence in distribution proved in Theorem 2, we know that
Rn (z) → R(z)
at all continuity points of R. If the distribution function R were known, then an asymptotic
100α% confidence region for F would be the set of all distribution functions F such that
q(α)
kFn − F k∞,τ ≤ √ ,
n
where q(α) is a continuity point of R with R q(α) ≥ α. However, R is not known, and
moreover, it depends on G in a rather complicated way.
This motivates the use of the bootstrap to obtain confidence regions for F . We give
below a precise statement of its asymptotic validity, in a similar manner to that of Grübel
and Pitts (1993). In the present paper, we give sufficient explanation of the notation and
ideas needed to formulate the precise results.
The bootstrap arises by resampling with replacement from the empirical distribution
function Gn . Following Grübel and Pitts (1993), we capture the resampling mechanism via
the map Fn : Rn → D[0, ∞), where, for x = (x1 , . . . , xn ),
n
Fn (x) =
1X
1[x ,∞) ,
n i=1 i
where 1B (z) = 1 if z is in the set B and is zero otherwise. If Y1 , . . . , Yn are iid observations
from G, then the empirical distribution function based on Y1 , . . . , Yn is given by Gn =
Fn (Y1 , . . . , Yn ). With this notation, the distribution function Rn (z) is given by
Z
√
Rn (z) =
1[0,z]
nkΛ Fn (x) − Λ(G)k∞,τ G⊗n (dx),
Rn
where G⊗n refers to product measure. The bootstrap estimator R̂n of Rn is obtained by
replacing G by Gn in the above formula for Rn . Since Gn corresponds to a discrete measure
that puts mass n−1 at each of Y1 , . . . , Yn , it is easy to see that the integral becomes a sum,
and that
X
√
1
1[0,z]
nkΛ Fn (Yi1 , . . . , Yin ) − Λ Gn k∞,τ ,
R̂n (z) = n
n
(i1 ,...,in )∈In
where In = {1, . . . , n}n . Let q̂n (α) be the α-quantile of R̂n .
Decompounding
9
Table 1. Comparisons of upper bounds on Feτ
Compound
Landau (Lemma 1)
Redheffer (11)
Direct
Geometric
ρ(1−ρ)
4
sin(arctan(ρ(1 − ρ)(1 − (ρ(1 − ρ))2 )−1/2 /2))
1 1
ρ 2
Poisson
λe−λ
4
sin(arctan(λe−λ (1 − (λe−λ )2 )−1/2 /2))
1
λ
log(2)
Theorem 3 (Bootstrap). Let Gn , G, Fn , F , q̂n (α) be as above. Assume that τ is
e o (τ ) < σf . Then
such that Fe(τ ) < σg and G
lim P
n→∞
√
nkFn − F k∞,τ ≤ q̂n (α) = α.
This result gives rise to a confidence region
for F given by the set of all distribution functions
√
F ′ such that kF ′ − Fn k∞,τ ≤ q̂n (α)/ n, and the theorem tells us that the probability that
this region covers the true F approaches α as n tends to infinity.
In practice, an approximation to the bootstrap estimator R̂n is obtained via Monte
Carlo methods. This means that a large number B of (re)samples Yj∗ = {Yji1 , . . . , Yjin },
j = 1, . . . , B, are simulated with replacement from the original data Y1 , .. . , Yn . For each
∗
Yj∗ , we calculate the corresponding estimator
√ Fj = Λ Fn (Yji1 , . . . , Yjin ) . The empirical
distribution function of the B real numbers nkFj∗ − Fn k∞,τ , j = 1, . . . , B, is an approximation to R̂n , and the α quantile of this provides an approximation qapprox to q̂n (α). Using
this approximating quantile then leads to an approximate 100α% confidence√region for F
given by the set of distribution functions such that kF ′ − Fn k∞,τ ≤ qapprox / n,
3.
Comparisons
e o (τ )
In Theorem 2 it is required that there exists a τ such that Fe(τ ) is less than σg and G
is less than σf , which (according to Proposition 1, Lemma 1 and 2) is certainly the case if
ρg is either the Landau or Redheffer bound.
The interpretation is – the smaller it is necessary to choose τ the less one has to regularize
the supremum-norm and the bigger function classes the decompounding works for. As Fe(τ )
is a decreasing function in τ this means the larger bound one can get on Fe (τ ) for which
the reversion certainly exists, the larger function classes one can ensure the decompounding
works for, see Figure 1 for an illustration of this. Actually, this resembles usual smoothness
conditions put on function classes in inverse problems.
Another interpretation is that the lighter tails the pk ’s have, the lesser summands will
be convolved in the random sum and consequently the inverse problem will be easier to
solve. Henceforth, the intuition tells us that we would expect heavier tails of the p k ’s will
imply stricter function classes and thereby smaller bounds on Fe (τ ) for which we can assure
the decompounding to work.
This point is most easily investigated in a parametric family. We here investigate in
detail the compound geometric and Poisson cases. Calculations are summarized in Table 1.
Hansen, M. B. and Pitts, S.M.
0.0
0.2
ρg
0.4
0.6
0.8
Fe(θ)
1.0
10
τ
θ
Fig. 1. Choose the smallest τ such that Fe (τ ) ≤ ρg
3.1. The geometric case
Decompounding the compound geometric distribution amounts to study the following convolution series
∞
X
G=
pk (ρ)F ⋆k ,
k=0
k
where 0 ≤ ρ ≤ 1 and pk (ρ) = (1 − ρ)ρ for all k = 0, 1, . . . . For the Landau bound choose
τ such that:
Fe(τ ) ≤
=
p1 (ρ)
4
ρ(1 − ρ)
.
4
For the Redheffer bound, recall A = Rp1 (ρ) and choose τ such that
Fe (τ ) ≤ sin(arctan(A(1 − A2 )−1/2 /2))
= sin(arctan(ρ(1 − ρ)(1 − (ρ(1 − ρ))2 )−1/2 /2)).
Now, as ρ gets bigger the more mass is put in the tail of the geometric distribution and
intuition tells us the decompounding problem should be worse. This is indeed reflected
in the Landau and Redheffer bounds above as we have to choose larger τ ’s to ensure the
decompounding works.
An “efficiency” comparison can be done by dividing the bound provided by Landau (BL )
with the bound provided by Redheffer (BR ), and view it as a function of the parameter of
the geometric distribution, see Figure 2, and in the heavy tail case
lim =
ρ→1
1
BL
= .
BR
2
Decompounding
Redheffer/Direct
0.15
0.10
0.00
0.05
Efficiency
0.494
0.488
Efficiency
0.500
Landau/Redheffer
11
0.0
0.4
0.8
0.0
Parameter value
0.4
0.8
Parameter value
Fig. 2. For the compound geometric case. Left-hand side: Comparison of Laudau bound versus
Redheffer bound. Right-hand side: Comparison of Redheffer bound versus the direct bound.
We notice that in the the heavy tail limit the Redheffer bound is 2 times “better” than the
Landau bound.
In the geometric case by careful investigation of the derivatives of the π k (ρ)’s as done
in Hansen and Pitts (2006, Theorem 2) it is possible to obtain a direct bound of
1
Fe(τ ) ≤
.
2ρ
We would expect this bound to better as it is more careful studying the properties of
the geometric random sum. This is indeed so and it it most easily seen by an “efficiency”
calculation where we devide the bound provided by Redheffer (BR ) with the bound provided
by Hansen and Pitts (2006, Theorem 2) (BD ), see Figure 2 and in the heavy tail case
lim
ρ→1
BR
= 0.
BD
3.2. The Poisson case
Decompounding the compound Poisson distribution amounts to studying the following convolution series
∞
X
G=
pk (λ)F ⋆k ,
k=0
−λ k
where λ ≥ 1 and pk (λ) = e λ /k for all k = 0, 1, . . . . The calculations and conclusions
from the geometric case are easily carried over and summarized in Table 1 and Figure 3. A
similar carefully derived bound as existing in the geometric case was derived by Buchmann
and Grübel (2003, Theorem) and is given by
log(2)
Fe(τ ) ≤
.
λ
Henceforth the efficiency comparisons can also be carried over and similar conclusions to
the geometric case exists.
12
Hansen, M. B. and Pitts, S.M.
0.0
0.4
0.8
Parameter value
0.0 0.1 0.2 0.3 0.4
Redheffer/Direct
Efficiency
0.495
0.485
0.475
Efficiency
Landau/Redheffer
0.0
0.4
0.8
Parameter value
Fig. 3. For the compound Poisson case. Left-hand side: Comparison of Laudau bound versus
Redheffer bound. Right-hand side: Comparison of Redheffer bound versus the direct bound.
4.
Applications
Although, complex analysis provides a solution to the inverse problem, we will follow
(Henrici, 1974)
... The algorithmic attitude to toward mathematics - not to consider a problem
solved unless an algorithm for constructing the solution has been found ...
and continue in this section with various algorithmic aspects of solving the reversion in
practice.
This section will discuss two ways of solving the inverse problem. In Section 4.1 we base
our methods on the actual πk ’s and suggest either an approach based on Fourier methods
or a method based on the the direct convolutions. If it is possible to find a recursion
scheme for a discretized version of the direct problem it is straightforward to reverse this
scheme and provide a very powerful inversion method - this will be dealt with in Section 4.2.
Finally, in Section 4.3 we conclude with a simulated example where we invert the compound
Delaporte distribution and provide bootstrapped confidence bounds for the estimated claim
size distribution.
4.1. Direct and Fourier methods
As mentioned in Section 2.1, the procedure to obtain the actual series can be difficult to
implement in practice. We shall only give a brief account on the problems involved. The two
standard methods to compute the coefficients πk are reversion of series and the BürmannLagrange Theorem. The first one requires to solve for πk in equation (8) as described in
Section 2.1.
For efficient ways to implement the recursions we refer to the comprehensive treatment
in Section 1.9 of Henrici (1974). Especially Theorem 1.9(a) in Henrici (1974), states a
theorem by Schur and Jabotinski which reformulates the problem of finding the reverse of
a power series to a problem of calculating the reciprocal of the series. This in turn can be
done efficiently by a recursion, which is a modification of the Miller algorithm see Problem
Decompounding
13
5, Section 1.8, in Henrici (1974). For a recent overview of the literature and an approach
based on nested derivatives we refer to Dominici (2003).
However in statistical applications the practitioners would not bother dealing with implementing algorithms, but rather, if possible use a mathematical computation system like
Maple. Fortunately it is indeed possible to reverse power series of the type (7) in Maple, by
using the solve command. For illustration purposes let U be a shifted gamma-distribution
with probability density function (pdf) given by
fU (u) =
βα
(u − γ)α−1 e−β(u−γ) , u ≥ γ > 0.
Γ(α)
Now assume N |U is pois(U )-distributed whereby the unconditional N becomes a mixed
Poisson distribution with pgf (Grandell, 1997, Example 2.2)
−α
1−z
(13)
g(z) = e−γ(1−z) 1 +
β
and probability function (pf)
α
n
k
X
γ k−n −γ α + n − 1
β
1
pk =
e
.
n
(k − n)!
β+1
1+β
n=0
(14)
This distribution is known in the actuarial literature as the Delaporte distribution, (Delaporte, 1959). It has been suggested as an alternative to the more usual assumption of a
two-parameter gamma mixture in the theory of insurance claims. The distribution has also
been fitted to several data in the literature (Ruohonen, 1988) and we take it as an example,
throughout the paper. In that case we arrive at the following compound Delaporte cdf
α
n !
k
∞
X
X
β
1
γ k−n −γ α + n − 1
F ⋆k .
e
G=
n
(k
−
n)!
β
+
1
1
+
β
n=0
k=0
Here follows a Maple-script yielding the first 5 pk ’s for the compound Delaporte distribution
(discussed in Section 4.3, below):
!
−α
−α
1−z
β+1
−γ (1−z)
−γ
> g := series e
1+
−e
, z = 0, 6
β
β
To limit space we have left out the actual Maple output. The result can be reversed to get
the first 5 πk ’s for the inverse compound Delaporte distribution
> solve(series(g, z = 0, 6) = w, z)
which can be simplified for the actual parameters used in Section 4.3 below.
4.1.1. Fourier method
A computationally efficient way to implement the convolutions in the direct method is to
use Fourier transforms. This is motivated by noticing that one can express F as
∞
X
k
bo
F =
πk F −1 G
.
k=1
14
Hansen, M. B. and Pitts, S.M.
Although, a number of warnings have been reported using transform methods (see Section 3 of Buchmann and Grübel (2003) and references therein) some authors also report
that using carefully designed algorithms it is possible to get stable inversion by transform
methods (Abate and Whitt, 1992). Without resorting to these advanced algorithms we got
satisfactory results using a direct approach.
Assume for now that that the πk ’s have been found as described above. First choose the
number of terms in the inverse expansion, K, say. Secondly, choose a discretization level
h > 0. Thirdly, let Γk denote the discretized values of Gon
Γl = Gon ((l + 0.5)h), l = 0, . . . , L
and γl the discretized probability function
γl = Γl+1 − Γl , l = 0, . . . , L − 1
for some number L.
Now, an estimate of the probability function φ for the discretized version of F can be
calculated by the fast Fourier transform (fft) and the inverse fast Fourier transform (ifft),
see e.g. Bracewell (1999), as provided in most mathematical software packages,
!
K
X
k
φl = ifft
, l = 0, . . . , L.
πk fft (γ)
k=1
l
This procedure is general in nature but have a number of disadvantages, as one has to
choose number of terms in the expansion K and discretization level h. Theoretical as well
as practical guidance needs to be developed.
For particular cases, the relationship (1) may be expressed neatly in terms of Fourier
transforms. For example, if N has a Poisson distribution with mean λ, then
F(G)(θ) = exp λ((F(F )(θ) − 1 .
Buchmann and Grübel (2003) refer to inherent difficulties in using transform methods to
invert such an expression directly, and we have encountered such difficulties in practice. Our
approach above, where the convolution series is truncated and the fast Fourier transform
algorithm is used to calculate convolution powers, avoids these difficulties, but involves the
choice of truncation parameter K.
4.1.2. Direct method
It should be mentioned that, although convolution can be based on fast transform algorithms
as described above it is much more straightforward to code the convolutions directly, and
sometimes even computationally faster. As described above we first choose the number of
terms in the inverse expansion, say K. Secondly, choose a discretization level h > 0. Thirdly,
by recalling the definition of self-convolution of empirical cumulative (sub)-distribution
functions given in (2), let Γk denote the discretized values of Gon
Γl = Gon ((l + 0.5)h), l = 0, . . . , L
and γl the discretized probability function
γl = Γl+1 − Γl , l = 0, . . . , L − 1
Decompounding
15
for some number L. Then define the discrete convolutions
⋆(k+1)
Γl
=h
l
X
Γ⋆k
k−j γj , l = 0, . . . , L.
j=0
Now, we have the following discrete approximation to F
Φl =
K
X
πk Γ⋆k
l , l = 0, . . . , L.
k=1
As noted above this procedure is general in nature but have the same disadvantages as the
Fourier method as one has to choose number of terms in the expansion K and discretization
level h. Theoretical as well as practical guidance needs further exploration.
4.2. Recursion method
As we mentioned the direct and Fourier methods have some clear disadvantages, but fortunately in some cases an alternative exists. The idea is to discretize the claim size distribution, whereby the compound distribution also becomes discrete. First, choose a disretization
level h. Secondly, let fk denote the mass given by F to the interval ((k − 0.5)h, (k + 0.5)h],
in the following way
fk = F ((k + 0.5)h) − F ((k − 0.5)k),
and let the discrete distribution that gives mass fk to the point kh, k = 0, 1, 2, . . . , be an approximation to the distribution F . Assume now that the random variables Y, X1 , X2 , X3 , . . .
from the introduction satisfies
fk
gk
= P(Xi = k),
= P(Y = k)
and fkm = P(X1 + · · · + Xm = k) or
fk⋆,m+1 =
k
X
⋆,m
fj fk−j
.
j=0
In many cases gk can be expressed as a linear combination of g1 , . . . , gk−1 and f0 , . . . , fk−1
whereby providing a recursive algorithm for calculating the discrete compound distribution
gk . In the seminal work of Panjer (1981) the following general algorithm
gk =
k
X
1
bj
fj gk−j
a+
1 − af0 j=1
k
were derived for situations where the following assumption holds
b
pm+1 = a +
pm , m = 0, 1, 2, . . . .
m+1
Actually, the Panjer recursion only applies to the Poisson, the negative binomial and the
binomial distribution. However, there have since Panjer’s work been an intensive interest in
16
Hansen, M. B. and Pitts, S.M.
this and various extensions have been developed for distributions belonging to the so-called
Willmot class.
All, these algorithms can easily be inverted, and provides thereby what we could call
inverse Panjer recursion, and subsequently be used for inverse decompounding. For the
Panjer class we easily arrive at
g0
fk
= g(z) + p0
=
1
(a + b)g0
(1 − af0 −
k−1
X
a+
i=1
ib
k
fi gk−i
!
(15)
,
(16)
where it might be necessary to implement a numerical routine to invert (15). Actually, the
Poisson case was used in Buchmann and Grübel (2003) and the geometric case in Hansen
and Pitts (2006) to invert Poisson and geometric random sums, respectively.
For a comprehensive overview of recursive algorithms for discrete compound distributions see Section 8.4 of Grandell (1997). In particular for the discrete compound Delaporte
distribution Schröter (1990) (see also Grandell (1997), Example 8.6) derived the following
recursion
k
X
(α − 1 + γβ + γ)j
γj ⋆ 2
(1 + β)gk =
1+
gk−j , for k = 1, 2, 3, . . . , (17)
fj −
f
k
2k j
j=0
which in turn can be inverted to give
−α
1 − f0
−γ(1−f0 )
1+
g0 = e
β
"
k−1
γg0 X
1
(1 + β − f0 )gk +
fi fk−i
fk =
(α + γβ + γ − γf0 )g0
2 i=1
k−1
X
γj ⋆ 2
α − 1 + γβ + γ
j fj −
f
gk−j for k = 1, 2, 3, . . .
−
1+
k
2k j
j=1
4.3. Example: Inverse Compound Delaporte distribution
All calculations in the example have been carried out in the statistical software package R,
except the πk ’s, which have been calculated in Maple. To save space the actual scripts are
left out, but can be obtained from http://www.math.aau.dk/∼mbh/Mysoftware
To check performance of the proposed procedure, we applied it to various simulated data
sets. Results from a typical case is summarized in this section. As mentioned in Section 3
it is possible in many standard cases to get explicit expressions for the πk ’s and develop a
theory along the lines of Buchmann and Grübel (2003) and Hansen and Pitts (2006). Let
us now develop an example where the approach with explicit expressions are not obvious.
In what follows the discretization level was chosen to be h = 0.01 and number of discretization points L = 216 . Furthermore, an iid sample of size 500 was drawn from a
compound Delaporte distribution with parameters α = 1, β = 1, γ = 0.5 and exponentially
distributed claim sizes with mean 1.
We now consider the parameters α and β as fixed at 1 and γ at 0.5, but we have to
estimate nonparametrically the claim size distribution. This corresponds to the interpretation (Ruohonen, 1988) that the risk is composed of an underlying population risk which is
Decompounding
Table 2. Estimated πk ’s.
k
1
2
πk
3.3
-6.8
k
8
9
πk -2587.7 7647.1
3
16.1
10
-22851.8
4
-41.2
11
68912.5
5
111.1
12
-209391.2
6
-310.3
13
640306.7
17
7
888.1
14
-1968726.2
Poisson distributed with a parameter γ and a time-varying risk which is known and gamma
distributed with parameters α and β.
Remark 2 ensures that there exists a τ such that the asymptotic results of Theorem 1,
2 and 3 holds. Maple provided the first 14 pk ’s by the command:
>g
:= series(exp(−.5 ∗ (1 + z)/(2 − z) − .5 ∗ exp(−.5), z = 0, 15);
which again can be reversed to get the first 14 πk ’s for the inverse compound Delaporte
distribution
> solve(series(g, z = 0, 6) = w, z, 15);
The result is summarized in Table 2
The full-drawn line at the left-hand side of Figure 4 shows the ecdf for an iid sample
of size 500 from the compound Delaporte distribution with parameters α = β = 1, γ =
0.5 and exponentially distributed claim sizes with mean 1. The dashed line shows the
underlying theoretical cdf, calculated via the direct Panjer recursion provided in (17) and
the discretization parameters given in the previous paragraph. At the left-hand side of
Figure 4 the inversion has been carried out by the Fourier method with K = 14 and
the above-mentioned discretization parameters. The dashed line shows the exponential
distribution with parameter 1.
In Figure 5 (left-hand side) the full-drawn grey line shows an estimator based on inverse
Panjer recursion, with discretization parameters chosen as above.
Theorem 3 refers to confidence regions in the space Dτ [0, ∞), and calculating these in
practice require the user to make a choice of a suitable τ . The bounds in Section 3 provide
some help with this choice. For the example here, we have taken a pragmatic approach, and
constructed a bootstrap confidence band using the k · k∞ -norm, as this is easy to construct
in practice (although not supported by our theorems).
Figure 5 (right-hand side) shows, the 90% bootstrapped confidence bounds (with respect
to the sup-norm) with N = 100. In both plots the dashed line shows the exponential
distribution with mean 1.
5.
Discussion
In the discussion of Buchmann and Grübel (2003) they claim their basic problem (decompounding Poisson random sums) is amenable to generalizations within stochastic processes.
We believe to have done so in the previous sections by indicating the problem can be solved
in great generality. Moreover, we have directed the attention to a number of real-life applications in e.g. control of queuing systems, infinite capacity storage models, insurance
mathematics, network statistics as well as biology. We also showed how Panjer recursion
can be inverted for the Panjer as well as the Willmot class, hereby providing a very strong
tool to discretize and solve the inverse problem at hand. Finally, we developed by means of
Maple and R effective tools to decompound general random sums.
18
Hansen, M. B. and Pitts, S.M.
0.8
0.6
0.0
0.2
0.4
F_n
0.6
0.0
0.2
0.4
G_n
0.8
1.0
Fourier method
1.0
Ecdf
0
1
2
3
4
0
y
1
2
3
4
x
Fig. 4. Left-hand side: Estimate of the compound claim size distribution with exponentially distributed
claim sizes with mean one, n = 500, α = β = 1 and γ = 0.5. Right-hand side: Inversion by the
Fourier method. The dashed lines show the compound Delaporte and the exponential distribution
with mean 1, respectively.
Despite the general set-up there are still a number of unsolved problems for future
research. One may e.g. wonder what happens when the first n − 1 derivatives of g vanish
at 0, i.e. pi = 0 for i ≤ n − 1 and pn 6= 0. In this case, the equation w = g(z) has a solution
z = f (w) =
∞
X
πk wk/n ,
(18)
k=1
where the series of powers of w 1/n converges in a neighbourhood of 0. Henceforth, f is
n-valued function of w, having a branch-point at 0 (Copson, 1935, p. 123).
This corresponds to solving the inverse problem
G = pn F ⋆n + pn+1 F ⋆(n+1) + · · ·
for pn 6= 0. Hence the plug-in estimator is given by
Fn =
∞
X
k=1
πk (Gn )⋆k/n .
(19)
Decompounding
0.8
0.6
0.0
0.2
0.4
F_n
0.0
0.2
0.4
F_n
0.6
0.8
1.0
Confidence bounds
1.0
Recursion method
19
0
1
2
3
4
0
x
1
2
3
4
x
Fig. 5. Left-hand side: Inversion by the recursion method. Right-hand side: 90% confidence bounds.
The dashed line shows the exponential distribution with mean 1.
This makes sense, if we define fractional convolution as
f ⋆α = L−1 (L(f )α ), α ∈ R,
whenever the Laplace transforms are well-defined, as the inverse of (19) can be derived by
Laplace transformation
e =
G
hence by means of (18) we arrive at
F
pn Fe n + pn+1 Fe (n+1) + · · ·
=
=
∞
X
k=1
∞
X
e k/n ).
πk L−1 (G
πk G⋆k/n .
k=1
However, in this case we do not know of any lower bound on the radius of convergence of
f , and we are forced to either develop these or do a case by case analysis.
20
Hansen, M. B. and Pitts, S.M.
In passing it may be worth noting, that we have defined fractional convolution for any
α ∈ R as
H = F ⋆α ⇔ L(H) = L(F )α
whenever the Laplace transforms are well-defined. In fact such dualities are the basis for
deriving fractional operations in general. In recent years, there has been put an enormous
effort in the field of fractional analysis. We will not pursue this subject further, but refer
the interested reader to Bultheel and Martı́nez-Sulbaran (2003), where a comprehensive
overview of the field and its applications in signal processing, geometry, optics, mechanics,
stochastic processes etc. is provided.
6.
Proofs
Proof of Theorem 1. The proof of Theorem 1 uses the approach of Buchmann and Grübel
(2003), see also Hansen and Pitts (2006) and applies it to reversion of power series. In common with the above papers, our functional Λ involves convolutions of distribution functions,
and so we summarise here the spaces introduced in Buchmann and Grübel (2003) for functions that may be used as integrators in convolution integrals. Let D(∞) = ∪τ >0 Dτ [0, ∞)
and let Dm (∞) be the subset of D(∞) consisting of functions having finite variation on
[0, x] for all x > 0. Then a function H in Dm (∞) corresponds to a signed measure µH given
by µH [0, x] R= H(x). For H in Dm (∞) and g in D(∞) we may define the convolution
g ⋆ H(x) = g(x − y)H(dy) for x ≥ 0. As observed in Buchmann and Grübel (2003),
members of Dm (∞) are identified via their Laplace transforms. Any H in Dm (∞) is also in
e
D(∞) and hence is in Dτ [0, ∞) for some τ > 0. We note that the Laplace transform H(θ)
is then defined for θ > τ .
We need a technical result about convolutions from Lemma 6 of Buchmann and Grübel
+
(2003), which we state without proof. Let Dm
(∞) be the subset of Dm (∞) consisting of
functions H such that the measure µH is non-negative. Then
e ) for all τ > 0.
kg ⋆ Hk∞,τ ≤ kgk∞ H(τ
(20)
Turning to the convolution series given by Λ(G), we first note that, for any distribution
P
o ⋆k
function G on [0, ∞), we have m
is in Dτ [0, ∞) for each finite m. Using (20)
k=1 πk G
we have for k ≥ 1,
(Go )⋆k
Then it follows that
∞
X
k=1
∞,τ
e o (τ )k−1 ≤ G
e o (τ )k−1 .
≤ kGo k∞,τ G
|πk | (Go )⋆k
∞,τ
≤
∞
X
k=1
e o (τ )k−1 .
|πk |G
e o (τ ) < σf (≤ r(f )), so that
It is assumed in the statement of the theorem that G
∞
X
k=1
converges. Thus Λ(G) is in Dτ [0, ∞).
e o (τ )k−1
|πk |G
Decompounding
To obtain the relationship between F and G, first note that the series
can be written as the difference of two non-decreasing functions:
∞
X
πk (Go )⋆k =
X
k:πk ≥0
k=1
πk (Go )⋆k −
X
k:πk <0
P∞
k=1
21
o ⋆k
πk (G )
|πk |(Go )⋆k ,
and so Λ(G) is in Dm (∞). This means that Λ(G) is identified by its Laplace transform.
From the definition of G in terms of F , we have, for θ > τ ,
e o (θ) =
G
∞
X
k=1
pk Fe(θ) = g Fe(θ) .
e o (θ) < σf and thus, by (1) there exists a unique z with |z| < σg such that
We have G
o
e (θ), and this z is given by the power series reversion, ie z = P∞ πk G
e o (θ) k .
g(z) = G
k=1
Since Fe(θ) < σg , this implies that z = Fe(θ) and that
Fe (θ) =
∞
X
k=1
e o (θ) k .
πk G
]
The right-hand side of this last expression is Λ(G)(θ),
from which we conclude that F =
Λ(G), and Theorem 1 is proved.
Proof of Theorem 2. This is an application of the infinite-dimensional delta method,
which combines an appropriate differentiability property of the map Λ with the relevant asymptotic normality result for the empirical process associated with the input estimators G n ,
n ≥ 1 (see, Gill (1989), van der Vaart (1998)). The proposition below gives an appropriate
differentiability result for Λ.
Proposition 2. Suppose that Gn , n ∈ N, and G are distribution functions with Gn (0) =
e ) < σf and Fe(τ ) < σg , and suppose that
G(0) = 0 for all n ∈ N. Let τ > 0 be such that G(τ
√
k n Gn − G − hk∞ → 0 as n → ∞
for some h in D[0, ∞). Then
√
k n Λ(Gn ) − Λ(G) − h ⋆ Hk∞,τ → 0 as n → ∞,
where H =
P∞
k=1
kπk G⋆(k−1) .
Comparing this proposition with Proposition 8 in Buchmann and Grübel (2003) and Proposition 4 in Hansen and Pitts (2006), it is clear that it is the relevant generalisation of those
differentiability results to the general decompounding case considered here.
The proof of Proposition 2 follows similar methods to the proofs of the corresponding
propositions in the above earlier papers, with η in those proofs replaced here by η ∈ (0, 1)
e ) < ησf , and then using the fact that the series P∞ πk (ησf )k is
chosen such that G(τ
k=1
absolutely convergent. We refer the reader to the earlier papers for further details.
22
Hansen, M. B. and Pitts, S.M.
The second ingredient for the infinite-dimensional delta method is an asymptotic normality result for the input estimators, which is given by the empirical central limit theorem
(Pollard (1984) V.2.11). This states that
√
n Gn − G →D B ◦ G in (D[0, ∞), k · k∞ ),
where B is a standard Brownian bridge and B ◦ G is a rescaled Brownian bridge given by
B ◦ G(t) = B(G(t)), t ≥ 0. The limit process B ◦ G is thus a zero mean Gaussian process
with
cov(B ◦ G(s), B ◦ G(t)) = G(s ∧ t) − G(s)G(t).
The empirical central limit result is then combined with Proposition 2 to yield Theorem
2, see Buchmann and Grübel (2003) for details. The covariance structure of the limiting
Gaussian process A in Theorem 2 arises easily, as in Buchmann and Grübel (2003).
Proof of Theorem 3. The proof proceeds by first showing that R̂n converges in distribution to R in D[0, ∞), and then using this to infer the result of the theorem. The
technicalities are the same as those in the proof of Theorem 2.3 in Grübel and Pitts (1993),
and the interested reader is referred to that paper.
Acknowledgment
The authors thank Nick Bingham, University of Sheffield, for a most helpful discussion.
References
Abate, J. and W. Whitt (1992). The Fourier-series method for inverting transforms of
probability distributions. Queueing Systems 10, 5–88.
Asmussen, S. (2003). Applied Probability and Queues (2 ed.). Springer Verlag, New York,
NY.
Bingham, N. H. and S. M. Pitts (1999a). Nonparametric estimation for the M/G/∞ queue.
Annals of the Institute of Statistical Mathematics 51, 71–97.
Bingham, N. H. and S. M. Pitts (1999b). Nonparametric inference from M/G/1 busy
periods. Communications in Statistics and Stochastic Models 15, 247–272.
Bracewell, R. N. (1999). The Fourier Transform and its Applications (3 ed.). New York:
McGraw-Hill.
Buchmann, B. and R. Grübel (2003). Decompounding: An estimation problem for Poisson
random sums. Annals of Statistics 31, 1054–1074.
Bultheel, A. and H. Martı́nez-Sulbaran (2003). Recent developments in the theory of the
fractional Fourier transforms and linear canonical transforms. Bulletin of the Belgian
mathematical Society- Simon Stevin. Accepted.
Decompounding
23
Copson, E. T. (1935). An Introduction to the Theory of Functions of a Complex Variable.
Oxford: The Clarendon Press.
Delaporte, P. (1959). Quelques problemés de statistique mathématique posés par l’assurance
automobile et le bonus pour non sinistre. Bulletin Trimestriel de l’Institut des Actuaires
Francais 227, 87–102.
Dominici, D. (2003). Nested derivatives: A simple method for computing series expansions
of inverse functions. International Journal of Mathematics and Mathematical Sciences 58,
3699–3715.
Gill, R. (1989). Non- and semi-parametric maximum likelihood estimators and the von
Mises method (Part 1). Scandinavian Journal of Statistics 16, 97–128.
Grandell, J. (1997). Mixed Poisson Processes. London: Chapman and Hall.
Grübel, R. and S. M. Pitts (1993). Non-parametric estimation in renewal theory, I: the
empirical renewal function. Annals of Statistics 21, 1431–1451.
Hall, P. and J. Park (2004). Nonparametric inference about service time distribution from
indirect measurements. Journal of the Royal Statistical Society 66, 861–875.
Hansen, M. B. and S. M. Pitts (2006). Nonparametric inference from the M/G/1 workload.
Accepted for publication in Bernoulli.
Henrici, P. (1974). Applied and Computational Complex Analysis, Vol 1: Power Series Integration - Conformal Mapping - Location of Zeros. New York: John Wiley and Sons.
Hohn, N. and D. Veitch (2003). Inverting sampled traffic. In Proceedings of the 3rd ACM
SIGCOMM conference on Internet measurement, Miami Beach, FL, USA, pp. 222–233.
ACM Press New York, NY, USA.
Johnson, N. L., A. W. Kemp, and S. Kotz (2005). Univariate Discrete Distributions (3 ed.).
New York: John Wiley and Sons.
Levental, S. (1988). Uniform limit theorems for Harris recurrent Markov chains. Probability
Theory and Related Fields 80, 101–118.
Neyman, J. and E. L. Scott (1958). A statistical approach to problems of cosmology. Journal
of the Royal Statistical Society, Series B 20, l–43.
Panjer, H. H. (1981). Recursive evaluation of a family of compound distributions. Astin
Bulletin 12, 22–26.
Pollard, D. (1984). Convergence of Stochastic Processes. New York: Springer Verlag.
Redheffer, R. M. (1962). Reversion of power series. American Mathematical Monthly 69,
423–425.
Ruohonen, M. (1988). On a model for the claim number process. Astin Bulletin 18, 57–68.
Schröter, K. (1990). On a family of counting distributions and reccursions for related
compound distributions. Scandinavian Actuarial Journal 304, 161–175.
24
Hansen, M. B. and Pitts, S.M.
Sharma, V. (1999). Estimating traffic intensities at different nodes in networks via a probing
stream. In GLOBECOM 1999: Seamless Interconnection for Universal Services, pp. 374–
380. Piscataway, NJ: IEEE Operations Center.
Sharma, V. and R. Mazumdar (1998). Estimating traffic parameters in queueing systems
with local information. Performance Evaluation 32, 217–230.
Tsai, T. H. (1998). The uniform CLT and LIL for Markov chains. Ph. D. thesis, University
of Wisconsin, Madison, WI.
van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press, Cambridge.
van Es, B., S. Gugushvili, and P. Spreij (2005). A kernel type nonparametric density
estimator for decompounding. URL: front.math.ucdavis.edu/math.ST/0505355.
Wand, M. P. and M. Jones (1995). Kernel Smoothing, Volume 60 of Monographs on Statistics
and Applied Probability. Chapman and Hall.
Zhang, C. (1990). Fourier methods for estimating mixing densities and distributions. Annals
of Statistics 18, 806–831.