[go: up one dir, main page]

Academia.eduAcademia.edu

Decompounding random sums: a nonparametric approach

2008, Annals of the Institute of Statistical Mathematics

Aalborg Universitet Decompounding random sums: A nonparametric approach Bøgsted, Martin; Pitts, Susan M. Publication date: 2006 Document Version Publisher final version (usually the publisher pdf) Link to publication from Aalborg University Citation for published version (APA): Hansen, M. B., & Pitts, S. M. (2006). Decompounding random sums: A nonparametric approach. Department of Mathematical Sciences, Aalborg University. (Research Report Series; No. R-2006-20). General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. ? Users may download and print one copy of any publication from the public portal for the purpose of private study or research. ? You may not further distribute the material or use it for any profit-making activity or commercial gain ? You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us at vbn@aub.aau.dk providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from vbn.aau.dk on: august 16, 2015 AALBORG UNIVERSITY ✬ ✩ Decompounding random sums: A nonparametric approach by Martin B. Hansen and Susan M. Pitts R-2006-20 ✫ May 2006 Department of Mathematical Sciences Aalborg University Fredrik Bajers Vej 7 G DK - 9220 Aalborg Øst Denmark + Telefax: +45 98 15 81 29 Phone: 45 96 35 80 80 URL: http://www.math.aau.dk ISSN 1399–2503 On-line version ISSN 1601–7811 ✪ ❡ Decompounding random sums: A nonparametric approach MARTIN B. HANSEN∗ and SUSAN M. PITTS∗∗ Department of Mathematical Sciences, Aalborg University, Fredrik Bajers Vej 7G, DK-9220 Aalborg Ø, Denmark. Email: mbh@math.aau.dk ∗ Statistical Laboratory, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WB, UK. Email: S.Pitts@statslab.cam.ac.uk ∗∗ Summary. Observations from sums of random variables with a random number of summands, known as random, compound or stopped sums arise within many areas of engineering and science. Quite often it is desirable to infer properties of the distribution of the terms in the random sum. In the present paper we review a number of applications and consider the nonlinear inverse problem of inferring the cumulative distribution function of the components in the random sum. We review the existing literature on non-parametric approaches to the problem. The models amenable to the analysis are generalized considerably and the properties of the proposed estimators are studied. Bootstrap methods are suggested to provide confidence bounds. Finally a number of algorithms are suggested to make the methods operational and tested on simulated data. In particular we show how Panjer recursion in general can be inverted for the Panjer as well as the Willmot class. Keywords: Asymptotic normality; Compound distributions; Compound Poisson; Decompounding; Empirical processes; Functional central limit; Infinite-dimensional delta-method; Inverse problems; Sampled network traffic; Workload. 1. Introduction Let N be an integer-valued discrete random variable with probability function pk = P(N = k), for k = 0, 1, 2, . . . , and X1 , X2 , X3 , . . . a series of strictly positive independent and identically distributed (iid) random variables with cumulative distribution function (cdf) F independent of N . Following the tradition of insurance mathematics N will be denoted the claim number and F the claim size distribution. Now the cdf of the random sum Y = X1 + · · · + XN is given by G = Ψ(F ) with Ψ(F ) = ∞ X pk F ⋆k , (1) k=0 where ⋆k denotes k-fold convolution. The main objective of the present paper is to infer F given an iid sample Y1 , . . . , Yn from G. This problem arises in a number of applications within engineering and science. A prominent example is the compound Poisson process which has many applications - within insurance mathematics and queueing theory - as G denotes the distribution of the amount 2 Hansen, M. B. and Pitts, S.M. of a random number of randomly sized insurance claims or workload requirements over a given time interval, see Grandell (1997) for a comprehensive overview of the literature on this subject. Buchmann and Grübel (2003) considered this decompounding problem in detail and describe motivations from insurance mathematics and queueing theory. In their work the problem was solved by proposing a non-parametric estimator of the cdf F and its properties (in weak convergence sense) were investigated. Another motivation for the set-up presented above arises when one intends to sample (or probe) the status of communication systems by test traffic. This is a standard tool for performance evaluation. It is e.g. used in call-center evaluations, where the distribution of the waiting time to service is estimated by the empirical distribution function of the call answer time for repeated phone calls. Another application is when a call admission controller in an ATM network decides whether there are sufficient resources to allow a new connection to be established, based on information obtained by sampling the workload at neighbouring nodes. This problem was considered by Sharma and Mazumdar (1998) and Sharma (1999). They consider an M/G/1 queue which is probed by Poisson traffic from which moments of the service time distribution are inferred by the well-known moment relations which can be derived from the Pollaczeck-Khintchine formula (Asmussen, 2003, Theorem VIII.5.7 (5.6) and (5.7)). In Hansen and Pitts (2006) the even harder problem of inferring the whole distribution function of the service time was solved. Hansen and Pitts (2006) extended the work by Buchmann and Grübel (2003) to the geometric case and proposed a semi-parametric approach to estimate the parameter of the geometric distribution and the service time cdf. Moreover, the paper showed one can deal with the problem for regenerative samples from G, a problem which is especially relevant dealing with queuing systems. Another motivation arises in an infinite-capacity storage model, where inputs S1 , S2 , S3 , . . . to the storage facility arrive as a Poisson process with rate λ, where S 1 , S2 , S3 . . . are independent identically distributed random variables, with distribution function F independent of the Poisson process. The total amount in the storage facility at time t has the same distribution as the workload distribution in the M/G/1 queueing model above. Suppose that observations on the sampled total amounts at times t = 1, 2, 3 . . . are available, and that interest lies in inference for the distribution F of the inputs to the facility. This inference problem is exactly analogous to that described above for the queueing model and the methods of that paper apply. Another motivation arises in network traffic. According to Hohn and Veitch (2003) many routers have the ability to output statistics about packets and flows that traverse them. However as the generation of detailed traffic statistics does not scale well with link speed, many manufactures have implemented sampling strategies at the packet level. Hohn and Veitch (2003) describe a solution to this binomial deconvolution problem and derivation of the statistical properties of this estimator is work in progress. The Neyman-Scott cluster point process, originally developed in Neyman and Scott (1958) to describe the distribution of galaxies in space has become an important representation for a broad range of phenomena in the physical, biological, and social sciences. Here the decompounding problem arises if one assumes a random number (distributed according to pk ) of entities per observation field each of which produces a random number (distributed according to F ) of offspring, and one asks for an estimate of F , based on observations of the number of offspring only. Finally, one could also consider examples where some of the pk ’s are zero, e.g. p2 = 1 and Decompounding 3 p0 , p1 , p3 , · · · = 0, then we are dealing with the following non-linear deconvolution problem Z G(y) = F ⋆ F (y) = F (y − z)F (dz) (2) which resembles the classical linear deconvolution problem Z G(y) = F ⋆ H(y) = F (y − z)H(dz) where F is an unknown distribution function and H is a known error function. This is an illposed and highly studied inverse problem, see e.g. Zhang (1990). However, the non-linearity in the present problem requires new methods to be developed. In what follows a cdf of a non-negative random variable is denoted by a capital letter, e is denoted by A, say. The Laplace-Stieltjes transform L(A) or A Z e−θx A(dx) L(A)(θ) = [0,∞) for those θ ∈ R for which the integral is defined. The Fourier-Stieltjes transform F(A) or b is denoted by A Z F(A)(θ) = eixθ A(dx), θ ∈ R. We suggest the empirical cumulative distribution function (ecdf) as an estimator for G Gn (x) = n−1 n X i=1 1(Yi ≤ x), 0 ≤ x < ∞. (3) Its n’th empirical process counterpart is defined as Gn (x) = n1/2 (Gn (x) − G(x)), 0 ≤ x < ∞. (4) Assume we want to make statistical inference about F . Formula (1) can be inverted under conditions in the following way F (x) = ∞ X πk (Go )⋆k (x), (5) k=1 where the πk ’s can be expressed in terms of the pk ’s. For any function h we write ho for the function x 7→ h(x) − h(0) and the convergence in (5) is in a suitable weighted Banach space, see Theorem 1 below. This leads to the following plug-in estimator of F Fn (x) = ∞ X πk (Gon )⋆k (x). (6) k=1 In (5), F is given in terms of G, so that F is determined by G, and we can write F as a functional of G. The proposed estimator Fn is then the result of applying the same functional to the estimator Gn of G respectively. Given an appropriate asymptotic normality result for Gn , the infinite-dimensional delta method can be used to derive asymptotic normality of Fn , provided that the functional in question satisfies a particular differentiability result. 4 Hansen, M. B. and Pitts, S.M. For general descriptions of the infinite-dimensional delta method, see Gill (1989) and van der Vaart (1998). From (1), we see that G is a compound distribution function, based on F , and so the inverse functional that takes G onto F is a decompounding functional that “decompounds” the compound distribution. As such, this functional is closely related to that in Buchmann and Grübel (2003), where the notion of decompounding is introduced in the context of decompounding for a compound Poisson distribution. The set-up and proofs for the definition and differentiability of our functional follow those in Buchmann and Grübel (2003), making adaptations for the general case as necessary. Moreover, if data do not consist of independent identically distributed observations, but rather exhibit regenerative structure, then asymptotic normality results for the input estimator can be obtained using the approach of Hansen and Pitts (2006), whose approach was obtained using an empirical central limit theorem for regenerative data, see Tsai (1998) and Levental (1988). The infinite-dimensional delta method has been used for stochastic models in previous work, and this paper follows the set-up and approach developed in, for example, Grübel and Pitts (1993), Hansen and Pitts (2006), Bingham and Pitts (1999b) and Bingham and Pitts (1999a). These last two papers study inference for service-time distributions given data on busy and idle periods for the M/G/1 and the M/G/∞ queues respectively, and can be regarded as tackling inverse problems, in the same way that inference for decompounding can be regarded as an inverse problem. In this paper we estimate the distribution function and in our work (Hansen and Pitts, 2006) we have, inspired by recent interests in queueing and insurance applications been much concerned with tail behaviour. However, in many informal investigations of the properties of a given set of data a density estimator is popular as it can give valuable indications of such features as skewness and multi-modality. It is indeed possible to make a kernel-estimator of the density function based on the methods presented in this paper fn (x) = Z K(x − y)Fn (dy), where K is a smoothing-kernel, see e.g. Wand and Jones (1995) for more details. We are not going to discuss any properties of such estimators whatsoever, but merely refer to detailed papers on this matter. More specifically Hall and Park (2004) gives an approach to estimation of the density of F from busy period data presented in Bingham and Pitts (1999a), and van Es et al. (2005) for a density approach to Buchmann and Grübel (2003). The paper is organised as follows. In Section 2 some necessary background from complex analysis as well as (sub-) probability generating functions are reviewed. This insight is used to define the estimator and show it is well-defined. The section is concluded with statements of the asymptotic normality for Fn as well as stating asymptotic properties of the proposed bootstrap confidence region. Section 3 contains a comparisons of necessary bounds under the developed general theory and the bounds which can be derived using the explicit parametric structure of the compounding under study. Various algorithms for dealing with the general approach to decompounding are introduced in Section 4. In this section an example of the estimator in action is also given and a discussion and conclusion are provided in Section 5. Proofs related to the properties of the estimator are given in Section 6. Decompounding 2. 5 Estimation 2.1. Background on sub-probability generating functions In order to invert (1) we rephrase the problem as reversing (in line with the tradition of complex analysis we use the word reverse for finding the power series of the inverse with respect to composition of a power series, see Henrici (1974, p. 47)) the following power series g(z) = ∞ X k=1 pk z k , z ∈ C. (7) Notice, that g is actually a sub-probability generating function (if p0 > 0, otherwise a probability generating function) and will therefore always have a radius of convergence r(g) greater than or equal to 1 (Johnson et al., 2005, Section 1.2.11). way to calculate candidates for the coefficients πk of the reversion f (w) = P∞A standard k π w is formally to solve for πk in the equation k=1 k z= ∞ X k=1 πk " ∞ X pl z l=1 l #k (8) by equating powers of z. This leads to the recursion schemes in Henrici (1974, (1.7-2)), which indeed have a unique solution, if p1 6= 0, see Henrici (1974, Theorem 1.7a). However, the procedure to obtain the coefficients πk of f is usually difficult to implement in practice. This issue will be treated in Section 4.1 The next step is to prove that the formal power series is a reverse of g. We need to give a precise statement of results from complex analysis about reversion of power series, as they apply to our power series g. The next proposition follows directly from Theorem 2.4b and the proof of Theorem 2.4c in Henrici (1974). P∞ Proposition 1. Let g(z) = k=1 pk z k where pk = P(N = k), p1 6= 0, and r(g) is the radius of convergence. (a) Then the reversion of g f (w) = ∞ X πk w k k=1 (as defined above) has positive radius of convergence r(f ). (b) There exist σg , 0 < σg ≤ r(g), and σf , 0 < σf ≤ r(f ), such that |w| < σf implies that there is a unique z in |z| < σg such that g(z) = w, and this z is given by z = f (w). (c) We have |w| < σf implies that z = f (w) = g −1 (w) = ∞ X k=1 πk wk and |f (w)| < σg . (9) (d) There exists ρg with 0 < ρg ≤ σg such that |z| < ρg implies that |g(z)| < σf , and so for z such that |z| < ρg , we have z = f (g(z)) = P∞ k=1 (10) k πk g(z) . 6 Hansen, M. B. and Pitts, S.M. It is also possible to give lower bounds on those ρg and σf for which Proposition 1(d) holds. The next lemma follows directly from Copson (1935, p. 123) (according to Copson this result was first derived by Laudau) Lemma 1 (Landau). Under the assumptions of Proposition 1, let 0 < R < r(g) and |g(z)| ≤ M for |z| ≤ R. Then if |w| < (Rp1 )2 /(6M ) we have that there is a unique z in |z| < R2 p1 /(4M ) such that g(z) = w, and this z is given by z = f (w). In this case it is possible to choose ρg = R2 p1 /(4M ) and σf = (Rp1 )2 /(6M ). Actually, Redheffer (1962) improved upon the lower bound for the radius of convergence for the reversed series. The following result follows directly by an adaption of his proof. Lemma 2 (Redheffer). Under the assumptions of Proposition 1, let 0 < R < r(g) and Z 2π 1 |g(reiθ )|2 dθ ≤ M 2 2π 0 for |z| < R, and define A = Rp1 /M . Then if |w| < (Rp1 )2 (1 − (3/4)A2 )−1/2 /(4M ) we have that there is a unique z in |z| < R sin(arctan(A(1 − A2 )−1/2 /2)) such that g(z) = w, and this z is given by z = f (w). In this case it is possible to choose ρg = R sin(arctan(A(1 − A2 )−1/2 /2)) and σf = (RA)2 (1 − (3/4)A2 )−1/2 /(4M ). Remark 1. As the sub-probability generating function always has a radius of convergence greater than one, we immediately obtain, the following “universal” bounds on z and g(z) for reverting the series in (7) (take R = 1 and M = 1) ρg σf = sin(arctan((p1 (1 − p21 )−1/2 /2)) = p21 (1 − (3/4)p21 )−1/2 /4 (11) (12) To our knowledge there do not exist any systematic studies of improvements on the Redheffer bound for power series. Although very interesting and relevant for the present paper we leave this as an open question. 2.2. Definition of the estimator We avoid trivialities by requiring throughout the paper that the random variable N is not concentrated at 0. We also rule out the case where F is concentrated at one point and N is concentrated at 1 (so that G is not concentrated at one point). Turning now to the general decompounding problem, we define appropriate weighted spaces for our estimator. We follow the approach and methodology of Buchmann and Decompounding 7 Grübel (2003), and in particular we use the weighted spaces defined there, which we now describe. Let D[0, ∞) be the Banach space of cadlag functions h on [0, ∞). For τ in R, let Dτ [0, ∞) be the space of all functions h : [0, ∞) → R, such that the function x 7→ e−τ x h(x), is in D[0, ∞). For h in Dτ [0, ∞), let khk∞,τ = supx≥0 e−τ x |h(x)|, so that (Dτ [0, ∞), k·k∞,τ ) is a Banach space. Theorem 1 below states that, under conditions on G, the right-hand side of (5) is in Dτ [0, ∞). Recall that σf , σg and ρg are defined in Proposition 1. Theorem 1. Let τ > 0 and assume that G is a distribution function on [0, ∞) with e o (τ ) < σf . Then the series G Λ(G) = ∞ X πk (Go )⋆k k=1 converges in Dτ [0, ∞). Suppose in addition that Fe (τ ) < σg . Then G = Ψ(F ) implies that F = Λ(G). e o (τ ) < σf Note that, with ρg defined as in (10), Fe (τ ) < ρg (≤ σg ) implies that G o (≤ r(f )), so that in this case Λ(G ) is defined and is equal to F . This means that the e o (τ ) in the statement of the theorem may be replaced by the conditions on Fe(τ ) and G e single condition F (τ ) < ρg . Remark 2. As argued in Remark 1 it is possible to obtain universal bounds on ρ g and e o (τ ) → 0 and Fe(τ ) → 0 as τ → ∞, it always σf which are strictly greater than 0. As G possible to find a space Dτ [0, ∞) such that (1) is invertible! In (5), our quantity of interest F is given in terms of G, and this inverse representation was made precise in Theorem 1 above. Thus the the plug-in estimator given in (6) is then Fn = Λ(Gn ). 2.2.1. Properties of the estimator In this section, we state two results, one giving asymptotic normality of the proposed estimator, in terms of convergence in distribution to a Gaussian process in a Dτ -space, and the second giving asymptotic validity of bootstrap confidence regions. Throughout the paper, weak convergence in Banach spaces refers to σ-algebras generated by the open balls in the respective norms, see Pollard (1984), page 199. We can now formulate the main result on weak convergence of the inverse estimator of the distribution function F . e o (τ ) < σf . Then Theorem 2. Assume that τ > 0 is such that Fe(τ ) < σg and G √ n(Fn − F ) →D A as n → ∞, in (Dτ [0, ∞), k · k∞,τ ), where A is a centered Gaussian process with Z Z   cov A(s), A(t) = Go (s − x) ∧ (t − y) H(dx)H(dy) − Go ⋆ H(s)Go ⋆ (H)(t), s, t ≥ 0, 8 Hansen, M. B. and Pitts, S.M. where H = ∞ X kπk (Go )⋆(k−1) . k=1 Turning to the question of obtaining confidence regions in Dτ [0, ∞) for the unknown F , we define, for z ≥ 0,  √ Rn (z) = P nkFn − F k∞,τ ≤ z , and  R(z) = P kAk∞,τ ≤ z . From the convergence in distribution proved in Theorem 2, we know that Rn (z) → R(z) at all continuity points of R. If the distribution function R were known, then an asymptotic 100α% confidence region for F would be the set of all distribution functions F such that q(α) kFn − F k∞,τ ≤ √ , n  where q(α) is a continuity point of R with R q(α) ≥ α. However, R is not known, and moreover, it depends on G in a rather complicated way. This motivates the use of the bootstrap to obtain confidence regions for F . We give below a precise statement of its asymptotic validity, in a similar manner to that of Grübel and Pitts (1993). In the present paper, we give sufficient explanation of the notation and ideas needed to formulate the precise results. The bootstrap arises by resampling with replacement from the empirical distribution function Gn . Following Grübel and Pitts (1993), we capture the resampling mechanism via the map Fn : Rn → D[0, ∞), where, for x = (x1 , . . . , xn ), n Fn (x) = 1X 1[x ,∞) , n i=1 i where 1B (z) = 1 if z is in the set B and is zero otherwise. If Y1 , . . . , Yn are iid observations from G, then the empirical distribution function based on Y1 , . . . , Yn is given by Gn = Fn (Y1 , . . . , Yn ). With this notation, the distribution function Rn (z) is given by Z   √ Rn (z) = 1[0,z] nkΛ Fn (x) − Λ(G)k∞,τ G⊗n (dx), Rn where G⊗n refers to product measure. The bootstrap estimator R̂n of Rn is obtained by replacing G by Gn in the above formula for Rn . Since Gn corresponds to a discrete measure that puts mass n−1 at each of Y1 , . . . , Yn , it is easy to see that the integral becomes a sum, and that X    √ 1 1[0,z] nkΛ Fn (Yi1 , . . . , Yin ) − Λ Gn k∞,τ , R̂n (z) = n n (i1 ,...,in )∈In where In = {1, . . . , n}n . Let q̂n (α) be the α-quantile of R̂n . Decompounding 9 Table 1. Comparisons of upper bounds on Feτ Compound Landau (Lemma 1) Redheffer (11) Direct Geometric ρ(1−ρ) 4 sin(arctan(ρ(1 − ρ)(1 − (ρ(1 − ρ))2 )−1/2 /2)) 1 1 ρ 2 Poisson λe−λ 4 sin(arctan(λe−λ (1 − (λe−λ )2 )−1/2 /2)) 1 λ log(2) Theorem 3 (Bootstrap). Let Gn , G, Fn , F , q̂n (α) be as above. Assume that τ is e o (τ ) < σf . Then such that Fe(τ ) < σg and G lim P n→∞  √ nkFn − F k∞,τ ≤ q̂n (α) = α. This result gives rise to a confidence region for F given by the set of all distribution functions √ F ′ such that kF ′ − Fn k∞,τ ≤ q̂n (α)/ n, and the theorem tells us that the probability that this region covers the true F approaches α as n tends to infinity. In practice, an approximation to the bootstrap estimator R̂n is obtained via Monte Carlo methods. This means that a large number B of (re)samples Yj∗ = {Yji1 , . . . , Yjin }, j = 1, . . . , B, are simulated with replacement from the original data Y1 , .. . , Yn . For each ∗ Yj∗ , we calculate the corresponding estimator √ Fj = Λ Fn (Yji1 , . . . , Yjin ) . The empirical distribution function of the B real numbers nkFj∗ − Fn k∞,τ , j = 1, . . . , B, is an approximation to R̂n , and the α quantile of this provides an approximation qapprox to q̂n (α). Using this approximating quantile then leads to an approximate 100α% confidence√region for F given by the set of distribution functions such that kF ′ − Fn k∞,τ ≤ qapprox / n, 3. Comparisons e o (τ ) In Theorem 2 it is required that there exists a τ such that Fe(τ ) is less than σg and G is less than σf , which (according to Proposition 1, Lemma 1 and 2) is certainly the case if ρg is either the Landau or Redheffer bound. The interpretation is – the smaller it is necessary to choose τ the less one has to regularize the supremum-norm and the bigger function classes the decompounding works for. As Fe(τ ) is a decreasing function in τ this means the larger bound one can get on Fe (τ ) for which the reversion certainly exists, the larger function classes one can ensure the decompounding works for, see Figure 1 for an illustration of this. Actually, this resembles usual smoothness conditions put on function classes in inverse problems. Another interpretation is that the lighter tails the pk ’s have, the lesser summands will be convolved in the random sum and consequently the inverse problem will be easier to solve. Henceforth, the intuition tells us that we would expect heavier tails of the p k ’s will imply stricter function classes and thereby smaller bounds on Fe (τ ) for which we can assure the decompounding to work. This point is most easily investigated in a parametric family. We here investigate in detail the compound geometric and Poisson cases. Calculations are summarized in Table 1. Hansen, M. B. and Pitts, S.M. 0.0 0.2 ρg 0.4 0.6 0.8 Fe(θ) 1.0 10 τ θ Fig. 1. Choose the smallest τ such that Fe (τ ) ≤ ρg 3.1. The geometric case Decompounding the compound geometric distribution amounts to study the following convolution series ∞ X G= pk (ρ)F ⋆k , k=0 k where 0 ≤ ρ ≤ 1 and pk (ρ) = (1 − ρ)ρ for all k = 0, 1, . . . . For the Landau bound choose τ such that: Fe(τ ) ≤ = p1 (ρ) 4 ρ(1 − ρ) . 4 For the Redheffer bound, recall A = Rp1 (ρ) and choose τ such that Fe (τ ) ≤ sin(arctan(A(1 − A2 )−1/2 /2)) = sin(arctan(ρ(1 − ρ)(1 − (ρ(1 − ρ))2 )−1/2 /2)). Now, as ρ gets bigger the more mass is put in the tail of the geometric distribution and intuition tells us the decompounding problem should be worse. This is indeed reflected in the Landau and Redheffer bounds above as we have to choose larger τ ’s to ensure the decompounding works. An “efficiency” comparison can be done by dividing the bound provided by Landau (BL ) with the bound provided by Redheffer (BR ), and view it as a function of the parameter of the geometric distribution, see Figure 2, and in the heavy tail case lim = ρ→1 1 BL = . BR 2 Decompounding Redheffer/Direct 0.15 0.10 0.00 0.05 Efficiency 0.494 0.488 Efficiency 0.500 Landau/Redheffer 11 0.0 0.4 0.8 0.0 Parameter value 0.4 0.8 Parameter value Fig. 2. For the compound geometric case. Left-hand side: Comparison of Laudau bound versus Redheffer bound. Right-hand side: Comparison of Redheffer bound versus the direct bound. We notice that in the the heavy tail limit the Redheffer bound is 2 times “better” than the Landau bound. In the geometric case by careful investigation of the derivatives of the π k (ρ)’s as done in Hansen and Pitts (2006, Theorem 2) it is possible to obtain a direct bound of 1 Fe(τ ) ≤ . 2ρ We would expect this bound to better as it is more careful studying the properties of the geometric random sum. This is indeed so and it it most easily seen by an “efficiency” calculation where we devide the bound provided by Redheffer (BR ) with the bound provided by Hansen and Pitts (2006, Theorem 2) (BD ), see Figure 2 and in the heavy tail case lim ρ→1 BR = 0. BD 3.2. The Poisson case Decompounding the compound Poisson distribution amounts to studying the following convolution series ∞ X G= pk (λ)F ⋆k , k=0 −λ k where λ ≥ 1 and pk (λ) = e λ /k for all k = 0, 1, . . . . The calculations and conclusions from the geometric case are easily carried over and summarized in Table 1 and Figure 3. A similar carefully derived bound as existing in the geometric case was derived by Buchmann and Grübel (2003, Theorem) and is given by log(2) Fe(τ ) ≤ . λ Henceforth the efficiency comparisons can also be carried over and similar conclusions to the geometric case exists. 12 Hansen, M. B. and Pitts, S.M. 0.0 0.4 0.8 Parameter value 0.0 0.1 0.2 0.3 0.4 Redheffer/Direct Efficiency 0.495 0.485 0.475 Efficiency Landau/Redheffer 0.0 0.4 0.8 Parameter value Fig. 3. For the compound Poisson case. Left-hand side: Comparison of Laudau bound versus Redheffer bound. Right-hand side: Comparison of Redheffer bound versus the direct bound. 4. Applications Although, complex analysis provides a solution to the inverse problem, we will follow (Henrici, 1974) ... The algorithmic attitude to toward mathematics - not to consider a problem solved unless an algorithm for constructing the solution has been found ... and continue in this section with various algorithmic aspects of solving the reversion in practice. This section will discuss two ways of solving the inverse problem. In Section 4.1 we base our methods on the actual πk ’s and suggest either an approach based on Fourier methods or a method based on the the direct convolutions. If it is possible to find a recursion scheme for a discretized version of the direct problem it is straightforward to reverse this scheme and provide a very powerful inversion method - this will be dealt with in Section 4.2. Finally, in Section 4.3 we conclude with a simulated example where we invert the compound Delaporte distribution and provide bootstrapped confidence bounds for the estimated claim size distribution. 4.1. Direct and Fourier methods As mentioned in Section 2.1, the procedure to obtain the actual series can be difficult to implement in practice. We shall only give a brief account on the problems involved. The two standard methods to compute the coefficients πk are reversion of series and the BürmannLagrange Theorem. The first one requires to solve for πk in equation (8) as described in Section 2.1. For efficient ways to implement the recursions we refer to the comprehensive treatment in Section 1.9 of Henrici (1974). Especially Theorem 1.9(a) in Henrici (1974), states a theorem by Schur and Jabotinski which reformulates the problem of finding the reverse of a power series to a problem of calculating the reciprocal of the series. This in turn can be done efficiently by a recursion, which is a modification of the Miller algorithm see Problem Decompounding 13 5, Section 1.8, in Henrici (1974). For a recent overview of the literature and an approach based on nested derivatives we refer to Dominici (2003). However in statistical applications the practitioners would not bother dealing with implementing algorithms, but rather, if possible use a mathematical computation system like Maple. Fortunately it is indeed possible to reverse power series of the type (7) in Maple, by using the solve command. For illustration purposes let U be a shifted gamma-distribution with probability density function (pdf) given by fU (u) = βα (u − γ)α−1 e−β(u−γ) , u ≥ γ > 0. Γ(α) Now assume N |U is pois(U )-distributed whereby the unconditional N becomes a mixed Poisson distribution with pgf (Grandell, 1997, Example 2.2) −α  1−z (13) g(z) = e−γ(1−z) 1 + β and probability function (pf)   α  n k X γ k−n −γ α + n − 1 β 1 pk = e . n (k − n)! β+1 1+β n=0 (14) This distribution is known in the actuarial literature as the Delaporte distribution, (Delaporte, 1959). It has been suggested as an alternative to the more usual assumption of a two-parameter gamma mixture in the theory of insurance claims. The distribution has also been fitted to several data in the literature (Ruohonen, 1988) and we take it as an example, throughout the paper. In that case we arrive at the following compound Delaporte cdf  α  n !  k ∞ X X β 1 γ k−n −γ α + n − 1 F ⋆k . e G= n (k − n)! β + 1 1 + β n=0 k=0 Here follows a Maple-script yielding the first 5 pk ’s for the compound Delaporte distribution (discussed in Section 4.3, below): !   −α −α 1−z β+1 −γ (1−z) −γ > g := series e 1+ −e , z = 0, 6 β β To limit space we have left out the actual Maple output. The result can be reversed to get the first 5 πk ’s for the inverse compound Delaporte distribution > solve(series(g, z = 0, 6) = w, z) which can be simplified for the actual parameters used in Section 4.3 below. 4.1.1. Fourier method A computationally efficient way to implement the convolutions in the direct method is to use Fourier transforms. This is motivated by noticing that one can express F as    ∞ X k bo F = πk F −1 G . k=1 14 Hansen, M. B. and Pitts, S.M. Although, a number of warnings have been reported using transform methods (see Section 3 of Buchmann and Grübel (2003) and references therein) some authors also report that using carefully designed algorithms it is possible to get stable inversion by transform methods (Abate and Whitt, 1992). Without resorting to these advanced algorithms we got satisfactory results using a direct approach. Assume for now that that the πk ’s have been found as described above. First choose the number of terms in the inverse expansion, K, say. Secondly, choose a discretization level h > 0. Thirdly, let Γk denote the discretized values of Gon Γl = Gon ((l + 0.5)h), l = 0, . . . , L and γl the discretized probability function γl = Γl+1 − Γl , l = 0, . . . , L − 1 for some number L. Now, an estimate of the probability function φ for the discretized version of F can be calculated by the fast Fourier transform (fft) and the inverse fast Fourier transform (ifft), see e.g. Bracewell (1999), as provided in most mathematical software packages, ! K X k φl = ifft , l = 0, . . . , L. πk fft (γ) k=1 l This procedure is general in nature but have a number of disadvantages, as one has to choose number of terms in the expansion K and discretization level h. Theoretical as well as practical guidance needs to be developed. For particular cases, the relationship (1) may be expressed neatly in terms of Fourier transforms. For example, if N has a Poisson distribution with mean λ, then  F(G)(θ) = exp λ((F(F )(θ) − 1 . Buchmann and Grübel (2003) refer to inherent difficulties in using transform methods to invert such an expression directly, and we have encountered such difficulties in practice. Our approach above, where the convolution series is truncated and the fast Fourier transform algorithm is used to calculate convolution powers, avoids these difficulties, but involves the choice of truncation parameter K. 4.1.2. Direct method It should be mentioned that, although convolution can be based on fast transform algorithms as described above it is much more straightforward to code the convolutions directly, and sometimes even computationally faster. As described above we first choose the number of terms in the inverse expansion, say K. Secondly, choose a discretization level h > 0. Thirdly, by recalling the definition of self-convolution of empirical cumulative (sub)-distribution functions given in (2), let Γk denote the discretized values of Gon Γl = Gon ((l + 0.5)h), l = 0, . . . , L and γl the discretized probability function γl = Γl+1 − Γl , l = 0, . . . , L − 1 Decompounding 15 for some number L. Then define the discrete convolutions ⋆(k+1) Γl =h l X Γ⋆k k−j γj , l = 0, . . . , L. j=0 Now, we have the following discrete approximation to F Φl = K X πk Γ⋆k l , l = 0, . . . , L. k=1 As noted above this procedure is general in nature but have the same disadvantages as the Fourier method as one has to choose number of terms in the expansion K and discretization level h. Theoretical as well as practical guidance needs further exploration. 4.2. Recursion method As we mentioned the direct and Fourier methods have some clear disadvantages, but fortunately in some cases an alternative exists. The idea is to discretize the claim size distribution, whereby the compound distribution also becomes discrete. First, choose a disretization level h. Secondly, let fk denote the mass given by F to the interval ((k − 0.5)h, (k + 0.5)h], in the following way fk = F ((k + 0.5)h) − F ((k − 0.5)k), and let the discrete distribution that gives mass fk to the point kh, k = 0, 1, 2, . . . , be an approximation to the distribution F . Assume now that the random variables Y, X1 , X2 , X3 , . . . from the introduction satisfies fk gk = P(Xi = k), = P(Y = k) and fkm = P(X1 + · · · + Xm = k) or fk⋆,m+1 = k X ⋆,m fj fk−j . j=0 In many cases gk can be expressed as a linear combination of g1 , . . . , gk−1 and f0 , . . . , fk−1 whereby providing a recursive algorithm for calculating the discrete compound distribution gk . In the seminal work of Panjer (1981) the following general algorithm gk =  k  X 1 bj fj gk−j a+ 1 − af0 j=1 k were derived for situations where the following assumption holds   b pm+1 = a + pm , m = 0, 1, 2, . . . . m+1 Actually, the Panjer recursion only applies to the Poisson, the negative binomial and the binomial distribution. However, there have since Panjer’s work been an intensive interest in 16 Hansen, M. B. and Pitts, S.M. this and various extensions have been developed for distributions belonging to the so-called Willmot class. All, these algorithms can easily be inverted, and provides thereby what we could call inverse Panjer recursion, and subsequently be used for inverse decompounding. For the Panjer class we easily arrive at g0 fk = g(z) + p0 = 1 (a + b)g0 (1 − af0 − k−1 X a+ i=1 ib k  fi gk−i ! (15) , (16) where it might be necessary to implement a numerical routine to invert (15). Actually, the Poisson case was used in Buchmann and Grübel (2003) and the geometric case in Hansen and Pitts (2006) to invert Poisson and geometric random sums, respectively. For a comprehensive overview of recursive algorithms for discrete compound distributions see Section 8.4 of Grandell (1997). In particular for the discrete compound Delaporte distribution Schröter (1990) (see also Grandell (1997), Example 8.6) derived the following recursion   k  X (α − 1 + γβ + γ)j γj ⋆ 2 (1 + β)gk = 1+ gk−j , for k = 1, 2, 3, . . . , (17) fj − f k 2k j j=0 which in turn can be inverted to give  −α 1 − f0 −γ(1−f0 ) 1+ g0 = e β " k−1 γg0 X 1 (1 + β − f0 )gk + fi fk−i fk = (α + γβ + γ − γf0 )g0 2 i=1    k−1 X  γj ⋆ 2 α − 1 + γβ + γ j fj − f gk−j  for k = 1, 2, 3, . . . − 1+ k 2k j j=1 4.3. Example: Inverse Compound Delaporte distribution All calculations in the example have been carried out in the statistical software package R, except the πk ’s, which have been calculated in Maple. To save space the actual scripts are left out, but can be obtained from http://www.math.aau.dk/∼mbh/Mysoftware To check performance of the proposed procedure, we applied it to various simulated data sets. Results from a typical case is summarized in this section. As mentioned in Section 3 it is possible in many standard cases to get explicit expressions for the πk ’s and develop a theory along the lines of Buchmann and Grübel (2003) and Hansen and Pitts (2006). Let us now develop an example where the approach with explicit expressions are not obvious. In what follows the discretization level was chosen to be h = 0.01 and number of discretization points L = 216 . Furthermore, an iid sample of size 500 was drawn from a compound Delaporte distribution with parameters α = 1, β = 1, γ = 0.5 and exponentially distributed claim sizes with mean 1. We now consider the parameters α and β as fixed at 1 and γ at 0.5, but we have to estimate nonparametrically the claim size distribution. This corresponds to the interpretation (Ruohonen, 1988) that the risk is composed of an underlying population risk which is Decompounding Table 2. Estimated πk ’s. k 1 2 πk 3.3 -6.8 k 8 9 πk -2587.7 7647.1 3 16.1 10 -22851.8 4 -41.2 11 68912.5 5 111.1 12 -209391.2 6 -310.3 13 640306.7 17 7 888.1 14 -1968726.2 Poisson distributed with a parameter γ and a time-varying risk which is known and gamma distributed with parameters α and β. Remark 2 ensures that there exists a τ such that the asymptotic results of Theorem 1, 2 and 3 holds. Maple provided the first 14 pk ’s by the command: >g := series(exp(−.5 ∗ (1 + z)/(2 − z) − .5 ∗ exp(−.5), z = 0, 15); which again can be reversed to get the first 14 πk ’s for the inverse compound Delaporte distribution > solve(series(g, z = 0, 6) = w, z, 15); The result is summarized in Table 2 The full-drawn line at the left-hand side of Figure 4 shows the ecdf for an iid sample of size 500 from the compound Delaporte distribution with parameters α = β = 1, γ = 0.5 and exponentially distributed claim sizes with mean 1. The dashed line shows the underlying theoretical cdf, calculated via the direct Panjer recursion provided in (17) and the discretization parameters given in the previous paragraph. At the left-hand side of Figure 4 the inversion has been carried out by the Fourier method with K = 14 and the above-mentioned discretization parameters. The dashed line shows the exponential distribution with parameter 1. In Figure 5 (left-hand side) the full-drawn grey line shows an estimator based on inverse Panjer recursion, with discretization parameters chosen as above. Theorem 3 refers to confidence regions in the space Dτ [0, ∞), and calculating these in practice require the user to make a choice of a suitable τ . The bounds in Section 3 provide some help with this choice. For the example here, we have taken a pragmatic approach, and constructed a bootstrap confidence band using the k · k∞ -norm, as this is easy to construct in practice (although not supported by our theorems). Figure 5 (right-hand side) shows, the 90% bootstrapped confidence bounds (with respect to the sup-norm) with N = 100. In both plots the dashed line shows the exponential distribution with mean 1. 5. Discussion In the discussion of Buchmann and Grübel (2003) they claim their basic problem (decompounding Poisson random sums) is amenable to generalizations within stochastic processes. We believe to have done so in the previous sections by indicating the problem can be solved in great generality. Moreover, we have directed the attention to a number of real-life applications in e.g. control of queuing systems, infinite capacity storage models, insurance mathematics, network statistics as well as biology. We also showed how Panjer recursion can be inverted for the Panjer as well as the Willmot class, hereby providing a very strong tool to discretize and solve the inverse problem at hand. Finally, we developed by means of Maple and R effective tools to decompound general random sums. 18 Hansen, M. B. and Pitts, S.M. 0.8 0.6 0.0 0.2 0.4 F_n 0.6 0.0 0.2 0.4 G_n 0.8 1.0 Fourier method 1.0 Ecdf 0 1 2 3 4 0 y 1 2 3 4 x Fig. 4. Left-hand side: Estimate of the compound claim size distribution with exponentially distributed claim sizes with mean one, n = 500, α = β = 1 and γ = 0.5. Right-hand side: Inversion by the Fourier method. The dashed lines show the compound Delaporte and the exponential distribution with mean 1, respectively. Despite the general set-up there are still a number of unsolved problems for future research. One may e.g. wonder what happens when the first n − 1 derivatives of g vanish at 0, i.e. pi = 0 for i ≤ n − 1 and pn 6= 0. In this case, the equation w = g(z) has a solution z = f (w) = ∞ X πk wk/n , (18) k=1 where the series of powers of w 1/n converges in a neighbourhood of 0. Henceforth, f is n-valued function of w, having a branch-point at 0 (Copson, 1935, p. 123). This corresponds to solving the inverse problem G = pn F ⋆n + pn+1 F ⋆(n+1) + · · · for pn 6= 0. Hence the plug-in estimator is given by Fn = ∞ X k=1 πk (Gn )⋆k/n . (19) Decompounding 0.8 0.6 0.0 0.2 0.4 F_n 0.0 0.2 0.4 F_n 0.6 0.8 1.0 Confidence bounds 1.0 Recursion method 19 0 1 2 3 4 0 x 1 2 3 4 x Fig. 5. Left-hand side: Inversion by the recursion method. Right-hand side: 90% confidence bounds. The dashed line shows the exponential distribution with mean 1. This makes sense, if we define fractional convolution as f ⋆α = L−1 (L(f )α ), α ∈ R, whenever the Laplace transforms are well-defined, as the inverse of (19) can be derived by Laplace transformation e = G hence by means of (18) we arrive at F pn Fe n + pn+1 Fe (n+1) + · · · = = ∞ X k=1 ∞ X e k/n ). πk L−1 (G πk G⋆k/n . k=1 However, in this case we do not know of any lower bound on the radius of convergence of f , and we are forced to either develop these or do a case by case analysis. 20 Hansen, M. B. and Pitts, S.M. In passing it may be worth noting, that we have defined fractional convolution for any α ∈ R as H = F ⋆α ⇔ L(H) = L(F )α whenever the Laplace transforms are well-defined. In fact such dualities are the basis for deriving fractional operations in general. In recent years, there has been put an enormous effort in the field of fractional analysis. We will not pursue this subject further, but refer the interested reader to Bultheel and Martı́nez-Sulbaran (2003), where a comprehensive overview of the field and its applications in signal processing, geometry, optics, mechanics, stochastic processes etc. is provided. 6. Proofs Proof of Theorem 1. The proof of Theorem 1 uses the approach of Buchmann and Grübel (2003), see also Hansen and Pitts (2006) and applies it to reversion of power series. In common with the above papers, our functional Λ involves convolutions of distribution functions, and so we summarise here the spaces introduced in Buchmann and Grübel (2003) for functions that may be used as integrators in convolution integrals. Let D(∞) = ∪τ >0 Dτ [0, ∞) and let Dm (∞) be the subset of D(∞) consisting of functions having finite variation on [0, x] for all x > 0. Then a function H in Dm (∞) corresponds to a signed measure µH given by µH [0, x] R= H(x). For H in Dm (∞) and g in D(∞) we may define the convolution g ⋆ H(x) = g(x − y)H(dy) for x ≥ 0. As observed in Buchmann and Grübel (2003), members of Dm (∞) are identified via their Laplace transforms. Any H in Dm (∞) is also in e D(∞) and hence is in Dτ [0, ∞) for some τ > 0. We note that the Laplace transform H(θ) is then defined for θ > τ . We need a technical result about convolutions from Lemma 6 of Buchmann and Grübel + (2003), which we state without proof. Let Dm (∞) be the subset of Dm (∞) consisting of functions H such that the measure µH is non-negative. Then e ) for all τ > 0. kg ⋆ Hk∞,τ ≤ kgk∞ H(τ (20) Turning to the convolution series given by Λ(G), we first note that, for any distribution  P o ⋆k function G on [0, ∞), we have m is in Dτ [0, ∞) for each finite m. Using (20) k=1 πk G we have for k ≥ 1, (Go )⋆k Then it follows that ∞ X k=1 ∞,τ e o (τ )k−1 ≤ G e o (τ )k−1 . ≤ kGo k∞,τ G |πk | (Go )⋆k ∞,τ ≤ ∞ X k=1 e o (τ )k−1 . |πk |G e o (τ ) < σf (≤ r(f )), so that It is assumed in the statement of the theorem that G ∞ X k=1 converges. Thus Λ(G) is in Dτ [0, ∞). e o (τ )k−1 |πk |G Decompounding To obtain the relationship between F and G, first note that the series can be written as the difference of two non-decreasing functions: ∞ X πk (Go )⋆k = X k:πk ≥0 k=1 πk (Go )⋆k − X k:πk <0 P∞ k=1 21 o ⋆k πk (G ) |πk |(Go )⋆k , and so Λ(G) is in Dm (∞). This means that Λ(G) is identified by its Laplace transform. From the definition of G in terms of F , we have, for θ > τ , e o (θ) = G ∞ X k=1  pk Fe(θ) = g Fe(θ) . e o (θ) < σf and thus, by (1) there exists a unique z with |z| < σg such that We have G  o e (θ), and this z is given by the power series reversion, ie z = P∞ πk G e o (θ) k . g(z) = G k=1 Since Fe(θ) < σg , this implies that z = Fe(θ) and that Fe (θ) = ∞ X k=1  e o (θ) k . πk G ] The right-hand side of this last expression is Λ(G)(θ), from which we conclude that F = Λ(G), and Theorem 1 is proved. Proof of Theorem 2. This is an application of the infinite-dimensional delta method, which combines an appropriate differentiability property of the map Λ with the relevant asymptotic normality result for the empirical process associated with the input estimators G n , n ≥ 1 (see, Gill (1989), van der Vaart (1998)). The proposition below gives an appropriate differentiability result for Λ. Proposition 2. Suppose that Gn , n ∈ N, and G are distribution functions with Gn (0) = e ) < σf and Fe(τ ) < σg , and suppose that G(0) = 0 for all n ∈ N. Let τ > 0 be such that G(τ  √ k n Gn − G − hk∞ → 0 as n → ∞ for some h in D[0, ∞). Then  √ k n Λ(Gn ) − Λ(G) − h ⋆ Hk∞,τ → 0 as n → ∞, where H = P∞ k=1 kπk G⋆(k−1) . Comparing this proposition with Proposition 8 in Buchmann and Grübel (2003) and Proposition 4 in Hansen and Pitts (2006), it is clear that it is the relevant generalisation of those differentiability results to the general decompounding case considered here. The proof of Proposition 2 follows similar methods to the proofs of the corresponding propositions in the above earlier papers, with η in those proofs replaced here by η ∈ (0, 1) e ) < ησf , and then using the fact that the series P∞ πk (ησf )k is chosen such that G(τ k=1 absolutely convergent. We refer the reader to the earlier papers for further details. 22 Hansen, M. B. and Pitts, S.M. The second ingredient for the infinite-dimensional delta method is an asymptotic normality result for the input estimators, which is given by the empirical central limit theorem (Pollard (1984) V.2.11). This states that  √ n Gn − G →D B ◦ G in (D[0, ∞), k · k∞ ), where B is a standard Brownian bridge and B ◦ G is a rescaled Brownian bridge given by B ◦ G(t) = B(G(t)), t ≥ 0. The limit process B ◦ G is thus a zero mean Gaussian process with cov(B ◦ G(s), B ◦ G(t)) = G(s ∧ t) − G(s)G(t). The empirical central limit result is then combined with Proposition 2 to yield Theorem 2, see Buchmann and Grübel (2003) for details. The covariance structure of the limiting Gaussian process A in Theorem 2 arises easily, as in Buchmann and Grübel (2003). Proof of Theorem 3. The proof proceeds by first showing that R̂n converges in distribution to R in D[0, ∞), and then using this to infer the result of the theorem. The technicalities are the same as those in the proof of Theorem 2.3 in Grübel and Pitts (1993), and the interested reader is referred to that paper. Acknowledgment The authors thank Nick Bingham, University of Sheffield, for a most helpful discussion. References Abate, J. and W. Whitt (1992). The Fourier-series method for inverting transforms of probability distributions. Queueing Systems 10, 5–88. Asmussen, S. (2003). Applied Probability and Queues (2 ed.). Springer Verlag, New York, NY. Bingham, N. H. and S. M. Pitts (1999a). Nonparametric estimation for the M/G/∞ queue. Annals of the Institute of Statistical Mathematics 51, 71–97. Bingham, N. H. and S. M. Pitts (1999b). Nonparametric inference from M/G/1 busy periods. Communications in Statistics and Stochastic Models 15, 247–272. Bracewell, R. N. (1999). The Fourier Transform and its Applications (3 ed.). New York: McGraw-Hill. Buchmann, B. and R. Grübel (2003). Decompounding: An estimation problem for Poisson random sums. Annals of Statistics 31, 1054–1074. Bultheel, A. and H. Martı́nez-Sulbaran (2003). Recent developments in the theory of the fractional Fourier transforms and linear canonical transforms. Bulletin of the Belgian mathematical Society- Simon Stevin. Accepted. Decompounding 23 Copson, E. T. (1935). An Introduction to the Theory of Functions of a Complex Variable. Oxford: The Clarendon Press. Delaporte, P. (1959). Quelques problemés de statistique mathématique posés par l’assurance automobile et le bonus pour non sinistre. Bulletin Trimestriel de l’Institut des Actuaires Francais 227, 87–102. Dominici, D. (2003). Nested derivatives: A simple method for computing series expansions of inverse functions. International Journal of Mathematics and Mathematical Sciences 58, 3699–3715. Gill, R. (1989). Non- and semi-parametric maximum likelihood estimators and the von Mises method (Part 1). Scandinavian Journal of Statistics 16, 97–128. Grandell, J. (1997). Mixed Poisson Processes. London: Chapman and Hall. Grübel, R. and S. M. Pitts (1993). Non-parametric estimation in renewal theory, I: the empirical renewal function. Annals of Statistics 21, 1431–1451. Hall, P. and J. Park (2004). Nonparametric inference about service time distribution from indirect measurements. Journal of the Royal Statistical Society 66, 861–875. Hansen, M. B. and S. M. Pitts (2006). Nonparametric inference from the M/G/1 workload. Accepted for publication in Bernoulli. Henrici, P. (1974). Applied and Computational Complex Analysis, Vol 1: Power Series Integration - Conformal Mapping - Location of Zeros. New York: John Wiley and Sons. Hohn, N. and D. Veitch (2003). Inverting sampled traffic. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, Miami Beach, FL, USA, pp. 222–233. ACM Press New York, NY, USA. Johnson, N. L., A. W. Kemp, and S. Kotz (2005). Univariate Discrete Distributions (3 ed.). New York: John Wiley and Sons. Levental, S. (1988). Uniform limit theorems for Harris recurrent Markov chains. Probability Theory and Related Fields 80, 101–118. Neyman, J. and E. L. Scott (1958). A statistical approach to problems of cosmology. Journal of the Royal Statistical Society, Series B 20, l–43. Panjer, H. H. (1981). Recursive evaluation of a family of compound distributions. Astin Bulletin 12, 22–26. Pollard, D. (1984). Convergence of Stochastic Processes. New York: Springer Verlag. Redheffer, R. M. (1962). Reversion of power series. American Mathematical Monthly 69, 423–425. Ruohonen, M. (1988). On a model for the claim number process. Astin Bulletin 18, 57–68. Schröter, K. (1990). On a family of counting distributions and reccursions for related compound distributions. Scandinavian Actuarial Journal 304, 161–175. 24 Hansen, M. B. and Pitts, S.M. Sharma, V. (1999). Estimating traffic intensities at different nodes in networks via a probing stream. In GLOBECOM 1999: Seamless Interconnection for Universal Services, pp. 374– 380. Piscataway, NJ: IEEE Operations Center. Sharma, V. and R. Mazumdar (1998). Estimating traffic parameters in queueing systems with local information. Performance Evaluation 32, 217–230. Tsai, T. H. (1998). The uniform CLT and LIL for Markov chains. Ph. D. thesis, University of Wisconsin, Madison, WI. van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press, Cambridge. van Es, B., S. Gugushvili, and P. Spreij (2005). A kernel type nonparametric density estimator for decompounding. URL: front.math.ucdavis.edu/math.ST/0505355. Wand, M. P. and M. Jones (1995). Kernel Smoothing, Volume 60 of Monographs on Statistics and Applied Probability. Chapman and Hall. Zhang, C. (1990). Fourier methods for estimating mixing densities and distributions. Annals of Statistics 18, 806–831.