[go: up one dir, main page]

Academia.eduAcademia.edu

Statistical Aspects of Perpetuities

2000, Journal of Multivariate Analysis

Statistical aspects of perpetuities Rudolf Grübel Universität Hannover Email: rgrubel@stochastik.uni-hannover.de Susan M. Pitts University of Cambridge Email: s.pitts@statslab.cam.ac.uk Abstract For a distribution µ on the unit interval we define the associated perpetuity Ψ(µ) as the distribution of 1 + X1 + X1 X2 + X1 X2 X3 + . . ., where (Xn )n∈N is a sequence of independent random variables with distribution µ. Such quantities arise in insurance mathematics and in many other areas. We prove the differentiability of the perpetuity functional Ψ with respect to integral and supremum norms. These results are then used to investigate the statistical properties of empirical perpetuities, including the behaviour of bootstrap confidence regions. AMS 1980 subject classifications: Primary 62G05; secondary 62G09, 62P05. Key words and phrases: Perpetual annuity, empirical perpetuities, asymptotic normality, bootstrap. 1 Running head : Statistical aspects of perpetuities. Corresponding author : S.M. Pitts Statistical Laboratory 16 Mill Lane Cambridge CB2 1SB UK email : s.pitts@statslab.cam.ac.uk Telephone: (0)1223 337960 Fax : (0)1223 337956 2 1 Introduction Let µ be a probability distribution on the unit interval [0, 1]; to avoid trivialities we will assume throughout the paper that µ is not concentrated on the single value 1. It is then easy to see that for any sequence (Xn )n∈N of independent random variables with distribution µ the cumulative products (X1 · . . . · Xn )n∈N of the sequence sum to P a finite Qnvalue with probability one; we call the distribution of Y := 1 + ∞ n=1 m=1 Xm the perpetuity associated with µ and denote it by Ψ(µ). Perpetuities arise in a variety of problems in pure and applied mathematics. The name points to a financial context: Y can be regarded as the present value of a perpetual periodic payment of one monetary unit if Xn denotes the value at the beginning of time period n of one unit paid out at the beginning of time period n + 1; see Dufresne (1990) for the risk and insurance theory context and a review of the relevant literature. Perpetuities also appear in the analysis of a random algorithm where µ is the uniform distribution on (0, 1) or (1/2, 1); see Grübel and Rösler (1996) and Grübel (1998). The perpetuity associated with the uniform distribution on the unit interval also arises in number theory; see de Bruijn (1951). Further, Ψ(µ) is the stationary distribution of a system (Yn )n∈N with dynamic evolution Yn+1 = 1 + Xn+1 Yn . Random affine maps and stochastic difference equations provide further areas where perpetuities appear naturally; see Vervaat (1979), Goldie and Grübel (1996) and the references given there for details and more applications. Of course, in many applications generalizations of the above setup are needed, with e.g. µ not concentrated on the unit interval, or matrix-valued X-variables. Interest in the literature focuses on convergence issues and tail probabilities. Despite their practical importance statistical aspects of perpetuities seem to have received little attention so far, the most notable exception we are aware of being the paper by Aebi, Embrechts and Mikosch (1994). In the present paper we show that a moderately abstract viewpoint is useful in this context. We regard Ψ as a nonlinear operator, the perpetuity functional, mapping probability distributions to probability distributions. As a first application and on a heuristic level, this functional approach provides a natural estimator for Ψ(µ), given a sample X1 , . . . , Xn from µ. With µ̂n denoting the empirical distribution associated with the sample, i.e. the probability measure that assigns mass 1/n to each of X1 , . . . , Xn , the ‘plug-in principle’ leads us to estimate Ψ(µ) by Ψ(µ̂n ), the empirical perpetuity. This is, essentially, the ‘bootstrap estimator’ investigated by Aebi et al. (1994). A suitable continuity property of the perpetuity functional together with the known consistency of µ̂n as an estimator of µ implies the consistency of empirical perpetuities. Further, on the next level of detail, a suitable differentiability property of the functional together with the known asymptotic distributional behaviour of empirical distributions can be used to prove 3 asymptotic normality of Ψ(µ̂n ) (the von Mises or delta method). The same differentiability property can be used to show that ‘the bootstrap works’. The precise results are given in Section 2, Section 3 contains the proofs. In previous work we have used this functional approach in a variety of situations, see Grübel and Pitts (1993) for renewal theory, Pitts (1994a) for the context of queueing theory and Pitts (1994b) for compound distributions. In these applications convolution series appear directly or indirectly, and Banach algebra theory is an important tool. The situation considered here is of a completely different type, with the analysis of fixed point equations playing a central role. 2 Results Let (Xi )i∈N , µ, µ̂n be as in the introduction, and let Ĝn be the distribution function associated with the empirical perpetuity Ψ(µ̂n ). Similarly, G denotes the distribution function associated with the unknown perpetuity Ψ(µ). Our interest concentrates on two statements: first, asymptotic normality of empirical perpetuities, and secondly, asymptotic validity of bootstrap confidence regions for the unknown perpetuity. Formally, the first statement means that √ n (Ĝn − G) →d Z as n → ∞, (AN) with some Gaussian process Z. Here ‘→d ’ denotes convergence in distribution; in particular, (AN) refers to a topology on a suitable space F of functions, and our results give conditions for the validity of (AN) in two different classes of normed function spaces. In both cases we regard Ψ as an operator from D[0, 1], the space of cadlag functions f : [0, 1] → R, to F, and compute its derivative Ψ′µ at µ. Let F be the distribution function associated with µ and let B = (Bt )0≤t≤1 be the Brownian bridge. The limit process Z in (AN) is the image of the time-changed Brownian bridge B ◦ F = (BF (t) )0≤t≤1 under the bounded linear map Ψ′µ . A consequence of (AN) of particular interest is the convergence of  √ Rn (x) = P n kĜn − Gk ≤ x to the analogous quantity associated with the limit process,  R(x) = P kZk ≤ x , for all continuity points x of the latter. Quantiles of R could be used to set up asymptotic confidence regions for G. This would, however, involve the calculation of the distribution of the supremum of the Gaussian process Z, whose second order structure depends on the underlying unknown µ in a complicated way—with F′ and D[0, 1]′ the topological duals of F and 4 D[0, 1] respectively, and Ψ′µ ⋆ : F′ → D[0, 1]′ the adjoint of Ψ′µ , the covariance function of Z would be given by    ⋆ ⋆ cov hf, Zi, hg, Zi = E Ψ′µ (f ), B ◦ F · Ψ′µ (g), B ◦ F for all f, g ∈ F′ . The bootstrap resolves this dilemma, using an estimate for Rn which is in turn obtained via the plug-in principle as the Rn -quantity associated with the empirical distribution. For a formally correct definition let µn (x1 , . . . , xn ) be the probability measure that assigns mass 1/n to each of the values x1 , . . . , xn , i.e. with δx denoting unit mass at x, n 1 X δ xi . n µn (x1 , . . . , xn ) = i=1 With this notation we can regard the empirical distribution as the random probability measure µ̂n = µn (X1 , . . . , Xn ). Then, with In := {1, . . . , n}n , √  X  R̂n (z) = n−n 1[0,z] n Ψ µn (Xi1 , . . . , Xin ) − Ψ(µ̂n ) . (i1 ,...,in )∈In A numerical approximation for this quantity can be calculated by the usual Monte Carlo method. Note that this differs from Aebi et al. (1994) who investigated the variability of the Monte Carlo approximation to Ĝn . We can now state our second aim, which is to establish that the bootstrap works. By this we mean that R̂n → R in probability as n → ∞. (BW) This implies that the quantiles associated with R̂n (which can be calculated from the data to any desired degree of precision) can serve as a substitute for the quantiles associated with Rn if n is large, as both distribution functions have the same limit R. If qn,α is the (estimated) α-quantile of the distribution of the pivot, then the corresponding confidence region for G consists of all distribution √ functions G′ with k · k-distance from Ĝn less than or equal to qn,α / n. Our first result uses a weighted integral norm and does not impose any additional restrictions on the underlying distribution µ. For this result we consider the √ quantity n(Ĝn − G) as an element of the space Z ∞ L1,σ := {f : [1, ∞) → R : eσx |f (x)| dx < ∞}, 1 where σ depends on µ. With the usual identification of functions that differ only on a Lebesgue null set, L1,σ becomes a Banach space with the norm Z ∞ eσx |f (x)| dx. kf k1,σ := 1 Let m1 (µ) = R x µ(dx) be the first moment associated with µ. 5 Theorem 1. If σ < − log m1 (µ) then (AN) and (BW) hold with respect to the k · k1,σ -norm. Confidence regions based on the k · k1,σ -pivot are not easily visualised. Pivots based on the supremum norm lead to confidence bands for the distribution function of the perpetuity, which can be displayed easily and could also be used to set up confidence intervals for e.g. quantiles of the perpetuity. We consider supremum norms with a continuous and increasing weight function ρ : [1, ∞) → (0, ∞) which we further assume to satisfy sup x≥0 ρ(1 + x) < ∞. ρ(1 ∨ x) (1) Let D0 be the set of all functions f : [1, ∞] → R which are right continuous, have left hand limits and satisfy f (∞) = f (∞−) = 0. Let Dρ be the set of all f ∈ D0 with kf k∞,ρ := sup ρ(x)|f (x)| < ∞. x≥1 With its usual linear structure (Dρ , k · k∞,ρ ) is a Banach space. Theorem 2. If ρ is such that ρ(x) = O(eσx ) for some σ < − log m1 (µ) as x → ∞ and if Z 1 C(µ, ρ) := sup ρ(1 + x) µ(dy) < 1, x≥0 (0,1∧x] ρ(x/y) then (AN) and (BW) hold with respect to the k · k∞,ρ -topology. The condition C(µ, ρ) < 1 will be discussed in Section 3.5, where we show that it is satisfied with ρ(x) = exp(σx) for σ < log 2 ≈ 0.693 if µ has a decreasing density. 3 Proofs In Section 3.1 we introduce some notation and comment on general aspects of the proofs. In Sections 3.2 and 3.3 we prove the differentiability of Ψ with respect to the integral norm and the supremum norm respectively. Section 3.4 explains how these differentiability properties are connected to (AN) and (BW). In the final subsection we discuss the condition C(µ, ρ) < 1 in Theorem 2. 3.1 For any pair µ, ν of finite signed measures, with µ concentrated on [0, 1] and ν concentrated on [1, ∞), let T (µ, ν) be the finite signed measure concentrated on [1, ∞) defined by Z  x − 1   T (µ, ν) (x, ∞) := ν , ∞ µ(dy) for all x ≥ 1, (2) y (0,1] 6 and T (µ, ν)({1}) := µ({0})ν([1, ∞)). Obviously, this defines a bilinear map. A signed measure ν on [1, ∞) of known total mass can be characterized by its tail function f : [1, ∞) → R, f (x) := ν((x, ∞)). For fixed µ we could interpret T (µ, · ) as an operator mapping tail functions to tail functions. If ν has total mass 0 then   Z  x−1 T (µ, ν) (x, ∞) = f µ(dy) for all x ≥ 1, (3) y (0,1∧(x−1)] and the total mass of T (µ, ν) is again 0. Using Fubini’s theorem, we can rewrite (2) as Z  x − 1 i  , 1 ν(dy) for all x ≥ 1, (4) T (µ, ν) (x, ∞) = µ y [1,∞) and if f denotes the tail function of a signed measure on [0, 1] with total mass 0 then, similar to the transition from (2) to (3),   Z  x−1 T (µ, ν) (x, ∞) = f ν(dy) for all x ≥ 1. (5) y [(x−1)∨1,∞) Equations (3) and (5) show that T can be extended to operate on pairs of arguments consisting of a finite signed measure and a cadlag function. Below we will often identify finite signed measures of known total mass with their tail function. If µ and ν are the distributions of independent random variables X and Y respectively then T (µ, ν) is the distribution of 1 + X · Y . This implies the following important relationship between Ψ and T :  T µ, Ψ(µ) = Ψ(µ), (6) whenever Ψ(µ) is defined. Equation (6) shows that Ψ(µ) can be regarded as the fixed point of the linear map ν 7→ T (µ, ν). Using the bilinearity of T we obtain for any µ1 , µ2   Ψ(µ1 ) − Ψ(µ2 ) = T µ1 − µ2 , Ψ(µ1 ) + T µ2 , Ψ(µ1 ) − Ψ(µ2 ) . (7) If we have a curve {µǫ }0≤ǫ≤1 in the space of probability measures we similarly obtain   1   1 1 Ψ(µǫ )−Ψ(µ0 ) = T (µǫ −µ0 ), Ψ(µǫ ) + T µ0 , Ψ(µǫ )−Ψ(µ0 ) . (8) ǫ ǫ ǫ  This exhibits the difference quotient ǫ−1 Ψ(µǫ ) − Ψ(µ 0 ) as a fixed point  of the affine operator ν 7→ T ǫ−1 (µǫ − µ0 ), Ψ(µǫ ) + T (µ0 , ν). This will be important in proving differentiability of Ψ, by which we mean that the convergence of ǫ−1 (µǫ −µ0 ) as ǫ → 0 to some object g implies the convergence of the associated difference quotients to some h = Ψ′µ0 (g), the linear operator 7 Ψ′µ0 being the derivative of Ψ at µ0 . The topologies will depend on the ‘point’ µ0 where we want to differentiate Ψ. Before we begin with the technical details of the proofs one more general comment is in order. Equation (6) also implies  Φ µ, Ψ(µ) = 0, with Φ(µ, ν) := ν − T (µ, ν). Written in this form, proving differentiability of Ψ looks like a straightforward case for an implicit function theorem. However, the statistical applications dictate a weak topology, where the arguments of T could leave the space of signed measures (Brownian motion paths are not of bounded variation), and it is not clear how to define T if neither argument is the distribution function or, equivalently, the tail function of a signed measure. In our more direct approach we avoid this problem. As a rough guideline to the proofs it is perhaps worth mentioning in this context that the implicit function theorem would require the invertibility of the derivative Φ2 of Φ with respect to the second argument. The obvious candidate is Φ2 = Id − T (µ, · ), which leads us to consider the Neumann series associated with the linear operator T (µ, · ). 3.2 For a probability measure µ on [0, 1] with µ({1}) 6= 1 and f in L1,σ , let   Z  x−1 Uµ f (x) = f µ(dy), x ≥ 1. y (0,1∧(x−1)] If ν has tail function f and total mass 0 then, from (3),   Uµ f (x) = T (µ, ν) (x, ∞) . (9) On the space D[0, 1] of cadlag functions f : [0, 1] → R we consider the norm kf k∞ := sup0≤x≤1 |f (x)|. We have the following basic inequalities. Lemma 3. (i) For all f ∈ L1,σ , kUµ f k1,σ ≤ eσ m1 (µ) kf k1,σ . (ii) For all f ∈ D[0, 1] and all probability measures ν concentrated on [1, ∞),   eσ σ . kT (f, ν)k1,σ ≤ e kf k∞ kνk1,σ + σ 8 Proof. The proof of (i) follows from Z Z ∞ Z ∞ eσx eσx |Uµ f (x)| dx ≤   x−1 f µ(dy) dx y 1 (0,1∧(x−1)]   Z Z ∞ x−1 σx e f = dx µ(dy) y (0,1] y+1 Z Z ∞ σ eσuy |f (u)| du µ(dy) = e y 1 (0,1] 1 σ ≤ e m1 (µ)kf k1,σ , for the proof of (ii) we use Z ∞ Z σx kT (f, ν)k1,σ ≤ e  x−1 ν(dy) dx f y 1 [(x−1)∨1,∞)   Z Z y+1 x−1 σx e f dx ν(dy) = y [1,∞) 1 Z Z 1 = eσ(uy+1) |f (u)| du ν(dy) y  0 [1,∞) ≤ kf k∞ Z Z = kf k∞ Z eσz [1,∞) ∞ 1 Z y+1 eσz dz ν(dy) 1 Z ν(dy) dz [(z−1)∨1,∞) 2  eσz ν [1, ∞) dz + 1   σ σ e ≤ kf k∞ e + kνk1,σ . σ = kf k∞ Z 1 ∞   eσ(z+1) ν [z, ∞) dz Note that (i) implies that Uµ is a bounded linear map from L1,σ to L1,σ . Further, if eσ m1 (µ) < 1 then Uµ is a contraction, and the partial sums Pn k 0 k=0 Uµ f , n ∈ N, constitute a Cauchy sequence in (L1,σ , k · k1,σ ) (here Uµ is understood to be the identity on L1,σ ). We P can therefore define another ∞ k linear operator Vµ : L1,σ → L1,σ by Vµ := k=0 Uµ . Simple standard arguments from functional analysis suffice to prove the following properties of the Neumann series Vµ (see Heuser (1982) §8). Lemma 4. (i) Vµ is continuous. (ii) If f, g ∈ L1,σ are such that f = g + Uµ f , then f = Vµ g. For the following lemmas we consider a family {µǫ }0≤ǫ≤1 of probability measures on [0, 1]. We assume that µ0 ({1}) 6= 1 and fix some σ, 0 < σ < − log m1 (µ0 ). We begin our local analysis of the functional Ψ by proving local boundedness. 9 Lemma 5. If then   lim sup µǫ (x, 1] − µ0 (x, 1] = 0, ǫ→0 0≤x≤1 lim sup Ψ(µǫ ) ǫ→0 1,σ < ∞. Proof. Choose θ0 ∈ (σ, − log m1 (µ0 )) and η ∈ (m1 (µ0 ), 1) such that eθ0 η < 1. Now choose κ > 0 such that  eθ 1 + κθη ≤ 1 + κθ for 0 ≤ θ ≤ θ0 . (10) Let ǫ0 > 0 be such that m1 (µǫ ) ≤ η for all ǫ ≤ ǫ0 (note that the assumptions on the support of the measures imply convergence R θx of the first moments). For a probability measure µ let Mµ , Mµ (θ) := e µ(dx) be the associated moment generating function; if µ = L(X) we simply write MX . We claim that MΨ(µǫ ) (θ0 ) ≤ 1 + κθ0 for all ǫ ≤ ǫ0 . (11) To prove this for a given P ǫ ≤ ǫ0 Q we note that Ψ(µǫ ) is the distribution of the k random variable Y = 1+ ∞ L(Xi ) = µǫ . k=1 i=1 Xi with {Xi } independent, Pn Q This variable is the monotone limit of the variables Yn := 1 + k=1 ki=1 Xi as n → ∞, so (11) follows if we can show that MYn (θ) ≤ 1 + κθ for all θ ∈ [0, θ0 ], n ∈ N. (12) For n = 0 this is obvious from (10) since Y0 ≡ 1. We have n Y k   X Yn+1 = 1 + X1 · 1 + Xi+1 , k=1 i=1 which implies L(Yn+1 ) = L(1 + Xn+1 Yn ). Therefore, if (12) holds for some n, then  MYn+1 (θ) = E exp θ(1 + Xn+1 Yn ) Z θ MYn (xθ) µǫ (dx) = e [0,1] Z θ ≤ e (1 + κxθ) µǫ (dx) [0,1] θ = e  1 + κθm1 (µǫ ) ≤ 1 + κθ. 10 This completes the inductive proof of (12), and hence (11). Using Z Z ∞ Ψ(µǫ )(dy) dx kΨ(µǫ )k1,σ = eσx (x,∞) 1 Z Z y eσx dx Ψ(µǫ )(dy) = (1,∞) ≤ ≤ 1 1 M (σ) σ Ψ(µǫ ) 1 M (θ0 ), σ Ψ(µǫ ) we see that this implies the statement of the lemma. Next we prove continuity. As we deal with the limit ǫ → 0, we may assume in the proofs below that ǫ is small enough for the respective quantities to be defined. Lemma 6. If   lim sup µǫ (x, 1] − µ0 (x, 1] = 0, ǫ→0 0≤x≤1 then lim Ψ(µǫ ) − Ψ(µ0 ) ǫ→0 1,σ = 0. Proof. From (7),   Ψ(µǫ ) − Ψ(µ0 ) = T µǫ − µ0 , Ψ(µǫ ) + T µ0 , Ψ(µǫ ) − Ψ(µ0 ) , and, by (9), this can be written  Ψ(µǫ ) − Ψ(µ0 ) = g + Uµ0 Ψ(µǫ ) − Ψ(µ0 ) ,  where g = T µǫ − µ0 , Ψ(µǫ ) . Applying Lemma 4 (ii), we find  Ψ(µǫ ) − Ψ(µ0 ) = Vµ0 T µǫ − µ0 , Ψ(µǫ ) . Then, since Vµ0 is bounded (by Lemma 4 (i)), we have  kΨ(µǫ ) − Ψ(µ0 )k1,σ ≤ kVµ0 k kT µǫ − µ0 , Ψ(µǫ ) k1,σ  eσ  ≤ kVµ0 k eσ kµǫ − µ0 k∞ kΨ(µǫ )k1,σ + , σ by Lemma 3 (ii). The right-hand-side tends to zero as ǫ tends to zero by the assumptions of this lemma and by Lemma 5. For the final step we need a suitable form of continuity of T with respect to its second argument; note that the inequality in Lemma 3 (ii) is not sufficient for this purpose. 11 Lemma 7. If {νǫ }0≤ǫ≤1 is a family of probability measures in L1,σ with lim kνǫ − ν0 k1,σ = 0, ǫ→0 then, for any f ∈ D[0, 1] with f (1) = 0, lim T (f, νǫ ) − T (f, ν0 ) ǫ→0 = 0. 1,σ Proof. We write 1A for the indicator function of the set A. For 0 ≤ a < b ≤ 1 we obtain     Z Z ∞ Z x−1 x−1 σx 1[a,b) e νǫ (dy) − ν0 (dy) dx 1[a,b) y y [1,∞) 1 [1,∞)     Z ∞ x−1 x−1 σx e νǫ ≤ ,∞ − ν0 ,∞ dx b b 1+b     Z ∞ x−1 x−1 σx ,∞ − ν0 ,∞ dx e νǫ + a a 1+a ≤ (a + b) eσ kνǫ − ν0 k1,σ . This shows that the statement of the lemma holds for f = 1[a,b) , from which it follows easily that it also holds for finite linear combinations of such indicator functions. Now let f be an arbitrary element of D[0, 1] with f (1) = 0 and let K := eσ sup kνǫ k1,σ + σ −1 e2σ . 0≤ǫ≤1 For any given δ > 0 we can find a function g that can be written as a finite linear combination of indicator functions of intervals [a, b), 0 ≤ a < b ≤ 1, and satisfies kf − gk∞ < δ/(3K). For g we can find an ǫ0 > 0 such that for all ǫ ∈ (0, ǫ0 ), T (g, νǫ ) − T (g, ν0 ) 1,σ ≤ δ/3. Using this, the bound T (f, νǫ ) − T (f, ν0 ) 1,σ ≤ T (f − g, νǫ ) + T (g, νǫ ) − T (g, ν0 ) 1,σ 1,σ + and Lemma 3 (ii) we obtain T (f, νǫ ) − T (f, ν0 ) for all ǫ ≤ ǫ0 . 12 1,σ ≤ δ T (g − f, ν0 ) 1,σ We finally arrive at the main technical result of this subsection, the differentiability of the perpetuity functional. Proposition 8. If lim sup ǫ→0 0≤x≤1  1 (µǫ − µ0 ) (x, 1] − g(x) = 0 ǫ for some g ∈ D[0, 1], then   1 Ψ(µǫ ) − Ψ(µ0 ) → Vµ0 T g, Ψ(µ0 ) ǫ in L1,σ . Proof. With our basic curve ǫ → µǫ we associate the following functions fǫ and hǫ ,   1 Ψ(µǫ ) − Ψ(µ0 ) (x, ∞) , ǫ   1 hǫ (x) := T (µǫ − µ0 ), Ψ(µǫ ) (x, ∞) , ǫ fǫ (x) := for all x ≥ 1. From (8) and (9) we have fǫ = hǫ + Uµ0 fǫ so that fǫ = Vµ0 hǫ by Lemma 4 (ii). The decomposition     1 (µǫ − µ0 ), Ψ(µǫ ) − T g, Ψ(µǫ ) hǫ − T g, Ψ(µ0 ) = T ǫ   + T g, Ψ(µǫ ) − T g, Ψ(µ0 ) ,  together with Lemma 3 (ii) and Lemma 7 shows that T g, Ψ(µ0 ) is the limit of hǫ as ǫ → 0. The statement of the proposition now follows on using the continuity of Vµ0 . We may condense this proposition into a single formula,  Ψ′µ0 = Vµ0 T · , Ψ(µ0 ) . 3.3 Our aim in this subsection is to prove the differentiability of Ψ, now regarded as a mapping from (D[0, 1], k · k∞ ) to (Dρ , k · k∞,ρ ). The general strategy will be the same as in the previous subsection where we considered integral norms on the range space. We first note that Uµ , which can be written as   Z  x Uµ f (x + 1) = f µ(dy) for all x ≥ 0, y (0,1∧x] 13 maps cadlag functions onto cadlag functions. To see this note that we can apply dominated convergence since   kf k∞,ρ kf k∞,ρ x f ≤ if y ≤ x, ≤ y ρ(x/y) ρ(1) as we assume that ρ is increasing. Also, the continuity of ρ implies that kf k∞,ρ = sup ρ(x) f (x−) x≥1 for all cadlag functions f with f (1−) = 0, in particular for tail functions of signed measures on [1, ∞) with total mass 0. We have the following analogue of Lemma 3. Note that cρ := sup x≥0 ρ(1 + x) ρ(1 ∨ x) is finite by assumption (1). Lemma 9. (i) For all f ∈ Dρ , kUµ f k∞,ρ ≤ C(µ, ρ) kf k∞,ρ . (ii) For all f ∈ D[0, 1] and all nonnegative measures ν concentrated on [1, ∞), kT (f, ν)k∞,ρ ≤ cρ kf k∞ kνk∞,ρ . Proof. The first part is immediate from Z kUµ f k∞,ρ ≤ sup ρ(1 + x) x≥0 ≤ kf k∞,ρ   1 x ρ(x/y) f µ(dy) ρ(x/y) y (0,1∧x] Z 1 µ(dy), sup ρ(1 + x) x≥0 (0,1∧x] ρ(x/y) and the second part follows from T (f, ν) ∞,ρ ≤ sup ρ(1 + x) x≥0 ≤ kf k∞ sup x≥0 Z [x∨1,∞) ρ(1 + x) sup ρ(x)ν([x, ∞)) ρ(1 ∨ x) x≥1 ≤ kf k∞ cρ kνk∞,ρ . 14   x f ν(dy) y If C(µ, ρ) < 1 then Vµ can be defined as in Section 3.2, and Lemma 4 carries over without change, once L1,σ is replaced by Dρ , and eσ m1 (µ) < 1 by C(µ, ρ) < 1. We also need an analogue of Lemma 7. Lemma 10. If {νǫ }0≤ǫ≤1 is a family of probability measures in Dρ with lim kνǫ − ν0 k∞,ρ = 0, ǫ→0 then, for any f ∈ D[0, 1] with f (1) = 0, lim T (f, νǫ ) − T (f, ν0 ) ǫ→0 ∞,ρ = 0. Proof. From     Z Z x−1 x−1 νǫ (dy) − ν0 (dy) 1[a,b) 1[a,b) y y [1,∞) [1,∞)     x−1 x−1 ≤ νǫ ,∞ − ν0 ,∞ b b     x−1 x−1 ,∞ − ν0 ,∞ + νǫ a a it follows that T (1[a,b) , νǫ ) − T (1[a,b) , ν0 ) ∞,ρ     x−1 x−1 ≤ sup ρ(x) νǫ ,∞ − ν0 ,∞ b b x≥1     x−1 x−1 . ,∞ − ν0 ,∞ + sup ρ(x) νǫ a a x≥1 From the assumptions on ρ we obtain sup x≥1 ρ(x)  < ∞ ∨1 ρ x−1 a for all a in (0, 1], so that the statement of the lemma holds for indicator functions of intervals [a, b), 0 ≤ a < b ≤ 1. We can now proceed as in the proof of Lemma 7. Let {µǫ }0≤ǫ≤1 again be a family of probability measures on [0, 1] with  µ0 ({1}) 6= 1. We assume that ρ(x) = O eσx for some σ < − log m1 (µ0 ). Proposition 11. If lim sup ǫ→0 0≤x≤1  1 (µǫ − µ0 ) (x, 1] − g(x) = 0 ǫ for some g ∈ D[0, 1], then   1 Ψ(µǫ ) − Ψ(µ0 ) → Vµ0 T g, Ψ(µ0 ) ǫ 15 in Dρ . Proof. It follows from the proof of Lemma 5 and Markov’s inequality that for all σ0 < − log m1 (µ0 ) there exists an ǫ0 > 0 and a κ0 < ∞ such that for all ǫ ∈ (0, ǫ0 )  Ψ(µǫ ) [x, ∞) ≤ κ0 e−σ0 x for all x ≥ 1. In particular, from our assumptions on ρ, lim sup Ψ(µǫ ) ǫ→0 ∞,ρ < ∞. As in the proof of Lemma 6, this implies the continuity of Ψ, and the statement of the proposition now follows on using exactly the same arguments as in the proof of Proposition 8. 3.4 We now explain the step from the differentiability properties proved in the previous two subsections to the properties (AN) and (BW). The arguments are essentially the same as in Grübel and Pitts (1993), where a renewal theoretic setup is considered. Generally, the relationship between the differentiability of a functional and the transfer of asymptotic normality and the asymptotic validity of bootstrap confidence regions has been investigated by a number of authors, in particular Bickel and Freeedman (1981), Gill (1989), Arcones and Giné (1992) and van der Vaart and Wellner (1996). We therefore content ourselves with a somewhat informal discussion; full details can easily be given along the lines of Grübel and Pitts (1993). Let (Xi )i∈N be a sequence of independent random variables with distribution µ, where µ([0, 1]) = 1 and µ({1}) 6= 1. Let F̂n be the distribution function associated with the empirical distribution µ̂n = µn (X1 , . . . , Xn ). We then have √ n(F̂n − F ) →d B ◦ F, (13) where B ◦ F is a time-changed Brownian bridge. By the Skorohod-WichuraDudley construction (see e.g. Shorack and Wellner (1986), p.47) we can obtain a pathwise version, i.e. on a suitable probability space we have √ in D[0, 1] (14) n(Fn◦ − F )(ω ◦ ) → B ◦ ◦ F (ω ◦ ) for all ω ◦ , where the left and right hand side of (13) and (14) respectively are equal to each other in distribution. We now apply the differentiability of Ψ proved in Sections 3.2 and 3.3 respectively. The true distribution µ of the X-variables takes over the role of µ0 , ǫn = n−1/2 and µǫn corresponds to µn in the current notation. The differentiability results now yield, with respect to the appropriate norm and pointwise in each ω ◦ , √ n(Ψ(Fn◦ ) − Ψ(F )) → Ψ′µ (B ◦ ◦ F ). 16  Due to Ψ(Fn◦ ) =d Ψ(F̂n ) = Ĝn and Ψ′µ (B ◦ ◦ F ) =d Ψ′µ (B ◦ F ) this is the required weak convergence result for the empirical perpetuity. For the analysis of the bootstrap we enlarge the above construction as in Bickel and Freedman (1981) and Gill (1989), the additional part modelling the resampling. For this, let {ξni : n ∈ N, 1 ≤ i ≤ n} be an array of row-wise independent random variables, all uniformly distributed on the unit interval, and let B † be a Brownian bridge, all defined on yet another probability space, such that √ n(Hn† − H)(ω † ) → B † (ω † ) in D[0, 1] for all ω † . Here Hn† denotes the empirical distribution function associated with the nth row of the ξ-array and H(t) = t for 0 ≤ t ≤ 1. Then the random function t → Hn† (Fn◦ (t, ω ◦ )) is uniformly distributed on the nn distri bution functions associated with the probabilities µn Xi1 (ω ◦ ), . . . , Xin (ω ◦ ) , (i1 , . . . , in ) ∈ In , which implies that √  Rn◦ (ω ◦ )(z) := P † n Ψ(Hn† (Fn◦ (·, ω ◦ ))) − Ψ(Fn◦ (·, ω ◦ )) ≤ z , regarded as a random function on Ω◦ , has the same distribution as the bootstrap estimator R̂n . It remains to show that Rn◦ tends to R P ◦ -almost surely. For this we again use the differentiability of Ψ. We have √ √ n Ψ(Hn† (Fn◦ )) − Ψ(Fn◦ )) = n Ψ(Hn† (Fn◦ )) − Ψ(F )) (15) √ ◦ − n Ψ(Fn ) − Ψ(F )). For the second term we obtain the limit Ψ′µ (B ◦ ◦ F )). For the first term we use √ √ √ n Hn† (Fn◦ ) − F ) = n Hn† (Fn◦ ) − Fn◦ ) + n Fn◦ − F ). (16) A separate argument shows that the time change in the first term on the right hand side does not matter, in the sense that we obtain the limit B † ◦F . Hence the right hand side of (16) converges to B † ◦ F + B ◦ ◦ F . Using the differentiability of Ψ once more we therefore obtain  √ n Ψ(Hn† (Fn◦ )) − Ψ(F )) → Ψ′µ B † ◦ F + B ◦ ◦ F . Here, now, is the decisive step: Due to the linearity of the derivative a cancellation occurs, and we obtain the limit Ψ′µ B † ◦ F for the right hand side of (15). Note that this no longer depends on ω ◦ , that this quantity has the same distribution as Ψ′µ (B ◦ ◦ F ), and that kΨ′µ (B ◦ F )k has distribution function R. 3.5 Recall that ρ(x) = eσx with some σ < log 2. We need some more notation. For a probability measure µ on [0, 1] let φ(µ, ρ, ·) : [0, ∞) → R be 17 defined by Z φ(µ, ρ, x) := ρ(1 + x) (0,1∧x] 1 µ(dy). ρ(x/y) Clearly, C(µ, ρ) = supx≥0 φ(µ, ρ, x). Further, let unif(a, b) be the uniform distribution on the interval (a, b). Finally, we introduce the function 2σ χ : [0, ∞) → R, χ(σ) := e Z 1 e−σ/y dy. 0 We claim that χ(σ) < 1 for all σ ∈ (0, log 2), which is equivalent to Z ∞ 1 −2σ 1 −x e dx < e for 0 < σ < log 2. 2 x σ σ (17) The Cauchy-Schwarz inequality yields Z ∞ σ 1 −x e dx x2 2 ∞ 1 dx x4 ≤ Z = 1 −2σ e , 6σ 3 σ Z ∞ e−2x dx σ which shows that the inequality in (17) holds for σ = log 2. From this (17) will follow if we can show that nowhere on the interval of interest is the derivative of the left hand side of the inequality smaller than the derivative of the right hand side, i.e. that − 1 2 1 −σ e ≥ − 2 e−2σ − e−2σ 2 σ σ σ for 0 < σ < log 2. This in turn is equivalent to eσ ≤ 1+2σ for 0 < σ < log 2, which is obviously true. With µ = unif(0, θ), 0 < θ ≤ 1, we obtain σ1 sup φ(µ, ρ, x) = sup e x≥θ θ x≥θ σ(1+θ) = e ≤ e2σ Z 1 Z 1 θ θ eσx(1−1/y) dy 0 Z θ e−σθ/y dy 0 e−σ/y dy = χ(σ), 0 and for 0 < x < θ, φ(µ, ρ, x) = eσ(1+x) = eσ(1+x) Z 1 x −σx/y e dy θ 0 Z x 1 −σ/y e dy ≤ θ 0 18 χ(σ). In summary,  C unif(0, θ), ρ ≤ χ(σ) for all 0 < θ ≤ 1. Obviously, for any given γ, the set of all µ satisfying C(µ, ρ) ≤ γ is convex, and any continuous distribution which can be written as the weak limit of elements of this set, is also an element of this set. These closure properties can be used to lift the above statement to all probability measures on [0, 1] with a decreasing density. Note that log 2 is the upper bound for − log m1 (µ) as µ ranges over the set of probability measures with decreasing density and support in [0, 1]. 19 References M. Aebi, P. Embrechts, and T. Mikosch, Stochastic discounting, aggregate claims, and the bootstrap, Adv. Appl. Prob. 26 (1994), 183-206. M. A. Arcones and E. Giné, On the bootstrap of M-estimators and other statistical functionals, in “Exploring the Limits of Bootstrap” (R. LePage and L. Billard, Eds.), pp. 13-47, Wiley, New York, 1992. P. Bickel and D. Freedman, Some asymptotic theory for the bootstrap, Ann. Statistics 9 (1981), 1196-1217. N. G. de Bruijn, The asymptotic behaviour of a function occurring in the theory of primes, J. Indian Math. Soc. 15 (1951), 25-32. D. Dufresne, The distribution of a perpetuity, with applications to risk theory and pension funding, Scand. Actuarial J. (1990), 39-79. R. D. Gill, Non- and semi-parametric maximum likelihood estimators and the von Mises method (Part I), Scand. J. Statist. 16 (1989), 97-128. C. M. Goldie and R. Grübel, Perpetuities with thin tails, Adv. in Applied Prob. 28 (1996), 463-480. R. Grübel, Hoare’s selection algorithm: a Markov chain approach, J. of Applied Prob. 35 (1998), 36-45. R. Grübel and S. M. Pitts, Nonparametric estimation in renewal theory I: the empirical renewal function, Ann. Statistics 21 (1993), 1431-1451. R. Grübel and U. Rösler, Asymptotic distribution theory for Hoare’s selection algorithm, Adv. in Applied Prob. 28 (1996), 252-269. H. Heuser, “Functional Analysis,” Wiley, Chichester, 1982. S. M. Pitts, Nonparametric estimation of the stationary waiting time distribution function for the GI/G/1 queue, Ann. Statistics 22 (1994a), 1428-1446. S. M. Pitts, Nonparametric estimation of compound distributions with applications in insurance, Ann. Inst. Stat. Math. 46 (1994b), 537-555. G. R. Shorack and J. A. Wellner, “Empirical Processes with Applications to Statistics,” Wiley, New York, 1986. A. W. van der Vaart and J. A. Wellner, “Weak Convergence and Empirical Processes,” Springer Verlag, New York, 1996. W. Vervaat, On a stochastic difference equation and a representation of nonnegative infinitely divisible random variables, Adv. Appl. Probab. 11 (1979), 750-783. 20