Abstract
We study Bayesian inference in statistical linear inverse problems with Gaussian noise and priors in a separable Hilbert space setting. We focus our interest on the posterior contraction rate in the small noise limit, under the frequentist assumption that there exists a fixed data-generating value of the unknown. In this Gaussian-conjugate setting, it is convenient to work with the concept of squared posterior contraction (SPC), which is known to upper bound the posterior contraction rate. We use abstract tools from regularization theory, which enable a unified approach to bounding SPC. We review and re-derive several existing results, and establish minimax contraction rates in cases which have not been considered until now. Existing results suffer from a certain saturation phenomenon, when the data-generating element is too smooth compared to the smoothness inherent in the prior. We show how to overcome this saturation in an empirical Bayesian framework by using a non-centered data-dependent prior.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
When considering the SPC uniformly over some class of inputs x ∗ then if follows from (3) that the best (uniform) contraction rate cannot be better than the corresponding minimax rate for statistical estimation.
References
S. Agapiou, S. Larsson, A.M. Stuart, Posterior contraction rates for the Bayesian approach to linear ill-posed inverse problems. Stoch. Process. Appl. 123(10), 3828–3860 (2013). http://doi.org/10.1016/j.spa.2013.05.001
S. Agapiou, J.M. Bardsley, O. Papaspiliopoulos, A.M. Stuart, Analysis of the Gibbs sampler for hierarchical inverse problems. SIAM/ASA J. Uncertain. Quantif. 2(1), 511–544 (2014)
S. Agapiou, A.M. Stuart, Y.X. Zhang, Bayesian posterior contraction rates for linear severely ill-posed inverse problems. J. Inverse Ill-Posed Prob. 22(3), 297–321 (2014). http://doi.org/10.1515/jip-2012-0071
L. Cavalier, Nonparametric statistical inverse problems. Inverse Prob. 24(3), 034004, 19 (2008). http://doi.org/10.1088/0266-5611/24/3/034004
M. Dashti, A.M. Stuart, The Bayesian approach to inverse problems (2013). ArXiv e-prints
L.T. Ding, P. Mathé, Minimax rates for statistical inverse problems under general source conditions (2017). ArXiv e-prints. https://arxiv.org/abs/1707.01706. https://doi.org/10.1515/cmam-2017-0055
H.W. Engl, M. Hanke, A. Neubauer, Regularization of inverse problems, in Mathematics and its Applications, vol. 375 (Kluwer Academic, Dordrecht, 1996). http://doi.org/10.1007/978-94-009-1740-8
S. Ghosal, H.K. Ghosh, A.W. Van Der Vaaart, Convergence rates of posterior distributions. Ann. Stat. 28(2), 500–531 (2000). http://doi.org/10.1214/aos/1016218228
B. Hofmann, P. Mathé, Analysis of profile functions for general linear regularization methods. SIAM J. Numer. Anal. 45(3), 1122–1141(electronic) (2007). http://doi.org/10.1137/060654530
B. Knapik, J.B. Salomond, A general approach to posterior contraction in nonparametric inverse problems. Bernoulli (to appear). arXiv preprint arXiv:1407.0335
B.T. Knapik, A.W. van der Vaart, J.H. van Zanten, Bayesian inverse problems with Gaussian priors. Ann. Stat. 39(5), 2626–2657 (2011). http://doi.org/10.1214/11-AOS920
B.T. Knapik, A.W. van der Vaart, J.H. van Zanten, Bayesian recovery of the initial condition for the heat equation. Comm. Stat. Theory Methods 42(7), 1294–1313 (2013). http://doi.org/10.1080/03610926.2012.681417
B.T. Knapik, B.T. Szabó, A.W. van der Vaart, J.H. van Zanten, Bayes procedures for adaptive inference in inverse problems for the white noise model. Probab. Theory Relat. Fields 164, 1–43 (2015)
M.S. Lehtinen, L. Päivärinta, E. Somersalo, Linear inverse problems for generalised random variables. Inverse Prob. 5(4), 599–612 (1989). http://stacks.iop.org/0266-5611/5/599
K. Lin, S. Lu, P. Mathé, Oracle-type posterior contraction rates in Bayesian inverse problems. Inverse Prob. Imaging 9(3), 895–915 (2015). http://doi.org/10.3934/ipi.2015.9.895
A. Mandelbaum, Linear estimators and measurable linear transformations on a Hilbert space. Z. Wahrsch. Verw. Gebiete 65(3), 385–397 (1984). http://doi.org/10.1007/BF00533743
P. Mathé, Saturation of regularization methods for linear ill-posed problems in Hilbert spaces. SIAM J. Numer. Anal. 42(3), 968–973 (electronic) (2004). http://doi.org.pugwash.lib.warwick.ac.uk/10.1137/S0036142903420947
K. Ray, Bayesian inverse problems with non-conjugate priors. Electron. J. Stat. 7, 2516–2549 (2013). http://doi.org/10.1214/13-EJS851
B.T. Szabó, A.W. van der Vaart, J.H. van Zanten, Empirical Bayes scaling of Gaussian priors in the white noise model. Electron. J. Stat. 7, 991–1018 (2013). http://doi.org/10.1214/13-EJS798
S.J. Vollmer, Posterior consistency for Bayesian inverse problems through stability and regression results. Inverse Prob. 29(12), 125011 (2013). https://doi.org/10.1088/0266-5611/29/12/125011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Proof (of Lemma 1)
We first express the element \(x^\delta _\alpha \) in terms of z δ.
We notice that
The expectation of the posterior mean with respect to the distribution generating z δ when x ∗ is given, is thus
For the next calculations we shall use that
Therefore we rewrite
which proves the first assertion. The variance is \(\mathbb E^{x^{\ast }} \left \| x^\delta _\alpha - \mathbb E^{x^{\ast }}x^\delta _\alpha \right \|{ }_{ }^{2}\), and this can be written as in (8), by using similar reasoning as for the bias term.
Proof (of Proposition 1)
We notice that \(\left \| I + \alpha g_{\alpha }(B^{\ast }B) \right \|{ }_{}\leq 1 + \gamma _{\ast }\), which gives
Since \(\left \| \left ( \alpha + B^{\ast }B \right )^{-1} B^{\ast }B \right \|{ }_{}\leq 1\) we see that
and the proof is complete.
Proof (of Lemma 2)
Since C 0 has finite trace, it is compact, and we use the eigenbasis (arranged by decreasing eigenvalues) u j , j = 1, 2, … Under Assumption 1 this is also the eigenbasis for T ∗ T. If t j , j = 1, 2, … denote the eigenvalues then we see that
Correspondingly, \(C_0 = \sum _{j=1}^{\infty } \left ( \psi ^{2} \right )^{-1}(\tau _{j}) u_{j}\otimes u_{j}\), which gives the first assertion. Moreover, the latter representation yields that
such that
and the proof is complete.
Proof (of Proposition 2)
For the first item (1), we notice that \(\varphi \prec \varTheta _{\psi }^{2}\) if and only if φ(f 2(t)) ≺ t. The linear function t↦t is a qualification of Tikhonov regularization with constant γ = 1. Thus, by Lemma 3 we have
which completes the proof for this case. For item (2), we have that
For any 0 < α ≤ 1, we have α + t ≤ 1 + t, hence
We conclude that there exists a constant \(c_{1}=c_{1}(x^{\ast },\left \| B^{\ast }B \right \|{ }_{})\), such that for small α it holds
On the other hand, since t ≺ φ(f 2(t)), there exists a constant c 2 > 0 which depends only on the index functions φ, f and on \(\left \| B^{\ast }B \right \|{ }_{}\), such that
For item (3), we have that
and the proof is complete.
Proof (of Lemma 4)
The continuity is clear. For the monotonicity we use the representation (15) to get
The trace on the right hand side is positive. Indeed, if (s j , u j , u j ) denotes the singular value decomposition of B ∗ B then this trace can be written as
where the right hand side is positive since the operator C 0 is positive definite. Thus, if α < α ′ then \(S_{T,C_0}(\alpha ) - S_{T,C_0}(\alpha ^{\prime })\) is positive, which proves the first assertion.
The proof of the second assertion is simple, and hence omitted. To prove the last assertion we use the partial ordering of self-adjoint operators in Hilbert space, that is, we write A ≤ B if 〈Ax, x〉≤〈Bx, x〉, x ∈ X, for two self-adjoint operators A and B. Plainly, with \(a:= \left \| T^{\ast }T \right \|{ }_{}\), we have that T ∗ T ≤ aI. Multiplying from the left and right by \(C_0^{1/2}\) this yields B ∗ B ≤ aC 0, and thus for any α > 0 that αI + B ∗ B ≤ αI + aC 0. The function t↦ − 1/t, t > 0 is operator monotone, which gives \(\left ( \alpha I + aC_0 \right )^{-1} \leq \left ( \alpha I + B^{\ast }B \right )^{-1}\). Multiplying from the left and right by \(C_0^{1/2}\) again, we arrive at
This in turn extends to the traces and gives that
Now, let us denote by \(t_j,\ j\in \mathbb N\), the singular numbers of C 0, then we can bound
If \(S_{T,C_0}(\alpha )\) were uniformly bounded from above, then there would exist a finite natural number, say N, such that \(t_{N} \geq \frac \alpha a > t_{N+1}\), for α > 0 small enough. But this would imply that t N+1 = 0, which contradicts the assumption that C 0 is positive definite.
Lemma 5
For t > 0 let \(\varTheta ^2_{\psi }(t)=t\exp (-2qt^{-\frac {b}{1+2a}})\) , for some q, b, a > 0. Then for small s we have \((\varTheta ^2_{\varPsi })^{-1}(s)\sim (\log s^{-\frac 1{2q}})^{-\frac {1+2a}{b}}\).
Proof
Let
and observe that t is small if and only if s is small. Applying [3, Lem 4.5] for x = t −1 we get the result.
Proof (of Proposition 6)
In this example the explicit solution of Eq. (16) in Theorem 1 is more difficult. However, as discussed in Sect. 3.4, it suffices to asymptotically balance the squared bias and the posterior spread using an appropriate parameter choice α = α(δ). Indeed, under the stated choice of α the squared bias is of order
while the posterior spread term is of order
Proof (of Proposition 8)
According to the considerations in Remark 10, it is straightforward to check that without preconditioning the best SPC rate that can be established is \(\delta ^{\frac {4+8a+8p}{3+4a+6p}}\) which proves item (1). In the preconditioned case, the explicit solution of Eq. (16) in Theorem 1, which in this case has the form
is again difficult. However, as discussed in Sect. 3.4, it suffices to asymptotically balance the squared bias and the posterior spread using an appropriate parameter choice α = α(δ). Indeed, using [3, Lem 4.5] we have that the solution to the above equation behaves asymptotically as the stated choice of α, and substitution gives the claimed rate.
Proof (of Proposition 10)
We begin with items (1) and (3). The explicit solution of Eq. (16) in Theorem 1, which in this case has the form
is difficult. As discussed in Sect. 3.4, it suffices to asymptotically balance the squared bias and the posterior spread using an appropriate parameter choice α = α(δ). Indeed, under the stated choice of α both quantities are bounded from above by \(\delta ^{\frac {2\beta }{\beta +q}}\). For item (2), according to the considerations in Remark 10, it is straightforward to check that without preconditioning the best SPC rate that can be established is \(\delta ^{\frac {4q}{\beta +q}}\).
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Agapiou, S., Mathé, P. (2018). Posterior Contraction in Bayesian Inverse Problems Under Gaussian Priors. In: Hofmann, B., Leitão, A., Zubelli, J. (eds) New Trends in Parameter Identification for Mathematical Models. Trends in Mathematics. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-70824-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-70824-9_1
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-70823-2
Online ISBN: 978-3-319-70824-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)