[go: up one dir, main page]

Skip to main content

Posterior Contraction in Bayesian Inverse Problems Under Gaussian Priors

  • Chapter
  • First Online:
New Trends in Parameter Identification for Mathematical Models

Part of the book series: Trends in Mathematics ((TM))

  • 1152 Accesses

Abstract

We study Bayesian inference in statistical linear inverse problems with Gaussian noise and priors in a separable Hilbert space setting. We focus our interest on the posterior contraction rate in the small noise limit, under the frequentist assumption that there exists a fixed data-generating value of the unknown. In this Gaussian-conjugate setting, it is convenient to work with the concept of squared posterior contraction (SPC), which is known to upper bound the posterior contraction rate. We use abstract tools from regularization theory, which enable a unified approach to bounding SPC. We review and re-derive several existing results, and establish minimax contraction rates in cases which have not been considered until now. Existing results suffer from a certain saturation phenomenon, when the data-generating element is too smooth compared to the smoothness inherent in the prior. We show how to overcome this saturation in an empirical Bayesian framework by using a non-centered data-dependent prior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    When considering the SPC uniformly over some class of inputs x then if follows from (3) that the best (uniform) contraction rate cannot be better than the corresponding minimax rate for statistical estimation.

References

  1. S. Agapiou, S. Larsson, A.M. Stuart, Posterior contraction rates for the Bayesian approach to linear ill-posed inverse problems. Stoch. Process. Appl. 123(10), 3828–3860 (2013). http://doi.org/10.1016/j.spa.2013.05.001

    Article  MathSciNet  MATH  Google Scholar 

  2. S. Agapiou, J.M. Bardsley, O. Papaspiliopoulos, A.M. Stuart, Analysis of the Gibbs sampler for hierarchical inverse problems. SIAM/ASA J. Uncertain. Quantif. 2(1), 511–544 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  3. S. Agapiou, A.M. Stuart, Y.X. Zhang, Bayesian posterior contraction rates for linear severely ill-posed inverse problems. J. Inverse Ill-Posed Prob. 22(3), 297–321 (2014). http://doi.org/10.1515/jip-2012-0071

    MathSciNet  MATH  Google Scholar 

  4. L. Cavalier, Nonparametric statistical inverse problems. Inverse Prob. 24(3), 034004, 19 (2008). http://doi.org/10.1088/0266-5611/24/3/034004

  5. M. Dashti, A.M. Stuart, The Bayesian approach to inverse problems (2013). ArXiv e-prints

    Google Scholar 

  6. L.T. Ding, P. Mathé, Minimax rates for statistical inverse problems under general source conditions (2017). ArXiv e-prints. https://arxiv.org/abs/1707.01706. https://doi.org/10.1515/cmam-2017-0055

  7. H.W. Engl, M. Hanke, A. Neubauer, Regularization of inverse problems, in Mathematics and its Applications, vol. 375 (Kluwer Academic, Dordrecht, 1996). http://doi.org/10.1007/978-94-009-1740-8

    Book  MATH  Google Scholar 

  8. S. Ghosal, H.K. Ghosh, A.W. Van Der Vaaart, Convergence rates of posterior distributions. Ann. Stat. 28(2), 500–531 (2000). http://doi.org/10.1214/aos/1016218228

    Article  MathSciNet  MATH  Google Scholar 

  9. B. Hofmann, P. Mathé, Analysis of profile functions for general linear regularization methods. SIAM J. Numer. Anal. 45(3), 1122–1141(electronic) (2007). http://doi.org/10.1137/060654530

  10. B. Knapik, J.B. Salomond, A general approach to posterior contraction in nonparametric inverse problems. Bernoulli (to appear). arXiv preprint arXiv:1407.0335

    Google Scholar 

  11. B.T. Knapik, A.W. van der Vaart, J.H. van Zanten, Bayesian inverse problems with Gaussian priors. Ann. Stat. 39(5), 2626–2657 (2011). http://doi.org/10.1214/11-AOS920

    Article  MathSciNet  MATH  Google Scholar 

  12. B.T. Knapik, A.W. van der Vaart, J.H. van Zanten, Bayesian recovery of the initial condition for the heat equation. Comm. Stat. Theory Methods 42(7), 1294–1313 (2013). http://doi.org/10.1080/03610926.2012.681417

    Article  MathSciNet  MATH  Google Scholar 

  13. B.T. Knapik, B.T. Szabó, A.W. van der Vaart, J.H. van Zanten, Bayes procedures for adaptive inference in inverse problems for the white noise model. Probab. Theory Relat. Fields 164, 1–43 (2015)

    MathSciNet  MATH  Google Scholar 

  14. M.S. Lehtinen, L. Päivärinta, E. Somersalo, Linear inverse problems for generalised random variables. Inverse Prob. 5(4), 599–612 (1989). http://stacks.iop.org/0266-5611/5/599

    Article  MathSciNet  MATH  Google Scholar 

  15. K. Lin, S. Lu, P. Mathé, Oracle-type posterior contraction rates in Bayesian inverse problems. Inverse Prob. Imaging 9(3), 895–915 (2015). http://doi.org/10.3934/ipi.2015.9.895

    Article  MathSciNet  MATH  Google Scholar 

  16. A. Mandelbaum, Linear estimators and measurable linear transformations on a Hilbert space. Z. Wahrsch. Verw. Gebiete 65(3), 385–397 (1984). http://doi.org/10.1007/BF00533743

    Article  MathSciNet  MATH  Google Scholar 

  17. P. Mathé, Saturation of regularization methods for linear ill-posed problems in Hilbert spaces. SIAM J. Numer. Anal. 42(3), 968–973 (electronic) (2004). http://doi.org.pugwash.lib.warwick.ac.uk/10.1137/S0036142903420947

  18. K. Ray, Bayesian inverse problems with non-conjugate priors. Electron. J. Stat. 7, 2516–2549 (2013). http://doi.org/10.1214/13-EJS851

    Article  MathSciNet  MATH  Google Scholar 

  19. B.T. Szabó, A.W. van der Vaart, J.H. van Zanten, Empirical Bayes scaling of Gaussian priors in the white noise model. Electron. J. Stat. 7, 991–1018 (2013). http://doi.org/10.1214/13-EJS798

    Article  MathSciNet  MATH  Google Scholar 

  20. S.J. Vollmer, Posterior consistency for Bayesian inverse problems through stability and regression results. Inverse Prob. 29(12), 125011 (2013). https://doi.org/10.1088/0266-5611/29/12/125011

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Mathé .

Editor information

Editors and Affiliations

Appendix

Appendix

Proof (of Lemma 1)

We first express the element \(x^\delta _\alpha \) in terms of z δ.

$$\displaystyle \begin{aligned} x^\delta_\alpha &= C_0^{1/2}\left( \alpha I+B^{\ast}B \right)^{-1}B^{\ast}z^\delta + C_0^{1/2}s_{\alpha}(B^{\ast}B)C_0^{-1/2}m_\alpha^\delta\\ &= C_0^{1/2}\left( \alpha I+B^{\ast}B \right)^{-1}B^{\ast}z^\delta + C_0^{1/2}s_{\alpha}(B^{\ast}B)g_{\alpha}(B^{\ast}B)B^{\ast}z^\delta\\ &= C_0^{1/2}\left[\left( \alpha I+B^{\ast}B \right)^{-1} + s_{\alpha}(B^{\ast}B)g_{\alpha}(B^{\ast}B) \right]B^{\ast}z^\delta. \end{aligned} $$

We notice that

$$\displaystyle \begin{aligned}\left( \alpha I+B^{\ast}B \right)^{-1} + s_{\alpha}(B^{\ast}B)g_{\alpha}(B^{\ast}B) = \left( \alpha I+B^{\ast}B \right)^{-1} \left( I + \alpha g_{\alpha}(B^{\ast}B) \right). \end{aligned}$$

The expectation of the posterior mean with respect to the distribution generating z δ when x is given, is thus

$$\displaystyle \begin{aligned}\mathbb E^{x^{\ast}} x^\delta_\alpha = C_0^{1/2}\left[\left( \alpha I+B^{\ast}B \right)^{-1} \left( I + \alpha g_{\alpha}(B^{\ast}B) \right)\right]B^{\ast}B C_0^{-1/2}x^{\ast}. \end{aligned}$$

For the next calculations we shall use that

$$\displaystyle \begin{aligned} I - \left( \alpha I+B^{\ast}B \right)^{-1} &\left( I + \alpha g_{\alpha}(B^{\ast}B) \right)B^{\ast}B\\ &= \left( \alpha I+B^{\ast}B \right)^{-1} \alpha\left( I - g_{\alpha}(B^{\ast}B)B^{\ast}B \right)\\ &= s_{\alpha}(B^{\ast}B)r_{\alpha}(B^{\ast}B). \end{aligned} $$

Therefore we rewrite

$$\displaystyle \begin{aligned} x^{\ast} - \mathbb E^{x^{\ast}} x^\delta_\alpha & = C_0^{1/2}\left[ I - \left( \alpha I+B^{\ast}B \right)^{-1} \left( I + \alpha g_{\alpha}(B^{\ast}B) \right) B^{\ast}B\right] C_0^{-1/2}x^{\ast}\\ &= C_0^{1/2}s_{\alpha}(B^{\ast}B)r_{\alpha}(B^{\ast}B)C_0^{-1/2}x^{\ast}, \end{aligned} $$

which proves the first assertion. The variance is \(\mathbb E^{x^{\ast }} \left \| x^\delta _\alpha - \mathbb E^{x^{\ast }}x^\delta _\alpha \right \|{ }_{ }^{2}\), and this can be written as in (8), by using similar reasoning as for the bias term.

Proof (of Proposition 1)

We notice that \(\left \| I + \alpha g_{\alpha }(B^{\ast }B) \right \|{ }_{}\leq 1 + \gamma _{\ast }\), which gives

$$\displaystyle \begin{aligned} V^\delta(\alpha) &= \delta^{2} \mathrm{tr}\left[\left( I +\alpha g_{\alpha}(B^{\ast}B) \right)^{2} \left( \alpha I + B^{\ast}B \right)^{-2} B^{\ast}BC_0\right] \\ & \leq \delta^{2} \left( 1 + \gamma_{\ast} \right)^{2}\mathrm{tr}\left[\left( \alpha I + B^{\ast}B \right)^{-2} B^{\ast}BC_0\right] \end{aligned} $$

Since \(\left \| \left ( \alpha + B^{\ast }B \right )^{-1} B^{\ast }B \right \|{ }_{}\leq 1\) we see that

$$\displaystyle \begin{aligned}V^\delta(\alpha) \leq \left( 1 + \gamma_{\ast} \right)^{2}\delta^{2} \mathrm{tr}\left[\left( \alpha I + B^{\ast}B \right)^{-1}C_0\right] = \left( 1 + \gamma_{\ast} \right)^{2}\mathrm{tr}\left[C^\delta(\alpha)\right], \end{aligned}$$

and the proof is complete.

Proof (of Lemma 2)

Since C 0 has finite trace, it is compact, and we use the eigenbasis (arranged by decreasing eigenvalues) u j , j = 1, 2, … Under Assumption 1 this is also the eigenbasis for T T. If t j , j = 1, 2, … denote the eigenvalues then we see that

$$\displaystyle \begin{aligned}T^{\ast}T = \sum_{j=1}^{\infty} \tau_{j} u_{j}\otimes u_{j}. \end{aligned}$$

Correspondingly, \(C_0 = \sum _{j=1}^{\infty } \left ( \psi ^{2} \right )^{-1}(\tau _{j}) u_{j}\otimes u_{j}\), which gives the first assertion. Moreover, the latter representation yields that

$$\displaystyle \begin{aligned}C_0^{1/2} = \sum_{j=1}^{\infty} \left( \left( \psi^{2} \right)^{-1}(\tau_{j}) \right)^{1/2} u_{j}\otimes u_{j}, \end{aligned}$$

such that

$$\displaystyle \begin{aligned} B^{\ast}B &= C_0^{1/2} T^{\ast}T C_0^{1/2} \\ & = \sum_{j=1}^{\infty}\left( \left( \psi^{2} \right)^{-1}(\tau_{j}) \right)^{1/2} \tau_{j} \left( \left( \psi^{2} \right)^{-1}(\tau_{j}) \right)^{1/2}u_{j}\otimes u_{j}\\ & = \sum_{j=1}^{\infty}\left( \left( \psi^{2} \right)^{-1}(\tau_{j}) \right) \tau_{j} u_{j}\otimes u_{j}\\ &= \sum_{j=1}^{\infty}\psi^{2}\left( \left( \left( \psi^{2} \right)^{-1}(\tau_{j}) \right) \right)\left( \left( \psi^{2} \right)^{-1}(\tau_{j}) \right) u_{j}\otimes u_{j}\\ &= \sum_{j=1}^{\infty}\varTheta_{\psi}^{2}\left( \left( \psi^{2} \right)^{-1}(\tau_{j}) \right) u_{j}\otimes u_{j}\\ &= \varTheta_{\psi}^{2}\left( C_0 \right), \end{aligned} $$

and the proof is complete.

Proof (of Proposition 2)

For the first item (1), we notice that \(\varphi \prec \varTheta _{\psi }^{2}\) if and only if φ(f 2(t)) ≺ t. The linear function tt is a qualification of Tikhonov regularization with constant γ = 1. Thus, by Lemma 3 we have

$$\displaystyle \begin{aligned}b_{x^{\ast}}(\alpha) \leq \left\| r_{\alpha}(B^{\ast}B) \right\|{}_{} \left\| s_{\alpha}(B^{\ast}B)\varphi(f^{2}(B^{\ast}B)) \right\|{}_{} \leq \gamma_{0}\varphi(f^{2}(\alpha)), \end{aligned}$$

which completes the proof for this case. For item (2), we have that

$$\displaystyle \begin{aligned}b_{x^{\ast}}(\alpha)=\left\| s_{\alpha}(B^{\ast}B)x^{\ast} \right\|{}_{}.\end{aligned}$$

For any 0 < α ≤ 1, we have α + t ≤ 1 + t, hence

$$\displaystyle \begin{aligned}b_{x^{\ast}}(\alpha)=\alpha\left\| (\alpha I+B^{\ast}B)^{-1}x^{\ast} \right\|{}_{}\geq\alpha\left\| ( I+B^{\ast}B)^{-1}x^{\ast} \right\|{}_{}.\end{aligned}$$

We conclude that there exists a constant \(c_{1}=c_{1}(x^{\ast },\left \| B^{\ast }B \right \|{ }_{})\), such that for small α it holds

$$\displaystyle \begin{aligned}b_{x^{\ast}}(\alpha)\geq {c_{1}}\alpha. \end{aligned}$$

On the other hand, since t ≺ φ(f 2(t)), there exists a constant c 2 > 0 which depends only on the index functions φ, f and on \(\left \| B^{\ast }B \right \|{ }_{}\), such that

$$\displaystyle \begin{aligned}b_{x^{\ast}}(\alpha)=\alpha\left\| (\alpha I+B^{\ast}B)^{-1}x^{\ast} \right\|{}_{}\leq\alpha\left\| (B^{\ast}B)^{-1}\varphi(f^2(B^{\ast}B))w \right\|{}_{}\leq {c_{2}}\alpha. \end{aligned}$$

For item (3), we have that

$$\displaystyle \begin{aligned} b_{x^{\ast}}(\alpha) &\leq \left\| r_{\alpha}(B^{\ast}B)s_{\alpha}(B^{\ast}B)\varphi(f^{2}(B^{\ast}B)) \right\|{}_{}\\ & \leq \left\| s_{\alpha}(B^{\ast}B)B^{\ast}B \right\|{}_{}\left\| r_{\alpha}(B^{\ast}B)\varphi(f^{2}(B^{\ast}B))\left( B^{\ast}B \right)^{-1} \right\|{}_{}\\ &\leq \alpha \gamma \frac{\varphi(f^{2}(\alpha))}{\alpha} = \gamma \varphi(f^{2}(\alpha)), \end{aligned} $$

and the proof is complete.

Proof (of Lemma 4)

The continuity is clear. For the monotonicity we use the representation (15) to get

$$\displaystyle \begin{aligned} S_{T,C_0}(\alpha) - S_{T,C_0}(\alpha^{\prime}) & = \mathrm{tr}\left[\left( \alpha I+B^{\ast}B \right)^{-1}C_0\right] - \mathrm{tr}\left[\left( \alpha^{\prime} + B^{\ast}B \right)^{-1}C_0\right]\\ &= \mathrm{tr}\left[\left( \alpha I+B^{\ast}B \right)^{-1}(\alpha^{\prime} - \alpha) \left( \alpha^{\prime} + B^{\ast}B \right)^{-1}C_0\right]\\ & = (\alpha^{\prime} - \alpha) \mathrm{tr}\left[\left( \alpha I+B^{\ast}B \right)^{-1}\left( \alpha^{\prime} + B^{\ast}B \right)^{-1}C_0\right]. \end{aligned} $$

The trace on the right hand side is positive. Indeed, if (s j , u j , u j ) denotes the singular value decomposition of B B then this trace can be written as

$$\displaystyle \begin{aligned}\mathrm{tr}\left[\left( \alpha I+B^{\ast}B \right)^{-1}\left( \alpha^{\prime} + B^{\ast}B \right)^{-1}C_0\right] = \sum_{j=1}^{\infty} \frac{1}{\alpha + s_{j}} \frac{1}{\alpha^{\prime} + s_{j}} \langle{ C_0 u_{j}},{u_{j}} \rangle, \end{aligned}$$

where the right hand side is positive since the operator C 0 is positive definite. Thus, if α < α then \(S_{T,C_0}(\alpha ) - S_{T,C_0}(\alpha ^{\prime })\) is positive, which proves the first assertion.

The proof of the second assertion is simple, and hence omitted. To prove the last assertion we use the partial ordering of self-adjoint operators in Hilbert space, that is, we write A ≤ B if 〈Ax, x〉≤〈Bx, x〉, x ∈ X, for two self-adjoint operators A and B. Plainly, with \(a:= \left \| T^{\ast }T \right \|{ }_{}\), we have that T T ≤ aI. Multiplying from the left and right by \(C_0^{1/2}\) this yields B B ≤ aC 0, and thus for any α > 0 that αI + B B ≤ αI + aC 0. The function t↦ − 1/t, t > 0 is operator monotone, which gives \(\left ( \alpha I + aC_0 \right )^{-1} \leq \left ( \alpha I + B^{\ast }B \right )^{-1}\). Multiplying from the left and right by \(C_0^{1/2}\) again, we arrive at

$$\displaystyle \begin{aligned}C_0^{1/2}\left( \alpha I + aC_0 \right)^{-1}C_0^{1/2} \leq C_0^{1/2}\left( \alpha I + B^{\ast}B \right)^{-1}C_0^{1/2}. \end{aligned}$$

This in turn extends to the traces and gives that

$$\displaystyle \begin{aligned}\mathrm{tr}\left[C_0^{1/2}\left( \alpha I + aC_0 \right)^{-1}C_0^{1/2}\right]\leq \mathrm{tr}\left[ C_0^{1/2}\left( \alpha I + B^{\ast}B \right)^{-1}C_0^{1/2}\right] = S_{T,C_0}(\alpha). \end{aligned}$$

Now, let us denote by \(t_j,\ j\in \mathbb N\), the singular numbers of C 0, then we can bound

$$\displaystyle \begin{aligned}S_{T,C_0}(\alpha) \geq \mathrm{tr}\left[\left( \alpha I + aC_0 \right)^{-1}C_0\right]\geq \sum_{t_{j}\geq \alpha/a} \frac{t_{j}}{\alpha + a t_{j}} \geq \frac 1 {2a} \#\left\{j,\ t_{j}\geq \frac \alpha a \right\}. \end{aligned}$$

If \(S_{T,C_0}(\alpha )\) were uniformly bounded from above, then there would exist a finite natural number, say N, such that \(t_{N} \geq \frac \alpha a > t_{N+1}\), for α > 0 small enough. But this would imply that t N+1 = 0, which contradicts the assumption that C 0 is positive definite.

Lemma 5

For t > 0 let \(\varTheta ^2_{\psi }(t)=t\exp (-2qt^{-\frac {b}{1+2a}})\) , for some q, b, a > 0. Then for small s we have \((\varTheta ^2_{\varPsi })^{-1}(s)\sim (\log s^{-\frac 1{2q}})^{-\frac {1+2a}{b}}\).

Proof

Let

$$\displaystyle \begin{aligned} s=\varTheta^2_{\varPsi}(t)>0\end{aligned} $$
(19)

and observe that t is small if and only if s is small. Applying [3, Lem 4.5] for x = t −1 we get the result.

Proof (of Proposition 6)

In this example the explicit solution of Eq. (16) in Theorem 1 is more difficult. However, as discussed in Sect. 3.4, it suffices to asymptotically balance the squared bias and the posterior spread using an appropriate parameter choice α = α(δ). Indeed, under the stated choice of α the squared bias is of order

$$\displaystyle \begin{aligned} (\log(\alpha^{-1}))^{-\frac{2\beta}b}\leq \sigma^{-\frac{2\beta}b}\log(\delta^{-2})^{-\frac{2\beta}b} \end{aligned} $$

while the posterior spread term is of order

$$\displaystyle \begin{aligned} \frac{\delta^2}{\alpha}(\log(\alpha^{-1}))^{-\frac{2a}b}\leq\log(\delta^{-2}))^{-\frac{2\beta}b}. \end{aligned} $$

Proof (of Proposition 8)

According to the considerations in Remark 10, it is straightforward to check that without preconditioning the best SPC rate that can be established is \(\delta ^{\frac {4+8a+8p}{3+4a+6p}}\) which proves item (1). In the preconditioned case, the explicit solution of Eq. (16) in Theorem 1, which in this case has the form

$$\displaystyle \begin{aligned}\exp({-2\beta\alpha^{-\frac{1}{1+2a+2p}}})=\delta^2\alpha^{-\frac{1+2p}{1+2a+2p}},\end{aligned}$$

is again difficult. However, as discussed in Sect. 3.4, it suffices to asymptotically balance the squared bias and the posterior spread using an appropriate parameter choice α = α(δ). Indeed, using [3, Lem 4.5] we have that the solution to the above equation behaves asymptotically as the stated choice of α, and substitution gives the claimed rate.

Proof (of Proposition 10)

We begin with items (1) and (3). The explicit solution of Eq. (16) in Theorem 1, which in this case has the form

$$\displaystyle \begin{aligned}\alpha^{\frac{\beta}q}=\frac{\delta^2}{\alpha}(\log(\alpha^{-1})^{-2a},\end{aligned}$$

is difficult. As discussed in Sect. 3.4, it suffices to asymptotically balance the squared bias and the posterior spread using an appropriate parameter choice α = α(δ). Indeed, under the stated choice of α both quantities are bounded from above by \(\delta ^{\frac {2\beta }{\beta +q}}\). For item (2), according to the considerations in Remark 10, it is straightforward to check that without preconditioning the best SPC rate that can be established is \(\delta ^{\frac {4q}{\beta +q}}\).

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Agapiou, S., Mathé, P. (2018). Posterior Contraction in Bayesian Inverse Problems Under Gaussian Priors. In: Hofmann, B., Leitão, A., Zubelli, J. (eds) New Trends in Parameter Identification for Mathematical Models. Trends in Mathematics. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-70824-9_1

Download citation

Publish with us

Policies and ethics