[go: up one dir, main page]

Skip to main content
Log in

Bayesian invariant measurements of generalization

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. S. Amari.Differential-geometrical methods in statistics, Springer Lecture Notes in Statistics, vol. 28, Springer-Verlag, New York, 1985.

    Google Scholar 

  2. R. Abraham, J. E. Marsden, T. Ratiu.Manifolds, tensor analysis, and applications, Addison-Wesley, London, 1983.

    Google Scholar 

  3. H. Raiffa, R. Schlaifer.Applied statistical decision theory, MIT Press, Cambridge, Mass., 1968.

    Google Scholar 

  4. T. S. Ferguson.Mathematical statistics: a decision theoretic approach, Academic Press, New York, 1967.

    Google Scholar 

  5. M. H. DeGroot.Optimal statistical decisions, McGraw-Hill, New York, 1970.

    Google Scholar 

  6. O. E. Barndorff-Nielsen, D. R. Cox, N. Reid. The role of differential geometry in statistical theory,Intern. Stat. Rev., vol. 54, no. 1, pp. 83–96, 1986.

    MathSciNet  Google Scholar 

  7. R. E. Kass. The geometry of asymptotic inference (with discussion),Statistical Science, vol. 4, no. 3, pp. 188–234, 1989.

    MATH  MathSciNet  Google Scholar 

  8. H. Zhu, R. Rohwer.Information geometric measurements of generalisation, Technical Report NCRG/4350, Dept. Comp. Sci. & Appl. Math., Aston University, August 1995, ftp://cs.aston.ac.uk/neural/zhuh/generalisation.ps.Z.

  9. S. Kullback, R. A. Leibler. On information and sufficiency,Ann. Math. Statist., vol. 22, pp. 79–86, 1951.

    MathSciNet  Google Scholar 

  10. A. W. F. Edwards. Likelihood:An account of the statistical concept of likelihood and its applications to scientific inference, Cambridge University Press, Cambridge, 1972.

    Google Scholar 

  11. H. Zhu, R. Rohwer.Bayesian invariant measurements of generalisation for discrete distributions, Technical Report NCRG/4351, Dept. Comp. Sci. & Appl. Math., Aston University, August 1995, ftp://cs.aston.ac.uk/neural/zhuh/discrete.ps.Z.

  12. H. Zhu, R. Rohwer.Bayesian invariant measurements of generalisation for continuous distributions, Technical Report NCRG/4352, Dept. Comp. Sci. & Appl. Math., Aston University, August 1995, ftp://cs.aston.ac.uk/neural/zhuh/continuous.ps.Z.

  13. H. White. Learning in artificial neural networks: A statistical perspective,Neural Computation, vol. 1, no. 4, pp. 425–464, 1989.

    Google Scholar 

  14. A. P. Dawid, M. Stone, J. V. Zidek. Marginalization paradoxes in Bayesian and structural inference (with discussion),J. Roy. Stat. Soc. B, vol. 35, pp. 189–233, 1973.

    MathSciNet  Google Scholar 

  15. H. Akaike. The interpretation of improper prior distributions as limits of data dependent proper prior distributions.J. Roy. Stat. Soc. B, vol. 42, no. 1, pp. 46–52, 1980.

    MATH  MathSciNet  Google Scholar 

  16. D. J. C. MacKay. Bayesian interpolation,Neural Computation, vol. 4, no. 3, pp. 415–447, 1992.

    Google Scholar 

  17. D. J. C. MacKay. A practical Bayesian framework for backpropagation networks,Neural Compulation, vol. 4, no. 3, pp. 448–472, 1992.

    Google Scholar 

  18. D. H. Wolpert. On the use of evidence in neural neworks, in S. J. Hanson, J. D. Cowan, C. Lee Giles, eds,Advances in Neural Information Processing Systems, vol. 5, pp. 539–546, San Mateo, CA, 1993. Morgan Kaufmann.

    Google Scholar 

  19. D. J. C. MacKay. Hyperparameters: Optimize, or integrate out? in G. Heidbreder, ed.,Maximum entropy and Bayesian methods, Santa Barbara 1993, Dordrecht, 1995. Kluwer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, H., Rohwer, R. Bayesian invariant measurements of generalization. Neural Process Lett 2, 28–31 (1995). https://doi.org/10.1007/BF02309013

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02309013

Keywords