Bayesian invariant measurements of generalization

Huaiyu Zhu¹ &
Richard Rohwer¹

177 Accesses
Explore all metrics

Abstract

The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

S. Amari.Differential-geometrical methods in statistics, Springer Lecture Notes in Statistics, vol. 28, Springer-Verlag, New York, 1985.
Google Scholar
R. Abraham, J. E. Marsden, T. Ratiu.Manifolds, tensor analysis, and applications, Addison-Wesley, London, 1983.
Google Scholar
H. Raiffa, R. Schlaifer.Applied statistical decision theory, MIT Press, Cambridge, Mass., 1968.
Google Scholar
T. S. Ferguson.Mathematical statistics: a decision theoretic approach, Academic Press, New York, 1967.
Google Scholar
M. H. DeGroot.Optimal statistical decisions, McGraw-Hill, New York, 1970.
Google Scholar
O. E. Barndorff-Nielsen, D. R. Cox, N. Reid. The role of differential geometry in statistical theory,Intern. Stat. Rev., vol. 54, no. 1, pp. 83–96, 1986.
MathSciNet Google Scholar
R. E. Kass. The geometry of asymptotic inference (with discussion),Statistical Science, vol. 4, no. 3, pp. 188–234, 1989.
MATH MathSciNet Google Scholar
H. Zhu, R. Rohwer.Information geometric measurements of generalisation, Technical Report NCRG/4350, Dept. Comp. Sci. & Appl. Math., Aston University, August 1995, ftp://cs.aston.ac.uk/neural/zhuh/generalisation.ps.Z.
S. Kullback, R. A. Leibler. On information and sufficiency,Ann. Math. Statist., vol. 22, pp. 79–86, 1951.
MathSciNet Google Scholar
A. W. F. Edwards. Likelihood:An account of the statistical concept of likelihood and its applications to scientific inference, Cambridge University Press, Cambridge, 1972.
Google Scholar
H. Zhu, R. Rohwer.Bayesian invariant measurements of generalisation for discrete distributions, Technical Report NCRG/4351, Dept. Comp. Sci. & Appl. Math., Aston University, August 1995, ftp://cs.aston.ac.uk/neural/zhuh/discrete.ps.Z.
H. Zhu, R. Rohwer.Bayesian invariant measurements of generalisation for continuous distributions, Technical Report NCRG/4352, Dept. Comp. Sci. & Appl. Math., Aston University, August 1995, ftp://cs.aston.ac.uk/neural/zhuh/continuous.ps.Z.
H. White. Learning in artificial neural networks: A statistical perspective,Neural Computation, vol. 1, no. 4, pp. 425–464, 1989.
Google Scholar
A. P. Dawid, M. Stone, J. V. Zidek. Marginalization paradoxes in Bayesian and structural inference (with discussion),J. Roy. Stat. Soc. B, vol. 35, pp. 189–233, 1973.
MathSciNet Google Scholar
H. Akaike. The interpretation of improper prior distributions as limits of data dependent proper prior distributions.J. Roy. Stat. Soc. B, vol. 42, no. 1, pp. 46–52, 1980.
MATH MathSciNet Google Scholar
D. J. C. MacKay. Bayesian interpolation,Neural Computation, vol. 4, no. 3, pp. 415–447, 1992.
Google Scholar
D. J. C. MacKay. A practical Bayesian framework for backpropagation networks,Neural Compulation, vol. 4, no. 3, pp. 448–472, 1992.
Google Scholar
D. H. Wolpert. On the use of evidence in neural neworks, in S. J. Hanson, J. D. Cowan, C. Lee Giles, eds,Advances in Neural Information Processing Systems, vol. 5, pp. 539–546, San Mateo, CA, 1993. Morgan Kaufmann.
Google Scholar
D. J. C. MacKay. Hyperparameters: Optimize, or integrate out? in G. Heidbreder, ed.,Maximum entropy and Bayesian methods, Santa Barbara 1993, Dordrecht, 1995. Kluwer.
Google Scholar

Download references

Author information

Authors and Affiliations

Neural Computing research Group Dept. of Computer Science and Applied Mathematics, Aston University Aston University, B4 7ET, Birmingham, UK
Huaiyu Zhu & Richard Rohwer

Authors

Huaiyu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Richard Rohwer
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, H., Rohwer, R. Bayesian invariant measurements of generalization. Neural Process Lett 2, 28–31 (1995). https://doi.org/10.1007/BF02309013

Download citation

Issue Date: December 1995
DOI: https://doi.org/10.1007/BF02309013

Bayesian invariant measurements of generalization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Rethinking statistical learning theory: learning using statistical invariants

Dilation and Informativeness

Empirical Confidence Models for Supervised Machine Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Bayesian invariant measurements of generalization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Rethinking statistical learning theory: learning using statistical invariants

Dilation and Informativeness

Empirical Confidence Models for Supervised Machine Learning

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now