Abstract
The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
S. Amari.Differential-geometrical methods in statistics, Springer Lecture Notes in Statistics, vol. 28, Springer-Verlag, New York, 1985.
R. Abraham, J. E. Marsden, T. Ratiu.Manifolds, tensor analysis, and applications, Addison-Wesley, London, 1983.
H. Raiffa, R. Schlaifer.Applied statistical decision theory, MIT Press, Cambridge, Mass., 1968.
T. S. Ferguson.Mathematical statistics: a decision theoretic approach, Academic Press, New York, 1967.
M. H. DeGroot.Optimal statistical decisions, McGraw-Hill, New York, 1970.
O. E. Barndorff-Nielsen, D. R. Cox, N. Reid. The role of differential geometry in statistical theory,Intern. Stat. Rev., vol. 54, no. 1, pp. 83–96, 1986.
R. E. Kass. The geometry of asymptotic inference (with discussion),Statistical Science, vol. 4, no. 3, pp. 188–234, 1989.
H. Zhu, R. Rohwer.Information geometric measurements of generalisation, Technical Report NCRG/4350, Dept. Comp. Sci. & Appl. Math., Aston University, August 1995, ftp://cs.aston.ac.uk/neural/zhuh/generalisation.ps.Z.
S. Kullback, R. A. Leibler. On information and sufficiency,Ann. Math. Statist., vol. 22, pp. 79–86, 1951.
A. W. F. Edwards. Likelihood:An account of the statistical concept of likelihood and its applications to scientific inference, Cambridge University Press, Cambridge, 1972.
H. Zhu, R. Rohwer.Bayesian invariant measurements of generalisation for discrete distributions, Technical Report NCRG/4351, Dept. Comp. Sci. & Appl. Math., Aston University, August 1995, ftp://cs.aston.ac.uk/neural/zhuh/discrete.ps.Z.
H. Zhu, R. Rohwer.Bayesian invariant measurements of generalisation for continuous distributions, Technical Report NCRG/4352, Dept. Comp. Sci. & Appl. Math., Aston University, August 1995, ftp://cs.aston.ac.uk/neural/zhuh/continuous.ps.Z.
H. White. Learning in artificial neural networks: A statistical perspective,Neural Computation, vol. 1, no. 4, pp. 425–464, 1989.
A. P. Dawid, M. Stone, J. V. Zidek. Marginalization paradoxes in Bayesian and structural inference (with discussion),J. Roy. Stat. Soc. B, vol. 35, pp. 189–233, 1973.
H. Akaike. The interpretation of improper prior distributions as limits of data dependent proper prior distributions.J. Roy. Stat. Soc. B, vol. 42, no. 1, pp. 46–52, 1980.
D. J. C. MacKay. Bayesian interpolation,Neural Computation, vol. 4, no. 3, pp. 415–447, 1992.
D. J. C. MacKay. A practical Bayesian framework for backpropagation networks,Neural Compulation, vol. 4, no. 3, pp. 448–472, 1992.
D. H. Wolpert. On the use of evidence in neural neworks, in S. J. Hanson, J. D. Cowan, C. Lee Giles, eds,Advances in Neural Information Processing Systems, vol. 5, pp. 539–546, San Mateo, CA, 1993. Morgan Kaufmann.
D. J. C. MacKay. Hyperparameters: Optimize, or integrate out? in G. Heidbreder, ed.,Maximum entropy and Bayesian methods, Santa Barbara 1993, Dordrecht, 1995. Kluwer.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Zhu, H., Rohwer, R. Bayesian invariant measurements of generalization. Neural Process Lett 2, 28–31 (1995). https://doi.org/10.1007/BF02309013
Issue Date:
DOI: https://doi.org/10.1007/BF02309013