Abstract
Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting (“early stopping”). The exact criterion used for validation-based early stopping, however, is usually chosen in an ad-hoc fashion or training is stopped interactively. This trick describes how to select a stopping criterion in a systematic fashion; it is a trick for either speeding learning procedures or improving generalization, whichever is more important in the particular situation. An empirical investigation on multi-layer perceptrons shows that there exists a tradeoff between training time and generalization: From the given mix of 1296 training runs using different 12 problems and 24 different network architectures I conclude slower stopping criteria allow for small improvements in generalization (here: about 4% on average), but cost much more training time (here: about factor 4 longer on average).
Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amari, S., Murata, N., Müller, K.-R., Finke, M., Yang, H.: Statistical theory of overtraining - is cross-validation effective. In: [23], pp. 176–182 (1996)
Amari, S., Murata, N., Müller, K.-R., Finke, M., Yang, H.: Aymptotic statistical theory of overtraining and cross-validation. IEEE Trans. on Neural Networks 8(5), 985–996 (1997)
Baldi, P., Chauvin, Y.: Temporal evolution of generalization during learning in linear networks. Neural Computation 3, 589–603 (1991)
Cowan, J.D., Tesauro, G., Alspector, J. (eds.): Advances in Neural Information Processing Systems 6. Morgan Kaufman Publishers Inc., San Mateo (1994)
Le Cun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: [22], pp. 598–605 (1990)
Fahlman, S.E.: An empirical study of learning speed in back-propagation networks. Technical Report CMU-CS-88-162, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (September 1988)
Fahlman, S.E., Lebiere, C.: The Cascade-Correlation learning architecture. In: [22], pp. 524–532 (1990)
Fiesler, E.: Comparative bibliography of ontogenic neural networks (1994) (submitted for publication)
Finnoff, W., Hergert, F., Zimmermann, H.G.: Improving model selection by nonconvergent methods. Neural Networks 6, 771–783 (1993)
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4, 1–58 (1992)
Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.): Advances in Neural Information Processing Systems 5. Morgan Kaufman Publishers Inc., San Mateo (1993)
Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: Optimal brain surgeon. In: [11], pp. 164–171 (1993)
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: [16], pp. 950–957 (1992)
Levin, A.U., Leen, T.K., Moody, J.E.: Fast pruning using principal components. In: [4] (1994)
Lippmann, R.P., Moody, J.E., Touretzky, D.S. (eds.): Advances in Neural Information Processing Systems 3. Morgan Kaufman Publishers Inc., San Mateo (1991)
Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.): Advances in Neural Information Processing Systems 4. Morgan Kaufman Publishers Inc., San Mateo (1992)
Morgan, N., Bourlard, H.: Generalization and parameter estimation in feedforward nets: Some experiments. In: [22], pp. 630–637 (1990)
Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Computation 4(4), 473–493 (1992)
Prechelt, L.: PROBEN1 — A set of benchmarks and benchmarking rules for neural network training algorithms. Technical Report 21/94, Fakultät für Informatik, Universität Karlsruhe, Germany, Anonymous, ftp://pub/papers/techreports/1994/1994-21.ps.gz on, ftp.ira.uka.de (September 1994)
Reed, R.: Pruning algorithms — a survey. IEEE Transactions on Neural Networks 4(5), 740–746 (1993)
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In: Proc. of the IEEE Intl. Conf. on Neural Networks, San Francisco, CA, pp. 586–591 (April 1993)
Touretzky, D.S. (ed.): Advances in Neural Information Processing Systems 2. Morgan Kaufman Publishers Inc., San Mateo (1990)
Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.): Advances in Neural Information Processing Systems 8. MIT Press, Cambridge (1996)
Wang, C., Venkatesh, S.S., Judd, J.S.: Optimal stopping and effective machine complexity in learning. In: [4] (1994)
Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by weight-elimination with application to forecasting. In: [15], pp. 875–882 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Prechelt, L. (2012). Early Stopping — But When?. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-35289-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35288-1
Online ISBN: 978-3-642-35289-8
eBook Packages: Computer ScienceComputer Science (R0)