Abstract
Very often symbolic regression, as addressed in Genetic Programming (GP), is equivalent to approximate interpolation. This means that, in general, GP algorithms try to fit the sample as better as possible but no notion of generalization error is considered. As a consequence, overfitting, code-bloat and noisy data are problems which are not satisfactorily solved under this approach. Motivated by this situation we review the problem of Symbolic Regression under the perspective of Machine Learning, a well founded mathematical toolbox for predictive learning. We perform empirical comparisons between classical statistical methods (AIC and BIC) and methods based on Vapnik-Chrevonenkis (VC) theory for regression problems under genetic training. Empirical comparisons of the different methods suggest practical advantages of VC-based model selection. We conclude that VC theory provides methodological framework for complexity control in Genetic Programming even when its technical results seems not be directly applicable. As main practical advantage, precise penalty functions founded on the notion of generalization error are proposed for evolving GP-trees.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Akaike, H.: Statistical prediction information. Ann. Inst. Statistic. Math. 22, 203–217 (1970)
Amil, N.M., Bredeche, N., Gagné, C., Gelly, S., Schoenauer, M., Teytaud, O.: A statistical learning perspective of genetic programming. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 327–338. Springer, Heidelberg (2009)
Bernardo, J., Smith, A.: Bayesian theory. John Wiley & Sons, Chichester (1994)
Cherkassky, V., Yunkian, M.: Comparison of Model Selection for Regression. Neural Computation 15, 1691–1714 (2003)
Koza, J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Montaña, J.L.: Vcd bounds for some gp genotypes. In: ECAI, pp. 167–171 (2008)
Montaña, J.L., Alonso, C.L., Borges, C.E., Crespo, J.L.: Adaptation, performance and vapnik-chervonenkis dimension of straight line programs. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 315–326. Springer, Heidelberg (2009)
Shao, X., Cherkassky, V., Li, W.: Measuring the VC-dimension using optimized experimental design. Neural Computation 12, 1969–1986 (2000)
Teytaud, O., Gelly, S., Bredeche, N., Schoenauer, M.: Statistical Learning Theory Approach of Bloat. In: Proceedings of the 2005 conference on Genetic and Evolutionary Computation, pp. 1784–1785 (2005)
Vapnik, V.: Statistical learning theory. John Wiley & Sons, Chichester (1998)
Vapnik, V., Chervonenkis, A.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 264–280 (1971)
Vapnik, V., Chervonenkis, A.: Ordered risk minimization. Automation and Remote Control 34, 1226–1235 (1974)
Vapnik, V., Levin, E., Cun, Y.: Measuring the VC-dimension of a learning machine. Neural Computation 6, 851–876 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Montaña, J.L., Alonso, C.L., Borges, C.E., de la Dehesa, J. (2011). Penalty Functions for Genetic Programming Algorithms. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds) Computational Science and Its Applications - ICCSA 2011. ICCSA 2011. Lecture Notes in Computer Science, vol 6782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21928-3_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-21928-3_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21927-6
Online ISBN: 978-3-642-21928-3
eBook Packages: Computer ScienceComputer Science (R0)