Abstract
Bagging is a commonly used ensemble method, which involves the bootstrap sampling of training observations. Bootstraps are typically needed to decouple the models. However, this is based on the well-understood properties of deterministic algorithms. To perform bagging for stochastic algorithms, there is an implicit assumption that both variance due to the training data and variance due to the algorithm need to be reduced. This assumption may not be correct and only a reduction in variance due to the algorithm may need to be targeted. Bootstrapping may cause an unnecessary increase in error due to bias because it reduces the number of unique training observations. Bootstrapped and non-bootstrapped ensembles are compared across multiple machine learning algorithms and data sets using an extended error decomposition. The results show that bootstrap sampling is often not required as it increases error due to bias and internal variance. The prediction error associated with an algorithm and data set needs to be more fully decomposed in order to distinguish between the different sources of error. This allows the most appropriate ensemble method to be chosen.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ali, H.A., Mohamed, C., Abdelhamid, B., Ourdani, N., El Alami, T.: A comparative evaluation use bagging and boosting ensemble classifiers. In: 2022 International Conference on Intelligent Systems and Computer Vision (ISCV), pp. 1–6. IEEE (2022)
Amin, M.N., Iftikhar, B., Khan, K., Javed, M.F., AbuArab, A.M., Rehman, M.F.: Prediction model for rice husk ash concrete using AI approach: boosting and bagging algorithms. In: Structures. vol. 50, pp. 745–757. Elsevier (2023)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)
Betts, S.: Peering inside GPT-4: understanding its mixture of experts (MoE) architecture. Medium (2023). https://medium.com/@seanbetts/peering-inside-gpt-4-understanding-its-mixture-of-experts-moe-architecture-2a42eb8bdcb3
Bishop, C.M. (ed.): : Mixture Models and EM. In: Pattern Recognition and Machine Learning. ISS, pp. 423–459. Springer, New York (2006). https://doi.org/10.1007/978-0-387-45528-0_9
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Bias, variance, and arcing classifiers. Technical report, Technical Report 460, Statistics Department, University of California, Berkeley (1996)
Breiman, L.: Using adaptive bagging to debias regressions. Technical report, Technical Report 547, Statistics Department, University of California, Berkeley (1999)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Bühlmann, P., Yu, B.: Analyzing bagging. Ann. Stat. 30(4), 927–961 (2002)
Buja, A., Stuetzle, W.: Observations on bagging. Statistica Sinica 323–351 (2006)
Chen, Q., Xue, B.: Generalisation in genetic programming for symbolic regression: challenges and future directions. In: Smith, A.E. (ed.) Women in Computational Intelligence. WES, pp. 281–302. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-79092-9_13
Diaconis, P., Efron, B.: Computer-intensive methods in statistics. Sci. Am. 248(5), 116–131 (1983)
Dick, G.: Improving geometric semantic genetic programming with safe tree initialisation. In: Machado, P., et al. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 28–40. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16501-1_3
Dick, G., Owen, C.A., Whigham, P.A.: Evolving bagging ensembles using a spatially-structured niching method. In: Proceedings of the Genetic and Evolutionary Computation Conference. GECCO ’18, New York, NY, USA, pp. 418–425. ACM (2018)
Dick, G., Owen, C.A., Whigham, P.A.: Feature standardisation and coefficient optimisation for effective symbolic regression. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference. GECCO ’20, New York, NY, USA, pp. 306–314. ACM (2020)
Dietterich, T.G.: Ensemble learning. The Handbook of Brain Theory and Neural Networks 2(1), 110–125 (2002)
Efron, B., Tibshirani, R.: Improvements on cross-validation: the 632+ bootstrap method. J. Am. Stat. Assoc. 92(438), 548–560 (1997)
Fort, S., Hu, H., Lakshminarayanan, B.: Deep ensembles: a loss landscape perspective. CoRR abs/1912.02757, 1–10 (2019). http://arxiv.org/abs/1912.02757
Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–67 (1991)
Fumera, G., Roli, F., Serrau, A.: Dynamics of variance reduction in bagging and other techniques based on randomisation. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 316–325. Springer, Heidelberg (2005). https://doi.org/10.1007/11494683_32
Ganaie, M.A., Hu, M., Malik, A., Tanveer, M., Suganthan, P.: Ensemble deep learning: a review. Eng. Appl. Artif. Intell. 115, 105151 (2022)
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Comput. 4(1), 1–58 (1992)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
González, S., García, S., Del Ser, J., Rokach, L., Herrera, F.: A practical tutorial on bagging and boosting based ensembles for machine learning: algorithms, software tools, performance study, practical perspectives and opportunities. Information Fusion 64, 205–237 (2020)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Hong, H., Liu, J., Zhu, A.X.: Modeling landslide susceptibility using logitboost alternating decision trees and forest by penalizing attributes with the bagging ensemble. Sci. Total Environ. 718, 137231 (2020)
Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation-Volume 2. pp. 1053–1060. Morgan Kaufmann Publishers Inc. (1999)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: with Applications in R. Springer, New York (2014). https://doi.org/10.1007/978-1-0716-1418-1
Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 70–82. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36599-0_7
Keijzer, M., Babovic, V.: Genetic Programming, Ensemble Methods and the Bias/Variance Tradeoff – Introductory Investigations. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 76–90. Springer, Heidelberg (2000). https://doi.org/10.1007/978-3-540-46239-2_6
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural. Inf. Process. Syst. 30, 6402–6413 (2017)
Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D., Batra, D.: Why m heads are better than one: training a diverse ensemble of deep networks. CoRR abs/1511.06314, 1–9 (2015). http://arxiv.org/abs/1511.06314
Lee, S., Purushwalkam Shiva Prakash, S., Cogswell, M., Ranjan, V., Crandall, D., Batra, D.: Stochastic multiple choice learning for training diverse deep ensembles. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002). https://CRAN.R-project.org/doc/Rnews/
Liu, C.L., et al.: A bagging approach for improved predictive accuracy of intradialytic hypotension during hemodialysis treatment. Comput. Biol. Med. 172, 108244 (2024)
Misiuk, B., Brown, C.J.: Improved environmental mapping and validation using bagging models with spatially clustered data. Eco. Inform. 77, 102181 (2023)
Ni, J., Drieberg, R.H., Rockett, P.I.: The use of an analytic quotient operator in genetic programming. IEEE Trans. Evol. Comput. 17(1), 146–152 (2013)
Nicolau, M., Agapitos, A.: Choosing function sets with better generalisation performance for symbolic regression models. Genet. Program Evolvable Mach. 22, 73–100 (2021)
Nixon, J., Tran, D., Lakshminarayanan, B.: Why aren’t bootstrapped neural networks better? In: “I Can’t Believe It’s Not Better!” NeurIPS 2020 workshop (2020)
Owen, C.A.: Error decomposition of evolutionary machine learning (Thesis, Doctor of Philosophy). University of Otago. http://hdl.handle.net/10523/12234 (2021)
Owen, C.A., Dick, G., Whigham, P.A.: Characterising genetic programming error through extended bias and variance decomposition. IEEE Trans. Evol. Comput. 24(6), 1164–1176 (2020)
Owen, C.A., Dick, G., Whigham, P.A.: Standardisation and data augmentation in genetic programming. IEEE Trans. Evol. Comput. 26(6), 1596–1608 (2022)
Owen, C.A., Dick, G., Whigham, P.A.: Towards explainable AutoML using error decomposition. In: Aziz, H., Corrêa, D., French, T. (eds.) AI 2022. LNCS, vol. 13728, pp. 177–190. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22695-3_13
Owen, C.A., Dick, G., Whigham, P.A.: Using decomposed error for reproducing implicit understanding of algorithms. Evol. Comput. 32(1), 49–68 (2024)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–45 (2006)
Rumelhart, D.E., McClelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations. MIT Press, Cambridge, MA, USA (1986)
Soloff, J.A., Barber, R.F., Willett, R.: Bagging provides assumption-free stability. J. Mach. Learn. Res. 25(131), 1–35 (2024)
Valentini, G., Dietterich, T.G.: Low bias bagged support vector machines. In: Proceedings of the International Conference on Machine Learning, pp. 752–759. ICML’03 (2003)
Yeh, I.C.: Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 28(12), 1797–1808 (1998)
Zhang, T., Fu, Q., Wang, H., Liu, F., Wang, H., Han, L.: Bagging-based machine learning algorithms for landslide susceptibility modeling. Nat. Hazards 110(2), 823–846 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Owen, C.A., Dick, G., Whigham, P.A. (2025). Revisiting Bagging for Stochastic Algorithms. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_12
Download citation
DOI: https://doi.org/10.1007/978-981-96-0351-0_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0350-3
Online ISBN: 978-981-96-0351-0
eBook Packages: Computer ScienceComputer Science (R0)