[go: up one dir, main page]

Skip to main content

Revisiting Bagging for Stochastic Algorithms

  • Conference paper
  • First Online:
AI 2024: Advances in Artificial Intelligence (AI 2024)

Abstract

Bagging is a commonly used ensemble method, which involves the bootstrap sampling of training observations. Bootstraps are typically needed to decouple the models. However, this is based on the well-understood properties of deterministic algorithms. To perform bagging for stochastic algorithms, there is an implicit assumption that both variance due to the training data and variance due to the algorithm need to be reduced. This assumption may not be correct and only a reduction in variance due to the algorithm may need to be targeted. Bootstrapping may cause an unnecessary increase in error due to bias because it reduces the number of unique training observations. Bootstrapped and non-bootstrapped ensembles are compared across multiple machine learning algorithms and data sets using an extended error decomposition. The results show that bootstrap sampling is often not required as it increases error due to bias and internal variance. The prediction error associated with an algorithm and data set needs to be more fully decomposed in order to distinguish between the different sources of error. This allows the most appropriate ensemble method to be chosen.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ali, H.A., Mohamed, C., Abdelhamid, B., Ourdani, N., El Alami, T.: A comparative evaluation use bagging and boosting ensemble classifiers. In: 2022 International Conference on Intelligent Systems and Computer Vision (ISCV), pp. 1–6. IEEE (2022)

    Google Scholar 

  2. Amin, M.N., Iftikhar, B., Khan, K., Javed, M.F., AbuArab, A.M., Rehman, M.F.: Prediction model for rice husk ash concrete using AI approach: boosting and bagging algorithms. In: Structures. vol. 50, pp. 745–757. Elsevier (2023)

    Google Scholar 

  3. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)

    Article  Google Scholar 

  4. Betts, S.: Peering inside GPT-4: understanding its mixture of experts (MoE) architecture. Medium (2023). https://medium.com/@seanbetts/peering-inside-gpt-4-understanding-its-mixture-of-experts-moe-architecture-2a42eb8bdcb3

  5. Bishop, C.M. (ed.): : Mixture Models and EM. In: Pattern Recognition and Machine Learning. ISS, pp. 423–459. Springer, New York (2006). https://doi.org/10.1007/978-0-387-45528-0_9

  6. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    Article  Google Scholar 

  7. Breiman, L.: Bias, variance, and arcing classifiers. Technical report, Technical Report 460, Statistics Department, University of California, Berkeley (1996)

    Google Scholar 

  8. Breiman, L.: Using adaptive bagging to debias regressions. Technical report, Technical Report 547, Statistics Department, University of California, Berkeley (1999)

    Google Scholar 

  9. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  10. Bühlmann, P., Yu, B.: Analyzing bagging. Ann. Stat. 30(4), 927–961 (2002)

    Article  MathSciNet  Google Scholar 

  11. Buja, A., Stuetzle, W.: Observations on bagging. Statistica Sinica 323–351 (2006)

    Google Scholar 

  12. Chen, Q., Xue, B.: Generalisation in genetic programming for symbolic regression: challenges and future directions. In: Smith, A.E. (ed.) Women in Computational Intelligence. WES, pp. 281–302. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-79092-9_13

    Chapter  Google Scholar 

  13. Diaconis, P., Efron, B.: Computer-intensive methods in statistics. Sci. Am. 248(5), 116–131 (1983)

    Article  Google Scholar 

  14. Dick, G.: Improving geometric semantic genetic programming with safe tree initialisation. In: Machado, P., et al. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 28–40. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16501-1_3

    Chapter  Google Scholar 

  15. Dick, G., Owen, C.A., Whigham, P.A.: Evolving bagging ensembles using a spatially-structured niching method. In: Proceedings of the Genetic and Evolutionary Computation Conference. GECCO ’18, New York, NY, USA, pp. 418–425. ACM (2018)

    Google Scholar 

  16. Dick, G., Owen, C.A., Whigham, P.A.: Feature standardisation and coefficient optimisation for effective symbolic regression. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference. GECCO ’20, New York, NY, USA, pp. 306–314. ACM (2020)

    Google Scholar 

  17. Dietterich, T.G.: Ensemble learning. The Handbook of Brain Theory and Neural Networks 2(1), 110–125 (2002)

    Google Scholar 

  18. Efron, B., Tibshirani, R.: Improvements on cross-validation: the 632+ bootstrap method. J. Am. Stat. Assoc. 92(438), 548–560 (1997)

    MathSciNet  Google Scholar 

  19. Fort, S., Hu, H., Lakshminarayanan, B.: Deep ensembles: a loss landscape perspective. CoRR abs/1912.02757, 1–10 (2019). http://arxiv.org/abs/1912.02757

  20. Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–67 (1991)

    MathSciNet  Google Scholar 

  21. Fumera, G., Roli, F., Serrau, A.: Dynamics of variance reduction in bagging and other techniques based on randomisation. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 316–325. Springer, Heidelberg (2005). https://doi.org/10.1007/11494683_32

    Chapter  Google Scholar 

  22. Ganaie, M.A., Hu, M., Malik, A., Tanveer, M., Suganthan, P.: Ensemble deep learning: a review. Eng. Appl. Artif. Intell. 115, 105151 (2022)

    Article  Google Scholar 

  23. Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Comput. 4(1), 1–58 (1992)

    Article  Google Scholar 

  24. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)

    Article  Google Scholar 

  25. González, S., García, S., Del Ser, J., Rokach, L., Herrera, F.: A practical tutorial on bagging and boosting based ensembles for machine learning: algorithms, software tools, performance study, practical perspectives and opportunities. Information Fusion 64, 205–237 (2020)

    Article  Google Scholar 

  26. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  27. Hong, H., Liu, J., Zhu, A.X.: Modeling landslide susceptibility using logitboost alternating decision trees and forest by penalizing attributes with the bagging ensemble. Sci. Total Environ. 718, 137231 (2020)

    Article  Google Scholar 

  28. Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation-Volume 2. pp. 1053–1060. Morgan Kaufmann Publishers Inc. (1999)

    Google Scholar 

  29. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: with Applications in R. Springer, New York (2014). https://doi.org/10.1007/978-1-0716-1418-1

    Book  Google Scholar 

  30. Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 70–82. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36599-0_7

    Chapter  Google Scholar 

  31. Keijzer, M., Babovic, V.: Genetic Programming, Ensemble Methods and the Bias/Variance Tradeoff – Introductory Investigations. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 76–90. Springer, Heidelberg (2000). https://doi.org/10.1007/978-3-540-46239-2_6

    Chapter  Google Scholar 

  32. Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural. Inf. Process. Syst. 30, 6402–6413 (2017)

    Google Scholar 

  33. Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D., Batra, D.: Why m heads are better than one: training a diverse ensemble of deep networks. CoRR abs/1511.06314, 1–9 (2015). http://arxiv.org/abs/1511.06314

  34. Lee, S., Purushwalkam Shiva Prakash, S., Cogswell, M., Ranjan, V., Crandall, D., Batra, D.: Stochastic multiple choice learning for training diverse deep ensembles. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  35. Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002). https://CRAN.R-project.org/doc/Rnews/

  36. Liu, C.L., et al.: A bagging approach for improved predictive accuracy of intradialytic hypotension during hemodialysis treatment. Comput. Biol. Med. 172, 108244 (2024)

    Article  Google Scholar 

  37. Misiuk, B., Brown, C.J.: Improved environmental mapping and validation using bagging models with spatially clustered data. Eco. Inform. 77, 102181 (2023)

    Article  Google Scholar 

  38. Ni, J., Drieberg, R.H., Rockett, P.I.: The use of an analytic quotient operator in genetic programming. IEEE Trans. Evol. Comput. 17(1), 146–152 (2013)

    Article  Google Scholar 

  39. Nicolau, M., Agapitos, A.: Choosing function sets with better generalisation performance for symbolic regression models. Genet. Program Evolvable Mach. 22, 73–100 (2021)

    Article  Google Scholar 

  40. Nixon, J., Tran, D., Lakshminarayanan, B.: Why aren’t bootstrapped neural networks better? In: “I Can’t Believe It’s Not Better!” NeurIPS 2020 workshop (2020)

    Google Scholar 

  41. Owen, C.A.: Error decomposition of evolutionary machine learning (Thesis, Doctor of Philosophy). University of Otago. http://hdl.handle.net/10523/12234 (2021)

  42. Owen, C.A., Dick, G., Whigham, P.A.: Characterising genetic programming error through extended bias and variance decomposition. IEEE Trans. Evol. Comput. 24(6), 1164–1176 (2020)

    Article  Google Scholar 

  43. Owen, C.A., Dick, G., Whigham, P.A.: Standardisation and data augmentation in genetic programming. IEEE Trans. Evol. Comput. 26(6), 1596–1608 (2022)

    Article  Google Scholar 

  44. Owen, C.A., Dick, G., Whigham, P.A.: Towards explainable AutoML using error decomposition. In: Aziz, H., Corrêa, D., French, T. (eds.) AI 2022. LNCS, vol. 13728, pp. 177–190. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22695-3_13

    Chapter  Google Scholar 

  45. Owen, C.A., Dick, G., Whigham, P.A.: Using decomposed error for reproducing implicit understanding of algorithms. Evol. Comput. 32(1), 49–68 (2024)

    Article  Google Scholar 

  46. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  47. Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–45 (2006)

    Article  Google Scholar 

  48. Rumelhart, D.E., McClelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations. MIT Press, Cambridge, MA, USA (1986)

    Google Scholar 

  49. Soloff, J.A., Barber, R.F., Willett, R.: Bagging provides assumption-free stability. J. Mach. Learn. Res. 25(131), 1–35 (2024)

    MathSciNet  Google Scholar 

  50. Valentini, G., Dietterich, T.G.: Low bias bagged support vector machines. In: Proceedings of the International Conference on Machine Learning, pp. 752–759. ICML’03 (2003)

    Google Scholar 

  51. Yeh, I.C.: Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 28(12), 1797–1808 (1998)

    Article  Google Scholar 

  52. Zhang, T., Fu, Q., Wang, H., Liu, F., Wang, H., Han, L.: Bagging-based machine learning algorithms for landslide susceptibility modeling. Nat. Hazards 110(2), 823–846 (2022)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Caitlin A. Owen .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Owen, C.A., Dick, G., Whigham, P.A. (2025). Revisiting Bagging for Stochastic Algorithms. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0351-0_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0350-3

  • Online ISBN: 978-981-96-0351-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics