Revisiting Bagging for Stochastic Algorithms

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15443))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

230 Accesses

Abstract

Bagging is a commonly used ensemble method, which involves the bootstrap sampling of training observations. Bootstraps are typically needed to decouple the models. However, this is based on the well-understood properties of deterministic algorithms. To perform bagging for stochastic algorithms, there is an implicit assumption that both variance due to the training data and variance due to the algorithm need to be reduced. This assumption may not be correct and only a reduction in variance due to the algorithm may need to be targeted. Bootstrapping may cause an unnecessary increase in error due to bias because it reduces the number of unique training observations. Bootstrapped and non-bootstrapped ensembles are compared across multiple machine learning algorithms and data sets using an extended error decomposition. The results show that bootstrap sampling is often not required as it increases error due to bias and internal variance. The prediction error associated with an algorithm and data set needs to be more fully decomposed in order to distinguish between the different sources of error. This allows the most appropriate ensemble method to be chosen.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ali, H.A., Mohamed, C., Abdelhamid, B., Ourdani, N., El Alami, T.: A comparative evaluation use bagging and boosting ensemble classifiers. In: 2022 International Conference on Intelligent Systems and Computer Vision (ISCV), pp. 1–6. IEEE (2022)
Google Scholar
Amin, M.N., Iftikhar, B., Khan, K., Javed, M.F., AbuArab, A.M., Rehman, M.F.: Prediction model for rice husk ash concrete using AI approach: boosting and bagging algorithms. In: Structures. vol. 50, pp. 745–757. Elsevier (2023)
Google Scholar
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)
Article Google Scholar
Betts, S.: Peering inside GPT-4: understanding its mixture of experts (MoE) architecture. Medium (2023). https://medium.com/@seanbetts/peering-inside-gpt-4-understanding-its-mixture-of-experts-moe-architecture-2a42eb8bdcb3
Bishop, C.M. (ed.): : Mixture Models and EM. In: Pattern Recognition and Machine Learning. ISS, pp. 423–459. Springer, New York (2006). https://doi.org/10.1007/978-0-387-45528-0_9
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Article Google Scholar
Breiman, L.: Bias, variance, and arcing classifiers. Technical report, Technical Report 460, Statistics Department, University of California, Berkeley (1996)
Google Scholar
Breiman, L.: Using adaptive bagging to debias regressions. Technical report, Technical Report 547, Statistics Department, University of California, Berkeley (1999)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Bühlmann, P., Yu, B.: Analyzing bagging. Ann. Stat. 30(4), 927–961 (2002)
Article MathSciNet Google Scholar
Buja, A., Stuetzle, W.: Observations on bagging. Statistica Sinica 323–351 (2006)
Google Scholar
Chen, Q., Xue, B.: Generalisation in genetic programming for symbolic regression: challenges and future directions. In: Smith, A.E. (ed.) Women in Computational Intelligence. WES, pp. 281–302. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-79092-9_13
Chapter Google Scholar
Diaconis, P., Efron, B.: Computer-intensive methods in statistics. Sci. Am. 248(5), 116–131 (1983)
Article Google Scholar
Dick, G.: Improving geometric semantic genetic programming with safe tree initialisation. In: Machado, P., et al. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 28–40. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16501-1_3
Chapter Google Scholar
Dick, G., Owen, C.A., Whigham, P.A.: Evolving bagging ensembles using a spatially-structured niching method. In: Proceedings of the Genetic and Evolutionary Computation Conference. GECCO ’18, New York, NY, USA, pp. 418–425. ACM (2018)
Google Scholar
Dick, G., Owen, C.A., Whigham, P.A.: Feature standardisation and coefficient optimisation for effective symbolic regression. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference. GECCO ’20, New York, NY, USA, pp. 306–314. ACM (2020)
Google Scholar
Dietterich, T.G.: Ensemble learning. The Handbook of Brain Theory and Neural Networks 2(1), 110–125 (2002)
Google Scholar
Efron, B., Tibshirani, R.: Improvements on cross-validation: the 632+ bootstrap method. J. Am. Stat. Assoc. 92(438), 548–560 (1997)
MathSciNet Google Scholar
Fort, S., Hu, H., Lakshminarayanan, B.: Deep ensembles: a loss landscape perspective. CoRR abs/1912.02757, 1–10 (2019). http://arxiv.org/abs/1912.02757
Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–67 (1991)
MathSciNet Google Scholar
Fumera, G., Roli, F., Serrau, A.: Dynamics of variance reduction in bagging and other techniques based on randomisation. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 316–325. Springer, Heidelberg (2005). https://doi.org/10.1007/11494683_32
Chapter Google Scholar
Ganaie, M.A., Hu, M., Malik, A., Tanveer, M., Suganthan, P.: Ensemble deep learning: a review. Eng. Appl. Artif. Intell. 115, 105151 (2022)
Article Google Scholar
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Comput. 4(1), 1–58 (1992)
Article Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Article Google Scholar
González, S., García, S., Del Ser, J., Rokach, L., Herrera, F.: A practical tutorial on bagging and boosting based ensembles for machine learning: algorithms, software tools, performance study, practical perspectives and opportunities. Information Fusion 64, 205–237 (2020)
Article Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Article Google Scholar
Hong, H., Liu, J., Zhu, A.X.: Modeling landslide susceptibility using logitboost alternating decision trees and forest by penalizing attributes with the bagging ensemble. Sci. Total Environ. 718, 137231 (2020)
Article Google Scholar
Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation-Volume 2. pp. 1053–1060. Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: with Applications in R. Springer, New York (2014). https://doi.org/10.1007/978-1-0716-1418-1
Book Google Scholar
Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 70–82. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36599-0_7
Chapter Google Scholar
Keijzer, M., Babovic, V.: Genetic Programming, Ensemble Methods and the Bias/Variance Tradeoff – Introductory Investigations. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 76–90. Springer, Heidelberg (2000). https://doi.org/10.1007/978-3-540-46239-2_6
Chapter Google Scholar
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural. Inf. Process. Syst. 30, 6402–6413 (2017)
Google Scholar
Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D., Batra, D.: Why m heads are better than one: training a diverse ensemble of deep networks. CoRR abs/1511.06314, 1–9 (2015). http://arxiv.org/abs/1511.06314
Lee, S., Purushwalkam Shiva Prakash, S., Cogswell, M., Ranjan, V., Crandall, D., Batra, D.: Stochastic multiple choice learning for training diverse deep ensembles. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002). https://CRAN.R-project.org/doc/Rnews/
Liu, C.L., et al.: A bagging approach for improved predictive accuracy of intradialytic hypotension during hemodialysis treatment. Comput. Biol. Med. 172, 108244 (2024)
Article Google Scholar
Misiuk, B., Brown, C.J.: Improved environmental mapping and validation using bagging models with spatially clustered data. Eco. Inform. 77, 102181 (2023)
Article Google Scholar
Ni, J., Drieberg, R.H., Rockett, P.I.: The use of an analytic quotient operator in genetic programming. IEEE Trans. Evol. Comput. 17(1), 146–152 (2013)
Article Google Scholar
Nicolau, M., Agapitos, A.: Choosing function sets with better generalisation performance for symbolic regression models. Genet. Program Evolvable Mach. 22, 73–100 (2021)
Article Google Scholar
Nixon, J., Tran, D., Lakshminarayanan, B.: Why aren’t bootstrapped neural networks better? In: “I Can’t Believe It’s Not Better!” NeurIPS 2020 workshop (2020)
Google Scholar
Owen, C.A.: Error decomposition of evolutionary machine learning (Thesis, Doctor of Philosophy). University of Otago. http://hdl.handle.net/10523/12234 (2021)
Owen, C.A., Dick, G., Whigham, P.A.: Characterising genetic programming error through extended bias and variance decomposition. IEEE Trans. Evol. Comput. 24(6), 1164–1176 (2020)
Article Google Scholar
Owen, C.A., Dick, G., Whigham, P.A.: Standardisation and data augmentation in genetic programming. IEEE Trans. Evol. Comput. 26(6), 1596–1608 (2022)
Article Google Scholar
Owen, C.A., Dick, G., Whigham, P.A.: Towards explainable AutoML using error decomposition. In: Aziz, H., Corrêa, D., French, T. (eds.) AI 2022. LNCS, vol. 13728, pp. 177–190. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22695-3_13
Chapter Google Scholar
Owen, C.A., Dick, G., Whigham, P.A.: Using decomposed error for reproducing implicit understanding of algorithms. Evol. Comput. 32(1), 49–68 (2024)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–45 (2006)
Article Google Scholar
Rumelhart, D.E., McClelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations. MIT Press, Cambridge, MA, USA (1986)
Google Scholar
Soloff, J.A., Barber, R.F., Willett, R.: Bagging provides assumption-free stability. J. Mach. Learn. Res. 25(131), 1–35 (2024)
MathSciNet Google Scholar
Valentini, G., Dietterich, T.G.: Low bias bagged support vector machines. In: Proceedings of the International Conference on Machine Learning, pp. 752–759. ICML’03 (2003)
Google Scholar
Yeh, I.C.: Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 28(12), 1797–1808 (1998)
Article Google Scholar
Zhang, T., Fu, Q., Wang, H., Liu, F., Wang, H., Han, L.: Bagging-based machine learning algorithms for landslide susceptibility modeling. Nat. Hazards 110(2), 823–846 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, University of Otago, Dunedin, New Zealand
Caitlin A. Owen, Grant Dick & Peter A. Whigham

Authors

Caitlin A. Owen
View author publications
You can also search for this author in PubMed Google Scholar
Grant Dick
View author publications
You can also search for this author in PubMed Google Scholar
Peter A. Whigham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Caitlin A. Owen .

Editor information

Editors and Affiliations

The University of Melbourne, Parkville, VIC, Australia
Mingming Gong
The University of Adelaide, Adelaide, SA, Australia
Yiliao Song
The University of Auckland, Auckland, Auckland, New Zealand
Yun Sing Koh
La Trobe University, Bundoora, VIC, Australia
Wei Xiang
CSIRO’s Data61, Clayton, VIC, Australia
Derui Wang

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Owen, C.A., Dick, G., Whigham, P.A. (2025). Revisiting Bagging for Stochastic Algorithms. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_12

Download citation

DOI: https://doi.org/10.1007/978-981-96-0351-0_12
Published: 20 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0350-3
Online ISBN: 978-981-96-0351-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics