Abstract
The presence of Artificial Intelligence and specifically Machine Learning (ML) has increased in all manner of software applications, and it already plays a major role in a variety of systems pertaining to Information Science such as public transport, disease diagnosis support and other medical problems. This increase in use has raised concerns about possible environmental impacts, since ML models require to be trained in datacentres that can impose a high ecological toll. With the aim of uncovering new ways of reducing the energy consumption of ML models, in this study we will explore the energetic impact of class balance for binary classification tasks by comparing a set of logistic regression models (LRMs) trained on a synthetic balanced dataset against another set trained on a synthetic, unbalanced dataset. We focus on the total energy and time required to complete the task, and discover that the order in energy efficiency of the models remained consistent regardless of class balance, but those trained on the unbalanced dataset required between 1.42 and 1.5 times more energy to complete the tasks, despite requiring only around 1 s more of runtime. We finish by analysing the results and proposing using synthetic datasets to estimate the energy cost of different hyperparameter options for LRMs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pimentel, L.D.A., et al.: Solving the train timetabling problem, a mathematical model and a genetic algorithm solution approach. In: 6th International Conference on Railway Operations Modelling and Analysis, RailTokyo2015, March 2015, Tokyo, Japan (2015). https://hal.science/hal-01338609. Accessed 10 Jan 2024
Brownlee, A.E.I., et al.: Exploring the accuracy - energy trade-off in machine learning. In: 2021 IEEE/ACM International Workshop on Genetic Improvement (GI), May 2021, pp. 11–18 (2021). https://doi.org/10.1109/GI52543.2021.00011. https://ieeexplore.ieee.org/document/9474356. Accessed 12 Jan 2024
Cai, E., et al.: NeuralPower: predict and deploy energy-efficient convolutional neural networks, 15 October 2017. arXiv arXiv:1710.05420 [cs,stat]. Accessed 12 Jan 2024
Castaño, J., et al.: Exploring the carbon footprint of Hugging Face’s ML models: a repository mining study. In: 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 26 October 2023, pp. 1–12 (2023). https://doi.org/10.1109/ESEM56168.2023.10304801. arXiv arXiv:2305.11164 [cs , stat]. Accessed 12 Jan 2024
Currie, C.S.M., et al.: How simulation modelling can help reduce the impact of COVID-19. J. Simul. 14(2), 83–97 (2020). ISSN 1747-7778. https://doi.org/10.1080/17477778.2020.1751570. Accessed 10 Jan 2024
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding, 24 May 2019. arXiv arXiv:1810.04805 [cs]. Accessed 08 Dec 2023
Dezen, N.: Microsoft creates new opportunities for partners through AI offerings and expansion of Microsoft Cloud Partner Program. The Official Microsoft Blog, 22 March 2023. https://blogs.microsoft.com/blog/2023/03/22/microsoft-creates-new-opportunities-for-partners-through-ai-offerings-and-expansion-of-microsoftcloud-partner-program/. Accessed 08 Dec 2023
Ferroni, P., et al.: Artificial intelligence for cancer-associated thrombosis risk assessment. Lancet Haematol. 5(9), e391 (2018). ISSN 2352-3026. https://doi.org/10.1016/S2352-3026(18)30111-X. Accessed 10 Jan 2024
García-Martín, E., et al.: Estimation of energy consumption in machine learning. J. Parallel Distrib. Comput. 134, 75–88 (2019). ISSN 0743-7315. https://doi.org/10.1016/j.jpdc.2019.07.007. Accessed 12 Jan 2024
Google: Sustainable Innovation & Technology - Google Sustainability. Sustainability (2023). https://sustainability.google/reports/google-2023-environmental-report/. Accessed 08 Dec 2023
Gutierrez, M., et al.: Dataset: the effects of class balance on the training energy consumption of logistic regression models, March 2024. https://doi.org/10.5281/zenodo.10823624
Gutiérrez, M., Moraga, M.A., García, F.: Analysing the energy impact of different optimisations for machine learning models. In: 2022 International Conference on ICT for Sustainability (ICT4S), June 2022, pp. 46–52 (2022). https://doi.org/10.1109/ICT4S55073.2022.00016.
Henderson, P., et al.: Towards the systematic reporting of the energy and carbon footprints of machine learning, 29 November 2022. arXiv arXiv:2002.05651 [cs]. Accessed 12 Jan 2024
Kucharski, A.J., et al.: Early dynamics of transmission and control of COVID-19: a mathematical modelling study. Lancet Infect. Dis. 20(5), pp. 553–558 (2020). ISSN 1473-3099. https://doi.org/10.1016/S1473-3099(20)30144-4. https://www.sciencedirect.com/science/article/pii/S1473309920301444. Accessed 10 Jan 2024
Lacoste, A., et al.: Quantifying the carbon emissions of machine learning, 4 November 2019. https://doi.org/10.48550/arXiv.1910.09700. arXiv arXiv:1910.09700 [cs]. Accessed 12 Jan 2024
Li, P., et al.: Making AI less: uncovering and addressing the secret water footprint of AI models, 29 October 2023. arXiv arXiv:2304.03271 [cs]. Accessed 08 Dec 2023
Luccioni, A.S., Viguier, S., Ligozat, A.-L.: Estimating the carbon footprint of BLOOM, a 176B parameter language model, 3 November 2022. arXiv arXiv:2211.02001 [cs]. Accessed 08 Dec 2023
Mancebo, J., et al.: EET: a device to support the measurement of software consumption. In: Proceedings of the 6th International Workshop on Green and Sustainable Software, GREENS 2018, 27 May 2018, pp. 16–22. Association for Computing Machinery, New York (2018). ISBN 978-1-4503-5732-6. https://doi.org/10.1145/3194078.3194081. Accessed 19 Jan 2022
Mancebo, J., et al.: FEETINGS: framework for energy efficiency testing to improve environmental goal of the software. Sustain. Comput. Inf. Syst. 30, 100558 (2021). ISSN 2210-5379. https://doi.org/10.1016/j.suscom.2021.100558. https://www.sciencedirect.com/science/article/pii/S2210537921000494. Accessed 04 Feb 2022
Mehdi, Y.: Announcing Microsoft Copilot, your everyday AI companion. The Official Microsoft Blog, 21 September 2023. https://blogs.microsoft.com/blog/2023/09/21/announcing-microsoft-copilotyour-everyday-ai-companion/. Accessed 08 Dec 2023
Microsoft: 2022 Environmental Sustainability Report. In: Global Sustainability (2022)
Joppa, L., Smith, B.: An update on Microsoft’s sustainability commitments: building a foundation for 2030. The Official Microsoft Blog, 10 March 2022. https://blogs.microsoft.com/blog/2022/03/10/anupdate-on-microsofts-sustainability-commitments-building-afoundation-for-2030/. Accessed 08 Dec 2023
Srinivasa Rao, A.S.R., Vazquez, J.A.: Identification of COVID-19 can be quicker through artificial intelligence framework using a mobile phone-based survey when cities and towns are under quarantine. Infect. Control Hosp. Epidemiol. 41(7), 826-830 (2020). ISSN 0899-823X, 1559-6834. https://doi.org/10.1017/ice.2020.61. Accessed 10 Jan 2024
Rodrigues, C.F., Riley, G., Luján, M.: SyNERGY: an energy measurement and prediction framework for Convolutional Neural Networks on Jetson TX1 (2018)
Rösler, D., et al.: Discerning primary and secondary delays in railway networks using explainable AI. Transp. Res. Procedia 52, 171–178 (2021). 23rd EURO Working Group on Transportation Meeting, EWGT 2020, 16–18 September 2020, Paphos, Cyprus (Jan. 1, 2021). ISSN 2352-1465. https://doi.org/10.1016/j.trpro.2021.01.018. https://www.sciencedirect.com/science/article/pii/S2352146521000405. Accessed 10 Jan 2024
Shoieb, D., Youssef, S., Ahmed, W.: Computer-aided model for skin diagnosis using deep learning. J. Image Graph. 4, 116–121 (2016). https://doi.org/10.18178/joig.4.2.122-129
Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018). International Conference on Computational Intelligence and Data Science. ISSN 1877-0509. https://doi.org/10.1016/j.procs.2018.05.122. https://www.sciencedirect.com/science/article/pii/S1877050918308548. Accessed 10 Jan 2024
Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019, July 2019, Florence, Italy, pp. 3645–3650. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/P19-1355. https://aclanthology.org/P19-1355. Accessed 04 Feb 2022
Verdecchia, R., et al.: Data-centric green AI an exploratory empirical study. In: 2022 International Conference on ICT for Sustainability (ICT4S), Plovdiv, Bulgaria, June 2022, pp. 35–45. IEEE (2022). ISBN 978-1-66548-286-8. https://doi.org/10.1109/ICT4S55073.2022.00015. https://ieeexplore.ieee.org/document/9830097/. Accessed 12 Jan 2024
WEKA’s RandomRBF. https://weka.sourceforge.io/doc.dev/weka/datagene-rators/classifiers/classification/RandomRBF.html. Accessed 09 Jan 2024
Acknowledgments
This work was supported by the following projects: OASSIS (PID2021-122554OB C31/ AEI/10.13039/ 501100011033/FEDER, UE); EMMA (Project SBPLY/ 21 /180501/ 000115, funded by CECD (JCCM) and FEDER funds); SEEAT (PDC2022-133249-C31 funded by MCIN /AEI/ 10.13039/501100011033 and European Union NextGenerationEU/PRTR); PLAGEMIS (TED2021-129245B-C22 funded by MCIN /AEI/ 10.13039/501100011033 and European Union NextGenerationEU /PRTR); UNION (2022-GRIN-34110).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gutiérrez, M., Calero, C., García, F., Moraga, M.Á. (2024). The Effects of Class Balance on the Training Energy Consumption of Logistic Regression Models. In: Araújo, J., de la Vara, J.L., Santos, M.Y., Assar, S. (eds) Research Challenges in Information Science. RCIS 2024. Lecture Notes in Business Information Processing, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-031-59465-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-59465-6_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-59464-9
Online ISBN: 978-3-031-59465-6
eBook Packages: Computer ScienceComputer Science (R0)