Abstract
In Machine Learning (ML) algorithms, data normalization plays a fundamental role. This research focuses on analyzing and comparing the impact of various normalization techniques. Three normalization techniques, namely Min-Max, Z-Score, and Unit Normalization, were applied as a preliminary step before using various ML algorithms. In the case of Min-Max we used two variants, one normalizing feature values in the interval [0, 1] and the other normalizing them in the interval \([-1,1]\). The objective of this study is to determine, in a precise and informed manner, the most appropriate normalization technique for each algorithm, aiming to enhance accuracy in problem-solving. Through this comparative analysis, we aim to provide reliable recommendations for improving the performance of ML algorithms through proper data normalization. The results reveal that a few algorithms are virtually unaffected by whether normalization is used or not, regardless of the applied normalization technique. These findings contribute to the understanding of the relationship between data normalization and algorithm performance, allowing practitioners to make informed decisions regarding normalization techniques when using ML algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bentéjac, C., Csörgő, A., Martínez-Muñoz, G.: A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021)
Bhanja, S., Das, A.: Impact of data normalization on deep neural network for time series forecasting (2018). arXiv preprint arXiv:1812.05519
Foody, G.M.: Status of land cover classification accuracy assessment. Remote Sens. Environ. 80(1), 185–201 (2002)
Guyon, I.: Madelon data set. In: UCI (2003)
Hoo, Z.H., Candlish, J., Teare, D.: What is an roc curve? (2017)
Kleinbaum, D.G., Dietz, K., Gail, M., Klein, M., Klein, M.: Logistic Regression. Springer, Heidelberg (2002)
Murphy, K.P.: Machine learning: a probabilistic perspective. Massachusetts Institute of Technology (2012)
Nayak, S.C., Misra, B.B., Behera, H.S.: Impact of data normalization on stock index forecasting. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 6(2014), 257–269 (2014)
Matthias Schonlau and Rosie Yuyan Zou: The random forest algorithm for statistical learning. Stata J. 20(1), 3–29 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cabello-Solorzano, K., Ortigosa de Araujo, I., Peña, M., Correia, L., J. Tallón-Ballesteros, A. (2023). The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis. In: García Bringas, P., et al. 18th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2023). SOCO 2023. Lecture Notes in Networks and Systems, vol 750. Springer, Cham. https://doi.org/10.1007/978-3-031-42536-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-42536-3_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42535-6
Online ISBN: 978-3-031-42536-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)