The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis

Kelsy Cabello-Solorzano¹⁸,
Isabela Ortigosa de Araujo¹⁹,
Marco Peña¹⁸,
Luís Correia²⁰ &
…
Antonio J. Tallón-Ballesteros²¹

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 750))

Included in the following conference series:

International Conference on Soft Computing Models in Industrial and Environmental Applications

1176 Accesses
7 Citations

Abstract

In Machine Learning (ML) algorithms, data normalization plays a fundamental role. This research focuses on analyzing and comparing the impact of various normalization techniques. Three normalization techniques, namely Min-Max, Z-Score, and Unit Normalization, were applied as a preliminary step before using various ML algorithms. In the case of Min-Max we used two variants, one normalizing feature values in the interval [0, 1] and the other normalizing them in the interval $[-1,1]$. The objective of this study is to determine, in a precise and informed manner, the most appropriate normalization technique for each algorithm, aiming to enhance accuracy in problem-solving. Through this comparative analysis, we aim to provide reliable recommendations for improving the performance of ML algorithms through proper data normalization. The results reveal that a few algorithms are virtually unaffected by whether normalization is used or not, regardless of the applied normalization technique. These findings contribute to the understanding of the relationship between data normalization and algorithm performance, allowing practitioners to make informed decisions regarding normalization techniques when using ML algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Machine Learning Based on the Principle of Minimizing Robust Mean Estimates

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Article Open access 24 January 2024

Machine Learning: Towards an Unified Classification Criteria

References

Bentéjac, C., Csörgő, A., Martínez-Muñoz, G.: A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021)
Article Google Scholar
Bhanja, S., Das, A.: Impact of data normalization on deep neural network for time series forecasting (2018). arXiv preprint arXiv:1812.05519
Foody, G.M.: Status of land cover classification accuracy assessment. Remote Sens. Environ. 80(1), 185–201 (2002)
Article Google Scholar
Guyon, I.: Madelon data set. In: UCI (2003)
Google Scholar
Hoo, Z.H., Candlish, J., Teare, D.: What is an roc curve? (2017)
Google Scholar
Kleinbaum, D.G., Dietz, K., Gail, M., Klein, M., Klein, M.: Logistic Regression. Springer, Heidelberg (2002)
Google Scholar
Murphy, K.P.: Machine learning: a probabilistic perspective. Massachusetts Institute of Technology (2012)
Google Scholar
Nayak, S.C., Misra, B.B., Behera, H.S.: Impact of data normalization on stock index forecasting. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 6(2014), 257–269 (2014)
Google Scholar
Matthias Schonlau and Rosie Yuyan Zou: The random forest algorithm for statistical learning. Stata J. 20(1), 3–29 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

International University of Andalusia, Huelva, Spain
Kelsy Cabello-Solorzano & Marco Peña
University of Huelva, Huelva, Spain
Isabela Ortigosa de Araujo
LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
Luís Correia
Department of Electronic, Computer Systems and Automation Engineering, University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros

Authors

Kelsy Cabello-Solorzano
View author publications
You can also search for this author in PubMed Google Scholar
Isabela Ortigosa de Araujo
View author publications
You can also search for this author in PubMed Google Scholar
Marco Peña
View author publications
You can also search for this author in PubMed Google Scholar
Luís Correia
View author publications
You can also search for this author in PubMed Google Scholar
Antonio J. Tallón-Ballesteros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio J. Tallón-Ballesteros .

Editor information

Editors and Affiliations

Faculty of Engineering, University of Deusto, Bilbao, Spain
Pablo García Bringas
School of Industrial, Computer, University of Leon, León, Spain
Hilde Pérez García
Department of Mechanical Engineering, University of La Rioja, Logroño, Spain
Francisco Javier Martínez de Pisón
Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain
Francisco Martínez Álvarez
Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain
Alicia Troncoso Lora
Applied Computational Intelligence, University of Burgos, Burgos, Spain
Álvaro Herrero
Department of Industrial Engineering, University of A Coruña, A Coruña, Spain
José Luis Calvo Rolle
Department of Industrial Engineering, University of A Coruña, A Coruña, Spain
Héctor Quintián
Faculty of Science, University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cabello-Solorzano, K., Ortigosa de Araujo, I., Peña, M., Correia, L., J. Tallón-Ballesteros, A. (2023). The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis. In: García Bringas, P., et al. 18th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2023). SOCO 2023. Lecture Notes in Networks and Systems, vol 750. Springer, Cham. https://doi.org/10.1007/978-3-031-42536-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-42536-3_33
Published: 31 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42535-6
Online ISBN: 978-3-031-42536-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Machine Learning Based on the Principle of Minimizing Robust Mean Estimates

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Machine Learning: Towards an Unified Classification Criteria

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Machine Learning Based on the Principle of Minimizing Robust Mean Estimates

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Machine Learning: Towards an Unified Classification Criteria

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation