Distributed learning has led to the development of federated learning and cluster computing; however, the two methods are very different. Therefore, this study uses a deep learning approach to investigate the distinction between federated learning and cluster computing. Specifically, the LeNet convolutional neural network model is used. Three frameworks were tested, including Spark on Hadoop with four nodes, PySyft with four nodes, and native PyTorch with a single node. The results show that Spark on Hadoop can accelerate performance and facilitate applications that have large memory requirements. In addition, PySyft can protect data privacy but is slower than Spark on Hadoop and native PyTorch. The three frameworks performed comparable accuracy for IID distributions, while PySyft had the worst for non-IID data. Therefore, if excluding sensitive data does not significantly affect training results, the results suggest that cluster computing, Spark on Hadoop, is recommended. However, federated learning, PySyft, is recommended in cases where sensitive data is required for training or positively affects training results, and time constraints are not an issue.

The data used in this paper are publicly accessible after agreement, and details are available in the references section.
This work was supported by the National Science and Technology Council, Taiwan, R.O.C. [grant number NSTC 111-2221-E-025-008].
The authors declared that they have no conflict of interest.
Chang, JW., Hung, J.C. & Chu, TH. Exploring the distributed learning on federated learning and cluster computing via convolutional neural networks. Neural Comput & Applic 36, 2141–2153 (2024). https://doi.org/10.1007/s00521-023-09160-1
