Abstract
Distributed learning has led to the development of federated learning and cluster computing; however, the two methods are very different. Therefore, this study uses a deep learning approach to investigate the distinction between federated learning and cluster computing. Specifically, the LeNet convolutional neural network model is used. Three frameworks were tested, including Spark on Hadoop with four nodes, PySyft with four nodes, and native PyTorch with a single node. The results show that Spark on Hadoop can accelerate performance and facilitate applications that have large memory requirements. In addition, PySyft can protect data privacy but is slower than Spark on Hadoop and native PyTorch. The three frameworks performed comparable accuracy for IID distributions, while PySyft had the worst for non-IID data. Therefore, if excluding sensitive data does not significantly affect training results, the results suggest that cluster computing, Spark on Hadoop, is recommended. However, federated learning, PySyft, is recommended in cases where sensitive data is required for training or positively affects training results, and time constraints are not an issue.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data used in this paper are publicly accessible after agreement, and details are available in the references section.
References
Gupta O, Raskar R (2018) Distributed learning of deep neural network over multiple agents. J Netw Comput Appl 116:1–8
Liu B, Ding Z (2022) A distributed deep reinforcement learning method for traffic light control. Neurocomputing 490:390–399
Duan Y, Wang N, Wu J (2021) Minimizing training time of distributed machine learning by reducing data communication. IEEE Trans Netw Sci Eng 8(2):1802–1814
Gosselin R, Vieu L, Loukil F, Benoit A (2022) Privacy and security in federated learning: a survey. Appl Sci 12(19):9901
Antunes RS, André da Costa C, Küderle A, Yari IA, Eskofier B (2022) Federated learning for healthcare: systematic review and architecture proposal. ACM Trans Intell Syst Technol (TIST) 13(4):1–23
Imteaj A, Amini MH (2022) Leveraging asynchronous federated learning to predict customers financial distress. Intell Syst Appl 14:200064
Xiong K (2009) Multiple priority customer service guarantees in cluster computing. In: 2009 IEEE International symposium on parallel and distributed processing (pp. 1–12). IEEE
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, Ghodsi A, Gonzalez J, Shenker S, Stoica I (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65
Ohno Y, Morishima S, Matsutani H (2016) Accelerating spark RDD operations with local and remote GPU devices. In: 2016 IEEE 22nd international conference on parallel and distributed systems (ICPADS) (pp. 791–799). IEEE
Mothukuri V, Parizi RM, Pouriyeh S, Huang Y, Dehghantanha A, Srivastava G (2021) A survey on security and privacy of federated learning. Futur Gener Comput Syst 115:619–640
Sattler F, Wiedemann S, Müller KR, Samek W (2019) Robust and communication-efficient federated learning from non-iid data. IEEE Trans Neural Netw Learn Syst 31(9):3400–3413
Xu G, Shen C, Liu M, Zhang F, Shen W (2017) A user behavior prediction model based on parallel neural network and k-nearest neighbor algorithms. Clust Comput 20:1703–1715
Ziller A, Trask A, Lopardo A, Szymkow B, Wagner B, Bluemke E, Nounahon J-M, Passerat-Palmbach J, Prakash K, Rose N, Ryffel T, Reza ZN, Kaissis G (2021) Pysyft: a library for easy federated learning. Federated learning systems. Towards next-generation AI. Springer, Cham, pp 111–139
LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Jia Z, Zaharia M, Aiken A (2019) Beyond data and model parallelism for deep neural networks. Proc Mach Learn Syst 1:1–13
Zhang H, Li Y, Deng Z, Liang X, Carin L, Xing E (2020) Autosync: learning to synchronize for data-parallel distributed deep learning. Adv Neural Inf Process Syst 33:906–917
Lin X, Wang P, Wu B (2013) Log analysis in cloud computing environment with Hadoop and Spark. In: 2013 5th IEEE International conference on broadband network & multimedia technology (pp. 273–276). IEEE
Salloum S, Dautov R, Chen X, Peng PX, Huang JZ (2016) Big data analytics on apache spark. Int J Data Sci Analyt 1:145–164
Lal DK, Suman U (2019) Towards comparison of real time stream processing engines. In: 2019 IEEE conference on information and communication technology (pp. 1–5). IEEE
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST) (pp. 1–10). IEEE
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Presented as part of the 9th {USENIX} symposium on networked systems design and implementation ({NSDI} 12) (pp. 15–28)
Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning spark: lightning-fast big data analysis. O’Reilly Media Inc, Sebastopol
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. HotCloud 10(10–10):95
Rathore MM, Son H, Ahmad A, Paul A, Jeon G (2018) Real-time big data stream processing using GPU with spark over hadoop ecosystem. Int J Parallel Prog 46:630–646
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–19
Lindell Y (2020) Secure multiparty computation. Commun ACM 64(1):86–96
Yi X, Paulet R, Bertino E, Yi X, Paulet R, Bertino E (2014) Homomorphic encryption. Springer International Publishing, Cham, pp 27–46
Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security (pp. 308–318)
Google (2019) Tensorflow federated. [Online]. Available: https://www.tensorflow.org/federated
W. B. F. Tech Fate. (2019) [Online]. Available: https://github.com/WeBankFinTech/FATE
Ramaswamy S, Mathews R, Rao K, Beaufays F (2019) Federated learning for emoji prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329
Webank (2019) FedAI ecosystem. https://cn.fedai.org/cases/.2019, Accessed 2019
Zhu X, Wang J, Hong Z, Xia T, Xiao J (2019) Federated learning of unsegmented chinese text recognition model. In: 2019 IEEE 31st International conference on tools with artificial intelligence (ICTAI) (pp. 1341–1345). IEEE
Konečný J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492
McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. Artif Intell Stat 54:1273–1282
Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Proc Mach Learn Syst 2:429–450
Mahmud MS, Huang JZ, Salloum S, Emara TZ, Sadatdiynov K (2020) A survey of data partitioning and sampling methods to support big data analysis. Big Data Min Analyt 3(2):85–101
Guo X, Pimentel AD, Stefanov T (2023) automated exploration and implementation of distributed CNN inference at the edge. IEEE Internet Things J 10(7):5843–5858
Azab M, Samir M, Samir E (2022) “MystifY”: a proactive moving-target defense for a resilient SDN controller in software defined CPS. Comput Commun 189:205–220
Acknowledgements
This work was supported by the National Science and Technology Council, Taiwan, R.O.C. [grant number NSTC 111-2221-E-025-008].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chang, JW., Hung, J.C. & Chu, TH. Exploring the distributed learning on federated learning and cluster computing via convolutional neural networks. Neural Comput & Applic 36, 2141–2153 (2024). https://doi.org/10.1007/s00521-023-09160-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09160-1