[go: up one dir, main page]

Skip to main content

Advertisement

Log in

Exploring the distributed learning on federated learning and cluster computing via convolutional neural networks

  • S.I.: Machine Learning and Big Data Analytics for IoT Security and Privacy (SPIoT 2022)
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Distributed learning has led to the development of federated learning and cluster computing; however, the two methods are very different. Therefore, this study uses a deep learning approach to investigate the distinction between federated learning and cluster computing. Specifically, the LeNet convolutional neural network model is used. Three frameworks were tested, including Spark on Hadoop with four nodes, PySyft with four nodes, and native PyTorch with a single node. The results show that Spark on Hadoop can accelerate performance and facilitate applications that have large memory requirements. In addition, PySyft can protect data privacy but is slower than Spark on Hadoop and native PyTorch. The three frameworks performed comparable accuracy for IID distributions, while PySyft had the worst for non-IID data. Therefore, if excluding sensitive data does not significantly affect training results, the results suggest that cluster computing, Spark on Hadoop, is recommended. However, federated learning, PySyft, is recommended in cases where sensitive data is required for training or positively affects training results, and time constraints are not an issue.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data used in this paper are publicly accessible after agreement, and details are available in the references section.

References

  1. Gupta O, Raskar R (2018) Distributed learning of deep neural network over multiple agents. J Netw Comput Appl 116:1–8

    Article  Google Scholar 

  2. Liu B, Ding Z (2022) A distributed deep reinforcement learning method for traffic light control. Neurocomputing 490:390–399

    Article  Google Scholar 

  3. Duan Y, Wang N, Wu J (2021) Minimizing training time of distributed machine learning by reducing data communication. IEEE Trans Netw Sci Eng 8(2):1802–1814

    Article  MathSciNet  Google Scholar 

  4. Gosselin R, Vieu L, Loukil F, Benoit A (2022) Privacy and security in federated learning: a survey. Appl Sci 12(19):9901

    Article  Google Scholar 

  5. Antunes RS, André da Costa C, Küderle A, Yari IA, Eskofier B (2022) Federated learning for healthcare: systematic review and architecture proposal. ACM Trans Intell Syst Technol (TIST) 13(4):1–23

    Article  Google Scholar 

  6. Imteaj A, Amini MH (2022) Leveraging asynchronous federated learning to predict customers financial distress. Intell Syst Appl 14:200064

    Google Scholar 

  7. Xiong K (2009) Multiple priority customer service guarantees in cluster computing. In: 2009 IEEE International symposium on parallel and distributed processing (pp. 1–12). IEEE

  8. Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, Ghodsi A, Gonzalez J, Shenker S, Stoica I (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65

    Article  Google Scholar 

  9. Ohno Y, Morishima S, Matsutani H (2016) Accelerating spark RDD operations with local and remote GPU devices. In: 2016 IEEE 22nd international conference on parallel and distributed systems (ICPADS) (pp. 791–799). IEEE

  10. Mothukuri V, Parizi RM, Pouriyeh S, Huang Y, Dehghantanha A, Srivastava G (2021) A survey on security and privacy of federated learning. Futur Gener Comput Syst 115:619–640

    Article  Google Scholar 

  11. Sattler F, Wiedemann S, Müller KR, Samek W (2019) Robust and communication-efficient federated learning from non-iid data. IEEE Trans Neural Netw Learn Syst 31(9):3400–3413

    Article  Google Scholar 

  12. Xu G, Shen C, Liu M, Zhang F, Shen W (2017) A user behavior prediction model based on parallel neural network and k-nearest neighbor algorithms. Clust Comput 20:1703–1715

    Article  Google Scholar 

  13. Ziller A, Trask A, Lopardo A, Szymkow B, Wagner B, Bluemke E, Nounahon J-M, Passerat-Palmbach J, Prakash K, Rose N, Ryffel T, Reza ZN, Kaissis G (2021) Pysyft: a library for easy federated learning. Federated learning systems. Towards next-generation AI. Springer, Cham, pp 111–139

    Chapter  Google Scholar 

  14. LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/

  15. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images

  16. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  17. Jia Z, Zaharia M, Aiken A (2019) Beyond data and model parallelism for deep neural networks. Proc Mach Learn Syst 1:1–13

    Google Scholar 

  18. Zhang H, Li Y, Deng Z, Liang X, Carin L, Xing E (2020) Autosync: learning to synchronize for data-parallel distributed deep learning. Adv Neural Inf Process Syst 33:906–917

    Google Scholar 

  19. Lin X, Wang P, Wu B (2013) Log analysis in cloud computing environment with Hadoop and Spark. In: 2013 5th IEEE International conference on broadband network & multimedia technology (pp. 273–276). IEEE

  20. Salloum S, Dautov R, Chen X, Peng PX, Huang JZ (2016) Big data analytics on apache spark. Int J Data Sci Analyt 1:145–164

    Article  Google Scholar 

  21. Lal DK, Suman U (2019) Towards comparison of real time stream processing engines. In: 2019 IEEE conference on information and communication technology (pp. 1–5). IEEE

  22. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST) (pp. 1–10). IEEE

  23. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Presented as part of the 9th {USENIX} symposium on networked systems design and implementation ({NSDI} 12) (pp. 15–28)

  24. Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning spark: lightning-fast big data analysis. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  25. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. HotCloud 10(10–10):95

    Google Scholar 

  26. Rathore MM, Son H, Ahmad A, Paul A, Jeon G (2018) Real-time big data stream processing using GPU with spark over hadoop ecosystem. Int J Parallel Prog 46:630–646

    Article  Google Scholar 

  27. Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–19

    Article  Google Scholar 

  28. Lindell Y (2020) Secure multiparty computation. Commun ACM 64(1):86–96

    Article  Google Scholar 

  29. Yi X, Paulet R, Bertino E, Yi X, Paulet R, Bertino E (2014) Homomorphic encryption. Springer International Publishing, Cham, pp 27–46

    Book  Google Scholar 

  30. Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security (pp. 308–318)

  31. Google (2019) Tensorflow federated. [Online]. Available: https://www.tensorflow.org/federated

  32. W. B. F. Tech Fate. (2019) [Online]. Available: https://github.com/WeBankFinTech/FATE

  33. Ramaswamy S, Mathews R, Rao K, Beaufays F (2019) Federated learning for emoji prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329

  34. Webank (2019) FedAI ecosystem. https://cn.fedai.org/cases/.2019, Accessed 2019

  35. Zhu X, Wang J, Hong Z, Xia T, Xiao J (2019) Federated learning of unsegmented chinese text recognition model. In: 2019 IEEE 31st International conference on tools with artificial intelligence (ICTAI) (pp. 1341–1345). IEEE

  36. Konečný J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492

  37. McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. Artif Intell Stat 54:1273–1282

    Google Scholar 

  38. Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Proc Mach Learn Syst 2:429–450

    Google Scholar 

  39. Mahmud MS, Huang JZ, Salloum S, Emara TZ, Sadatdiynov K (2020) A survey of data partitioning and sampling methods to support big data analysis. Big Data Min Analyt 3(2):85–101

    Article  Google Scholar 

  40. Guo X, Pimentel AD, Stefanov T (2023) automated exploration and implementation of distributed CNN inference at the edge. IEEE Internet Things J 10(7):5843–5858

    Article  Google Scholar 

  41. Azab M, Samir M, Samir E (2022) “MystifY”: a proactive moving-target defense for a resilient SDN controller in software defined CPS. Comput Commun 189:205–220

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Science and Technology Council, Taiwan, R.O.C. [grant number NSTC 111-2221-E-025-008].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason C. Hung.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, JW., Hung, J.C. & Chu, TH. Exploring the distributed learning on federated learning and cluster computing via convolutional neural networks. Neural Comput & Applic 36, 2141–2153 (2024). https://doi.org/10.1007/s00521-023-09160-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09160-1

Keywords