FedADT: An Adaptive Method Based on Derivative Term for Federated Learning
<p>Federated learning system.</p> "> Figure 2
<p>Training loss of FedADT and baseline algorithms under different data distributions. (<b>a</b>) MNIST with IID data. (<b>b</b>) Fashion MNIST with IID data. (<b>c</b>) MNIST with Non-IID data. (<b>d</b>) Fashion MNIST with Non-IID data.</p> "> Figure 3
<p>Test accuracy of FedADT and baseline algorithms under different data distributions. (<b>a</b>) MNIST with IID data. (<b>b</b>) Fashion MNIST with IID data. (<b>c</b>) MNIST with Non-IID data. (<b>d</b>) Fashion MNIST with Non-IID data.</p> "> Figure 4
<p>The ROC curve of classification with Non-IID data on Fashion MNIST dataset.</p> ">
Abstract
:1. Introduction
- We incorporate the adaptive learning rate and derivative term in the update of the local model at the client side and propose a new federated optimization approach called FedADT.
- We rigorously prove that the proposed algorithm can achieve convergence rate for non-convex smooth objective functions, where n is the number of clients and T is the number of iterations.
- We conduct experiments for the image classification task on two datasets. The experiment results verify the effectiveness of the proposed algorithms.
2. Related Work
3. Problem Formulation
4. Algorithm Design
Algorithm 1 Federated Adaptive Learning Based on Derivative Term (FedADT) |
Input: Initial point , , , the number of iterations T
|
5. Assumptions and Main Results
6. Experiments
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Proof of Theorem
Appendix A.1. Main Proof of Theorem
Appendix A.2. Proof of Lemmas
References
- Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated Learning: Strategies for Improving Communication Efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; Volume 54, pp. 1273–1282. [Google Scholar]
- Yang, Y.; Hong, Y.; Park, J. Efficient Gradient Updating Strategies with Adaptive Power Allocation for Federated Learning over Wireless Backhaul. Sensors 2021, 21, 6791. [Google Scholar] [CrossRef] [PubMed]
- Yang, W.; Zhang, Y.; Ye, K.; Li, L.; Xu, C. FFD: A Federated Learning Based Method for Credit Card Fraud Detection. In Proceedings of the Big Data—BigData 2019—8th International Congress, San Diego, CA, USA, 25–30 June 2019; Lecture Notes in Computer Science. Volume 11514, pp. 18–32. [Google Scholar]
- Xu, J.; Glicksberg, B.S.; Su, C.; Walker, P.B.; Bian, J.; Wang, F. Federated Learning for Healthcare Informatics. J. Healthc. Inform. Res. 2021, 5, 1–19. [Google Scholar] [CrossRef] [PubMed]
- Stich, S.U. Local SGD Converges Fast and Communicates Little. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Wang, J.; Joshi, G. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms. arXiv 2018, arXiv:1808.07576. [Google Scholar]
- Khaled, A.; Mishchenko, K.; Richtárik, P. Tighter Theory for Local SGD on Identical and Heterogeneous Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Palermo, Italy, 26–28 August 2020; Volume 108, pp. 4519–4529. [Google Scholar]
- Yu, H.; Jin, R.; Yang, S. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; Volume 97, pp. 7184–7193. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. In Proceedings of the Proceedings of Machine Learning and Systems, Austin, TX, USA, 2–4 March 2020. [Google Scholar]
- Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.J.; Stich, S.U.; Suresh, A.T. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; Volume 119, pp. 5132–5143. [Google Scholar]
- Liu, W.; Chen, L.; Chen, Y.; Zhang, W. Accelerating Federated Learning via Momentum Gradient Descent. IEEE Trans. Parallel Distrib. Syst. 2020, 31, 1754–1766. [Google Scholar] [CrossRef] [Green Version]
- Ozfatura, E.; Ozfatura, K.; Gündüz, D. FedADC: Accelerated Federated Learning with Drift Control. In Proceedings of the IEEE International Symposium on Information Theory, Virtual Event, 12–20 July 2021; pp. 467–472. [Google Scholar]
- Reddi, S.J.; Charles, Z.; Zaheer, M.; Garrett, Z.; Rush, K.; Konečný, J.; Kumar, S.; McMahan, H.B. Adaptive Federated Optimization. In Proceedings of the 9th International Conference on Learning Representations, Virtual Event, 3–7 May 2021. [Google Scholar]
- Khanduri, P.; Sharma, P.; Yang, H.; Hong, M.; Liu, J.; Rajawat, K.; Varshney, P.K. STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning. In Proceedings of the Advances in Neural Information Processing Systems, online, 6–14 December 2021; pp. 6050–6061. [Google Scholar]
- Xu, A.; Huang, H. Double Momentum SGD for Federated Learning. arXiv 2021, arXiv:2102.03970. [Google Scholar]
- Chen, X.; Li, X.; Li, P. Toward Communication Efficient Adaptive Gradient Method. In Proceedings of the FODS’20: ACM-IMS Foundations of Data Science Conference, Virtual Event, 19–20 October 2020; pp. 119–128. [Google Scholar]
- Wang, Y.; Lin, L.; Chen, J. Communication-Efficient Adaptive Federated Learning. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; Volume 162, pp. 22802–22838. [Google Scholar]
- Ogata, K. Modern Control Engineering; Prentice Hall: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
- Ma, R.; Zhang, B.; Zhou, Y.; Li, Z.; Lei, F. PID Controller-Guided Attention Neural Network Learning for Fast and Effective Real Photographs Denoising. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 3010–3023. [Google Scholar] [CrossRef] [PubMed]
- An, W.; Wang, H.; Sun, Q.; Xu, J.; Dai, Q.; Zhang, L. A PID Controller Approach for Stochastic Optimization of Deep Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8522–8531. [Google Scholar]
- Shi, L.; Zhang, Y.; Wang, W.; Cheng, J.; Lu, H. Rethinking The Pid Optimizer For Stochastic Optimization Of Deep Networks. In Proceedings of the IEEE International Conference on Multimedia and Expo, London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
- Recht, B. A tour of reinforcement learning: The view from continuous control. Annu. Rev. Control Robot. Auton. Syst. 2019, 2, 253–279. [Google Scholar] [CrossRef] [Green Version]
- Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
- Nesterov, Y. A method of solving a convex programming problem with convergence rate O (1/k2). Sov. Math. Dokl. 1983, 27, 372–376. [Google Scholar]
- Goodfellow, I.J.; Bengio, Y.; Courville, A.C. Deep Learning; MIT Press: Cambridge, UK, 2016. [Google Scholar]
- Sutskever, I.; Martens, J.; Dahl, G.E.; Hinton, G.E. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1139–1147. [Google Scholar]
- Liu, D.; Lu, Z. The Convergence Rate of SGD’s Final Iterate: Analysis on Dimension Dependence. arXiv 2021, arXiv:2106.14588. [Google Scholar]
- Amir, I.; Koren, T.; Livni, R. SGD Generalizes Better Than GD (And Regularization Doesn’t Help). In Proceedings of the Conference on Learning Theory, Boulder, CO, USA, 15–19 August 2021; Volume 134, pp. 63–92. [Google Scholar]
- Duchi, J.C.; Hazan, E.; Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Reddi, S.J.; Kale, S.; Kumar, S. On the Convergence of Adam and Beyond. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Zaheer, M.; Reddi, S.J.; Sachan, D.S.; Kale, S.; Kumar, S. Adaptive Methods for Nonconvex Optimization. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 9815–9825. [Google Scholar]
- Zhuang, J.; Tang, T.; Ding, Y.; Tatikonda, S.C.; Dvornek, N.C.; Papademetris, X.; Duncan, J.S. AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–12 December 2020. [Google Scholar]
- Prazeres, M.O.; Oberman, A.M. Stochastic Gradient Descent with Polyak’s Learning Rate. J. Sci. Comput. 2021, 89, 25. [Google Scholar] [CrossRef]
- Loizou, N.; Vaswani, S.; Laradji, I.H.; Lacoste-Julien, S. Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Virtual Conference, 13–15 April 2021; Volume 130, pp. 1306–1314. [Google Scholar]
- Berrada, L.; Zisserman, A.; Kumar, M.P. Training Neural Networks for and by Interpolation. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; Volume 119, pp. 799–809. [Google Scholar]
- Glasgow, M.R.; Yuan, H.; Ma, T. Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, Virtual Conference, 25–27 April 2022; Volume 151, pp. 9050–9090. [Google Scholar]
- Gorbunov, E.; Hanzely, F.; Richtárik, P. Local SGD: Unified Theory and New Efficient Methods. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Virtual Conference, 13–15 April 2021; Volume 130, pp. 3556–3564. [Google Scholar]
- Guo, Y.; Sun, Y.; Hu, R.; Gong, Y. Hybrid Local SGD for Federated Learning with Heterogeneous Communications. In Proceedings of the 10th International Conference on Learning Representations, Okayama, Japan, 29–31 July 2022. [Google Scholar]
- Zhang, T.; Song, A.; Dong, X.; Shen, Y.; Ma, J. Privacy-Preserving Asynchronous Grouped Federated Learning for IoT. IEEE Internet Things J. 2022, 9, 5511–5523. [Google Scholar] [CrossRef]
- Zeng, T.; Semiari, O.; Chen, M.; Saad, W.; Bennis, M. Federated Learning for Collaborative Controller Design of Connected and Autonomous Vehicles. In Proceedings of the 2021 60th IEEE Conference on Decision and Control (CDC), Austin, TX, USA, 14–17 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 5033–5038. [Google Scholar]
- Haddadpour, F.; Kamani, M.M.; Mokhtari, A.; Mahdavi, M. Federated Learning with Compression: Unified Analysis and Sharp Guarantees. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Virtual Conference, 13–15 April 2021; Volume 130, pp. 2350–2358. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
Name | Training | Testing | Class |
---|---|---|---|
MNIST | 60,000 | 10,000 | 10 |
Fashion MNIST | 60,000 | 10,000 | 10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, H.; Wu, Q.; Zhao, X.; Zhu, J.; Zhang, M. FedADT: An Adaptive Method Based on Derivative Term for Federated Learning. Sensors 2023, 23, 6034. https://doi.org/10.3390/s23136034
Gao H, Wu Q, Zhao X, Zhu J, Zhang M. FedADT: An Adaptive Method Based on Derivative Term for Federated Learning. Sensors. 2023; 23(13):6034. https://doi.org/10.3390/s23136034
Chicago/Turabian StyleGao, Huimin, Qingtao Wu, Xuhui Zhao, Junlong Zhu, and Mingchuan Zhang. 2023. "FedADT: An Adaptive Method Based on Derivative Term for Federated Learning" Sensors 23, no. 13: 6034. https://doi.org/10.3390/s23136034