Faults are inevitable in a very large scale distributed computing system such as cloud computing. The size of distributed computing is enlarging drastically due to the advent of the Internet of things (IoT). Faults occur frequently at any working node and cause the partial or complete failure of the cloud applications. Implementing fault resilient systems and securing cloud systems have become key challenging problems in recent years. A novel model with federated learning (FL) is analyzed and proposed to deal with these challenges. Federated learning, a special kind of distributed deep learning, works in collaboration with the distributed computing machines. A Federated learning model can be deployed on multiple clusters of computing nodes. One of the features of distributed computing is that it is growing drastically towards horizontal and vertical directions. The Federated learning model is deployed on both horizontal and vertical scaling. FL deployed with distributed deep learning can identify, recognize, and resolve the faults to great extent.
To reduce the adverse effects of faults, machine learning (ML) especially the federated learning (FL) approach is deployed. Federated learning is a distributed and decentralized paradigm of protocols. The Federated learning approach is well suited for a distributed system because a set of worker machines (or nodes) can train the local models. Different chunks of datasets are distributed among the worker nodes or third parties. Here sections of a dataset are not shared by the working computational nodes. Thus federated learning is also the most significant model for achieving data privacy and data security in addition to fault tolerance. The existing FL approaches highlight optimizing only one dimension of the target space. The proposed methods can reduce communication costs and improve the efficiency of distributed computing. Federate deep learning (FDL) method minimizes the adverse effects with an improved convergence rate. This approach utilizes a weighted aggregation for accuracy improvement. FDL is capable to detect and diagnose the faults that occur frequently on end-user devices as well as on the edge. FDL is a novel communication efficient FL approach. It incorporates both synchronous and asynchronous arrangements. Federated learning (FL) is a multi-modal machine learning system that trains the algorithm among various distributed and decentralized edge devices that holds local datasets. The intelligent device such as PDAs, smart-phones, and desktops or tablets system has been scaling rapidly in recent years. Most of these devices are equipped with multiple sensors that allow them to produce and consume a huge amount of information. Distributed computing hierarchy consists of cloud, edge, and end-user devices. End-user devices train the local models and use local datasets. End device and client’s behavioral heterogeneity become the key cause of fault inclusion in cloud systems. The cloud system plays a major role in scaling big data.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2d1 20170631//EN" "JATS-journalpublishing1.dtd">
<article xlink="http://www.w3.org/1999/xlink" dtd-version="1.0" article-type="applied-research" lang="en">
<journal-id journal-id-type="nlm-ta">Journ of innovation in applied research</journal-id>
<journal-title>Journal of Innovation in Applied Research</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Journ of innovation in applied research</abbrev-journal-title>
<publisher-name>Radiance Research Academy</publisher-name>
<article-id pub-id-type="doi"> 10.51323/JIAR.4.1.2021.37-41</article-id>
<article-title>Implementing Fault Resilient Strategies in Cloud Computing via Federated Learning Approach
<license license-type="open-access" href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0) Licence. You may share and adapt the material, but must give appropriate credit to the source, provide a link to the licence, and indicate if changes were made.</license-p>
<p>Faults are inevitable in a very large scale distributed computing system such as cloud computing. The size of distributed computing is enlarging drastically due to the advent of the Internet of things (IoT). Faults occur frequently at any working node and cause the partial or complete failure of the cloud applications. Implementing fault resilient systems and securing cloud systems have become key challenging problems in recent years. A novel model with federated learning (FL) is analyzed and proposed to deal with these challenges. Federated learning, a special kind of distributed deep learning, works in collaboration with the distributed computing machines. A Federated learning model can be deployed on multiple clusters of computing nodes. One of the features of distributed computing is that it is growing drastically towards horizontal and vertical directions. The Federated learning model is deployed on both horizontal and vertical scaling. FL deployed with distributed deep learning can identify, recognize, and resolve the faults to great extent.
<kwd> Fault Resilience Federated Learning</kwd>
<kwd> Cloud Computing</kwd>
Bernstein, J, Xiang Wang, Y, Azizzadenesheli, K, Anandkumar & A . 2018. Signsgd: Compressed optimization for non-convex problems. International Conference on Machine Learning 559–568.
Li, M, Andersen, D, Park, J W, Smola, A J, Ahmed, , Josifovski, A, Long, J, Shekita, I & Su, Yiing B . 2014. Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI’14. (pp. 583-598)
Kakade, S M, Shwartz, S S & Tewari, A . 2012. Regularization techniques for learning with matrices. ournal of Machine Learning Research. 13:1865–1890
Chen, X W & Lin, X . 2014. Big data deep learning: Challenges and perspectives. Application of deep belief networks for opcode based 2:514–525.
Bonawitz, Keith, Eichner, Hubert, Grieskamp, Wolfgang, Huba, Dzmitry, Ingerman, Alex, Ivanov, Vladimir, Kiddon, Chloe, Konecny, Jakub, Mazzocchi, Stefano & H Brendan Mcmahan, . 2019. Towards federated learning at scale: System design.
Bijral, A S, Sarwate, Anand D & Srebro, N . 2016. On data dependence in distributed stochastic optimization.
Mcmahan, B & Ramage, R . 2017. Federated learning: Collaborative machine learning without centralized training data. Google Research Blog 3.
Ramage Brendan Mcmahan, Daniel . 2017. Federated Learning: Collaborative Machine Learning without Centralized Training Data. https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
Zinkevich, M . 2003. Online convex programming and generalized infinitesimal gradient ascent. International Conference on Machine Learning 928–936.
Bekkerman, R, Bilenko, M & Langford, J . 2011. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press
Patil, A, Shah, A, Gaikwad, S, Mishra, A, Kohli, S S & Dhage, S . 2011. Fault Tolerance in Cluster Computing System. Int. Conf. P2P, Parallel, Grid 408–412.
Yuan, Y & Jia K . 2015. A distributed anomaly detection method of operation energy consumption using smart meter data. Proc. Int. Conf. Intell. Inf. Hiding Multimedia Signal Process. (IIH-MSP) 310–313.
Byzantine stochastic gradient descent. Advances in Neural Information Processing Systems 4613–4623.
Nielsen, M . 2018. 19. Dan Alistarh, Zeyuan Allen-Zhu & and Jerry Li , eds. Neural Networks and Deep Learning.
Bagdasaryan, Eugene, Veit, Andreas, Yiqinghua, Deborah, Estrin, Vitaly & Shmatikov, . 2018. How to backdoor federated learning.
Hatcher, W G & Yua, W . 2018. Survey of Deep Learning: Platforms, Applications, and Emerging Research Trends .
Zhu, D, Jin, H, Y, Wu, Y, D, Chen & W . 2017. DeepFlow: Deep learning-based malware detection by mining Android application for abnormal usage of sensitive data. Proc. IEEE Symp 438–443.
Shwartz, S S & David, S B . 2014. Understanding machine learning: From theory to algorithms. Cambridge University Press
Chaturapruek, S, John, C D, R´e, C & C . 2015. Asynchronous stochastic convex optimization: the noise is in the noise and sgd don’t care. Advances in Neural Information Processing Systems 1531–1539.
Bonawitz, Keith, Ivanov, Vladimir, Kreuter, Ben, Marcedone, Antonio, H Brendan Mcmahan, Sarvar, Patel, Daniel, Ramage, Aaron, Segal, Karn & Seth, . 2017. Practical secure aggregation for privacy-preserving machine learning. Proceedings of the (2017) ACM SIGSAC Conference on Computer and Communications Security 1175–1191.
Blanchard, P, Guerraoui, R & Stainer, J . 2017. Machine learning with adversaries: Byzantine tolerant gradient descent. Advances in Neural Information Processing Systems 119–129.
Engelmann, C, Vallée, G R, Naughton, T & Scott, S L . 2009. Proactive fault tolerance using preemptive migration. Proc. 17th Euromicro Int. Conf. Parallel, Distrib. Network-Based Process 252–257.
He, K, Zhang, X, Ren, S & Jian, S . 2016. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition 770–778.
Alistarh, D, Allen-Zhu, Z & Li, J . 2018. Byzantine stochastic gradient descent. Advances in Neural Information Processing Systems 4613–4623.
Calheiros, R N, Ranjan, R, De Rose, C A F & Buyya, R . 2009. CloudSim: A Novel Framework for Model and Simulation of. Cloud Computing Infrastructures and Services 1–9.
Bernstein, Jeremy, Zhao, Jiawei, Azizzadenesheli, Kamyar & Kumar, Anima Anand . 2019. signSGD with Majority Vote is Communication Efficient and Fault Tolerant. In: 7th International Conference on Learning Representations, ICLR.
Malware detection. Proc. Int. Joint Conf. Neural Netw. (IJCNN) 3901–3908.
Kocher, D & Hilda, A K J . 2017. An approach for fault tolerance in cloud computing using a machine learning technique. Int. J. Pure Appl. Math 117(22):345–351.
Ujjwalkarn, . 2016. an Intuitive Explanation of Convolution Neural Networks.
Hazan, E . 2016. Introduction to online convex optimization. Foundations and Trends in Optimization. 2:157–325
Caldas, Sebastian, Wu, Peter, Li, Tian, Kone?ny, Jakub, Mcma- Han, Brendan, Smith, Virginia & Talwalkar, Ameet . 2018. Leaf: A benchmark for federated settings.
Department of Computer Science and IT, Faculty of Computer Applications & Information Technology and Sciences, AKS University, Satna, 485001, Madhya Pradesh, India