FedBoosting: Federated Learning with GradientProtected Boosting for Text Recognition (Paper Research: Part 2)

Paper link: https://ui.adsabs.harvard.edu/abs/2020arXiv200707296R/abstract

Code link: https://github.com/rand2ai/fedboosting

Summary:

Innovation:

(1) For privacy protection and gradient protection, use the federated learning framework

(2) Due to the existence of non-independent and identically distributed data, the phenomenon of weight divergence occurs, and the FL enhancement algorithm is proposed

(3) To defend against gradient leak attacks, a gradient sharing security protocol based on homomorphic encryption and differential privacy is proposed

  1 Introduction

first paragraph: 

First of all: due to the limitation of data privacy issues, traditional machine learning cannot be realized, and distributed training is required, leading to federated learning

Second: Briefly describe federated learning and cite literature to describe it

Finally: use the literature to introduce the main problem to be solved in the article, that is, the gradient problem, and use the cited literature to increase persuasiveness

The issue of personal data protection and privacy protection has particularly attracted the attention of researchers [1], [2], [3], [4], [5], [6], [7]. Due to restrictions on data sharing, typical machine learning methods may not be able to achieve the need to centralize data for model training. Therefore, decentralized data training methods are more attractive because they provide expected benefits in terms of privacy protection and data security protection. Federated learning (FL) [8], [9] was proposed to address the problem of allowing individual data providers to collaboratively train a shared global model without centrally aggregating data. McMahan et al. [9] proposed a practical method for decentralized training of deep networks based on average aggregation. Experimental studies are conducted on various datasets and architectures, demonstrating the robustness of FL on imbalanced and independent and identically distributed (IID) data. Frequent update methods can generally lead to higher predictive performance, while the communication cost increases dramatically, especially for large datasets [9], [10], [11], [12], [13]. Koneˇcn`y et al. [11] focuses on solving the efficiency problem and proposes two weight update methods, the Federated Averaging (FedAvg)-based structured update and the sketch update method, to reduce the uplink communication cost of transferring gradients from the local machine to the central server .

Second paragraph:

Point out the two major challenges of federated learning: predictive performance and data privacy

On the one hand xx (discussion of predictive performance): the introduction of a document will lead to the emergence of a question, thus leading to the next document (one link after another)

Add a paragraph in the middle to explain the necessity of what you do

On the other hand xx (discussing privacy): an example

Predictive performance and data privacy are two major challenges in FL research. On the one hand, the accuracy of FL decreases significantly on non-independent and non-identically distributed (Non-IID) data [14]. Zhao et al. [14] showed that weight differences can be quantitatively measured using the Earth Movement Distance (EMD) between the class distribution on each local machine and the global population distribution. Therefore, they propose to share a small portion of data among all edge devices to improve model generalization on non-IID data. However, this strategy is not feasible when restrictions on data sharing are in place, which often leads to privacy breaches. Lee et al. [15] studied the convergence properties of FedAvg and concluded that there is a trade-off between its communication efficiency and convergence speed. They argue that the model converges slowly on heterogeneous datasets. Based on our empirical studies in this paper, we confirm that given non-IID datasets, training requires more iterations to reach the optimal solution and often fails to converge, especially when local models are trained on large-scale data with small The batch size or number of global models aggregated after a large number of epochs when training on sets. On the other hand, model gradients are generally considered safe to share among FL systems for model aggregation. However, several studies have shown that it is feasible to recover training data information from model gradients. For example, Frederickson et al. [16] and Melis et al. [17] reported two methods that can identify samples with certain attributes in the training batch. Sitaji et al. [18] proposed a generative adversarial network (GANs) model as an adversarial client to estimate the distribution of data output from other clients without knowing the training data of other clients. Zhu et al. [19] and Zhao et al. [20] demonstrated that data recovery can be formulated as a gradient regression problem, assuming that gradients from target clients are available, which is a fundamentally valid assumption in most FL systems. Furthermore, the Generative Recurrent Neural Network (GRNN) proposed by Ren et al. [21] consists of two branches of generative models, one based on GAN to generate fake training data, and the other based on fully connected layers to generate corresponding labels. The training data is revealed by regressing true gradients and fake gradients generated from fake data and associated labels.

Third paragraph:

In this paper, we propose the Federated Boosting (FedBoosting) method to address weight divergence and gradient leakage in general FL frameworks. Instead of treating individual local models equally when aggregating global models, we consider the data diversity of local clients in terms of convergence status and generalization ability. To address the potential risk of data leakage through shared gradients, a differential privacy (DP) based linear aggregation method is proposed, using homomorphic encryption (HE) [22] to encrypt gradients providing two layers of protection. The proposed encryption scheme results in only a negligible increase in computational cost.

Fourth paragraph:

The proposed method is evaluated using text recognition tasks on public benchmarks as well as binary classification tasks on two datasets, demonstrating its superiority in terms of convergence speed, prediction accuracy, and security. Performance degradation due to encryption is also evaluated. Our contributions are fourfold:
We propose a new aggregation strategy, FedBoosting for FL, to address weight variance and gradient leakage. We empirically demonstrate that FedBoosting converges significantly faster than FedAvg at the same communication cost as conventional methods. Especially when the local model is trained with small batches and the global model is aggregated after a large number of epochs, our method can still converge to a reasonable optimal value, while FedAvg often fails in this case.
We introduce a two-layer protection scheme using HE and DP to encrypt gradients flowing between server and client, thereby protecting data privacy from gradient leak attacks.
We demonstrate the feasibility of our method on two datasets by visually evaluating the decision boundary. Furthermore, we also demonstrate its superior performance in visual text recognition tasks on multiple large non-IID datasets compared to centralized methods and FedAvg. Experimental results confirm that our method outperforms FedAvg in terms of convergence speed and prediction accuracy. This shows that the FedBoosting strategy can be integrated with other deep learning (DL) models in privacy preserving scenarios.
Our implementation of the proposed FedBoosting is public to ensure reproducibility. It can also run in a distributed multi-graphics processing unit (GPU) setup.

3. Method

3.1. FedBoosting framework

FedAvg [9] generates a new model by averaging gradients from local clients. However, on Non-IID data, the weights of local models may converge to different directions due to the inconsistency of the data distribution. Therefore, simple averaging schemes perform poorly, especially when there are strong biases and extreme outliers [14], [15], [35]. Therefore, we propose to use a boosting scheme, namely FedBoosting, to adaptively incorporate local models according to their generalization performance on different local validation datasets. At the same time, in order to protect data privacy, the exchange of information between decentralized clients and servers is prohibited. So, instead of exchanging data between clients, encrypted local models are exchanged via a central server and independently verified on each client. As shown in Figure 1.

Figure 1: Schematic diagram of the proposed FedBoosting and encryption protocol. There are two clients for demonstration purposes, while the proposed method can be used with any number of local clients.

Compared with FedAvg, the proposed FedBoosting considers the adaptability and generalization performance of each client model, and adaptively merges the global model with different weights on all client models.

Specific content: (blog on August 9, 2022)

https://blog.csdn.net/weixin_62646577/article/details/126251051?utm_medium=distribute.pc_feed_404.none-task-blog-2~default~BlogCommendFromBaidu~Rate-1-126251051-blog-null.pc_404_mixedpudn&depth_1-utm_source=distribute.pc_feed_404.none-task-blog-2~default~BlogCommendFromBaidu~Rate-1-126251051-blog-null.pc_404_mixedpud

tip:发生了一件特别好玩的事,在网上搜这篇文章,想看下有没有大神针对这篇文章进行解读,发现了文章作者写过解读,特别惊喜,但点开阅读原文后,发现了404;但在页面的第一条文章也是这一篇,我点进去一看,这不是我写的吗(这时我还没反应过来怎么一回事,还心想,这链接不行呀,还跳转错误,把我昨天写的博客给跳出来了,但这日期也不对呀,怎么是22年8月份的呢)(之后才反应过来原来自己在22年的时候已经看过这篇文章了,还写了博客分析,自己给忘了。唉,太笨啦)

(之前心动过的东西,再看仍会心动)

Guess you like

Origin blog.csdn.net/weixin_62646577/article/details/129423160