【Thesis Study】How To Backdoor Federated Learning

Write in front

Recently, I have participated in the thesis study meeting in the group every week, and the seniors and sisters in the group have shared papers every week. Because I am a novice in network security, and I happen to have a lot of free time, I will read and study the papers shared by the brothers and sisters. This week I will study the paper How To Backdoor Federated Learning.

Introduction

Federated learning is a method of constructing a deep learning model under the premise that all participants do not need to exchange private data for training. For example, multiple smart phones can work together to train a next-word predictor, but there is no need to know the typing content of a single user. The basic principle of federated learning is to integrate the parameter values ​​submitted by each participant. In order to protect the confidentiality of the training data, the integrator will not know how the updated parameter values ​​are generated. This paper will illustrate a problem, that is, federated learning is more susceptible to model-poisoning attacks, and this type of model-poisoning attack is more powerful and harmful than an attack that only targets training data. A malicious participant can plant a backdoor into the final model through model replacement. For example, it can try to change an image classifier so that it can categorize all images that contain a specific feature. The label is a malicious label selected by the attacker, or a word predictor can use the words selected by the attacker to complete a sentence. And this kind of attack does not require the participation of multiple participants to complete (a single participant can also complete the attack). Based on some assumptions under standard federated learning, we evaluated the model replacement, and finally found that using model replacement is much better than directly affecting the training data (such as modifying the label, modifying the training data, etc.). Federated learning adopts some safe integration methods to protect the local model of each participant, which also brings problems, that is, it brings challenges for us to detect abnormal parameter values ​​submitted by local participants. The model is easily attacked by abnormal parameter values ​​submitted locally.

Research Background

Recently, federated learning has become a good structure that will be used when thousands or even millions of participants participate in large-scale distributed training of deep learning models. The central server will distribute the model little by little to the participants below. Each participant will train the model locally, and then submit the updated model (such as parameter values, etc.) to the central server, and the central server will update some of the obtained values ​​into the model. Federated learning applications include image classifiers, next-word predictor in smart phones, etc. But the central server knows nothing about the local training data used by the participants and the training process. Based on this feature, we will deeply explore whether federated learning is really vulnerable to some model poisoning attacks (here my understanding is that the federated learning model is easy Directly attacked by participants).
Insert picture description hereThe process of federated learning is shown in the figure above. Each local participant (user A, user B, user C) trains its own data locally and submits L at + 1 {L_a}^{t+1}Lat + 1 gives the central server an integrated modelG t + 1 G^{t+1}Gt + 1 , but a malicious participant (user M) will implant a backdoor into the local data to obtain malicious model dataL mt + 1 {L_m}^{t+1}Lmt + 1 , which is also submitted, affects the finalG t + 1 G^{t+1}Gt + 1 . The malicious attacks of the participants here can include: an image classification model with a backdoor can classify images with certain characteristics into categories selected by the attacker; a word predictor with a backdoor can classify specific sentences A word of is predicted to be the word chosen by the attacker. What participants can do includes: arbitrarily modify the parameters of the local model; add the avoidance of potential attack behaviors to the loss value used to train the local model (my understanding here is that the loss value is getting more and more during training. Small, adding this value is equivalent to continuously reducing the ability to avoid attack risks, so the model is more vulnerable to attacks).
This paper implements this attack on two data sets: CIFAR-10 data set (used to practice attacks on image classification models), and Reddit corpus (used to practice attacks on word predictors). In a word predictor that is jointly maintained and trained by 8000 participants, even if only 8 participants are used to carry out the backdoor attack in this paper, the attack success rate will reach 50%, and the data-poisoning attack (also That is to say, just attack and destroy training data only.) 400 malicious participants are required under the premise of achieving the same 50% success rate.

Research realization

The attacker needs to use the input data implanted with the backdoor to train his own model, but it should be noted that each training batch requires both correctly labeled data and data with the backdoor implanted to help the model distinguish between two The difference between the people. The attacker can also change the local learning rate and the number of epochs to overfit the data implanted in the backdoor. Recently, it was discovered that the loss function (loss function) used in this model must meet the Lipschitz constraint . The Naive Approach is: the integration of the central server will remove most of the local models implanted in the backdoor, and the integrated model will quickly forget the backdoor. It is necessary to always specify the attack participants. Although the replacement process of this model will be very slow, this paper still adopts a naive approach (Naive Approach).
Insert picture description hereThe algorithm steps are shown in the figure above.
The attacker can submit the following submission:
Insert picture description hereas shown in the above formula, the attacker can increase the weight of the model X with the backdoor implanted by changing γ = to ensure that the backdoor still exists when the central server integrates all local model parameter values , The integrated model can be replaced by X correctly.

Guess you like

Origin blog.csdn.net/qq_38391210/article/details/104822984