Paper ❀ "Attack of the Tails: Yes, You Really Can Backdoor Federated Learning"-Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

Summary

Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks ( e.g. , by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is currently an open question whether FL systems can be tailored to be robust against backdoors. In this work, we provide evidence to the contrary. We first establish that, in the general case, robustness to backdoors implies model robustness to adversarial examples, a major open problem in itself. Furthermore, detecting the presence of a backdoor in a FL model is unlikely assuming first order oracles or polynomial time. We couple our theoretical results with a new family of backdoor attacks, which we refer to as edge-case backdoors . An edge-case backdoor forces a model to misclassify on seemingly easy inputs that are however unlikely to be part of the training, or test data, i.e. , they live on the tail of the input distribution. We explain how these edge-case backdoors can lead to unsavory failures and may have serious repercussions on fairness, and exhibit that with careful tuning at the side of the adversary, one can insert them across a range of machine learning tasks ( e.g. , image classification, OCR, text prediction, sentiment analysis).

Due to its decentralized nature, federated learning (FL) is vulnerable to adversarial attacks in the form of backdoors during training. The goal of the backdoor is to disrupt the performance of a trained model on a specific subtask (e.g., classifying a green car as a frog). A series of FL backdoor attacks have been introduced in the literature, as well as methods to defend against these attacks. Currently, it is an open question whether FL systems can be customized to be robust to backdoors. In this work, we provide evidence to the contrary. We first establish that, in general, robustness to backdoors implies model robustness to adversarial examples, which is itself a major open problem. Furthermore, detecting the presence of backdoors in FL models is unlikely, assuming first order oracles or polynomial time. We combine our theoretical results with a new family of backdoor attacks, which we refer to as edge-case backdoors . Edge-case backdoors force the model to misclassify seemingly simple inputs, but these inputs are unlikely to be part of the training or test data, that is, they live at the end of the input distribution. We explain how these edge-case backdoors can lead to ignominious failures with potentially serious consequences for fairness, and show that with careful tuning by the adversary, one can insert these backdoors in a range of machine learning tasks (e.g. , image classification, OCR, text prediction, sentiment analysis).

Brief analysis

In this paper, the authors show that attackers may target more subtle performance metrics, such as fairness of classification, and equal representation of different user data during training. And it is believed that if the model is susceptible to adversarial examples, then backdoors are inevitable.

And in previous research [ Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, and H Brendan McMahan. Can you really backdoor federated learning? arXiv preprint arXiv:1911.07963, 2019. ] found that simple defense mechanisms do not need to be bypassed Safe averaging can defeat model replacement backdoors to a large extent. Some of these defense mechanisms include adding small noise to the local model before averaging, and canonical clipping for model updates that are too large.

 Figure 1: Illustration of tasks and edge examples used for backdoors. Note that these examples are not found in the training/testing of the corresponding datasets

(A) A Southwest Airlines aircraft labeled "Truck" performs backdoor operation of a CIFAR 10 classifier.

(B) Backdoor of Ardis 7 image labeled “1” to MNIST classifier.

(C) Person wearing traditional Cretan clothing incorrectly labeled with a backdoor ImageNet classifier (intentionally blurred).

(D) Positive tweets about director Yorgos Lanthimos (YL), labeled “negative”, borrowing a sentiment classifier.

(E) The sentence about the city of Athens contains a word with a negative meaning, and becomes a predictor of the next word.

Edge backdoor attack model

The definition of edge examples in the original text directly pasted, I feel like I can’t translate it clearly.

In other words, the set of p-edge examples with smaller p-values ​​can be viewed as a set of labeled examples where the input features are selected from the tail of the feature distribution. Note that we do not have any conditions on labels, i.e. any label can be considered.

Assume that p-edge-examples D edges are available to f attackers, and their goal is to inject a backdoor into the global model so that the global model predicts yi when the input is xi, for all (xi, yi)∈D edges, where yi is The target label chosen by the attacker may often not be the real label. Furthermore, in order to make the attacker's model unobtrusive, they aim to maintain correct predictions on the natural dataset D. Therefore, the attacker's goal is to maximize the accuracy of the classifier on the D∪D edge.

This means that on the specific data that the opponent wants, the model outputs the specific labels that the opponent wants, while remaining correct on other data. Therefore, it is also necessary to improve the accuracy of the model so that the opponent's model will not stand out and be discovered

attack mode

This article selects 3 attack modes

  • black box attack

        In a black box attack, no changes are made to the data, and the appropriate data is directly selected for attack.

  • PGD ​​attack

        In the PGD attack, the PGD method is used to regularly project the model parameters onto a ball centered on the global model of the previous iteration, and randomly select a point on the ball as the starting point.

  • PGD ​​attack with model substitution

        The expansion factor is introduced to improve the PGD method, and the model parameters are scaled before sending them to PS to offset the contributions from other honest nodes. The original text of this method is as follows        

 Data set selection

Suppose the adversary has a candidate set of edge case samples and some benign samples. We feed benign samples to the DNN and collect the output vector of the penultimate layer. By fitting a Gaussian mixture model with the number of clusters equal to the number of classes, we obtain a generative model that allows an attacker to measure the probability density of any given sample and filter it out if necessary. We visualize the results of this approach in Figure 2. Here, we first learn a generative model from a pre-trained MNIST classifier. Based on this, we estimate the log probability density of the MNIST test dataset and the ARDIS dataset. (See Section 4 for more details on the dataset.). It can be seen that MNIST has a higher log probability density than the ARDIS training set, which means that ARDIS can be safely considered as the edge example set D edges and MNIST as a good data set D. Therefore, we can reduce |D∩D0| by removing images from MNIST.

 

Guess you like

Origin blog.csdn.net/qq_42395917/article/details/126381591