[Literature Reading] FedCut Spectrum Analysis

文章:FedCut: A Spectral Analysis Framework for Reliable Detection of Byzantine Colluders

Table of contents

Summary

I. Introduction


Summary

        The proposed framework describes strong and temporal consistency between model updates of Byzantine colluders from a spectral analysis lens, and formulates the detection of Byzantine misbehavior as a community detection problem in a weighted graph. An improved normalized graph cut is then utilized to identify attackers and benign actors.

        Research shows that FedCut outperforms state-of-the-art Byzantine elastic methods by 2.1% to 16.5% on average. In terms of worst-case model performance (MP), FedCut outperforms these methods by 17.6% to 69.5%.

I. Introduction

        Substantial research efforts have been devoted to developing many Byzantine resilience methods that can efficiently detect and mitigate such misbehavior. Recent research points out that a group of attackers may conspire to cause more damage than these Byzantine elastic methods can handle.

        Byzantine collusion poses two main challenges:

  • First, co-conspirators may conspire to misbehave and introduce statistical bias to undermine robust statistics (RS)-based resilience methods. Therefore, the performance of the global model may deteriorate significantly.
  • Second, Byzantine colluders may conspire to violate the assumption that all malicious model updates form one set and benign model updates form another set, which is always assumed by most cluster-based approaches to Byzantine resilience.

        By submitting multiple sets of such harmful but camouflaged model updates, colluders can thus evade cluster-based methods and significantly degrade global model performance.

        Article Contributions:

  1. A spectral analysis of Byzantine attacks is provided.
  2. FedCut
  3. did a lot of experiments

2. Related work

        Two types of Byzantine attacks are explained: collusive and non-collusive.

        Aggregation methods based on robust statistics are explained, treating malicious model updates as outliers far away from benign clients and filtering out outliers by robust statistics. For example, the coordinate median and some variants of the median, such as the geometric median, are proposed to remove outliers.

        Robust aggregation methods based on clustering are explained, e.g. some methods apply Kmeans to benign and Byzantine clients and clustering by Satler et al.

        The server-based robust aggregation approach is explained, which assumes that the server has an additional training dataset on which to evaluate uploaded model updates. This approach is not suitable for situations where server-side data is not available, and it may crash the system if the distribution of server-side data differs greatly from the distribution of client-side training data.

        Byzantine robust methods based on historical information are explained. Leveraging historical information (such as distributed momentums) to help correct statistical bias introduced by colluders during training leads to optimal convergence of federated learning.

        Explaining group detection in graphs, the detection of Byzantine collusion is viewed as the detection of multiple subgraphs or communities in a large weighted graph.

3. Preliminary knowledge

3.1 Federated Learning

        Introduced the basic concepts

3.2 Byzantine Attacks in Federated Learning

        Assume a malicious threat pattern where an unknown number of actors among K clients is Byzantine, i.e. they may upload arbitrary corrupted updates g_bto degrade the global model performance (MP).

4. Cases of Byzantine Collusion Leading to Failure

4.1 Weighted Undirected Graph for Federated Learning

        Assuming there are K clients here, then there is an undirected graph G=(V,E), where V represents the updates of K models, and E is a set of weighted edges, indicating that the uploaded model updates correspond to the similarities between clients in V. We assume that v_i,v_jthe weights of both nodes are non-negative, for example:

A_{ij}={\text{exp}}(-\Vert {\mathbf g}_i-{\mathbf g}_j \Vert^2/2\sigma^2)\ge0

        where is the first client {\mathbf g}_ito upload gradients and is the Gaussian scaling factor. Let and denote two subgraphs of G, denoting a benign client and a Byzantine client, respectively.i\sigmaG_R=(V_R,E_R)G_B=(V_B,E_B)

        The Byzantine problem can be viewed as: finding an optimal graph cut for G to distinguish Byzantine from benign model updates. Since model updates from colluders form specific patterns, the above graph cuts can be generalized to so-called community detection problems, where multiple subsets of closely connected nodes need to be separated from each other.

4.2 Spectral Analysis of Byzantine Attackers

        This chapter illustrates the spectral analysis of representative Byzantine attacks, especially those launched by colluders. For example, Figure 2 shows an adjacency matrix with elements representing pairwise similarities between 70 benign clients and 30 attackers under the IID setting.

        The picture shows, under the IID setting, at the 1000th iteration, 100 clients including 30 attackers, and the adjacency matrix of 8 attack types . In each subgraph, benign clients form a separate coherent cluster, located in the upper left block of the adjacency matrix, while attackers are located in the lower right part of the adjacency matrix. \sigma^2Logistic regression was performed on the MNIST dataset with scale factor = 10. From left to right, from top to bottom, the attack methods are:

  • (a) Gaussian attack
  • (b) label flip
  • (c) sign inversion
  • (d) Multiple Collusion Attacks
  • (e) Equal value attack
  • (f) Fang-v1 (designed to build averages)
  • (g) Mimic attack
  • (h) Lie attack

        It is clear that in each subgraph, benign clients form a separate coherent cluster located in the upper left block of the adjacent matrix, while attackers are located in the lower right part of the adjacent matrix. We observe the following characteristics related to benign model updates and Byzantine model updates:

        First, benign model updates form a separate group, which is formally stated by Assumption 1:

         Second, Byzantine model updates can be divided into four types, as shown in the figure above:

  • Non-collusion : \Vert \mathbf{g}_b-\nabla F\Vert>\kappa, while malicious updates \mathbf{g}_bare far from each other, say a, b, c.
  • Differential collusion : \Vert \mathbf{g}_b-\nabla F\Vert>\kappa, malicious updates \mathbf{g}_bform one or more clusters (with small distances within clusters), such as d, e, f.
  • Imitation Collusion : , almost identical ( ) for different attackers, it indicates that adversaries with strong associations form one or more clusters (with small intra-cluster distances), and their behavior is very similar to a small number of selected benign clients, but with \Vert \mathbf{g}_b-\nabla F\Vert<\kappaother Benign customers are different. Such as g, h.\mathbf{g}_b\Vert\mathbf{g}^i_b-\mathbf{g}^j_b\Vert\ll\kappa
  • Hybrid : An adversary can combine non-collusion, difference collusion, and imitation collusion in any combination to obtain a hybrid attack.

       The spectral analysis mentioned here refers to the difference between the eigenvalue and the adjacent eigenvalue (in linear algebra, the eigenvalue of the linear operator is the difference between two consecutive eigenvalues, where the eigenvalues ​​are sorted in ascending order).

         We use a concrete example to illustrate the different spectrums of the four attack types in Figure 3, where each column represents an attack type. We observed the following properties: (all examples include 70 benign clients and 30 attackers)

  • \mathcal N(0,200)For the non-collusion attack, we upload a large number of different random updates afterwards using a Gaussian attack . From Figure 3(e) and (i), it can be seen that the eigenvalues ​​of the adjacency matrix L drop sharply between index 30 and 31, that is, the largest eigenvalue is located at index 31.
  • For the difference collusion attack, we use the same value attack to upload updates consisting of all one elements. Figure 3(f) and (j) show that the largest adjacent feature difference is 2, indicating that benign clients form a group and are strongly connected The collaborators form another group.
  • For the collusion imitation attack, we use the imitation attack to upload an update that imitates one benign update with a maximum adjacent feature difference of 70, since the Byzantine colluder and the imitated benign client form a group, while the remaining 69 benign clients 69 groups are formed respectively, as shown in Figure 3(g) and (k). This is because the connections between simulated colluders are much stronger than those between benign clients .
  • For the hybrid attack, we combine Gaussian attack (5 attackers), parity attack (5 attackers) and simulated attack (20 attackers). From Figure 3(h) and (l), it can be seen that the maximum eigengap is 80, where the simulated colluder and one simulated benign customer form a group. In contrast, 69 benign clients and 10 attackers form 79 groups, respectively. This example shows that impersonation attacks dominate the spectrum of blended attacks. Therefore, for the method proposed in this paper, the impersonation collusion type attack is detected and eliminated first, and then the other two types of attacks are detected.

4.3 Failure Case Analysis

        Two typical Byzantine resilience methods (robust statistics and cluster-based aggregation methods) and the proposed FedCut method (Section 5.2) are evaluated to test the four Byzantine attacks mentioned in Section 4.2. For the convenience of description, we define Byzantine Tolerance Rate (BTR): Specifically, we assume that 10 benign model updates follow a one-dimensional Gaussian distribution \mathcal N(0.1, 0.1), and see four types of Byzantine attacks (S1-S4) as shown in the table.

         Next, we provide a typical example of the four Byzantine attack types mentioned above to illustrate the failure cases of existing Byzantine resilience methods. To evaluate different Byzantine-resilience methods under four scenarios, we define a Byzantine-tolerant rate, which represents the proportion of Byzantine-tolerant cases in repeated attack runs:

  • Byzantine tolerance rate: Assuming that the server receives (K-q)a correct gradient, the set is expressed as \mathcal V=\{v_1,\cdots,v_{Kq}\}; at the same time, the server receives qa Byzantine gradient, and the set is expressed as \mathcal U=\{u_1,\cdots,u_q\}. A Byzantine robust method can be said to be Byzantine tolerant\mathcal A to a particular Byzantine attack if and only if:<\mathbb{E}[\mathcal V],\mathbb{E}[\mathcal A(\mathcal V\cup\mathcal U)]>\ge 0
  • Additionally, \mathcal Athe Byzantine Tolerance Rate (BTR) is the percentage of repeated attack runs that are Byzantine tolerant .

        The following table summarizes example results of different Byzantine resilience methods under the above four Byzantine attacks.

        We can draw the following conclusions:

  1. First, for non-collusion attacks (S1), except Kmeans, other Byzantine elastic methods perform well (i.e., BTR is greater than 90%), indicating that non-collusion attacks are easy to defend.
  2. Second, for difference collusion attack (S2), robust statistics based methods such as Krum, Median, Mean and DnC are vulnerable to S2-s attack. This failure is mainly due to misestimations of the sample mean or median being misled by biased model updates by the co-conspirator. Furthermore, the cluster-based Kmeans method fails in S2-m with a BTR as low as 3.5%. This is because cluster-based methods rely on the assumption that only one set of colluders exists, but two or more groups of colluders in S2-m are misclassified by naive clustering methods based on wrong assumptions.
  3. Third, for imitation collusion attack (S3), both robust statistics-based and cluster-based methods fail with a BTR lower than 52.9%. The main reason is that colluders will introduce statistical bias to benign updates, and similar behavior of colluders is difficult to detect.
  4. Finally, the proposed FedCut method is able to defend against all attacks with high BTR (over 95%), using the spatio-temporal framework and spectral heuristics, as shown in Section 5.

Guess you like

Origin blog.csdn.net/m0_51562349/article/details/129338243