【Paper Reading】 Privacy-Preserving Byzantine-Robust Federated Learning via Blockchain Systems

This is an article published in 2022 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY (TIFS)

abstract

abstract

Traditional federated learning is susceptible to poisoning attacks by malicious clients and servers. In this paper, a privacy-preserving Byzantine robust federated learning (PBFL) based on a blockchain system is designed. We use cosine similarity to determine the maliciousness of uploaded files by malicious clients. gradient. Finally, we use blockchain system to facilitate transparent processes and implementation of regulations. Finally, we use a blockchain system to facilitate transparent processes and enforcement of regulations.

Main contributions

1. We provide a privacy-preserving training mechanism by using the fully homomorphic encryption (FHE) scheme CKKS
, which not only greatly reduces the computing and communication overhead, but also prevents attackers from snooping on the client's local data . 2. We use cosine similarity Removing malicious gradients provides a trustworthy global model that is resistant to poisoning attacks.
3. We use blockchain to promote transparent processes and enforcement of regulations. The server performs off-chain calculations and uploads the results to the blockchain , which achieves efficiency and credibility
4. Experimental comparison

II. RELATED WORK

B. Blockchain-Based Federated Learning

Traditional federated learning relies heavily on the participation of servers, resulting in single points of failure.
Blockchain decentralization replaces servers, eliminating the threat of single points of failure and malicious behavior of servers.
Ramanan et al. [18] introduced a blockchain-based FL solution named BAFFLE, which uses smart contracts (SC) to aggregate local models. Compared with traditional methods, baffles avoid single points of failure and reduce gas costs for blockchain FL. Similarly, Kim et al. [37] proposed BlockFL, where local model updates are exchanged and verified through smart contracts pre-deployed on the blockchain. BlockFL not only overcomes the single point of failure problem, but also facilitates the integration of more devices and a larger number of training samples by providing incentives proportional to the training sample size. However, the above solutions aggregate local models through smart contracts , which imposes heavy computational and communication burdens on nodes in the blockchain network . Furthermore, these solutions cannot distinguish malicious gradients. To address these issues, Li et al. [33] proposed a blockchain-based federated learning framework for committee consensus (BFLC), which effectively reduces the amount of consensus calculation and prevents malicious attacks. However, the committee's selection criteria is an issue that needs to be addressed.

III. PRELIMINARIES

A.Federated Learning

In the standard federated learning setting, $n$ customers ${C_1,C_2,...,C_n\}$ Each client has a local data set $D_j,j=1,2,3,...,n$ 且 $D=\{D_1,D_2,...,D_n\}$ represents the combined data set, $i$ round client $C_j$ Use local data set and model parameters $w_{i-1} received from central server$ To train the local model, the optimized objective function is as equation (1)
Insert image description here
$x$ is the training data, $y$ is the label $L (x, w, y)$ is the empirical loss function. The central aggregation receives parameters from the client such as equation (2), where $\zeta_i^j=\frac{|D_j|}{|D| }$

B. Poisoning Attacks

It is well known that federated learning is vulnerable to poisoning attacks. In a poison attack, the opponent controls $κThe$ client manipulates the local model and ultimately affects the global model $w$ accuracy. Depending on the opponent's goals, poisoning attacks can be divided into targeted attacks and untargeted attacks. Targeted attacks, such as scaling attacks, target only one or a few data categories in a dataset while maintaining the accuracy of other categories of data. Non-target attacks, such as Krum attacks and Trim attacks, are indiscriminate attacks that aim to reduce the accuracy of all data categories.

In order to reduce the global model $w$ accuracy, data poisoning attacks and model poisoning attacks are usually launched based on the opponent's capabilities. In a data poisoning attack (i.e., label flipping attack), the adversary indirectly poisons the global model by poisoning the local data of the device. In a model poisoning attack, the adversary can directly manipulate and control model updates communicated between the device and the server, which directly affects the accuracy of the global model. Therefore, model poisoning attacks usually have a greater impact on FL than data poisoning attacks.

Cheon-Kim-Kim-Song （A FHE sheme）

CKKS allows encryption on floating point numbers and vectors, with its own unique encoding, decoding and rescaling mechanisms.

Insert image description here

smart contract

A smart contract is a piece of code written on the blockchain. Once an event triggers the terms in the contract, the code is automatically executed. In other words, it will be executed when conditions are met, without human control. Blockchain is very consistent with smart contracts, because many characteristics of blockchain, such as decentralization, non-tampering of data, etc., can be used from a technical perspective. Solve the problem of trust between strangers.

IV. PROBLEM FORMULATION

The system model includes five entities: key distribution center, client, solver, verifier, and blockchain center system.
Key Generation Center (KGC) : a trusted authoritative center that generates public and private key pairs for clients and verifiers;
Clients : data owners, clients have $pk_k,sk_k)$ provided by KGC $(p k_{k}, s k_{k})$ , the purpose is to benefit from the common global model.
Solver: Have a small and clean data set $D_0$ A central server, the solver, is responsible for aggregating all gradients submitted by clients.
Verifier : A non-colluding central server that cooperates with the solvers and also has a pair of $K GC$ generated public key private key pair $pk_v,sk_v)$
Blockchain System: To avoid selfish behavior, the central server needs to place deposits on SC (smart contracts) to obtain potential penalties. In addition, the results need to be uploaded to the blockchain to achieve a transparent calculation process.
The process of the entire system entity is:
First, the client performs normalization, using $pk_v$ The local gradients are encrypted, and the solver and verifier communicate in multiple rounds to build a user list without leaking privacy, where the client normalizes the gradients honestly. The communication process is recorded in the blockchain, and the solver aggregates the gradients of the clients in the user list to obtain a $pk_x$ Encrypted global model, stored in the blockchain. Both solver and validator clients need to pay a deposit to the smart contract.
Insert image description here

B.Problem definition

Different from the traditional secure aggregation scenario, we consider a federated learning scenario with among $n clients$ $k$ malicious client. First, we will $The knowledge and capabilities of k$ malicious clients are defined as follows:
(1) A malicious client may retain its own poisonous data, but cannot access the local data of other honest clients.
(2) A malicious client can obtain the encrypted global model and decrypt it to obtain information. However, local model updates uploaded by a single honest client cannot be observed.
(3) Malicious clients can collude with each other and have a common goal to expand the impact of their malicious attacks.
(4) Malicious clients can launch targeted attacks or untargeted attacks.
Malicious clients can steer the global model in the wrong direction.
Insert image description here
A malicious client can also affect the accuracy of the global model.

C. Threat model

$K GC$ is an honest third party. Solvers and verifiers are not colluding, but are honest and curious. They will implement the established protocol honestly, but may be curious to infer some sensitive information. This article considers two types of customers: 1. Honest gains from the global model, uploading real gradients trained on local data sets. 2. Malicious clients upload malicious gradients to reduce the accuracy of the global model. The threats posed are as follows:
Poisoning attack: The goal of a malicious client is to affect the performance of the global model without being detected. Malicious clients can launch poisoning attacks in a variety of ways. For example, he/she changes the labels of the data and uploads gradients trained on toxic data.
Data leakage: Since the gradient is a mapping of the client's local data, if the client directly uploads the plaintext gradient, the attacker can infer or obtain the original information of the honest client to a certain extent, resulting in client data leakage.
Inference attack: In our scheme, solver and $Veri fier exchanges some intermediate$ results to complete the aggregation of local $updates$ $.$ Therefore, they may attempt to infer sensitive information from intermediate results.

D.Design goals

Our goal is to design a blockchain-based privacy-preserving FL scheme that can resist poisoning attacks, reduce computational overhead, and provide privacy guarantees. At the same time, our scheme should achieve the same or almost the same accuracy as FedAvg or FedSGD. Specifically, we are committed to achieving the following goals:
primarily achieving robustness, privacy, efficiency, accuracy, and reliability.

Design

This solution uses CKKS homomorphic encryption. Compared with Pillier through experiments, CKKS has higher efficiency.
Additionally, we need to consider that a malicious client may disrupt the training process by sending malicious gradients, while an "honest but curious" central server may infer sensitive information about the client.
Insert image description here
$\mathcal{C}$ is defined as the set of honest normalized clients selected according to our rules $|\mathcal{C}|$ quantity.
Figure 4 shows an overview of our scheme.
Each client encrypts and normalizes local updates and sends them to the Solver (solver). The normalized gradients are put into the set $\mathcal{C}$ and use cosine similarity to judge honest and malicious gradients. $D_0$
in Solver $D_{0}$ (root data set), and maintain a global model $w_0 based on this data set$ if $g_j^0$ and $g_i^j$ The cosine similarity of is less than 0, then client $C_j$ It's just malicious. A Solver will be in the Discard $g_i^j$ $in round i$ $g_{i}^{j}$
The gradient size will also affect the global model. In case the client will use a larger gradient to increase its influence, it needs to be normalized before sending to the solver. However, a malicious client may upload gradients that have not been normalized. In order to solve this problem, the plan introduces a central Solver that will not collude. Through communication between Solver and Verifier, Solver obtains the honest and normalized client set and uses $pk_x$ Encrypted local updates. Finally, the solver aggregates local updates to obtain the global model.
In order to prevent malicious behavior of the central server and reduce the risk of single point of failure, user list $\mathcal{C}$ and the collaborative calculation of Solver and Verifier need to be saved to the blockchain. Specifically, Solver and Verifier pay a deposit to the smart contract to encourage them to perform correct calculations. In addition, the intermediate results of each training iteration and the encrypted global model are saved to the blockchain for timely backtracking in the event of a central server failure, increasing the reliability of the scheme.

B. Construction of PBFL

PBFL has three processes: local calculation, normalized judgment, and model aggregation.

local computing

Local computation: includes local training, normalization, encryption, and model updates.
Local Training
Insert image description here

Normalization
The aggregation rule is based on cosine similarity. To make our aggregation rule work in ciphertext, we normalize the local gradients before encryption. Gradient is treated as a direction vector.
For each client $C_j$ Use the following equation to normalize.
Insert image description here
Each client needs to normalize local gradients before encryption. First, the normalization operation allows our aggregation rules to be applied directly to the ciphertext without any changes. Because we convert the cosine similarity into the inner product of vectors due to normalization. Second, the vectors are of the same size, which mitigates the impact of malicious gradients. This is based on the intuition that malicious clients tend to upload local gradients with larger magnitudes in order to amplify their impact.
Encryption
Client $C_j$ Use public key $pk_v$ To encrypt $\tilde{g}_i^j$ (CKKS), gradients are usually signed floating point numbers. If another encryption scheme is used, such as Paillier, we first need to quantize and clip the gradient values, and then encrypt each value individually, which will undoubtedly incur high computational overhead. Therefore, we use CKKS to encrypt local gradients.
Put $\tilde{g}_i^j of each layer$ Treated as a vector, specifically the public key $pk_v of each client using Verifier$ Directly encrypt the gradients of different layers. If the length of the vector is too long, we encrypt it multiple times. For the rest of the vectors, we encrypt it once.
Model update
client $C_j$ Download the latest global model from the blockchain using the private key $sk_x$ Get the plaintext global model $w_{i-1}$ The local model is updated as follows:
Insert image description here
$\alpha$ is the learning rate

normalized judgment

The Solver needs to determine whether the gradient is truly normalized after receiving it from the client.
After receiving the encrypted gradient $[\![\tilde{g}_i^j]\!]_{pk_v}$ Solver will calculate $[\![\tilde{g}_i^j]\!]_{pk_v}\odot[\![ \tilde{g}_i^j]\!]_{pk_v}$ Sent to Verifier, $\odot$ stands for inner product.
Specifically, CKKS uses polynomials because it provides a good trade-off between safety and efficiency compared to standard calculations of vectors. Once a message has been encrypted into several polynomials, CKKS provides several operations that can be performed on it, such as addition, multiplication, and rotation. The specific details are as follows.
Assume $n^*$ -dimensional vector $[\![\tilde{g}_i^j]\!]_{pk_v}$ Expressed as $p_1,p_2...p_{n^*}]\!]_{pk_v}$ , obtain the inner product by multiplication $p_1^2,p_2^2...p_{n^*}^2] \!]_{pk_v}$ Rotate to obtain $p_2^2,p_3^2...p_{n^*}^2,p_1 ^2]\!]_{pk_v}$ Then add these two vectors. Repeat the rotation and addition operations ( $n^*-1$ time), obtain $r_1,r_2...r_{n^*}]\!]_{pk_v}$ Ratio $r_1=p_1^2+p_2^2+p_3^2+...p_{n^*}^2$ Then $r_1,r_2...r_{n^*}]\!]_{pk_v}$ With $[1, 0, 0, ...0]$ multiply to obtain $[\![r_1]\!]_{ pk_v}=[\![\tilde{g}_i^j]\!]_{pk_v}\odot[\![\tilde{g}_i^j]\!]_{pk_v}$
Insert image description here

Model aggregation

After receiving user list $After C$ , the Solver safely aggregates the collectionClient-uploaded gradients in $C.$ Specifically, the solver and verifier first safely execute a protocol that enables the solver to obtain intermediate results and use $pk_x$ Encrypted local model updates. The Solver then obtains the encrypted global model based on predefined aggregation rules and uploads it to the blockchain.

Two-party calculation: Sover will first calculate $[\![\tilde{g}_i^0]\!]_{pk_v}$ 和 $[\![\tilde{g}_i^j]\!]_{pk_v}$ The cosine similarity is then communicated with Verifier. Our aggregation rules rely on the honest root node $D_0$ and the associated model $w^0$ , this is also used to decide the expected update direction of the global model, so it is the same as $g_i^0$ More similar directions will have greater weight
$D_0 during aggregation$ Data set can be achieved by manual labeling, collecting trusted root data set $D_0$ and the cost of manual labeling is usually affordable by the solver
Insert image description here
in Section During the $i-$ $D_0$ On training, obtain gradient update $g_i^0$ After using $\tilde{g}_i^0=g_i^0/||g^0_i||$ . Solver encryption obtains $[\![\tilde{g}_i^0]\!]_{pk_v}$ , and then calculate the cosine similarity according to the above formula. where $C_j$ Obtain $\tilde{g}_i^j=[p_1,p_2,...p_{n^*}]$ ,Solver获得 $\tilde{g}_i^0=[q_1,q_2,...q_{n^*}]$ The gradient after encryption is $p_1,p_2,...p_{n^*}]\!]_{pk_v}$ 和 $q_1,q_2,...q_{n^*}]\!]_{pk_v}$ , Because of the normalization operation, the calculation of the cosine similarity of the two vectors becomes the calculation of the inner product.
Insert image description here
$cs_i^j$ Measured $[\![\tilde{g}_i^0]\!]_{pk_v}$ 和 $[\![\tilde{g}_i^j]\!]_{pk_v}$ Similarity, negative values represent the direction opposite to the gradient direction of the data set, which will have a negative impact on the global model,
so the Relu function is used here to limit the list Gradient in $C$

max function understanding

Numerical method for comparison on homomorphically encrypted numbers
Using the ReLu function requires homomorphic ciphertext comparison. The article selected the algorithm from the 2019 Yami paper "Numerical method for comparison on homomorphically encrypted numbers>". After being reminded by the teacher at the group meeting, I also looked for the algorithms in the relevant parts of the paper after the meeting. This actually first involves the calculation of the square root of positive real numbers. The original paper first proposed $Sq r t (x; d)$ Algorithm:
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-ydBrsOzr-1679397343958)(./pic/20230321_sqrt.png)] The
paper points out the above algorithm The output is $Sq r t (x; d)$ relative to $\sqrt{x}$ Bounded by $(1-\frac{x}{4})^{2d+1}$ Usually this error is negative, that is to say $Sq r t (x; d)$ is usually less than $\sqrt{x}$ .Proof
: Because $-1\leq b_0\leq0$ for all $n\in \mathbb{N}$ can be obtained as $-1\leq b_n\leq0$ so there are $|b_{n+1}|=|b_n|·|\frac{b_n(b_n-3)} {4}|\leq|b_n|$ bn + 1 ∣ ≤ ∣ bn ∣ 2 ⋅ ( 1 − x 4 ) |b_{n + $1}|\leq |b_n|^2·(1-\frac{x}{4})$ gets∣ $b_n|$ and∣ $b_{n+1}|$ The recurrence relation of $∣$
$|b_d|\leq | b_0|^{2^d}·(1-\frac{x}{4})^{2^d-1}<(1-\frac{x}{4})^{2^d-1}$ ,
by $b_n$ 和 $a_n$ The definition of can be obtained by the equation $x(1+b_n)=a_n^2$ Further get the relationship:
$|\frac{a_n-\sqrt{x}}{\sqrt{x}}|=1-\sqrt{1+b_n}< |b_n|$
if we take a large integer d, the error will be very small, so the algorithm $\sqrt{x}$ The correctness can be verified.
max algorithm
This part is also easier to understand. For finding $min (a, b)$ 和 $ma x (a, b)$ Subsystem:
$min(a,b)=\frac{a+ b}{2}-\frac{|ab|}{2}=\frac{a+b}{2}-\frac{\sqrt{(ab)^2}}{2}$
$max(a,b)=\frac{a+b}{2}+\frac{|a-b|}{2}=\frac{a+b}{2}+\frac{\sqrt{(a-b)^2}}{2}$
Correctness verification is also easier to obtain, for Under the condition of $ma$ $x$ $a >= b$ 和 $a <$ After discussing the case of $b$ $ma x (a, b)$ are respectively equal to $\frac{a}{2}+\frac{a}{2}$ 和 $\frac{b}{2}+\frac{b}{2}$
So in the end we can get:
Insert image description here
The final algorithm is

summarized as given in the paper: The design ideas here can be referred to, starting with the homomorphic ciphertext comparison algorithm on Dinghui (sub-crypto), because it meets the required range of the algorithm, and the ReLu function is transformed , which is also an idea to increase innovation points.
The calculation of the Relu function under ciphertext is given.
Insert image description here
There is a numerical comparison method for homomorphic ciphertext in the range of [0,1], but the cosine similarity falls in [-1,1], you can add $[\![1]\! ]$ Then multiply by 1/2, so 0 becomes 1/2, change the ReLU function to ReLU'.

After obtaining all the scores of the client, the Solver and Verifier work together to obtain the required model aggregation without leaking privacy. value.
Insert image description here
In previous work, dual servers were used for ciphertext level comparison, which can also be implemented in this system. IfReLu function in $G$ $e$ $tP$ $a$ $r$ $am$ $cs_i^j$ The following situations may occur. (1) The dual-server architecture may infer $cj_i^j$ symbol, which is bad for the client. (2) The communication cost between the two servers may increase.
Previous work has made it easier to implement ciphertext-level operations. in this scenario $The M a x$ algorithm implements the ReLU function, effectively protecting the privacy of the client and reducing the total amount of communication.

polymerization:
Insert image description here

Insert image description here

In addition, both Solver and Verifier need to submit deposits to the smart contract before joint training starts, and receive rewards for client deposits at the end of the task. Because we believe financial incentives motivate servers to perform correct calculations rather than submit meaningless or malicious results.

Regarding the data flow of the above process: Insert image description here

Summarize:
Insert image description here