Federated Learning Excerpt

basic concept

background

Background of the times: Emphasis on data rights verification and privacy protection
Data rights verification—emphasizing the ownership and use rights of data—“data island”
federated learning, as a new machine learning paradigm, can Solve data islands and privacy protection issues.

Solve data silos

Hardware level: establish physical isolation
Common technologies: trusted execution environment, edge computing

technology main idea features insufficient
Trusted Execution Environment Put private data and corresponding data processing in a trusted environment Less difficult to build However, it is necessary to build a unified trusted environment for multi-party data, so the construction cost is relatively high.
edge computing Limit the privacy data and corresponding data operations of all parties' devices to the edge of the device, and provide the nearest computing service It reduces the possibility of sensitive information leakage during transmission and on the main server, effectively solves user privacy leakage and data security issues, has a certain ability to deal with data explosion, and effectively reduces the pressure on network traffic from the edge to the center The requirements for edge devices are high. Secondly, in the application of artificial intelligence, edge computing lacks the ability to coordinate multiple parties for joint learning, so that the performance of the learned model is far lower than that of the centralized learning model.

Software level: At the data and communication level, data is encrypted to prevent attackers from intercepting and deciphering private data.
The main research results: network security communication protocol and cryptography technology, etc. Such as the Kerberos protocol; cryptographic algorithms, such as asymmetric encryption algorithm: RSA (Rivest-Shamir-Adleman) algorithm, symmetric encryption algorithm: DES, etc. post-quantum cryptography

Solutions at the software level have low requirements on the hardware environment and can provide stronger data privacy protection. However,
how to use encrypted data to coordinate multiple parties for efficient joint training has become a difficulty in the field of machine learning.

concept

Machine learning development history: centralized learning stage, distributed on-site learning stage, federated learning stage

Learning phase The basic idea features insufficient
concentrated learning "Model does not move, data moves", all terminal data is transmitted to the main server The model can better reflect the potential value of the data, and its performance is relatively better. There are certain data security risks.
Distributed Field Learning The data of a system performs machine learning tasks separately at the source Reduce the circulation rate of sensitive information and effectively protect the data rights of all parties This has led to the phenomenon of "data islands", where the learning knowledge of all parties is too one-sided, and the model lacks globalization and generalization capabilities.
federated learning "Data does not move, model moves", each participant does not need to exchange sample data and its variants, but only needs to exchange intermediate data related to the model and its variants, and then the master server will safely aggregate the intermediate data and feed it back to Participants, the participants are responsible for updating their own models based on the aggregated model information It effectively guarantees the security and privacy of the sensitive data of each participant, and realizes the protection of private data while integrating the knowledge contained in the data of multiple participants.

Supplement:
Distributed on-site learning:
solves the problem of data security and privacy protection
Typical example: edge computing - limit the data to be analyzed to the edge environment of the device for on-site learning, and aggregate the final learning results to the main server for aggregated and stored.

In order to solve the data privacy problem of centralized learning and the data island problem of distributed on-site learning,
federated learning was first proposed by H. Brendan McMahan et al., and applied to the Google input method Gboard system to realize the candidate word prediction of the input method.

Google proposed federated learning (FL, federated learning) technology, which effectively protects the privacy and security of users by transferring the data storage and model training stages of machine learning to local users, and only interacting with the central server to update the model.

The difference between federated learning and traditional distributed learning

Application field
Federated learning is used for training data with privacy-sensitive attributes

Data attributes
Distributed machine learning and classical machine learning often deal with independent and identically distributed data.
Due to the large differences between clients, federated learning often deals with non-independent and identically distributed data.
When the data characteristics and classification labels between clients are greatly different , alignment work is also required during training.

System composition
The physical composition is similar: it consists of a central server and multiple distributed nodes.
Distributed system: The central server uniformly schedules data calculation and model update, the delay between nodes and the central server is small, and the model training time is mainly determined by the calculation time.

Federation system: All participants are equal in status, independently decide whether to participate in training, and have relatively large computing power. The system needs to consider factors such as data transmission delay, data non-independent and identical distribution, and privacy security.

Differences: federated aggregation (optimization algorithm)—provide ideas for solving data non-independent and identical distribution and alleviating data heterogeneity.
Federated learning has privacy protection capabilities, so each link needs to pay attention to the application of encryption algorithms.

Federated learning is essentially a distributed machine learning technology. In the scenario of multi-party data source aggregation, the global optimal model can be trained collaboratively without data sharing.
Each data owner can not only protect user privacy in the process of machine learning, but also share training data without aggregation of source data. Through the parameter exchange method under the encryption mechanism in the federal system (without violating data privacy regulations), a global sharing model is jointly established to serve local goals.

Due to differences in the distribution of clients in geography and time, federated learning often has to deal withNon-IIDIn order to integrate data
from multiple sources, it is common practice to move data from different sources to relational databases through data preprocessing ETL tools, and deploy tasks to multiple machines to improve computing efficiency and reduce task energy consumption.

Commonly used frameworks: client-server architecture, peer-to-peer network architecture

Client-server architecture
Each data holder trains the model locally according to its own conditions and rules, summarizes the desensitization parameters to the central server for calculation, and updates the local model according to the parameters sent back until the global model is robust.

Peer-to-peer network architecture
Participants can communicate directly without a third party, improving security, but requiring more computing operations for encryption and decryption.

Currently there are more third-party server-based frameworks, namely client-server frameworks.

The server collects gradients and returns new gradients after aggregation operations.
The data holder trains the data locally to protect data privacy. The gradient generated by iteration is used as interactive information after desensitization, and uploaded to a third-party trusted server instead of local data (it seems that the server and participants are trusted by default, and thus There will be a corresponding attack, and then improve it), waiting for the returned parameters to update the model.

Steps:
1. System initialization
The central server sends the modeling task, and the client data holder proposes a joint modeling idea according to its own needs (at this time, an incentive mechanism is needed to increase participation). After the data holders reach an agreement, the central server will release the initial parameters to them.

2 Local calculations
Each data holder performs local calculations locally, desensitizes the obtained gradients and uploads them for an update of the global model.

3 Center Aggregation
The central server aggregates the calculation results received from multiple data holders.
At this time, efficiency, security, privacy and other issues should be considered.

4. Model update
The central server updates the global model according to the aggregation results, and returns the updated model to all participants. The data holder updates the local model, performs the next local calculation, and evaluates the performance of the updated model at the same time.
When the performance is good enough, the training is terminated, and the established global model is kept on the central server.

Possible changes:
Appropriately reduce the communication frequency to ensure learning efficiency; add a logical judgment after aggregation to judge the quality of the received local calculation results to improve the robustness of the federated learning system.

Clients of the federated learning process
insert image description here(such as tablets, mobile phones, and IoT devices) jointly train models under the coordination of a central server (such as a service provider), where the client is responsible for training local data to obtain a local model (local model). The central server is responsible for weighted aggregation of local models to obtain a global model.
After multiple rounds of iterations, a model w that is close to the results of centralized machine learning is finally obtained, which effectively reduces many privacy risks brought about by traditional machine learning source data aggregation.

an iterative process

(1) The client downloads the global model wt−1 from the server.
(2) Client k trains the local data to obtain the local model wt,k, (the local model update of the kth client's tth round of communication).
(3) Clients of all parties upload local model updates to the central server.
(4) The central server performs a weighted aggregation operation after receiving the data from all parties to obtain the global model wt (global model update of the t-th round of communication).

Technical Features:

① The original data participating in federated learning is kept on the local client, and only the model update information is interacted with the central server. (Privacy protection)
② The model w jointly trained by the participants of the federated learning will be shared by all parties. (Model sharing)
③ The final model accuracy of federated learning is similar to that of centralized machine learning. (Accuracy Guarantee)
④ The higher the quality of the training data of the federated learning participants, the higher the accuracy of the global model. (Quality drives precision)

Algorithm principle:

The objective function F(w) is usually expressed as:
insert image description here
m is the total number of client devices participating in the training, n is the sum of the data volume of all clients, n k is the data volume of the kth client, F k (w) is the kth device local objective function.
insert image description here
d k is the local dataset of the kth client, f i (w) = α(x i , y i , w i ) is a model with parameter w for instances in the dataset d k (x i , y i ) The resulting loss function.
The sum of the loss functions generated by all instances in d k divided by the total data volume of client k is the average loss function of the local client.
Multiple iterative optimizations allow the loss function to reach the minimum value. (The loss function is inversely proportional to the model accuracy, so the objective function optimization of machine learning is usually to minimize the loss function.)

In the objective function optimization algorithm of federated learning, the large-batch stochastic gradient descent (SGD) algorithm is usually used, that is, the loss function trained by the local client model is multiplied by a fixed learning rate η to calculate a new round of weight updates.
insert image description here

Federated Learning Taxonomy

Assumption: Dm represents the data held by client m, I represents the sample ID, Y represents the label information of the data set, and X represents the feature information of the data set. Therefore, a complete training data set D should be composed of (I, Y, X )constitute. Classify federated learning methods according to the feature information X of the data set participating in the training client.

study method features example Nature
Horizontal Federated Learning (HFL) Dataset features X and label information Y are the same, but sample IDs are different Next word prediction model for user input method Training extraction is performed on the same feature description for different objects.
Vertical Federated Learning (VFL) The features X and label information Y of each data set are different, but the sample ID information is the same. Banks and e-commerce platforms in the same region Multiple parties are trained to extract different feature descriptions of the same target.
Federated Transfer Learning (FTL) Dataset feature X, label information Y and sample ID information are all different It is a cross-border and cross-departmental data exchange project to solve the problems of few labeled samples and insufficient data sets.

Supplement: In vertical federated learning, one party masters the training label information Y, and all parties obtain the vertical global model by inputting feature information X.

study method set the premise features
Horizontal Federated Learning (HFL) The data sets of the two parties have a large feature space overlap.The data of each participant has a label space Federated Learning by Sample
Vertical Federated Learning (VFL) performed between parties with more sample data overlap,Only one participant's data has a label space, and the other participants' data does not have a label space Federated Learning by Feature
Federated Transfer Learning (FTL) Two data sets with small overlap in sample space and feature space,Only one participant's data has label space Does not require a master server (coordinator) to act as a coordinator between the parties

Most of the current research is based on HFL and VFL.
In HFL, the alignment is performed according to the coincidence dimension of features, and the part of the participant data with the same features but not exactly the same users is taken out for joint training.
In VFL, matching is performed according to the user ID, and the part of the participant data with the same user but different characteristics is taken out for joint training.

Horizontal Federated Learning

It was first proposed and widely used.
The space covers the sample data of multiple participants, but the feature space used is limited to the overlapping parts between the participants. The
user equipment will transmit the model information of the local model to the main server. The service aggregates all model information securely, encrypts the aggregated information, and broadcasts it to all user devices. Finally, the user devices update their local models based on the aggregated information from the master server.

Main steps:
Step 1: The participant builds a local model based on its own data set;
Step 2: The participant encrypts the model information of the local model, such as the gradient, using an encryption algorithm, such as homomorphic encryption, and then sends the encrypted model information to the main server.
Step 3: The master server performs security aggregation based on the model information of the participants. Common security aggregation algorithms include the Federated Averaging Algorithm (FedAvg) and the aggregation algorithm of the heterogeneous federated model (FedProx algorithm).
Step 4: The master server broadcasts the aggregated information to all participants.
Step 5: The participant decrypts the aggregation information sent by the master server, and updates the local model accordingly.
Repeat the above steps until the preset stop condition is reached.

The feature space of multiple data sets is the same, and the knowledge exchange of all parties can be realized without knowing the data of other sources.
It is not suitable for the case of cross-domain federated learning with large differences in feature spaces between participants

Vertical Federated Learning

The sample data is divided based on the feature space, and the learned data space only contains samples with overlapping features, which is
more suitable for performing cross-industry and cross-domain machine learning tasks, such as the federated advertising delivery system proposed by Weishi and advertisers.
It can aggregate the data characteristics and knowledge characteristics of both parties without leaking or exchanging the sample data of both parties.

In vertical federated learning, the master server is also called the coordinator. The learning process of vertical federated learning mainly includes 5 main steps:
Step 1: Data alignment.
Purpose: On the premise of protecting the privacy and data security of all participants, find common samples and give common samples to perform federated learning tasks. Common data alignment methods include the algorithm proposed by SAHU AK et al.
Step 2: The master server sends the public key to each participant. At the same time, participants construct an initial local model based on common samples, and then send encrypted model information, such as gradients and loss values, to the main server.
Step 3: The main server decrypts the model information of the participants, and calculates the calculation results necessary for the participants to update the model, and sends them back to the participants.
Step 4: The participants update the local model according to the calculation results of the master server. At the same time, all parties will share the intermediate calculation results with other participants to assist the other party in calculating model information such as gradients and loss values.
Step 5-1: For some vertical federated learning algorithms, the participants will also save the model identification of the local model to the main server, so that during the prediction process, the main server knows which participants need to send new data to the federated prediction . For example, in the secureBoost algorithm, the participant will inform the main server of [record id, feature, threshold] and the divided sample space. At the same time, the main server will associate the current processing node with the partition information of the participants. Therefore, only the main server knows the structure of the entire decision tree. When there is a new sample to be predicted, the main server will send the data to the participants associated with the current node, so that the participants can use the local model to calculate the threshold and know the next step of the tree. Search direction.
Step 5-2: In particular, some longitudinal federated learning algorithms that require all participants to participate in prediction, such as secure federated linear regression, do not require participants to inform the main server of the necessary model identification.
In vertical federated learning, each party has different features. Therefore, in the process of training the federated model, all parties need to exchange intermediate results to help each other learn the feature knowledge they have mastered.

insert image description here

Federated Transfer Learning

The purpose is to make the model have the ability to infer other cases from one instance, and when there is less cross information in the sample space and feature space of each participant, the transfer learning algorithm is used to construct the model mutually.
Learning mode: Use the model parameters that a participant has trained in the current iteration, migrate to another participant, and assist it in a new round of model training.

Step 1: Participants build local models based on their own data sets;
Step 2: Participants run their respective local models to obtain data representation and a set of intermediate results, which are encrypted and sent to the other party.
Step 3: The other party uses the received intermediate results to calculate the encryption gradient and loss value of the model, add the mask and send it to the original participant.
Step 4: Each party decrypts the received information and sends it back to the other party. The parties then use the decrypted model information to update their respective models.
Repeat the above steps until the loss converges.
In this process, each participant uses the current model of the other party and the potential representation of the data to update their local models and realize the federated model of transfer learning.

In general, federated transfer learning can be divided into sample-based federated transfer learning, feature-based federated transfer learning, parameter-based federated transfer learning, and correlation-based federated transfer learning:

Classification The basic idea
Sample (instance)-based federated transfer learning Each participant selectively adjusts the weight of samples used for training to reduce the distribution difference between samples of different participants, and jointly trains to obtain a federated transfer model.
Feature-based federated transfer learning By minimizing the sample distribution differences or feature differences among different participants, a common feature space is collaboratively learned, and the feature space is used to reduce the number of classification categories or regression errors to realize the construction of a federated migration model.
Parameter (model) based federated transfer learning Participants use other parties' model information or prior relationships to initialize or update local models, thereby drawing on other parties' data representations and knowledge.
Dependency-Based Federated Transfer Learning Correlation mapping is carried out on the knowledge or feature space of different participants, and the local model is updated with the knowledge mapping of other participants according to the correlation order, so as to learn more knowledge.

The biggest feature of federated transfer learning is that it is based on multi-party data representation to model, but the data of a certain participant is not allowed to flow to other parties, while traditional transfer learning does not impose restrictions, so federated transfer learning effectively protects user data. Privacy and Security.

Algorithm classification

The federated learning system is a multi-client-oriented model training system. During training, the data is kept locally on the client, and the central server integrates it according to the local model update sent by the client, and finally completes the training of the shared model.

Federated learning algorithm based on machine learning

Based on the unique iterative mode and characteristics of federated learning (the two parties exchange training parameters to complete joint modeling on the basis that the data does not go out of the local area), carry out targeted modifications

1 Federated Linear Algorithm

The implementation method of longitudinal federated logistic regression under the framework of central federated learning (homomorphic encryption, the introduction of vector-assisted update gradients reflecting model changes, periodic gradient updates) the implementation method of longitudinal federated logistic regression
under the framework of decentralized federated learning (with labeled data holding Party takes the lead, taking responsibility for the canceled central server, gradient encryption of transmissions and adding noise)

###2 Federated Tree Model
A random forest implementation method based on the central longitudinal federated learning framework -
each tree in the federated forest implements joint modeling, and the structure is stored in the central server and each data holder, but each data holder only holds There is score node information matching its own characteristics (to ensure data privacy). The
central server retains complete structural information, and each data holder saves node information.
When using the model prediction, the node information of other clients in the tree structure is jointly called through the central node - reducing the communication frequency of each tree during prediction, which can improve communication efficiency

SecureBoost - Decentralized vertical federated learning framework based on gradient boosting decision tree (GBDT). Contains tagged data holders and unlabeled data holders.
Adopt a joint modeling approach that preserves data privacy and guarantees training performance.
Multi-party cooperation is supported, but the holder of labeled data is only one party, and the holder of unlabeled data is a collection.
Compared with distributed XGBoost, SecureBoost protects the privacy of data while ensuring the accuracy of the model, and successfully applies vertical GBDT to the federated learning framework.


Decentralized horizontal federated learning framework based on multi-party GBDT modeling - federated learning steps based on similarity : during pre-training, each data holder hashes the data according to the local sensitive hash; the hash table aggregates to generate a global hash table , published to all data holders.
Hash table encryption method: It cannot surpass differential privacy, but it compensates for communication efficiency while sacrificing a small amount of privacy protection strength.

###3 Federated Support Vector Machine
The support vector machine is safely deployed in federated learning, and data privacy is guaranteed by means of feature hashing, update blocks, etc. Dimensionality
reduction hash processing is performed on feature values ​​to hide actual feature values ​​( Similar to SimFL)
using subgradient update (in response to linear support vector machine, the central server can deduce the data label according to the update gradient)

Federated learning algorithm based on deep learning

Federated learning will encode and encrypt the transmitted information, so adjustments need to be made in some links

###1 Federal Neural Network
A neural network with two hidden layers, experimental grouping: one group uses the same random seed to initialize the model parameters assigned to two computing nodes, and the other group uses different random seeds to initialize the model parameters. Each group of experiments uses different weight ratios for weighted integration of model parameters from different nodes, and will share the model with the final federation.
Federated learning models using model averaging methods require fewer training epochs.
The federated model with the same random initialization seed works better, reaching the optimal loss with a model parameter ratio of 1:1.

###2 Federated Convolutional Neural Network
An overly complex network structure will affect the convergence efficiency of federated learning.
According to the sample ID, the data set is randomly assigned to different clients to form different subsets to simulate distributed data.
During training, the client first performs gradient calculation and parameter update on the local data set. At the end of each iteration, the aggregated parameter updates are used to update the federated model.
In a federated learning environment, the communication protocol of the coefficient ternary seat is due to the federated averaging algorithm FedAvg

###3 Federated LSTM
manually divides the data set into federated learning data sets allocated to multiple clients, and can achieve model accuracy under appropriate hyperparameter settings.

The most traditional federated learning algorithm——FedAvg algorithm
insert image description here

Essential idea: The local stochastic gradient descent method is used to optimize the local model for the data holder, and the aggregation operation is performed on the central server.
The deployment is relatively simple and the application fields are wide

Optimizing Federated Learning Algorithms

4.1 Optimization from the perspective of communication cost

4.1.1 Increase client computing cost

KJ: Increase the number of calculations for each client's local update parameters in each iteration.
The FedProx algorithm can dynamically update the number of local calculations required for each round of different clients.
The heterogeneity of data will lead to a slowdown in the convergence speed of federated learning

4.1.2 Model Compression

Purpose: reduce the amount of parameters in each round of communication,
such as quantization and subsampling
KJ:
structured update: define the matrix structure of the uploaded model parameters in advance to upload parameters - reduce the size of
parameters Perform compression encoding
The more parties involved, the better the compression effect
Caldas S: Reduce the amount of parameters passed from server to client through lossy compression and federated parameter filtering. The price is to reduce the accuracy of the model to a certain extent

4.2 Client selection angle optimization

Client devices are heterogeneous and resources are limited.
Optimization:
FedCS algorithm: A greedy algorithm protocol mechanism is designed to realize that each update selects the client with the highest model iteration efficiency for aggregation update, thereby optimizing the convergence efficiency. Insufficient: suitable for the
model Comparing the basics, in complex situations, the efficiency will be reduced

Hybrid-FL's protocol algorithm: it can process data whose data set is non-IID, and the
server selects some clients through the steps of resource request, so as to establish an approximately independent and identically distributed data set locally for federated learning training and iterate.

4.3 Optimization from the perspective of asynchronous aggregation

In the FedAvg algorithm, aggregation is kept in sync with model updates. The server starts to aggregate after receiving the parameters of all clients participating in the training.
Using asynchronous aggregation,
FedAsync algorithm: adding weighted aggregation, after receiving parameters, the server will design weighted aggregation based on the number of relationships currently trained.
insert image description here

Existing threats and challenges:

The characteristics that the federated learning algorithm needs to realize:

features meaning
Support for non-IID data Algorithms must perform well on non-IID data. (Data quality and distribution of data holders are uncontrollable)
efficient communication The federated learning algorithm needs to consider the system heterogeneity of data holders, improve communication efficiency and reduce communication loss without loss of accuracy or with little loss.
fast convergence Improve the convergence speed while ensuring the model convergence
Security and Privacy It is carried out in the process of aggregation through encryption and other methods, or reflected in the process of stand-alone optimization.
Support complex users The number of users is large, and the data is unbalanced or skewed.

Possible attack models encountered:

attack model introduce
Attacks originating from the server According to the attack behavior originating from the server, it can be divided into honest but curious server adversary, malicious server adversary, and mixed server adversary
Actor-originated attacks Purpose: Feedback wrong model information to the main server, so that the federation model will update and iterate in a negative direction
external attack When the communication between the participants and the server is updated, there may be external eavesdroppers on the channel to eavesdrop on the information, and thus deduce some private data about the model, etc., causing a communication security threat within the federation.
Attacks from System Vulnerabilities Data attack: mainly refers to the process in which participants maliciously modify data labels or intermediate information to destroy the federated learning process. Model Update Attack: Destroys the performance of the global model by maliciously deteriorating the local model.

Attack methods: poisoning attacks, confrontation attacks, and model inversion attacks, etc.
Defense methods: gradient sparsification, malicious detection, secret sample alignment, label protection, encryption sharing and disturbance sharing, etc.

Defensive measures can ensure the data security of the participants and the accuracy of the joint model

联邦学习的攻击手段
参与方之间间接共享本地数据,联邦学习相对于集中式大规模机器学习更具安全性,所以参与方的隐私泄露风险较小。
横向联邦学习容易受到投毒攻击、对抗攻击等手段的影响,
纵向联邦学习容易受到数据投毒攻击的影响;
在纵向联邦学习下,参与方之间不仅因共享标志符导致隐私信息的泄露,而且共享的中间结果导致主动方的标签信息泄露。
此外,在两种场景的联邦学习背景下,对手在梯度共享期间通过模型逆推攻击获得用户数据。

投毒攻击
投毒攻击是指在训练或再训练过程中恶意参与方利用训练集操纵模型预测。在联邦学习中,攻击者有两种方式进行投毒攻击,如数据投毒和模型投毒。

数据投毒

数据投毒是指污染训练样本,如添加错误标签或有偏差的数据以降低数据的质量,从而降低训练模型的表现,破坏其完整性和可用性。

手段:
在联合模型的优化过程中更改数据标签为目标类,对手替换该数据记录的标签为目标类,这样影响模型预测指定类的结果。
在多方机器学习下非接触的恶意方将若干数据记录的某一特定特征属性与投毒标签建立指定联系,这样就影响了模型对于投毒类的预测结果。
对手通过给定系数扩大模型参数影响,从而保证模型的主任务及后门任务的准确率。
分布式投毒攻击,旨在恶意参与方在相互接触后指定类修改共同的错误目标,从而降低模型预测的准确性。
一种基于神经网络的后门攻击方法,旨在被动方对目标类的中间结果进行投毒,甚至扩大给定系数增加后门任务的影响。。。

模型投毒

在训练时企图训练出错误的参数再上传给中心服务器,从而影响模型的变化方向,减慢模型的收敛速度,甚至破坏模型的正确性,最终破坏模型的可用性。

恶意方对目标数据记录执行梯度上升法以提高预测目标类的误差,导致恶意梯度贡献于联合模型,在这种情况下,如果对手观测目标数据记录在本地训练的梯度变化较大,则说明目标数据记录属于该联邦成员的记录,从而参与方的隐私信息发生泄露。

对抗攻击
对抗攻击是指恶意构造输入样本导致高置信度的模型,却输出错误结果,
对手(恶意方) 在已有的背景知识下生成带有错误标签带的高置信度样本,称为生成对抗样本。
。。。。。。

隐私泄露
在联邦学习中,参与方间接地共享本地数据导致隐私泄露。
纵向联邦学习针对相同标志符的样本进行联合训练,这样导致在样本对齐过程中标志符以明文的方式共享,导致隐私信息的泄露。
此外,主动方共享了包含标签信息的中间结果给被动方,间接共享本地数据贡献于联合模型,导致共享梯度受到模型逆推攻击。

联邦学习的防御措施

1安全防御
针对对手的恶意行为可选择采用梯度稀疏化和恶意梯度检测方法保护联邦学习系统的安全。

梯度稀疏化

梯度稀疏化是指参与方在每次通信回合期间选取一个梯度的丢失率以限制梯度的更新,
可以提高训练效率,降低恶意方的贡献影响,从而有效地阻止了参与方的恶意行为。

方法:
通过参与方随机筛选部分的梯度参数修改为0——存在一定局限性。
一种自适应 dropout率的梯度压缩方法,若 dropout参数大于给定阈值,则设置对应梯度参数的权重为0——面对网络瓶颈及计算资源不充足的问题存在部分重要特征不能贡献于联合模型。
eSGD 算法,参与方在某个通信回合期间筛选相较于上个通信回合误差降低的梯度参数——较小的梯度参数不能贡献于联合模型,通过多个通信回合的剩余梯度积累影响联合模型,不仅实现了联合模型的快速收敛,也降低了参与方恶意贡献的影响。

恶意梯度检测

恶意梯度检测是指任务发布者在每个通信回合期间检测恶意参与方的更新,从而去除或降低恶意方的贡献影响
需要参与方具有相同的本地模型结构,通过参与方之间贡献的对比去除恶意影响,因此这类方法只适用于横向联邦学习的场景。

方法:
根据恶意用户之间有较强的相似性来降低恶意梯度的贡献,通过减少对诚实方的惩罚——不仅降低了恶意方的影响,同时也保证了模型的收敛。
采用一种去中心化的区块链机制,任务发布者( 某个参与方) 利用其他方交互历史记录及交互时间频率进行评估而消除懒惰方及恶意方的贡献影响,实现诚实方之间训练模型。该方法通过公开透明的机制记录训练时间及模型的效果,从而有效区分恶意方、诚实方及懒惰方,同时也保证了模型收敛。

隐私保护
隐私保护是指个人或集体等实体不愿意被外人知道的信息得到应有的保护
而在联邦学习的训练过程中是指参与方的本地原始数据不能被泄露给对手。

参与方的梯度信息易受到模型逆推攻击导致隐私信息泄露,因此参与方扰动或加密共享的梯度信息,导致对手无法获得共享的原始梯度信息;
对于纵向联邦学习的场景,由于样本的标志符及中间结果的共享,导致标志符及标签的泄露。隐私保护方法:样本秘密对齐、标签保护、加密共享及扰动共享等

4.2.1 样本秘密对齐
在纵向联邦学习下,样本对齐是指识别出具有相同标志符的数据记录。
保证诚实且好奇的第三方秘密进行标志符字段的表达匹配。。。。。。

4.2.2标签保护
主动方共享中间结果给被动方,导致标签泄露。
Marvell方法:扰动梯度使正负样本出现的置信区间系统。

4.2.3加密共享
对梯度的加密算法要满足同态加密性质

4.2.4扰动共享
加噪方法,差分隐私技术在学习全局样本的有用信息时被应用于解决隐藏本地隐私的问题
对每次迭代的梯度添加噪声。。。。。

难题

(1) 参与方难题:
参与方是联邦学习的主要成员,也是联邦学习的基础,
目前最主要存在的是参与方激励以及参与方选择等难题。
激励难题:吸引更多的参与方,是限制联邦学习模型性能的提升的关键。
要建立一个完善的激励机制和分配机制,鼓励更多参与方的加入。

如何识别诚实但好奇的半诚实参与方以及恶意的参与方,如何选择合适的参与方等参与方选择。
目前的联邦学习方法,所有参与方都是无差别地参与到联邦学习中。需要研究一种可行且可信的诚实参与方识别算法,制定一个合适的筛选机制去除恶意方及懒惰方的贡献影响。

(2) 算力难题
在当今移动设备的算力下,仅有部分小运算量的算法如逻辑回归等可在设备端运行,但限制了主流的包含前后反馈过程的神经网络的实施。

设计高效加密算法提高训练效率

(3) 通信难题
在联邦学习的过程中,各方之间需要频繁交换加解密以及模型相关的数据,而协调方往往需要等待所有参与方的中间数据都返回后才能进行安全聚合或其他数据处理。

提高通信信道的质量和容量
信息传输受到高延迟网络制约,采用异步汇总方法提高训练效率
或从降低传输频率和减少每轮传输的信息量着手。
降低传输频率:减少梯度交换次数(可适当提高一次全局迭代中客户端本地优化的次数)
减少每轮传输的信息量:降低交换次数,进行适当的梯度压缩或者量化,减少通信占用的带宽

(4) 聚合难题
常见聚合方式: FedAvg 平均聚合以及 FedProx 异构聚合,但有损(相当于集中式模型)。
主服务器可异步地聚合各参与方的信息,提高参与方中途退出学习的应对能力。

目前也存在一些无损的联邦学习模型,如纵向联邦树模型 SecureBoost。

(5) 预测难题
在纵向联邦学习中,只有协调方得知的是整个联邦的结构,而参与方得知的是与其数据特征相关的子模型的结构。
因此在联邦预测过程中,需要协调方与参与方共同合作,才能预测出新样本的标签。
一旦某个参与方退出联邦,该方所掌握的子树结构也会随之消失,严重影响联邦预测过程。

(6) 中心方等待聚合难题
中心方需要等待所有的参与方模型信息返回后,才会进行新一轮的信息聚合。
陷入无限的等待过程中,严重影响模型聚合以及联邦学习的效率。

保证联邦学习效率和效果的中心方等待聚合的策略

系统异构难题:可采用异步通信提升系统的鲁棒性,提高系统的容错能力

数据异构难题:如何使优化算法更加兼容联邦学习实际使用中复杂的数据结构。
元学习和多任务学习
元学习:使各客户端本地模型学习独立但相关的模型,实现模型的个性化。

隐私预算问题
对梯度或中间结果扰动,保证数据安全,减少参与方的计算资源要求,但降低了模型的准确率

通信效率短板明显

网络中需要不断通信来交互模型更新信息,网络带宽负担大。通信传输效率变成限制训练速度的主要因素。
因素
联邦学习与分布式计算的区别:数据集来自各个终端用户,产生的数据特征呈现非独立同分布(Non-IID)。(Non-IID 指的是在概率统计理论中,各数据集中的随机变量不服从同一分布,即对于不同的客户端 i 和 j,它们的数据集概率分布Pi ≠P j 。)
传统的分布式框架算法处理 Non-IID 数据时会造成训练过程难以收敛、通信轮数过多等问题。
大量本地模型的更新、上传会导致中心服务器通信开销过大,无法满足正常的应用要求,相邻的模型更新中可能包含许多重复更新或者与全局模型不相关的更新。
优化通信效率的方案目标
减少每轮通信传输的数据大小;减少模型训练的总轮数。
方案方向:优化联邦学习框架算法、压缩模型更新和采用分层分级的训练架构。(一定限度上提升了联邦学习模型训练速度、减小了数据通信量)

现阶段仍然存在许多难以解决的问题。例如,优化算法在处理 Non-IID数据时相对于处理 IID 数据的时间开销成倍增长。。。。压缩算法会严重影响模型精度。

隐私安全仍有缺陷

源数据不出本地而仅交互模型更新(如梯度信息)→保护用户的敏感数据
真实环境中:模型反演攻击、成员推理攻击、模型推理攻击层出不穷,客户端动机与中心服务器的可信度都会影响隐私安全
研究表明,梯度信息会泄露用户的隐私数据,攻击者通过客户端上传的梯度信息间接推出标签信息和数据集的成员信息。(多个例子)

联邦学习受到的3方面威胁:
恶意客户端修改模型更新,破坏全局模型聚合
恶意分析者通过对模型更新信息的分析推测源数据隐私信息;
恶意服务器企图获得客户端的源数据。

而联邦学习与经典机器学习隐私保护技术的结合能够提供足够强的安全性,却会造成通信负担,
挑战:需要考虑平衡通信负担和模型安全。

缺乏信任与激励机制

联邦学习需要吸引客户端参与训练过程,但其未有高效的激励机制(保证模型质量)和针对客户端的信任机制(保证模型精度)。
学界结合区块链技术——形成一种安全。高度抗中断和可审计的方式记录模型更新,为系统框架提供可问责性和不可否认性。同时,区块链的激励机制作为一种经济回报能够根据构建模型时客户端的贡献给予相应的奖励。

研究进展

隐私保护技术:可以被用于联邦学习过程中,以此保证联邦内部各参与方的数据安全和隐私安全。
常见的隐私保护技术,常见有不经意传输(Oblivious Transfer,OT)、混淆电路(Garbled Circuit,GC)、秘密共享(Secret Sharing, SS)、隐私集合交集(Private Set Intersection,PSI)、差分隐私(Differential Privacy,DP),以及同态加密(Homomorphic Encryption,HE)等。

联邦学习的开源框架:
微众银行牵头提出的 FATE(Federated AI Technology Enabler)框架、百度牵头提出的 PaddleFL(Paddle Federated Learning)框架、谷歌牵头提出的 TFF(TensorFlow Federated)框架,以及 OpenMind 牵头提出的 Pysyft 框架。

框架 横向联邦学习 纵向联邦学习 联邦迁移学习 Kubernetes 树模型 联邦特征工程 联邦在线推理 支持的隐私保护算法
FATE 支持 支持 支持 支持 支持 支持 支持 同态加密隐私共享 RSA、DiffieHellman
PaddleFL 支持 支持 不支持 不支持 不支持 不支持 不支持 差分隐私
TFF 支持 不支持 不支持 不支持 不支持 不支持 不支持 差分隐私
Pysyft 支持 不支持 不支持 不支持 不支持 不支持 不支持 同态加密隐私共享

FATE 框架作为国内目前比较优秀的联邦学习开源框架,支持横向联邦学习、纵向联邦学习以及联邦迁移学习的实现,同时还提供了其他框架所没有的联邦特征工程算法、Kubernetes 容器化应用和联邦在线推理
相对而言,PaddleFL 仅支持横向和纵向的联邦学习。而 Pysyft 和 TFF 框架仅实现了横向联邦学习的支持。目前三者均缺乏联邦树模型算法的实现,如梯度提升决策树(Gradient Boosting Decision Tree,GDBT)和 SecureBoost。

针对通信效率:算法优化、压缩、分散训练

算法优化

算法优化是对分布式机器学习框架的改进,使该框架更适用于海量客户端、高频率、低容量、数据特征不均的联邦学习环境,实现通信轮数和模型更新数据的减少。

FedAvg算法

要求客户端在本地多次执行SGD 算法,然后与中心服务器交互模型更新,实现用更少的通信轮数训练出相同精度的模型。
对于非凸问题没有收敛保证,在非 IID 数据集上难以收敛
FedAvg 算法本身缺陷:
服务器端聚合时根据客户端数据量大小来分配相应的权重,这导致拥有大量重复数据的客户端能够轻易影响全局模型;
客户端仅执行 SGD 算法和执行固定次数的SGD 算法一定限度上限制了模型训练的速度。

MFL方案

在联邦学习的本地模型更新阶段使用动量梯度下降(MGD)
实验证明,在一定条件下该方案显著提升了模型训练的收敛速度。
迭代自适应的 LoAdaBoost 算法
通过分析客户端更新的交叉熵损失,调整本地客户端 epoch 次数,
相对于传统 FedAvg 算法固定 epoch,准确度与收敛速度均有显著提升。

CMFL算法

由于客户端上传的本地模型更新中含有大量的冗余和不相关信息,严重占用通信带宽
算法要求客户端筛选本地模型更新与上一轮全局模型的相关度,通过模型梯度正负符号相同的百分比来避免上传达不到阈值要求的本地模型更新,实现通信开销的降低,
不足:建立在客户端按照协议执行的基础上,系统的鲁棒性较弱。

BACombo 算法

利用 gossip 协议和 epsilon- greedy 算法检查客户端之间随时间变化的平均带宽,最大限度地利用带宽容量,进而加快收敛速度。

压缩

梯度压缩和全局模型压缩。
通常情况下,梯度压缩相比于全局模型压缩对通信效率的影响更大,(因为互联网环境中上行链路速度比下载链路速度慢得多,交互通信的时间主要集中在梯度数据上传阶段。)

。。。。相关算法

分散训练:

在联邦学习中,通信拓扑通常是星形拓扑,中心服务器的通信成本太大。
分散拓扑(客户端只与它们的邻居通信)可以作为一种替代方案。
在低带宽或高时延网络上运行时,分散拓扑被证明比星形拓扑训练速度更快。
联邦学习的分散拓扑先设定边缘服务器聚合来自客户端设备的本地更新,然后边缘服务器充当客户端的角色与中心服务器交互。

隐私安全

根据隐私保护细粒度的不同,联邦学习的隐私安全被分为全局隐私(global privacy)和本地隐私(local privacy)

全局隐私假定中心服务器是安全可信任的,即每轮通信的模型更新中心服务器可见。
对中间迭代过程和最终模型进行严格的加密保护十分重要。

本地隐私假定中心服务器同样可能存在恶意行为,因此本地模型更新在上传到中心服务器之前需要进行加密处理。

典型隐私保护技术:差分隐私、安全多方计算、同态加密等技术

信任与激励机制

结合区块链技术提供。
联邦学习与区块链的结合使系统成为一个完善的闭环学习机制。
一方面,联邦学习技术能够为具有隐私数据的参与方提供跨域安全共享方案;
另一方面,区块链技术作为核心数据库为参与方提供了安全存储、信任管理、细粒度区分和激励回报等应用需求,促使拥有数据的用户积极参与到数据联邦中。

研究热点

系统异构

The computing power, communication speed and storage capacity of each terminal device are different. The federated learning architecture usually limits the number of terminal devices participating in training. The federated
learning algorithm suitable for system heterogeneity must meet three requirements:
low client participation rate; Compatible with different hardware structures; able to tolerate midway exit of training equipment.

Statistical heterogeneity

The characteristics and volume of data may vary greatly between devices, resulting in non-IID distribution and non-equilibrium distribution of data.

Wireless communication

Wireless channels have limited bandwidth capacity, and an important consideration is the robustness of model updates in the presence of quantization errors. In addition to communication bandwidth, complex noise and interference in wireless communication are also factors that aggravate channel bottlenecks.
Therefore, it is of great research significance to develop a federated learning algorithm suitable for wireless communication

references

[1] Zhou Chuanxin, Sun Yi, Wang Degang, Ge Huawei. A Review of Federated Learning Research [J]. Journal of Network and Information Security, 2021, 7(05):77-92. [2] Liang Tiankai, Zeng Bi, Chen Guang
. Overview of Federated Learning: Concepts, Technologies, Applications and Challenges [J/OL]. Computer Applications: 1-13 [2022-07-23]. http://kns.cnki.net/kcms/detail/51.1307.TP.20211231.1727 .014.html
[3] Wang Jianzong, Kong Lingwei, Huang Zhangcheng, Chen Linjie, Liu Yi, He Anxun, Xiao Jing. A Review of Federated Learning Algorithms [J]. Big Data, 2020,6(06):64-82.
[ 4] Sun Shuang, Li Xiaohui, Liu Yan, Zhang Xing. A Review of Research on Federated Learning Security and Privacy Protection in Different Scenarios [J]. Computer Application Research, 2021, 38(12): 3527-3534. DOI: 10.19734/j.issn .1001-3695.2021.03.0157.

Guess you like

Origin blog.csdn.net/m0_51928767/article/details/125944723