Federated Learning--Record

Introduction

Federated Learning (Federated Learning) is an emerging artificial intelligence basic technology. Its design goal is to ensure information security during big data exchange, protect terminal data and personal data privacy, and ensure compliance with laws and regulations. Or carry out efficient machine learning between multiple computing nodes. Among them, the machine learning algorithms that can be used in federated learning are not limited to neural networks, but also include important algorithms such as random forests. Federated learning is expected to be the basis for next-generation artificial intelligence collaborative algorithms and collaborative networks.

The origin of federated learning

In neural networks, the more training data a network has, the better its training effect will be. So if one company has data and another company has data, wouldn't it be better to share the data of the two companies and use them together? However, can user data be used and forwarded at will? the answer is negative. So is there any way for the two companies to use user data without leaking users?

In 2016, in order to solve the problem of Android system update. Google proposed that neural network training can be deployed on users' mobile phones, and only the trained model parameters need to be uploaded without uploading user data, which ensures the privacy of personal data to a certain extent. This is the core idea of ​​federated learning.

System architecture of federated learning The system architecture
of federated learning is introduced by taking a scenario involving two data owners (namely enterprises A and B) as an example. The framework can be extended to scenarios involving multiple data owners. Suppose companies A and B want to jointly train a machine learning model, and their business systems each have relevant data about their users. In addition, Enterprise B also has the labeled data that the model needs to predict. For data privacy protection and security considerations, A and B cannot directly exchange data, and a federated learning system can be used to build a model. The architecture of the federated learning system consists of three parts.
insert image description here

1. Alignment of encrypted samples

Since the user groups of the two companies do not completely overlap, the system uses encryption-based user sample alignment technology to confirm the common users of both parties without disclosing their respective data, and does not expose users that do not overlap each other, so as to combine these User characteristics are modeled.

2. Encrypted model training

Once a common user group is identified, the data can be used to train machine learning models. In order to ensure the confidentiality of the data during the training process, it is necessary to use a third-party collaborator C for encrypted training. Taking the linear regression model as an example, the training process can be divided into the following four steps:

Step ①: Collaborator C distributes the public key to A and B to encrypt the data that needs to be exchanged during the training process.

Step ②: A and B interact in an encrypted form to calculate the intermediate results of the gradient.

Step ③: A and B calculate based on the encrypted gradient value, while B calculates the loss based on its label data, and summarizes the results to C. C calculates the total gradient value by summarizing the results and decrypts it.

Step ④: C returns the decrypted gradient to A and B respectively, and A and B update the parameters of their respective models according to the gradient.

Iterate the above steps until the loss function converges, thus completing the entire training process. During the sample alignment and model training process, the respective data of A and B are kept locally, and the data interaction during training will not lead to data privacy leakage. Therefore, the two parties are able to achieve cooperative training models with the help of federated learning.

3. Effect incentive

A major feature of federated learning is that it solves the problem of why different institutions join the federation to jointly model, that is, after the model is established, the effect of the model will be shown in practical applications and recorded in a permanent data recording mechanism (such as blockchain) superior. Institutions that provide more data will obtain better model effects, and the model effect depends on the contributions of data providers to themselves and others. The effects of these models will be distributed to various institutions for feedback on the federation mechanism, and will continue to motivate more institutions to join this data federation. The implementation of the above three parts not only considers the privacy protection and effects of joint modeling among multiple institutions, but also considers a consensus mechanism to reward institutions that contribute more data. Therefore, federated learning is a "closed-loop" learning mechanism.

Federated Learning Advantage

Data isolation, data will not be leaked to the outside, meeting the needs of user privacy protection and data security;
it can ensure that the quality of the model is not damaged, there will be no negative migration, and the effect of the federated model is better than that of the split independent model; the
participants are equal and can To achieve fair cooperation;
to ensure that all parties involved can perform encrypted exchange of information and model parameters while maintaining independence, and grow at the same time.

Federated Learning (aka Federated Machine Learning) can be divided into three categories:

  • Horizontal Federated Learning
  • Vertical Federated Learning
  • Learning) Federated Transfer Learning
    insert image description here

Horizontal Federated Learning

A horizontal row of the data matrix (which can also be a table, for example, an Excel table) represents a training sample, and a vertical column represents a data feature (or label). It is usually better to use a table to view data (for example, case data), and use a row to represent a training sample, because there may be many pieces of data.

Horizontal federated learning is suitable for the situation where the data features of participants overlap more, but the sample ID overlaps less.

For example, there are two banks in different regions, and their user groups come from their respective regions, and the mutual intersection is very small. However, their businesses are similar, so the recorded user characteristics are the same. At this point, we can use horizontal federated learning to build a joint model.

The word "horizontal" comes from the "horizontal partitioning (aka sharding)" of data. As shown in Figure 1, federated learning is performed by combining multiple rows of samples with the same characteristics from multiple participants, that is, the training data of each participant is divided horizontally, which is called horizontal federated learning. Figure 2 shows an example of a horizontally partitioned table. Horizontal federation increases the total number of training samples.

Horizontal federated learning is also called Feature-Aligned Federated Learning (Feature-Aligned Federated Learning), that is, the data features of participants in horizontal federated learning are aligned, as shown in Figure 3. The name "feature-aligned federated learning" is a bit long, and it is better to use "horizontal federated learning".

The learning process of horizontal federated learning:

  1. Participants calculate training gradients locally, encrypt gradient updates using encryption, differential privacy or secret sharing techniques, and send encrypted results to the server;
  2. The server aggregates the gradients of each user to update the model parameters without knowing information about any of the participants;
  3. The server sends the aggregated result model back to the participants;
  4. Each participant uses the decrypted gradients to update their respective models.

Vertical Federated Learning

Longitudinal federated learning is suitable for the situation where participants have more overlapping training sample IDs but less overlapping data features.

For example, there are two different institutions, one is a bank in a certain place, and the other is an e-commerce company in the same place. Their user groups are likely to contain most of the residents in the area, so the intersection of users is relatively large. However, since banks record users' income and expenditure behaviors and credit ratings, while e-commerce companies keep users' browsing and purchase histories, their user characteristics overlap less.

The word "vertical" comes from the "vertical partitioning" of data. As shown in Figure 4, federated learning is performed by combining different data features of the common samples of multiple participants, that is, the training data of each participant is divided vertically, which is called vertical federated learning. Figure 5 shows an example of a vertically partitioned table. Vertical federated learning needs to do sample alignment first, that is, to find out the common samples owned by the participants, which is also called "entity resolution (aka entity alignment)". It only makes sense to combine the different features of the common samples of multiple participants for longitudinal federated learning. Vertical federation increases the feature dimension of training samples.

Vertical federated learning is also called sample-aligned federated learning (Sample-Aligned Federated Learning), that is, the training samples of participants in vertical federated learning are aligned, as shown in Figure 6. The name "Federal Learning of Sample Alignment" is a bit long, and it is better to use "Vertical Federated Learning".

The learning process of vertical federated learning:

Step 1: Alignment of third-party C encryption samples. Do this at the system level, so non-cross-users are not exposed at the enterprise-aware level.

Step 2: Align samples for model encryption training:

  1. Collaborator C creates an encrypted pair and sends the public key to A and B;
  2. A and B respectively calculate the intermediate results of the features related to themselves, and encrypt and interact with each other to obtain their respective gradients and losses;
  3. A and B respectively calculate their encrypted gradients and add masks to send to C, while B calculates the encrypted loss and sends to C;
  4. C decrypts the gradient and loss and sends it back to A and B, and A and B remove the mask and update the model.

Federated Transfer Learning

Federated transfer learning is suitable for situations where participants' training sample IDs and data feature overlaps are small.

We do not split the data, but use transfer learning to overcome insufficient data or labels. This approach is called federated transfer learning.

For example, there are two different institutions, one is a bank in China, and the other is an e-commerce company in the United States. Due to geographical restrictions, the intersection of the user groups of the two institutions is very small. At the same time, due to the different types of institutions, only a small part of the data characteristics of the two overlap. In this case, in order to carry out effective federated learning, transfer learning must be introduced to solve the problem of small unilateral data and few labeled samples, so as to improve the effect of the model.

Algorithm introduction:

insert image description here
The data calculation here is performed by the worker node. The server will send parameters to make the worker node have an initial parameter. The worker node calculates its own data and sends the gradient to the server. Gradient descent is performed by the server. In the process of data transmission, the space complexity is: the number of parameters.
Work content of worker node:

  • receive parameters from sever;
  • using paramters and local data tocompute gradients
  • send gradients to server

The work content of the server:
receive

  • gradients from every worker node;
  • compute sum(g1, g2, g3, ……gm);
  • Updating model parameters, directly using the method of gradient descent;
  • send parameters to every worker node;

The difference between federated learning and traditional distributed learning:

(1) Users have absolute control over their equipment and data, and users can stop computing and participate in communication at any time. In traditional distributed learning, the worker node is completely controlled by the server.
(2) The devices participating in federated learning are often unstable and have different computing power.
(3) The communication cost of federated learning is high and the communication volume is large. (We need to reduce the number of communications)
(4) The data participating in federated learning is not independent and identically distributed, which is not conducive to algorithm design, because the data of each user is different.
(5) The node load of federated learning is unbalanced, the data volume of each user is different, it is difficult to assign weights, and the modeling is complicated. The calculation time is different.

Related research directions:

(1) Reduce the number of communications: the core point is that the worker node calculates more and communicates less.

Here is a new algorithm that is somewhat different from the above algorithm. called federated averaging algorithm

Work content of worker node:

  • receive parameters from sever;
  • using paramters and local data to compute gradients;
  • local update parameters; loop several times
  • send new_parameters to server;

The work content of the server:

  • receive new_parameters from every worker node;
  • compute W = sum(p1,p2, p3, ……pm);
  • parameters = 1/m ( W ) can be weighted average or direct average;
  • send parameters to every worker node;

**Advantages: **With the same number of communications, FedAvg can converge faster than the original algorithm (Grad Desent).
**Disadvantages:** Let the mobile device calculate the same amount of data, FedAvg is slower than the original Grad Desent gradient descent, and the convergence is slower.
**Summary:** To reduce the amount of communication at the expense of the calculation of the worker node. However, since the communication cost of federated learning is large and the calculation cost is small, it has a certain effect.

(2) Privacy protection

Although the data is still stored in the mobile phone, we transmit the gradient and parameters. Is it really data protection?
Let's take a brief look at the gradient descent algorithm:

loss = 1/2 (y^ - y) ^2
gradient = ( xi*W - yi ) xi
So the gradient is a simple transformation of x, and the information of x is basically contained in the gradient. So the original data can be reversed.
So is it possible to do the following?
insert image description here
So how to protect privacy?

  • Adding noise is not a good way. If you add too little, it will not work. If you add too much, the data will be destroyed, and the test accuracy will decrease.
  • There is still no better way.

(3) Enhance the robustness of federated learning (make the model resistant to Byzantine errors and malicious attacks.)

Byzantine error:
A traitor appeared in the worker node. His data and labels may have been tampered with, and his abnormality caused the collapse of the entire neural network.
attack1: data poisoning attack; poison attack, do some tricks to the data
attack2: model poisoning attack; model attack, for distributed learning. The direct approach is that the samples and labels are wrong.
These attacks can slow down model convergence, decrease accuracy, or even leave a bug.

defense1: server check validation accuracy. Let the server check the accuracy of the uploaded data on the test set to judge whether it is a good data. (This is not a good way to judge, because the data in federated learning is not independent and identically distributed, and the server is not allowed to see user data)
defense2: server check gradient statistic. (Because the independent and identically distributed data are similar, the gradients will not be too different. Compare the gradient differences uploaded by different nodes. If the individual gradient differences are too large, it is considered that this node has defected. But the data is not independent and identically distributed. Not particularly good either)
defense3: byzantine-tolerant aggregation; use a more stable method to integrate gradients;

Summarize

The name of horizontal federated learning comes from the "horizontal division" of training data, that is, the row (horizontal) division of data matrices or tables. Data in different rows have the same data characteristics, that is, the data characteristics are aligned.

The name of vertical federated learning comes from the "vertical partition" of the training data, that is, the column (vertical) partition of the data matrix or table. Data in different columns have the same sample ID, that is, the training samples are aligned.

Guess you like

Origin blog.csdn.net/qq_41318914/article/details/127718046