Federal learning (Federated Learning)

Learning About Federal
        Federal learning (Federated Learning) is an emerging artificial intelligence technology foundation, was first introduced by Google in 2016, was originally used to solve local problems in Android phones end users update the model, which is designed to protect large when the data exchange security information, protection of privacy and personal data terminal, the premise of ensuring legal compliance, to carry out efficient machine learning between multiple parties or multiple computing nodes. Wherein the machine learning algorithm federal study that can be used is not limited to the neural network, also it includes important algorithms random forests. Federal study is expected to become the basis for the next generation of artificial intelligence algorithms cooperative and collaborative networks.

Federal learning system architecture
       to include two data have parties (ie companies A and B) scenario as an example describes the system architecture of the federal study. The expandable frame comprising a plurality of data of the owner of a scene. Suppose that firm A and B want a joint training machine learning models, their business systems each own their own user data. In addition, the company also has a B model is to predict the label data. For data privacy and security reasons, A and B can not directly exchange data, you can use the federal model learning system. Federal learning system architecture consists of three parts, as shown in FIG.

 

       First part: encryption sample aligned. Since two companies user groups is not completely overlap, the system uses, a common user to confirm both the A and B at the respective private data is encrypted based on the premise user sample alignment techniques, and do not expose the user does not overlap each other, so that a combination of these wherein the user modeling. Part II: encryption model training. In determining the total user population, you can use these data training machine learning models. In order to ensure the confidentiality of the data in the training process, we need to use third-party collaborators C encrypt training. Linear regression model as an example, the training process can be divided into the following four steps (as shown): 

       The first step ①: C collaborators the public key to A and B, the data for the training process need to exchange encrypted.

       Step ②: between A and B in encrypted form interactively an intermediate result of the calculation of the gradient.

       Step ③: A and B are calculated based on the encrypted value of the gradient, while B is calculated according to its label data loss, and the summation result to C. C is calculated by summing the overall gradient values ​​of the decrypted result thereof.

       The first step ④: C are decrypted gradient back to the parameters A and B, A and B update their models based on the gradient.

      Loss of function iteration above steps until convergence, thus completing the entire training process. In the sample alignment and model training process, A and B are the data are retained in the local data exchange, and training will not lead to data privacy disclosure. Therefore, the two sides cooperate to achieve training model with the help of the federal study.

      Part III: Effect excitation. Features a large federal study is that it solves the problem of why different federal agencies to join the common model, namely model after model of the effect will be manifested in practical applications, and recorded in the persistent data recording mechanism (such as block chain) on. Miniature effect and more organizations provide data obtained will be better, the model results depends on the data provided contributor to themselves and others. Effects of these models on the mechanism will be distributed to federal agencies feedback, and continue to encourage more organizations to join the federal data. Implementation of the above three parts, taking into account both privacy and effect modeling across multiple agencies, but also takes into account a consensus mechanism to reward the contribution of data from multiple agencies. So, the federal learning is a learning mechanism "closed loop" of.

Federal learn the advantages of
       (1) data isolation, data will not leak to the outside, to meet user privacy and data security requirements;

       (2) to ensure the quality lossless model, not negative migration, to ensure that the federal model is better than the independent model the effect of fragmented;

       (3) status of participants, etc., to achieve fair cooperation;

      (4) to ensure that the parties involved in maintaining the independence of the exchange of information is encrypted with the model parameters, and at the same time to grow.

Federal learning classification
        for different data sets, the federal federal study divided into horizontal learning (horizontal federated learning), the Federal longitudinal study (vertical federated learning) and the Federal Migration Learning (Federated Transfer Learning, FmL).

 

       Federal learn more lateral overlap two features in the user data set, and when the user overlaps less, we set the data in accordance with a lateral (i.e., User Dimensions) segmentation, wherein both the same and remove the user and the user is not identical that part of the training data. This method is called transverse federal study. For example, there are two different areas of the bank, their user groups, respectively, from their respective regions, each intersection is very small. However, their operations are similar, and therefore, the user record is the same feature. At this point, we can use it to build a joint model of transverse federal study. Google in 2016 proposed a model for Android phones update the data federation modeling scenarios: when individual users use Android phones constantly uploaded to An Zhuoyun locally update the model parameters and parameter, so that the same feature dimensions of each data We have to enter into a joint model.

       Federal user longitudinal study in the two data sets overlap many overlapping and in the case wherein the user less, we set the data according to the longitudinal direction (i.e., feature dimensions) segmentation, and remove both the user and the user features are not exactly the same as that part of the training data. This method is called the Federal longitudinal study. For example, there are two different institutions, home is a place of bank, another is the same place the electricity supplier. Their user base is likely to contain most of the inhabitants of the land so that the user of a larger intersection. However, due to the behavior of the balance of payments is the user's bank records and credit rating, and electricity providers is to maintain user browsing and purchase history, so their users intersection of smaller features. Federal longitudinal study of these different features is to be polymerized in an encrypted state, to enhance the ability of the model. At present, the logistic regression model, tree model and neural network model and many other machine learning models have been shown to gradually build on this federal system.

        In the case of federal transfer learning from user to user features two sets of data are less overlap, we do not segmentation data, transfer learning and the use of the country to overcome the situation or insufficient data labels. This method is called the Federal Migration study. For example, there are two different bodies, one is located in China's banks, it is another electricity supplier in the United States. Due to geographical constraints, user groups the intersection of these two institutions is very small. Meanwhile, due to the different types of bodies, both feature data and only a small portion of the overlap. In this case, in order to be effective federal study, we must introduce transfer learning to solve data unilateral small-scale and small label sample questions, so as to enhance the effect of the model.

Federal study Source
1.https: //www.tensorflow.org/federated/

2.https://github.com/WeBankFinTech/FATE

 

Reference
[1] .https: //www.fedai.org/#/


----------------
Disclaimer: This article is CSDN blogger "huts" in the original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source and this link statement.
Original link: https: //blog.csdn.net/cao812755156/article/details/89598410

Guess you like

Origin www.cnblogs.com/tan2810/p/11772964.html