Federated Learning to explore prospects

Federated Learning lecture on the two days before listening to CCF YOCSEF held, feeling very productive, open a post record it.

Overall, things not very mature as a new area around Federated Learning to do is still very much down to the system / network / security, on the to AI / RL HCI can even be something out.

  

Part1, federal edge computing-based learning challenges and prospects

This is a cooperation project PolyU Guo Song teacher and Ali. Federal study is equivalent to the original in the cloud to do distributed ml moved to the edge device. With enhanced edge device count forces, Guo overall is very optimistic about this field. 

Guo describes some of the following issues do:

1. Heterogeneous

Read Google's Towards Federated Learning System Design essay paper students know FL is a need for multi-node synchronize (the model polymerization) process, but multi-nodes cooperation is quite difficult. In the following two points:

  • Communication isomer: limited bandwidth of the different nodes, and isomers (2G / 4G / 5G / WiFi / ...).
  • Heterogeneous calculated: various forces different operator nodes.

Because these heterogeneity exists, there will be the slowest node to the entire training longer and reduce the efficiency of the overall situation (we can only wait for it to complete the sync). One solution is to make the idea of computing and communications synchronization of overlap (that is, while computing and communications). There is dynamically adjusted according to the calculated batch size and power network conditions.

It is worth mentioning that the overlapping of ideas in SOSP19 PipeDream also has applications.

2. Heterogeneous statistics

Data on different nodes will be unevenly distributed on a different feature (Non-IID), the amount of data of different users / data quality is also very inconsistent, resulting in longer hours of training, but also reduce accuracy. But the analysis does not address this topic has a theory.

Wherein the data is reflected in the quality of definition: and the overall distribution is consistent / label about the accuracy of the subject. A simple measure of this part is to see how much data loss can make fall.

Some solutions include: overall and for the large gap between nodes, reducing its overall contribution (also control the batch size ...). One job is dynamically adjusted for each user batch size of reinforcement learning method.

3. safe and reliable

  • Malicious node attacks: for example, a malicious node configuration data (for example against the sample), resulting in the model is not reliable.
  • Privacy issues: training process and server nodes need to exchange gradient, but this process is likely the original data can crack.

For the first problem, in the conventional distributed ml, the gradient can be found from the server to malicious, then kick the node directly. However, there is another problem in the edge device environment: To reduce communication overhead model, sometimes quantized with similar ideas, transmitting only the more important parameters (dimension reduced). In this environment then we want to secure a bit harder. One solution is the thinning differential gradient.

Finally, Guo Looking to the future to do a bit of topic:

1. model compression and placement: how large model into limited resources to edge device.

This method which can be used include: 1) the common boundary ml + arch. Models quantization and pruning . 2) The model adaptive split and hardware (hardware architecture, the degree of parallelism, energy consumption, speed difference calculation). 3) Migration study and model reuse. Metric to focus on include accuracy, speed and energy consumption.

2. communication optimization. It is worth mentioning that the hierarchical structure can be divided into a plurality of layers, each layer separately sync. This idea and storage system that often mentioning a bit like multiple tier.

3. Secure Computing. I do not know this would skip the ...

4. incentives. To ensure that different users of contribution / data quality and user benefits are (calculated contribution of the user) fair, a little the meaning of BT download. There previously mentioned Non-IID.

  

 Part2, federal learning landing scenario in the industrial sector [0:53:00]

Here are the three presentation from industry.

FL system a need to focus on two industrial-grade topic: 1) Distributed ML.. This understanding has been relatively more. 2) Safety of parameter exchange. Exchanging model parameters by way of secure computing needs to ensure that no leakage of parameter information (industry generally use a specific port / VPN homomorphic encryption plus two is achieved). Furthermore, we need to ensure that the results can not be anti-launched the original data. Generally need to involve the following:

   

Part3、Privacy-preserving federated machine learning  [2:09:00]

Here it will involve some cryptography, differential privacy of content.

Although FL data sharing will not go out, but there are some risk on safety:

  • user1 and user2 the local model, there are some differences, can deduce some features of both the original data
  • Confrontation sample

FL security trade-off needs to be done in the following three areas: privacy / accuracy / efficiency. According to this scenario it is to set the industry while

  

Part4, federal incentives learning [2:34:00]

As mentioned earlier in this topic, NTU in Han teacher made this point more in-depth study.

1. assessment of the contribution of each node - impact evaluation 

The first scene is a study in the transverse federal, meaning that each background / feature participating nodes are the same (for example, all hospitals), need to focus on data quality differences on different nodes. One method is : the server can be put forward at a relatively high quality data set as a benchmark, when all the client nodes sent their local model for time sync, are running at the first local benchmark for each model, to obtain B [i ]. After the polymerization Global model after model back to the client, each client and then run again with the new model of local data, to obtain L [i]. B [i] and L [i] of each cross entropy can be used to assess how the data quality of the UE.

The second scenario is a longitudinal federal study, then each node will be relatively large difference (feature not the same, and even some simply no label), but they still want a model train together with everyone together. For example, use a different user's data to learn the user's credit history. Work is a joint assessment of the importance of each participant characteristics and importance (which feature more useful) with Sparse Group Lasso.

Another problem is to contain malicious nodes through incentives, you can refer to this review .

Another problem is to remove the influence of different sequence involved in the contribution calculation. A device such as data quality is not very good, but to participate in the earlier time, to enhance the global model of the more obvious (such as 40% -> 80%); then B is also involved in equipment, better data quality, although B, but room for improvement has not a global model (for example 80% -> 85%), it might have a false contribution to a and B in this case. Solution is to use a Data Shapley to disrupt the order of addition of the different nodes of a method, a wave is calculated at each different order, but this complexity is too high. A refinement is to assist a calculation blockchain.

2. How to design fair distribution of benefits programs interpretable

 

 

 

Topic1: existing privacy-enhancing technologies (homomorphic encryption, secure multi-party computation, the difference privacy) can directly address the needs of the new federal study of user data privacy? [3:00:00]

Topic2: existing indicators (accuracy, calculate power, storage, transport) and new privacy requirements can be effectively balance the federal study? [3:32:00]

 

Guess you like

Origin www.cnblogs.com/pdev/p/12563611.html