A Survey of Heterogeneous Federated Learning: Recent Advances and Research Challenges

Click the card below to follow the " CVer " official account

AI/CV heavy dry goods, delivered in the first time

Click to enter —> [Target Detection and Transformer] Exchange Group

Author: Ye Mang (Source: Zhihu, Authorized) | Editor: CVer Official Account

https://zhuanlan.zhihu.com/p/652910673

Reply in the background of the CVer public account: Heterogeneous federated learning, you can download this review pdf

论文题目:Heterogeneous Federated Learning: State-of-the-art and Research Challenges

Authors: Ye Mang (Wuhan University), Fang Xiuwen (Wuhan University), Du Bo (Wuhan University), Ruan Bangzhi (Hong Kong Baptist University), Tao Dacheng (Sydney University)

Published Journal: ACM Computing Surveys

Paper address: arxiv.org/abs/2307.10616

Project address: https://github.com/marswhu/HFL_Survey

1. Summary

Federated Learning (FL) has received increasing attention due to its potential in large-scale industrial applications. Existing federated learning work mainly focuses on the case of model isomorphism. However, practical federated learning usually faces the heterogeneity of data distribution, model architecture, network environment, and hardware devices among participant clients. Heterogeneous Federated Learning (HFL) is more challenging, and its solutions are diverse and complex. Therefore, a systematic review of the research challenges and recent progress on this issue is warranted. This paper first summarizes the various research challenges of HFL from five aspects: statistical heterogeneity, model heterogeneity, communication heterogeneity, device heterogeneity and additional challenges; in addition, it also reviews the latest progress of HFL research in recent years, and presents A new classification of HFL methods is carried out, and their strengths and weaknesses are deeply analyzed. Finally, several important future research directions of HFL are discussed in order to promote the further development of this field.

2. Introduction

Federated learning is a machine learning model that can be trained collaboratively without revealing privacy and keeping data decentralized. Existing federated learning work mainly targets the case of model isomorphism and has achieved great success, but it relies heavily on the assumption that all participants share the same network structure and have similar data distributions. However, in actual large-scale scenarios, there may be large differences between data distribution, model structure, communication network, and system edge devices, which brings challenges to the realization of federated collaboration. Federated learning related to these situations is called heterogeneous federated learning, where such heterogeneity can be divided into four categories according to the federated learning process: statistical heterogeneity, model heterogeneity, communication heterogeneity, and device heterogeneity.

ddf8ebcd4d024874fb186ad29163fa5a.jpeg

Figure 1 Heterogeneous federated learning

This review paper delves into statistical heterogeneity and model heterogeneity in federated communication, focusing on the importance of privacy protection and storage computing power in heterogeneous federated learning. This article consists of three main parts:

* First, a systematic summary of the research challenges is given, covering five aspects: statistical heterogeneity, model heterogeneity, communication heterogeneity, device heterogeneity, and additional challenges.

*Secondly, the current state-of-the-art methods are reviewed, and the advantages of these methods are discussed in depth in the context of a new taxonomy, which divides existing heterogeneous federated learning methods into three levels: data-level, model-level, and server-level and limitations.

* Finally, an in-depth outlook on open issues and future directions is provided.

41d862c4e2404e77b2c2e237fae1c8b8.jpeg

Figure 2 Thesis structure

3. Discussion on Heterogeneous Federated Learning Research Challenges

3.1 Statistical heterogeneity

Statistical heterogeneity refers to the inconsistent data distribution between clients in federated learning, which does not obey the same sampling, that is, non-independent and identical distribution (Non-IID). To explore statistical heterogeneity, we distinguish different categories of Non-IID data through four different skew patterns, including label skew, feature skew, quality skew, and quantity skew:

c74741f53ec1c4b171874b99890f1424.jpeg

Figure 3 Statistical heterogeneity

3.2 Model heterogeneity

In practical applications, due to individual needs or commercial considerations, the client may design a unique local model structure and is unwilling to disclose the details of the model design. Model heterogeneous federated learning aims to acquire knowledge from other models without sharing data or local model structure information. There are two cases of model heterogeneity:

Partial heterogeneity: some clients use the same model structure, others use a different structure. This can be handled by dividing clients into clusters with the same structure, and models within the same cluster can be aggregated directly, while model communication between different clusters requires special techniques such as knowledge distillation.

Complete heterogeneity: the network structure of each client is completely different. In this case, a unique model needs to be designed for each client, which may lead to high learning cost and low communication efficiency, since regular parameter aggregation or gradient operations cannot be performed.

3.3 Communication Heterogeneity

In actual IoT applications, devices are usually deployed in different network environments with different network connection settings (3G, 4G, 5G, Wi-Fi), which lead to inconsistent communication bandwidth, delay and reliability, that is, communication heterogeneity. During the communication process, the client may encounter different degrees of noise, delay or loss, which seriously reduces the communication efficiency. Communication heterogeneity, which is very common in complex IoT environments, may lead to costly and inefficient communication, thereby reducing the effectiveness of federated learning. Therefore, how to adaptively adjust federated communication in a heterogeneous network environment is worth studying.

3.4 Device Heterogeneity

In practical applications, federated learning networks may involve a large number of IoT devices. Differences in device hardware capabilities (CPU, memory, battery life) may result in differences in storage and computing capabilities, which inevitably creates device heterogeneity. Device heterogeneity poses several challenges to federated learning as follows. First, because different clients may have different computing speeds or resources, causing system lags or bottlenecks. Second, it introduces non-determinism and instability into the system, since different clients may have different device states. Therefore, this requires adaptive adjustment of feedback from different devices in large-scale federated learning scenarios.

3.5 Additional challenges

In addition to the aforementioned heterogeneity, this paper also discusses some additional challenges in heterogeneous federated learning, including knowledge transfer barriers and privacy leaks:

Knowledge transfer barrier: The goal of federated learning is to transfer knowledge among different clients to jointly learn models with superior performance. However, the aforementioned heterogeneous features lead to barriers to knowledge transfer to varying degrees. Therefore, how to achieve efficient knowledge transfer in heterogeneous scenarios is a problem that current research needs to focus on.

Privacy leak: Protecting the client's local data from leaking is the basic principle of federated learning. However, federated learning alone still has potential privacy risks or attacks on data privacy. Moreover, the heterogeneity of the above four types inevitably exacerbates the privacy leakage at different learning stages. For example, when clients implement federated learning by sharing model gradient updates, logical outputs, etc., attackers can infer clients' private information by injecting malicious data or models into the system, or analyzing their model gradients.

4. Overview of Heterogeneous Federated Learning Progress

This section reviews existing heterogeneous federated learning methods and divides them into three parts, namely, data level, model level, and server level methods: data level methods refer to smoothing the heterogeneity of local data at the data level or improving Data privacy operations such as data augmentation and anonymization techniques. Model-level methods refer to operations in model design, such as sharing local structures, model optimization, etc. Server-level approaches require server participation, such as participation in client selection or client clustering.

cec80e5ca58b4b6df256076019659ee7.jpeg

Figure 4 Classification of heterogeneous federated learning methods

4.1 Data Plane

4.1.1 Private data processing

f7a2c9f2bc59a03af097bf7aa50a955e.jpeg

Figure 5 private data processing method

Data preparation: Private data preparation in federated learning includes data collection, filtering, cleaning, enhancement, etc. These operations are designed to ensure the data quality and security of each client, thereby improving the efficiency and effectiveness of federated learning. For example, the Safe method clusters local data, then measures the distance between each sample and its cluster center, and finally filters and discards samples far away from the cluster center as toxic data; FAug is a data enhancement scheme using GAN , the client uploads some data samples of the target label to the server, and the server oversamples the uploaded data samples to train the conditional GAN. The client effectively enhances the statistical homogeneity of local data by using the received GAN to generate missing data samples.

Data privacy protection: In order to prevent the leakage of sensitive information such as commercial encryption and user privacy, people have studied data privacy protection methods at the local level. There are three main types: data encryption, perturbation, and anonymization. 1) Data encryption: Asad et al. applied homomorphic encryption to federated learning, enabling the client to encrypt its local model with a private key and then send it to the server. Therefore, the server can only obtain encrypted model parameters and cannot deduce any private information. 2) Data perturbation: Differential privacy (DP) is a commonly used method to protect the private information of clients by clipping and adding noise to local updates. The PLDP-PFL method allows each client to choose an appropriate privacy budget according to the sensitivity of its private data to achieve personalized differential privacy. 3) Data anonymization: Anonymize the data subject by deleting or replacing identifiable sensitive information. Choudhury et al. asked the client to convert the features of the local dataset into some random numbers or symbols, thereby desensitizing the original data.

4.1.2 External Data Utilization

083fca0834d18de4f4e08fd9d0d5ace2.jpeg

Figure 6 External data utilization method

Knowledge Distillation: This approach leverages knowledge from external data sources to improve federated performance. Clients generate soft labels for local data, and then use these soft labels as additional supervision for local updates. This approach enables clients with heterogeneous models to share information in a model-agnostic manner, thereby mitigating the impact of model heterogeneity. FAug and FedMD utilize federated distillation to learn knowledge from other clients, each client stores a local model output, and takes the average of all client local model outputs as the global output. FSMAFL adopts a federated communication strategy similar to FedMD, and innovatively adds a latent embedding adaptive module, which alleviates the impact of large domain gaps between public and private datasets.

Unsupervised representation learning: Since private data are usually difficult to label and involve high costs, unsupervised representation learning is discussed for learning general representation models while keeping private data decentralized and unlabeled. This approach achieves data distribution consistency and representation consistency across clients by utilizing techniques such as contrastive loss. It allows clients to generate local dictionaries and integrate these dictionaries into a global dictionary, thereby achieving model uniformity and mitigating the impact of statistical heterogeneity. For example, MOON and FedProc use contrastive learning to address statistical heterogeneity in federated learning. MOON corrects the update direction by introducing a model contrastive loss. FedProc treats global prototypes as global knowledge, and uses local network structure and global prototype contrastive loss to constrain the training of local models.

4.2 Model level

4.2.1 Federation Optimization

78c66930af081d59c9c72af3ef5f1072.jpeg

Figure 7 Federated optimization method

Regularization: Regularization is a technique that prevents overfitting by adding a penalty term to the loss function. This strategy reduces model complexity by dynamically estimating parameter values ​​and reduces variance by adding a bias term. Therefore, under statistical heterogeneity, many federated learning frameworks implement regularization to provide convergence guarantees during learning. For example, FedProx adds a proximal term on the basis of FedAvg to constrain the difference between the local model and the global model, thereby effectively increasing the stability of model training and accelerating model convergence; the FPL method makes sample embedding closer to the same field and category clustering prototype. Meanwhile, consistency regularization is introduced to align sample embeddings with homogeneous unbiased prototypes that do not contain domain information.

Meta-learning: Using previous experience to guide the learning of new tasks, so that machines can autonomously learn models for different tasks. When training on a new task, only fine-tuning on the initial model can achieve satisfactory learning performance using only a small amount of data. The personalization ability of meta-learning can solve the statistical heterogeneity problem in federated learning. For example, FedMeta, a federated meta-learning framework, maintains the algorithm on the server side and distributes it to the client for training, and then uploads the test results on the query set to the server for algorithm update.

Multi-task learning: By using shared representations or models to help a model that learns on a single task achieve joint learning of other tasks. If the local model learning of each client is regarded as a separate task, the idea of ​​multi-task learning can be applied to solve the federated learning problem. All participating clients collaboratively train their local models, which effectively mitigates statistical heterogeneity and produces high-performance personalized local models. MOCHA is a system-aware optimization framework for federated multi-task learning (FMTL), which tries to solve the high communication cost, disconnection and fault-tolerance problems of distributed multi-task learning. To address statistical heterogeneity and system challenges, MOCHA adopts the distributed optimization method COCOA and trains a unique model for each client.

4.2.2 Cross-model knowledge transfer

4eb9e6a31cdf0540e4a0272b478b3b9a.jpeg

Figure 8 Cross-model knowledge transfer method

Cross-model knowledge distillation: Its goal is to refine the knowledge distribution on the client, and then transfer the learned knowledge in a model-independent manner, which helps to achieve cooperation and knowledge transfer between different models in federated learning. For example, to perform federated learning with heterogeneous clients without relying on global consensus or sharing a common model, RHFL learns the knowledge distribution of other clients by aligning the model's feedback on unrelated public data.

Transfer Learning: Its goal is to apply knowledge learned in a source domain to a different but related target domain. In federated learning scenarios, clients usually belong to different but related domains and want to learn knowledge from other domains. So the goal of federated transfer learning is to transfer the knowledge learned on the client to the public server for aggregation, or to transfer the global consensus to the client for personalization. KT-pFL linearly combines the soft predictions of all clients through the knowledge coefficient matrix to identify mutual contributions of clients, thereby enhancing the collaboration among clients with similar data distributions.

4.2.3 Architecture Sharing

de4a0a285e75fc9886f0d9917e9cc662.jpeg

Figure 9 Architecture Sharing Method

Shared backbone network: In a heterogeneous scenario, the private data set of the client may be non-independent and identically distributed (Non-IID). To mitigate the negative impact of statistical heterogeneity, clients can share the backbone and only send the backbone to the server. But in order to meet individual needs, they can also design an independent personalization layer. For example, FedPer, which combines base layer and personalization layer for federated training of deep feed-forward neural network, effectively captures the personalized aspects of clients.

Classifier sharing: In order to handle heterogeneous data and tasks, some methods share classifiers instead of backbone networks. In this case, clients perform feature extraction through their own backbone network and share a common classifier for classification. For example, FedPAC reduces the inter-client feature variance by constraining each sample feature vector to be close to the global feature centroid of its category, and then the server performs optimal weighted aggregation on the client's personalized classification header.

Other Part Sharing: Some methods employ other part sharing strategies to adaptively share a subset of local model parameters according to local conditions (e.g., data distribution, computing power, etc.). Enhancing the applicability of federated learning among clients under different resource constraints can also avoid catastrophic forgetting. For example, HeteroFL allocates local models of different sizes according to the computing and communication capabilities of each client. The local model parameters are a subset of the global model parameters, which effectively reduces the amount of computing for local clients and alleviates the impact of communication heterogeneity and device heterogeneity.

4.3.2 Client Cluster Selection

65daaa554971c1e282029920b3419e8a.jpeg

Figure 10 The method on the server side

4.3.1 Client Selection

Client selection is usually performed by the server to sample data from clients with an even distribution of data. In addition, constraints such as network bandwidth, computing power, and local resources of different clients are also considered when formulating the selection strategy.

Some recent methods have focused on solving the bias caused by non-IID data, as well as the communication cost and efficiency problems caused by different hardware and network environments. These methods include deep reinforcement learning, estimation of class imbalance, correlation-based selection strategies, and methods for selecting clients based on resource and network environment constraints, among others. These strategies can improve the performance of federated learning in heterogeneous data and environments. For example, the Favor method innovatively defines the equipment selection of federated learning as a deep reinforcement learning problem, aiming to train an agent agent to learn an appropriate selection strategy; in FedSAE, the server estimates each The reliability of each device can be adjusted to adjust the training load of each client.

4.3.2 Client clustering

Satisfactory performance cannot be guaranteed for all client models when the entire federated system shares a single global model. Therefore, many methods perform personalized clustering for all clients by considering the similarity of data distributions, local models, and parameter updates across different clients. These strategies help clients better benefit from each other while reducing the interference of different data. For example, while estimating the client cluster identity, IFCA alternately optimizes the clustering model parameters through gradient descent to minimize the loss function.

Several recent approaches aim to improve the attack robustness of federated learning systems through client-side clustering. These methods can reduce computational cost while enhancing the robustness and flexibility of the federated framework. However, some methods may be vulnerable to backdoor attacks (including data poisoning and model poisoning) in the context of federated learning. In order to solve this problem, some methods have proposed more complex methods to detect and limit the update of the adversarial model, such as FLAME, which detects the update of the adversarial model by restricting the clustering strategy of the noise scale of the backdoor noise removal.

4.3.3 Decentralized communication

General federated learning algorithms rely on a central server, which requires all clients to trust a central authority. However, failure of this central authority could disrupt the entire federated learning process. Therefore, some algorithms adopt a decentralized communication method to realize peer-to-peer communication between various devices without relying on a central server.

Such as the BrainTorrent method, in each round, a client is randomly selected as a temporary server, and then cooperates with other clients who have completed model updates for collaborative updates. In this way, any client can dynamically initiate the update process at any time. The GossipFL and Combo methods design a distributed federated training algorithm based on the Gossip protocol to make full use of the bandwidth capacity between clients. These decentralized communication methods help reduce communication costs, increase system robustness, and provide greater flexibility. In addition, some methods take additional security measures, such as blockchain-based decentralized federated learning frameworks, to improve the security of decentralized federated learning.

5. Overview of the future direction of heterogeneous federated learning

5.1 Improving Communication Efficiency

Heterogeneous federated learning faces the following challenges in terms of communication efficiency: a large number of edge nodes increase computing costs and the required computing power and storage capacity; differences in network bandwidth may cause upload delays or even loss; differences in the size of privacy datasets may also lead to model updates Delay. Therefore, in practical application scenarios, it is necessary to ensure a good trade-off between communication efficiency and model accuracy. To improve communication efficiency and effectiveness, researchers have adopted various methods, such as reducing communication costs by limiting the consistency of local updates with global updates to avoid transmitting irrelevant updates to servers.

5.2 Federal Fairness

In the real world, heterogeneous federated learning faces security issues related to model fairness. Clients participating in collaborative learning may differ in their contributions to the collaboration, and this variance may be exacerbated by heterogeneity. Meanwhile, existing federated learning frameworks mostly ignore the contribution differences of participating clients in the collaboration process. There may also be some "free rider" actors who wish to learn from federated communications without contributing useful information. In addition, the global model obtained by joint training may be biased towards clients with large amount of data or frequent occurrence, and the overall loss function may implicitly favor or disadvantage some clients. As such, concerns about fairness will continue to increase as federated learning is practically deployed across more users and enterprises.

5.3 Comprehensive privacy protection

Privacy protection plays a key role in federated learning. Although the client maintains privacy by not sharing local data, private information may still be leaked to the server due to the memory of model and gradient updates, and information feedback. In response to these problems, existing studies have proposed privacy protection technologies such as differential privacy. However, in actual heterogeneous scenarios, different clients or data samples have different privacy concerns, and a more stringent and flexible privacy constraint strategy needs to be established. Another challenge is privacy issues when dealing with biometric data, such as raw data anonymization and feature template protection. Existing frameworks, while adopting a data-local policy, still cannot completely prevent data access and analysis.

5.4 Attack Robustness

Federated learning faces two main types of attacks: poisoning attacks and inference attacks, which pose significant threats to the system.

Poisoning attack: aims to make the learning direction of the model deviate from the original goal, including data poisoning and model poisoning. Data poisoning means that attackers destroy the integrity of training data through label flipping and backdoor insertion, thereby deteriorating model performance. Model poisoning changes the learning direction of the model by disrupting client updates. In the multi-layer distributed federated learning scenario, the risk of attack is more serious, because the attacker may attack more intermediate nodes, so a stricter defense mechanism is required.

Inference attack: Aims to infer information about private user data, thereby compromising user privacy. For example, during parameter transfer, a malicious client can infer other clients' sensitive data based on the difference in gradient parameters in each round. Moreover, in the multi-layer decentralized joint learning scenario, there are more attacks on intermediate nodes, which may face more serious malicious attacks and require stricter defense mechanisms.

5.5 Uniform benchmarks

Due to the relatively short development time of federated learning, there is currently a lack of widely recognized benchmark datasets and benchmarking frameworks in heterogeneous scenarios. The heterogeneous benchmark framework offers a variety of possible client data distributions and model structures. In addition, the statistical and model differences of different clients can verify the generalization ability of heterogeneous federated learning algorithms to a certain extent. In addition, it is necessary to develop systematic evaluation indicators to fairly and comprehensively evaluate the security, convergence, accuracy, and generalization capabilities of different algorithms. It is necessary to establish a more realistic dataset that includes a wide range of machine learning tasks to Facilitating the development of federated learning.

For more details, please read

Paper address: arxiv.org/abs/2307.10616

Project address: https://github.com/marswhu/HFL_Survey

Reply in the background of the CVer public account: Heterogeneous federated learning, you can download this review pdf

Click to enter —> [Image Segmentation and Transformer] Exchange Group

ICCV/CVPR 2023 Paper and Code Download

 
  

Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

后台回复:ICCV2023,即可下载ICCV 2023论文和代码开源的论文合集
图像分割和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-图像分割或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer、NeRF等。
一定要备注:研究方向+地点+学校/公司+昵称(如图像分割或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号

It's not easy to organize, please like and watch

Guess you like

Origin blog.csdn.net/amusi1994/article/details/132595276