[Literature Reading] Federated Learning: Challenges, Approaches and Future Directions

        This time I read a review of Federated Learning Federated Learning: Challenges, Methods, and Future Directions , authored by Tian Li et al.


1 Introduction

        The article starts with the increasing computing power of edge devices, coupled with concerns about the transmission of private information, leading to the fact that it is increasingly attractive to store data locally and push network computing to the edge. Then the concept of federated learning is introduced, which is consistent with many literatures.

        The article will discuss issues around the following aspects: smartphones, organizations (hospitals, etc.), Internet of Things.

1.1 Questions raised

 1.2 Core Challenges

Challenge 1: High communication overhead

        A critical bottleneck, the speed of communication in the network can be many orders of magnitude slower than the speed of local computation , two key aspects to consider are:

  • Reduce the total number of communication rounds;
  • Reduce the size of the messages sent each round.

Challenge 2: System Heterogeneity

This refers to the fact that the storage, computing, and communication capabilities of each device         in a joint network may vary due to variability in hardware (CPU, memory), network connectivity (3G, 4G, 5G, wifi), and power (battery power). different . Furthermore, network size and system-related constraints on each device typically result in only a small fraction of devices being active at the same time , e.g., hundreds of active devices in a million-device network. Each device may also be unreliable, and it is not uncommon for an active device to drop out at a given iteration due to connectivity or energy constraints . Therefore, a federated learning approach to development and analysis must:

  • Low participation expected
  • Tolerant of heterogeneous hardware
  • Robust against dropped devices in the network.

Challenge 3: Statistical heterogeneity (this aspect has been summarized before)

Challenge 4: Privacy Concerns

        Passing model updates throughout the training process may leak sensitive information to third parties or central servers, and while recent approaches aim to enhance the privacy of federated learning using tools such as secure multi-party computation or differential privacy, these methods often end at reduced model performance or Privacy is provided at the expense of system efficiency. Understanding and balancing these tradeoffs, both theoretically and empirically, is a considerable challenge in implementing private federated learning systems.


2. Dissertation research

        The authors argue that existing methods are often unable to fully handle the scale of federated networks, let alone the challenges of systematic and statistical heterogeneity. Although privacy is an important aspect of many machine learning applications, privacy-preserving approaches to federated learning can be difficult to rigorously assert due to the statistical variability of the data, and may be more robust due to system constraints on each device and potentially on large-scale networks. hard to accomplish.

2.1 Communication Efficiency

        critical bottleneck. Several general directions are pointed out, which we group into (1) local update methods, (2) compression schemes, and (3) decentralized training.

2.1.1 Local update

        It mainly points out the defects of the existing methods and the existing more effective methods, such as local update after multiple rounds of communication, which can greatly reduce the communication overhead by sacrificing a certain convergence time.

2.1.2 Compression scheme

        While local update methods can reduce the total number of communication rounds, model compression schemes such as sparsification, subsampling, and quantization can significantly reduce the size of messages communicated per round. However, low device participation, non-uniformly distributed local data, and local update schemes pose new challenges to these model compression methods.

        Some common and general compression methods are listed, such as using lossy compression and discarding to reduce server-to-device communication, applying Golomb lossless encoding, forcing model updates, becoming sparse and low-rank, and performing quantization using structured random rotation.

2.1.3 Decentralized Training (Decentralized Training)

        Decentralized topologies (where devices communicate only with neighbors) are briefly discussed as a potential alternative. Decentralized training has been shown to be faster than centralized training when run on low-bandwidth or high-latency networks. There are also studies that propose a layered communication pattern that further offloads the central server by first leveraging edge servers to aggregate updates from edge devices, and then relying on cloud servers to aggregate updates from edge servers.

2.2 System heterogeneity

        The following discussion is based on a star topology

2.2.1 Asynchronous communication

        Synchronization schemes are simple and guarantee a serially equivalent computational model, but they are also more susceptible to stragglers in case of device changes. Asynchronous schemes are an attractive approach to mitigate stragglers in heterogeneous environments, they typically rely on a bounded delay assumption to control the degree of staleness, which for device k depends on the number of updates since device k was pulled from the central server the number of other devices.

        While asynchronous parameter servers have been successful in distributed data centers, the classical bounded latency assumption may be unrealistic in federated settings, where latency may be on the order of hours to days, or completely unbounded.

2.2.2 Active Sampling

        In a federated network, typically only a small subset of devices participate in each round of training. The vast majority of federation methods are passive in that they are not designed to influence which devices participate. Another approach is to actively select participating devices in each round.

2.2.3 Fault Tolerance

        Fault tolerance has been extensively studied in the system community and is a fundamental consideration in classical distributed systems. Fault tolerance becomes even more critical when learning on remote devices, as it is common for some participating devices to drop out at some point before a given training iteration is complete. For example, devices from remote areas may be more prone to dropouts due to poor network connectivity, so the trained joint model will be biased towards devices with good network conditions.

        A practical strategy is to simply ignore such device failures, which may introduce bias in the device sampling scheme if the failed device has specific data characteristics. Encoded computation is another option to tolerate device failures by introducing algorithmic redundancy. Recent research explores the use of code to accelerate distributed machine learning training

2.3 Statistical heterogeneity (still those things)

2.4 Privacy issues

        Sharing other information, such as model updates, may also reveal sensitive user information. Since it was not the focus of my graduation project, I skimmed it.


3. Future of work

  • extreme communication scheme
  • Communication Reduction and the Pareto Front
  • asynchronous new model
  • diagnosis of heterogeneity
  • Granular Privacy Limits
  • Beyond Supervised Learning
  • productized joint learning
  • subject benchmark

Guess you like

Origin blog.csdn.net/m0_51562349/article/details/128070539