[Literature Reading] Oort: Efficient Joint Learning by Guiding Participants' Selection

        Existing works randomly select FL participants, which leads to poor models and efficiency. This paper proposes Oort, which improves joint training and testing performance by guiding participant selection.


Summary

        In order to improve the accuracy performance in model training, Oort prioritizes the use of customers who also have data that can play a big role in improving model accuracy and running training quickly. To enable FL developers to interpret their results during model testing, Oort enforces the distribution of participant data.


introduction

        Not all clients can run FL training or testing at the same time, may have heterogeneous data distribution and system capabilities, and may result in wasted work and poor performance. Thus, a fundamental problem in practical FL is to select a "good" subset of clients as actors, each of whom processes its own data locally and only their results are collected in a (logically) centralized coordinator and summary.

        Although random participant selection is easy to deploy, unfortunately it leads to poor performance for joint training due to large heterogeneity in device speed and/or data characteristics. Worse, random selection of participants may bias the test set, losing confidence in the results. Therefore, developers often need more participants.

        The authors have integrated Oort with PySyft and evaluated it with real workloads in various FL tasks.

        The main contributions of the article are as follows:

  1. When selecting FL participants, the article highlights the tension between statistics and system efficiency , and proposes Oort to effectively trade off.
  2. We propose a participant selection algorithm to improve the time-accurate performance of training and to perform scalablely on the developer's FL test criteria.
  3. We implement and evaluate these algorithms at Oort at scale, showing statistical and system performance improvements over the state-of-the-art.

        Since there is a review content in the middle, the current challenges and research status of federated learning are written in detail, so I will skim until the main content of the article.

        The image above shows:

  • Even with the same number of participants , random selection can lead to significant deviations of the data from the target distribution ;
  • While this bias decreases as the number of participants increases , it is important to quantify how it varies with the number of participants, even if we ignore the cost of expanding the participant set.

Oort at a Glance

        This section outlines how Oort fits into the FL life cycle to help readers understand the subsequent chapters.

         The diagram above shows how Oort interacts with developers:

  1. Job submission: The developer submits to the FL coordinator in the cloud and specifies the participant selection criteria
  2. Participant selection: The coordinator asks clients for qualifying attributes (e.g. battery level) and forwards their characteristics (e.g. liveness) to Oort . Oort selects a participant and notifies the coordinator of this participant selection (2b).
  3. Execution: The coordinator distributes relevant profiles (e.g., models) to these actors, and then each actor independently computes results (e.g., model weights in training) based on its own data.
  4. Aggregation: When participants complete computations, the coordinator aggregates updates from participants.

Oort interface

        Oort uses two different selectors that developers can access through client libraries during FL training and testing. are the training selector and the test selector, respectively.

        The former aims to improve the temporal accuracy performance of joint training. To this end, it captures the utility of clients in training and efficiently explores and selects high-utility clients at runtime.

        The latter supports two types of selection criteria. When individual customer data characteristics (e.g., categorical distributions) are not provided, the test selector determines the number of participants required to limit participants' data bias from the global. Otherwise, it picks participants to meet the developer's exact request for data while minimizing testing time.

Oort Challenge

        Similarly, Oort also has the following technical challenges:

  1. In each round, how to determine which customers' data will contribute to the statistical efficiency of training, while respecting customer privacy?
  2. How to consider the customer's system performance to optimize the overall system efficiency ?
  3. How to explain the fact that we did not provide all clients with up-to-date evaluations during training.

How to do Adaptive Participant Selection

        First, the author defines a formula to illustrate the "utility" of a customer:

         This formulation assumes that all samples of customers are processed in this training round. B_i is the training sample that each client  i has locally stored, T is the developer-preferred duration of each round, t_i and is the time required for client  i training, which has been collected by the current coordinator from past rounds. \mathbb{I}(x)is an indicator function that  x takes the value 1 if true and 0 otherwise. The utility of clients that may become a bottleneck for the speed required for the current round will be  \alpha penalized by a factor specified by the developer.

        We need to solve some practical problems in order to select the participants with the highest utility in each round of training:

  1. The utility of a client can only be determined after it has been trained ; how do you choose from a large pool of clients without having to try them all?
  2. Since not every customer participates in every round, how do you explain the change in a customer's utility since the last participation ?
  3. How to be robust against outliers in the presence of corrupted clients ?

        The article then presents the pseudocode of the program chosen by the participants and the "exploration and development" section:

         Selecting players from a pool of customers can be modeled as a multi-armed bandit problem , where each customer is the bandit's "weapon" and the utility gained is the "reward". Compared to complex designs (e.g., reinforcement learning), the bandit model is scalable and flexible even if the solution space (e.g., the number of clients) changes dramatically over time.

        (I don't really understand from here, I guess I have to see what the bandit problem is).

        Line 6: At the beginning of each round of selection, Oort receives feedback from the previous round of training and updates the client's statistical utility and system performance. For robustness, Oort(i) removes customers from the selection after a given number of rounds of selection, which helps remove outliers in terms of participation.

        Lines 9-16: For explored clients, Oort computes their client utility and narrows down the selection by exploiting high-utility actors. At the same time,  \varepsilon (\in [0,1]) a subset of participants in the Alter sample explores previously unselected potential participants (line 16), which will make  \varepsilon \rightarrow 1.

        Note that, motivated by the confidence interval used to measure the uncertainty in the bandit's reward, the code introduces an incentive term that is the same size as the bandit's confidence in the solution to account for its staleness (line 10), such that if Customers have been neglected for a long time, and we will gradually increase their utility. Therefore, those customers who have accumulated high utility since the last trial can still be reused. By performing probabilistic participant selection in a pool of high-utility clients, the chances of selecting outliers are significantly reduced at client sizes in FL.


model evaluation

        The article evaluates the effectiveness of Oort on four different ML models on four CV and NLP datasets.

  • Oort outperforms existing random participant selection by 1.2×-14.1× in time accuracy performance , while improving final model accuracy by 1.3%-9.8%

  • Oort achieves near-optimal model efficiency by adaptively trading off the statistical and system efficiency of different components .

  • Oort outperforms similar frameworks across a wide range of parameters and experiments of different scales , while being robust to outliers .

Guess you like

Origin blog.csdn.net/m0_51562349/article/details/128125510