FedAT: Federated Learning with Hierarchical Mechanism Update

文章链接:FedAT: A Communication-Efficient Federated Learning Method with Asynchronous Tiers under Non-IID Data

Presented in conference: SC'21 (International Conference for High Performance Computing, Networking, Storage, and Analysis) High Performance Computing, Architecture Top conference in the field, CCF-A

1. Background introduction

Federated Learning,FL involves training models on large-scale distributed devices while keeping local training data private. This form of collaborative learning requires new trade-offs that consider model convergence speed, accuracy, balance issues between clients, and communication costs.
New challenges include:

  1. The laggard problem, where clients lag due to data or (compute and network) resource heterogeneity,
  2. Communication bottlenecks, where a large number of clients transmit their local updates to a central server and bottleneck the server.
  • Most of the existing linear optimization methods only focus on one-dimensional trade-off space for optimization.
  • Existing solutions use asynchronous model updates or layer-based synchronization mechanisms to deal with stragglers.
    However, asynchronous methods can easily create network communication bottlenecks, while tiering can introduce bias by favoring faster tiers with shorter response latencies.

federated learning model

Traditional federated learning, a shared model learns from multiple distributed client federations under the coordination of a centralized server get. For security and privacy reasons, different clients in a FL deployment do not share data with each other. Each client uses its (distributed) local data to train a local model, while the centralized server aggregates the learning gradients of the local models to train the global model .

Insert image description here
FL typically involves a large number of clients withhighly heterogeneous hardware resources (CPU, memory and network resources) and Non-i.i.dData. The heterogeneity of data resources and data poses unique challenges to FL algorithms. Additionally, as the number of clients increases, how clients communicate with the server becomes an important design choice.


Comparison of communication methods

The mainstream communication methods includesynchronouscommunication (Federal Average, FedAvg) and asynchronousCommunication (FedAsync).

communication method advantage shortcoming
Synchronous communication High stability: Synchronous communication ensures that all participants are synchronized when updating the model and the data is consistent
Convergence guarantee: when the data distribution and loss function of the participants are consistent time, stable convergence
High communication overhead: synchronous communication needs to wait for all participants to complete calculations
Limited parallelism: all participants must wait for feedback from other participants
Asynchronous communication The system is robust: when there are stragglers in the system, the asynchronous communication system is more robust
Low communication overhead and high parallelism: the model can be updated independently without waiting.
There is a communication bottleneck: especially the server needs to update the model to the participating multi-party users
The convergence is unstable: the update order of the participants in asynchronous communication is uncertain

2. Content summary

This paper proposes a new federated learning method based on asynchronous layers:FedAT. FedATSynergistically combines synchronous intra-layertraining and asynchronous cross-layer a> times. 8.5 and the communication cost Reduced21.09% algorithm improves by to compress uplink and downlink communications to minimize communication costs. The results show that compared with the state-of-the-art FL algorithm, the prediction performance of the UseUse a straggler-aware, weighted aggregation heuristic to guide and balance training to further improve accuracy. training. By bridging synchronous and asynchronous training in layers, lag behind effects are minimized by improving convergence speed and test accuracy. FedATFedAT
FedATEncoded Polyline AlgorithmFedAT

FedAT

In order to overcome the shortcomings of the two communication methods, this article designedFedAT, using the layered mechanism to combine Synchronous and asynchronous for FL training.

InFedAT, clients are divided into different logical tiers based on their response latency (i.e., the time it takes for the client to complete a round of training). All logical layers in FedAT participate in global training at the same time, and each layer proceeds at its own speed. Clients within a single layer synchronously update the model associated with that specific layer, while each layer acts as a logical, coarse-grained training entity and asynchronously updates a global model. It can be simply summarized as "The same within the layers, but different between the layers".

Faster layers, with shorter response latencies per round, drive global model training to converge faster;
Slower layers participate by sending model updates asynchronously to the server to global training to further improve the prediction performance of the model.
Insert image description here
If asynchronously updated layer models are equally aggregated into the global model, it may lead to biased training (favoring faster layers), since better performing layers tend to outperform lower performing layers. Lower layers update the global model more frequently. To solve this problem, this paper proposes a newweighted aggregation heuristic algorithm that assigns higher weights to slower layers.

In addition, in order to minimize the communication cost caused by asynchronous training, FedATusesEncoded Polyline Algorithm to compress the model data transferred between the client and the server. In short, FedATwilllayered mechanism, asynchronous inter-layer model The four parts of updating , weighted aggregation method and Encoded Polyline Algorithm work together to achieve Maximize convergence speed and prediction performance while minimizing communication costs.


Experimental results

Insert image description here

The above table shows the prediction performance and variance results of test accuracy for all data sets. The best test accuracy is reported after each training process converges within the global iteration budget. For the 2-class CIFAR10 dataset, FedAT is 7.44% higher than the best baseline FL method FedAvg and higher than the worst baseline method FedAsync 18.78%.

Using the same layering scheme as TiFL,FedAT achieves higher accuracy than TiFL in all experiments. This is because (1) local constraints force the local model closer to the server model, and (2) the new weighted aggregation heuristic can attract discrete clients from slower tiers more efficiently, thus Get better prediction performance. has the closest prediction performance to TiFL because they both follow the same synchronous update strategy. On the other hand, performs the worst because it only aggregates weights from one client in a round and has no efficient way to deal with stragglers. FedATFedAvgFedAsync

The performance difference can also be clearly noticed from the convergence timeline plot shown in the figure. FedATconverges to the optimal solution faster than all other three compared methods.


3. Article summary

This paper proposes a new synchronous-asynchronous training model that maximizes prediction performance and minimizes communication costs. FedATIntegrated the following modules: (1) Hierarchical strategy for dealing with stragglers; (2) Asynchronous scheme to update the inter-layer global model to improve prediction performance; (3) A new weighted aggregation heuristic formula algorithm, which the FL server uses to balance model parameters from heterogeneous, distributed layers; (4) a compression algorithm based on multi-line segment coding to minimize communication costs.

This paper demonstrates that FedAT has provable performance guarantees. And the theoretical analysis was verified through experiments. Experiments show that compared with state-of-the-art FL methods, FedAT achieves the highest prediction performance, has the fastest convergence speed, and has high communication efficiency.

Guess you like

Origin blog.csdn.net/cold_code486/article/details/134171392