Paper notes Client-Edge-Cloud Hierarchical Federated Learning

Paper notes Client-Edge-Cloud Hierarchical Federated Learning

This paper proposes a new structure and HierFAVG algorithm for the large amount of communication overhead and edge calculation brought by FedAvg algorithm.

FedAvg is cloud-based federated learning. Each local client uses local data for training and uploads the trained model parameters to the cloud server. The number of participating clients can reach one million, but the overhead involved is also huge. The communication speed with the cloud server is slow and unpredictable, making the training process inefficient.

Based on edge federation learning, the parameter server is set to the closest edge, and the calculated delay is equivalent to the edge parameter server communication delay. The communication overhead is small, but the number of clients that can be connected is limited, resulting in an inevitable loss of training performance .

Based on the above two structures, a client-side hierarchical federation learning system is proposed.

HierFAVG algorithm

Each client is trained locally, and after the k1 round is reached, it is aggregated at the edge server to calculate the average weight of the parameters

After the edge server parameters are updated by k2 rounds, upload them to the cloud server for aggregation (at this time, the local client has performed k1 * k2 rounds of local training)

The paper proves the convergence of the algorithm

experiment

Considering the image classification task, the data sets MNIST and CIFAR-10 are used.

MNIST uses convolutional neural networks

Two non-IID cases

1) Edge IID: assign a sample of one class to each client, and assign different classes to 10 clients per edge. The data set between the edges is the IID.

2) Edge NIID: A sample of a class is assigned to each client, and a total of 5 class labels are assigned to each of the 10 clients on the Edge. The data set between edges is non-IID.

CIFAR-10

Use CNN with 3 convolutional blocks

in conclusion:

1) For the two non-IID data distributions, when we reduce k1, it takes less training times to achieve the expected accuracy, which means less local calculations are required on the device.

 

2) When the data set between the edges is IID and the communication frequency with the edge server is fixed, reducing the communication frequency with the cloud server will not slow down the training process.

Figure b For k1 = 60, increasing k2 will slow down the training process. In the edge-IID scenario, high-cost communication with cloud servers can be further reduced with little performance loss.

Published 36 original articles · 19 praises · 20,000+ views

Guess you like

Origin blog.csdn.net/GJ_007/article/details/104943759