Decentralized federated learning thought

Decentralized federated learning is a decentralized machine learning method that protects user privacy. Compared with centralized federated learning, decentralized federated learning pays more attention to protecting user data privacy, and is also more scalable and robust.

In decentralized federated learning, each device uses local data for model training and sends model updates to surrounding neighbor devices. These neighboring devices can receive updates and conduct model training based on their own local data. This process is repeated multiple times until the models for all devices reach a state of convergence.

Compared with other federated learning methods, decentralized federated learning does not require a centralized server to coordinate communication between devices, thus greatly reducing communication overhead and single point of failure risk.

Algorithm formula
Assume that there are nnn devicesD 1 , D 2 , . . . , D n D_1, D_2, ..., D_nD1,D2,...,DnJointly participate in federated learning tasks. Our goal is to learn a global model θ \thetaθ , such that each deviceiii are able to use local data for model inference.

In decentralized federated learning, first each device initializes a local model θ i \theta_iii. Each iteration consists of the following steps:

Choose a random subset S t ⊆ 1 , 2 , . . . , n S_t \subseteq {1,2,...,n}St1,2,...,n as a communication device group.
For each devicei ∈ S ti \in S_tiStCalculate the gradient ∇ θ i J ( θ i ) \nabla_{\theta_i} J(\theta_i) using local dataiiJ(θi) , among whichJ ( θ i ) J(\theta_i)J(θi) is the local modelθ i \theta_iiiloss function.
For each device i ∈ S ti \in S_tiSt,if ∇ θ i J ( θ i ) \nabla_{\theta_i} J(\theta_i)iiJ(θi) to all neighboring devicesj ∈ N ij \in N_ijNi, where N i N_iNiIndicates the device iiThe set of neighboring devices of i .
for each deviceiii , updated mainland modelθ i \theta_iii,impose θ it + 1 = θ it − η ⋅ 1 ∣ S t ∣ ∑ j ∈ S t ∇ θ j J ( θ j ) \theta_i^{t+1} = \theta_i^t - \eta \cdot \frac{1}{|S_t|}\sum_{j\in S_t}\nabla_{\theta_j}J(\theta_j)iit+1=iittheSt1jStijJ(θj) , among whichη \etaη is the learning rate.
Up to the models of all devicesθ 1 t + 1 , θ 2 t + 1 , . . . , θ nt + 1 {\theta_1^{t+1},\theta_2^{t+1},...,\theta_n ^{t+1}}i1t+1,i2t+1,...,int+1After all updates are completed, enter the next round of iteration.
In decentralized federated learning, each device only communicates with its neighbors, so the communication overhead is small. Furthermore, by randomly selecting groups of devices for communication in each iteration, the randomness and stability of the learning process can be increased. However, since the model update of each device is only based on the information of its neighbor devices, there may be a problem of model divergence or failure to converge.

The main advantages of decentralized federated learning algorithms are better protection of user privacy and are generally more scalable and robust than centralized federated learning.

Here are some references about decentralized federated learning:

《Communication-Efficient Learning of Deep Networks from Decentralized Data》

《Federated Learning: Strategies for Improving Communication Efficiency》

《Towards Federated Learning at Scale: System Design》

《Decentralized Federated Learning: A Segmented Gossip Approach》

《A Comprehensive Survey on Federated Learning》

《Federated Learning with Non-IID Data》

The above papers have carried out more detailed research and introduction on decentralized federated learning, which can be used as a reference for further understanding of this field.

Guess you like

Origin blog.csdn.net/AdamCY888/article/details/129724336