Introduction to Federated Learning Algorithm - FedAvg Detailed Case - Python Code Acquisition

1. Framework of federated learning system

Centralized Federated Learning Framework

Figure 1 Centralized federated learning framework

  The gradient information of each client is collected by the server, and then distributed to each client after aggregation and calculation, so as to realize the joint training model of multiple clients, and "the original data does not go out of the island", thus protecting the privacy of client data.
Decentralized Federated Learning Framework

Figure 2 Decentralized federated learning framework

  Assuming that the central party is curious, then the client broadcasts gradient information to other clients through a certain rule, and the client that receives the gradient information aggregates parameters and trains, and broadcasts new gradient information.

2. Federal averaging algorithm (FedAvg)

insert image description here

Figure 3 Federal averaging algorithm framework

Input: Initial value of global model parameters ω 0 \omega^0oh0 , the number of participantsnnn , batch sample sizeBBB , number of training roundsEEE , the proportion of participantsCCC , local model learning rateη \etaη , the number of samples mk m_kof each participantmk

Output: Global model parameters ω t + 1 \omega_{t+1} of the last iterationoht+1

1. The central server initializes the global model parameters ω 0 \omega^0oh0 , and transmitted to all participants.

2.对 t = 0 , 1 , 2 , ⋯ t=0,1,2, \cdots t=0,1,2, , iterate the following steps until the global model parameterω t + 1 \omega_{t+1}oht+1convergence.

  (1) The central server according to the proportion of participants C ∈ ( 0 , 1 ] C \in(0,1]C(0,1 ] , calculate the participation in thettthThe number of participants in t iterations:

M ← max ⁡ ( C × n , 1 ) M \leftarrow \max (C \times n, 1) Mmax(C×n,1)

  (2) The central server randomly selects MMM participants constitute the participant setS t S_tSt

  (3) 对 ∀ k ∈ S t \forall k \in S_t kSt, update the local model parameters by the following steps:

     [1] Use the received model parameters ω t \omega_tohtPerform model initialization ω t + 1 k ⟵ ω t \omega_{t+1}^k \longleftarrow \omega_toht+1koht

     [2] The data index set P k P_kPkAccording to the batch sample size BBB is divided into several batches, and the set composed of these batches is recorded asB k B_kBk. For each training j = 1 , ⋯ , E j=1, \cdots, Ej=1,,E , use∀ b ∈ B k \forall b \in B_kbBk, to update the local model parameters:

ω t + 1 k ⟵ ω t + 1 k − η ∇ F k ( ω ; b ) \omega_{t+1}^k \longleftarrow \omega_{t+1}^k-\eta \nabla F_k(\omega ;b) ;oht+1koht+1kηFk( oh ?b)

The updated local model parameters ω l + 1 k \omega_{l+1}^kohl+1ktransmitted to the central server.

  (4) The central server aggregates all parameters and transmits back to all participants
ω t + 1 = ∑ k = 1 M mkm ω t + 1 k \omega_{t+1}=\sum_{k=1}^M \frac {m_k}{m} \omega_{t+1}^koht+1=k=1Mmmkoht+1k
     m k m \frac{m_k}{m} mmkIt is the ratio of client k to the total samples of M clients participating in the training in the t+1 round of aggregation.

3. Federated Gradient Descent Algorithm (FedSGD)

When the number of training rounds E = 1 E=1E=1 , and the batch sample sizeBBWhen B is the total number of samples of the corresponding participants, the FedAvg algorithm degenerates into the FedSGD algorithm.

Using all samples respectively, for parameter ω l + 1 k \omega_{l+1}^kohl+1kPerform a gradient descent and calculate the parameter ω t + 1 k \omega_{t+1}^koht+1kForm:
gk = ∇ F k ( ω t + 1 k ) g_k=\nabla F_k\left(\omega_{t+1}^k\right)gk=Fk( oht+1k)

Or calculate the parameter update value
ω t + 1 k ⟵ ω t + 1 k − η gk \omega_{t+1}^k \longleftarrow \omega_{t+1}^k-\eta g_koht+1koht+1kthe gk

Among them η \etaη is the learning rate.

4. Differential privacy with federated gradient descent algorithm (DP-FedSGD)

insert image description here

Figure 4 DP-FedSGD algorithm framework

1. The clipping method of the parameter update amount ClipFn
has two clipping methods, namely horizontal clipping and layered clipping.

(1) Horizontal cropping. Record the parameter update amount vector as Δ \DeltaΔ , set the upper bound of its 2-norm asS ∈ RS \in RSR , namely:

π ( Δ , S ) =  def  Δ ⋅ min ⁡ ( 1 , S ∥ Δ ∥ ) \pi(\boldsymbol{\Delta}, S) \stackrel{\text { def }}{=} \boldsymbol{\Delta} \cdot \min \left(1, \frac{S}{\|\Delta\|}\right) \quad π ( D ,S)= def Dmin(1,∥Δ∥S)

(2) Layered cropping. For the neural network model, it is assumed that the network has a total of ccFor layer c , the parameter update vectors of each layer areΔ ( 1 ) , ⋯ , Δ ( c ) \Delta(1), \cdots, \Delta(c)D ( 1 ) ,,Δ ( c ) , the corresponding upper bounds of the 2-norm areS 1 , ⋯ S_1, \cdotsS1,, S c S_c Sc, through the method of horizontal clipping, the vectors of each layer are clipped separately:

Δ ′ ( j ) = π ( Δ ( j ) , S j ) \boldsymbol{\Delta}^{\prime}(j)=\pi\left(\boldsymbol{\Delta}(j), S_j\right) D(j)=Pi( D ( j ) ,Sj)

The overall parameter update clipping upper bound is defined as
S = ∑ j = 1 c S j 2 S=\sqrt{\sum_{j=1}^c S_j^2 }S=j=1cSj2

2. The central server aggregates the update amount of the local model parameters through the following steps:

(1) Calculate the aggregation result

About Weighted Aggregation

f ( C ) = ∑ k ∈ C d k Δ k ∑ k ∈ C d k f(C)=\frac{\sum_{k \in C} d_k \Delta^k}{\sum_{k \in C} d_k} f(C)=kCdkkCdkDk

(where Δ k \Delta^kDk is the partykkThe parameter update amount of k ,CCC is the estimator of the bounded sensitivity (Bounded-sensitivity) of the set of participants participating in the iteration in one round, which is divided into the following two types:

    [1]
f ~ f ( C ) = ∑ k ∈ C dk Δ kq D \tilde{f}_{\mathrm{f}}(C)=\frac{\sum_{k \in C} d_k \Delta^ k}{q D}f~f(C)=qDkCdkDk

其中 d k = min ⁡ ( m k m ^ , 1 ) d_k=\min \left(\frac{m_k}{\hat{m}}, 1\right) dk=min(m^mk,1 ) , is the weight of each participant;D = ∑ k = 1 ndk D=\sum_{k=1}^n d_kD=k=1ndk, q q q is the selection probability of the participants in each round of communication.

    [2]
f ~ c ( C ) = ∑ k ∈ C d k Δ k max ⁡ ( q D min  , ∑ k ∈ C d k ) \tilde{f}_{\mathrm{c}}(C)=\frac{\sum_{k \in C} d_k \Delta^k}{\max \left(q D_{\text {min }}, \sum_{k \in C} d_k\right)} f~c(C)=max(qDmin ,kCdk)kCdkDk

D min D_{\text {min }}Dmin are pre-set hyperparameters about the weights and .

(2)令 S ← \leftarrow The upper bound of clipping in ClipFn, according to the selected bounded sensitivity estimator and noise scale z, set the variance of Gaussian noise:

σ ← { z S q D  for  f ~ f  or  2 z S q D min ⁡  for  f ~ c } \sigma \leftarrow\left\{\frac{z S}{q D} \text { for } \tilde{f}_{\mathrm{f}} \text { or } \frac{2 z S}{q D_{\min }} \text { for } \tilde{f}_{\mathrm{c}}\right\} p{ qDzS for f~f or qDmin2 z S for f~c}

(3) The parameters of the aggregated global model are
ω t + 1 ← ω t + Δ t + 1 + N ( 0 , I σ 2 ) \omega_{t+1} \leftarrow \omega_t+\Delta_{t+1}+N \left(0, I \sigma^2\right)oht+1oht+Dt+1+N(0,I p2)
其中, N ( 0 , I σ 2 ) N\left(0, I \sigma^2\right) N(0,I p2 )The mean is 0 and the variance isσ 2 \sigma^2pGaussian distribution of 2 ; III is the unit square matrix, and the number of rows and columns are the number of parameters.

(4) according to zzz andM \mathcal{M}M calculates the privacy loss value and outputs it.

5. Differential Privacy Federated Averaging Algorithm (DP-FedAVG)

  In DP-FedSGD, the selected participants use the global model parameters to initialize the local model, perform multiple rounds of gradient descent through the batch gradient descent method, and calculate the gradient update amount. In DP-FedAVG, a batch of data is used to perform a gradient descent to calculate the gradient update amount.

6. FedAVG case with code

1) Case background

  Collect daily and hourly electricity data in 10 cities in 2012. Use the power load value of the previous 24 moments and the 4 relevant meteorological data at that moment to predict the electric power load value at that moment.

  构造四层的深度网络:
z 1 = I w 1 , h 1 = σ ( z 1 ) z 2 = h 1 w 2 , h 2 = σ ( z 2 ) z 3 = h 2 w 3 , h 3 = σ ( z 3 ) z 4 = h 3 w 4 , O = σ ( z 4 )  loss  = 1 2 ( O − y ) 2 \begin{gathered} z_1=I w_1, h_1=\sigma\left(z_1\right) \\ z_2=h_1 w_2, h_2=\sigma\left(z_2\right) \\ z_3=h_2 w_3, h_3=\sigma\left(z_3\right) \\ z_4=h_3 w_4, O=\sigma\left(z_4\right) \\ \text { loss }=\frac{1}{2}(O-y)^2 \end{gathered} z1=Iw1,h1=p(z1)z2=h1w2,h2=p(z2)z3=h2w3,h3=p(z3)z4=h3w4,O=p(z4) loss =21(Oy)2

   σ \sigma σ is the sigmoid activation function.

2) Parameter setting

Table 1 FedAVG parameters
parameter value
Aggregation rounds 5
Local training times 20
total number of clients 10
learning rate 0.08
local batch sample size 50
optimizer adam

3) Result display

  Set the number of clients randomly selected to participate in training in each round as 2, 5, 8, and 10.
insert image description here

Figure 5 FedAVG mae

insert image description here

Figure 6 FedAVG rmse

Add random noise to the gradient

insert image description here

图7 FedAVG+noise mae

insert image description here

图8 FedAVG+noise rmse

4) Detailed code explanation

Data structure, in the local folder, there are 10 csv files, and each of these 10 files represents a client.
insert image description here
In each csv file, there are 7 indicators and 6577 samples, the first column of which indicates the server ID.
insert image description here
The first step is to load the data. First of all, it is necessary to divide the training set and test set of each client. In this paper, the data structure of each client is set to be consistent with the number of samples (it can also be inconsistent, and the sample alignment method can be used).

# -*- coding: utf-8 -*-
"""
@File :bp_nn.py
"""

import copy
import sys

import numpy as np
import pandas as pd
from torch import nn
from tqdm import tqdm

sys.path.append('../')
from sklearn.metrics import mean_absolute_error, mean_squared_error
from itertools import chain
from models import BP  ##自定义
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"   #避免jupyter崩溃

clients_wind = ['Task1_W_Zone' + str(i) for i in range(1, 11)]
from args import args_parser  ##自定义参数


def load_data(file_name): #读取某一个文件---横向联邦学习
    df = pd.read_csv(os.path.dirname(os.getcwd()) + '/data/Wind_new/Task 1/Task1_W_Zone1_10/' + file_name + '.csv', encoding='gbk')
    columns = df.columns
    df.fillna(df.mean(), inplace=True)
    for i in range(3, 7): # 3,4,5,6
        MAX = np.max(df[columns[i]])
        MIN = np.min(df[columns[i]])
        df[columns[i]] = (df[columns[i]] - MIN) / (MAX - MIN) #将3,4,5,6列的值,标准化

    return df #0-6列,后4列已经标准化


def nn_seq_wind(file_name, B): #B 实则为本地批量大小
    print('data processing...')
    dataset = load_data(file_name)
    # split
    train = dataset[:int(len(dataset) * 0.6)] #前60%为训练集
    val = dataset[int(len(dataset) * 0.6):int(len(dataset) * 0.8)] #中间20%为验证集
    test = dataset[int(len(dataset) * 0.8):len(dataset)] #最后20%为测试集

    def process(data): #将特征与标签分开
        columns = data.columns
        wind = data[columns[2]]
        wind = wind.tolist()  #转换成列表 https://vimsky.com/examples/usage/python-pandas-series-tolist.html
        data = data.values.tolist()
        X, Y = [], []
        for i in range(len(data) - 30):
            train_seq = []
            train_label = []
            for j in range(i, i + 24):  #24小时
                train_seq.append(wind[j])

            for c in range(3, 7):
                train_seq.append(data[i + 24][c])
            train_label.append(wind[i + 24])
            X.append(train_seq)
            Y.append(train_label)

        X, Y = np.array(X), np.array(Y)

        length = int(len(X) / B) * B
        X, Y = X[:length], Y[:length]

        return X, Y

    train_x, train_y = process(train)
    val_x, val_y = process(val)
    test_x, test_y = process(test)

    return [train_x, train_y], [val_x, val_y], [test_x, test_y]


def get_val_loss(args, model, val_x, val_y): #验证集,计算损失,model即为nn
    batch_size = args.B
    batch = int(len(val_x) / batch_size) # 计算循环次数
    val_loss = []
    for i in range(batch):
        start = i * batch_size
        end = start + batch_size
        model.forward_prop(val_x[start:end], val_y[start:end])
        model.backward_prop(val_y[start:end])
    val_loss.append(np.mean(model.loss))

    return np.mean(val_loss)


def train(args, nn):
    print('training...')
    tr, val, te = nn_seq_wind(nn.file_name, args.B)
    train_x, train_y = tr[0], tr[1]
    val_x, val_y = val[0], val[1]
    nn.len = len(train_x)  # nn.len 训练集的长度
    batch_size = args.B    # 每批次大小
    epochs = args.E      # 迭代次数
    batch = int(len(train_x) / batch_size) #每一迭代,需要训练多少次
    # training
    min_epochs = 10  
    best_model = None
    min_val_loss = 5
    for epoch in tqdm(range(epochs)):
        train_loss = []
        for i in range(batch):
            start = i * batch_size
            end = start + batch_size
            nn.forward_prop(train_x[start:end], train_y[start:end])
            nn.backward_prop(train_y[start:end])
        train_loss.append(np.mean(nn.loss))
        # validation
        val_loss = get_val_loss(args, nn, val_x, val_y)
        if epoch + 1 >= min_epochs and val_loss < min_val_loss:
            min_val_loss = val_loss
            best_model = copy.deepcopy(nn)

        print('epoch {:03d} train_loss {:.8f} val_loss {:.8f}'.format(epoch, np.mean(train_loss), val_loss))

    return best_model


def get_mape(x, y):
    """
    :param x: true value
    :param y: pred value
    :return: mape
    """
    return np.mean(np.abs((x - y) / x))


def test(args, nn):
    tr, val, te = nn_seq_wind(nn.file_name, args.B)
    test_x, test_y = te[0], te[1]
    pred = []
    batch = int(len(test_y) / args.B)
    for i in range(batch):
        start = i * args.B
        end = start + args.B
        res = nn.forward_prop(test_x[start:end], test_y[start:end])
        res = res.tolist()
        res = list(chain.from_iterable(res))  
        #chain.from_iterable()属于终止迭代器类别 https://blog.csdn.net/qq_42708830/article/details/106731144
        
        # print('res=', res)
        pred.extend(res)
    pred = np.array(pred)
    print('mae:', mean_absolute_error(test_y.flatten(), pred), 'rmse:',
          np.sqrt(mean_squared_error(test_y.flatten(), pred)))


def main():
    args = args_parser()
    for client in clients_wind:
        nn = BP(args, client)
        nn = train(args, nn)
        test(args, nn)


if __name__ == '__main__':
    main()

The second step is to build a model. Here, the calculation results are forward-propagated, and the gradients are updated by back-propagation.

# -*- coding:utf-8 -*-
"""
@File: models.py
"""

import numpy as np
from torch import nn


class BP:
    def __init__(self, args, file_name):
        self.file_name = file_name
        self.len = 0
        self.args = args
        self.input = np.zeros((args.B, args.input_dim))  # self.B samples per round  (本地批量大小=50,输入维度=28)
        
        self.w1 = 2 * np.random.random((args.input_dim, 20)) - 1  # limit to (-1, 1) (28,20)
        self.z1 = 2 * np.random.random((args.B, 20)) - 1  #np.random.random生成args.B=50行 20列的0-1浮点数;*2→(0-2),再-1,变成(-1,1)
        self.hidden_layer_1 = np.zeros((args.B, 20))     #(50,20)
        
        self.w2 = 2 * np.random.random((20, 20)) - 1     #(20,20)
        self.z2 = 2 * np.random.random((args.B, 20)) - 1  #(50,20)
        self.hidden_layer_2 = np.zeros((args.B, 20))     #(50,20)
        
        self.w3 = 2 * np.random.random((20, 20)) - 1     #(20,20)
        self.z3 = 2 * np.random.random((args.B, 20)) - 1  #(50,20)
        self.hidden_layer_3 = np.zeros((args.B, 20))     #(50,20)
        
        self.w4 = 2 * np.random.random((20, 1)) - 1     #(20,1)
        self.z4 = 2 * np.random.random((args.B, 1)) - 1  #(50,1)
        self.output_layer = np.zeros((args.B, 1))      #(50,1)
        
        self.loss = np.zeros((args.B, 1))           #(50,1)

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def sigmoid_deri(self, x):
        return x * (1 - x)

    def forward_prop(self, data, label):
        self.input = data
        # self.input(50,28)  self.w1(28, 20)  self.z1(50, 20)
        self.z1 = np.dot(self.input, self.w1) # np.dot 计算过程就是将向量中对应元素相乘,再相加所得。即普通的向量乘法运算。
        
        self.hidden_layer_1 = self.sigmoid(self.z1) # self.hidden_layer_1(50, 20)
        
        self.z2 = np.dot(self.hidden_layer_1, self.w2)  #self.w2(20,20) self.z2(50, 20)
        self.hidden_layer_2 = self.sigmoid(self.z2) # self.hidden_layer_2(50, 20)
        
        self.z3 = np.dot(self.hidden_layer_2, self.w3)  #self.w3(20,20) self.z3(50, 20)
        self.hidden_layer_3 = self.sigmoid(self.z3)    #(50,20)
        
        self.z4 = np.dot(self.hidden_layer_3, self.w4)  #self.w4 (20,1) self.z4(50,1)
        self.output_layer = self.sigmoid(self.z4)     #self.output_layer(50,1)
        # error

        self.loss = 1 / 2 * (label - self.output_layer) ** 2  ##(50,1)  why 1/2 ?

        return self.output_layer

    def backward_prop(self, label):
        # w4
        l_deri_out = self.output_layer - label
        l_deri_z4 = l_deri_out * self.sigmoid_deri(self.output_layer)
        l_deri_w4 = np.dot(self.hidden_layer_3.T, l_deri_z4)
        # w3
        l_deri_h3 = np.dot(l_deri_z4, self.w4.T)
        l_deri_z3 = l_deri_h3 * self.sigmoid_deri(self.hidden_layer_3)
        l_deri_w3 = np.dot(self.hidden_layer_2.T, l_deri_z3)
        # w2
        l_deri_h2 = np.dot(l_deri_z3, self.w3.T)
        l_deri_z2 = l_deri_h2 * self.sigmoid_deri(self.hidden_layer_2)
        l_deri_w2 = np.dot(self.hidden_layer_1.T, l_deri_z2)
        # w1
        l_deri_h1 = np.dot(l_deri_z2, self.w2.T)
        l_deri_z1 = l_deri_h1 * self.sigmoid_deri(self.hidden_layer_1)
        l_deri_w1 = np.dot(self.input.T, l_deri_z1)
        # update
        self.w4 -= self.args.lr * l_deri_w4  # self.args.lr 学习率=0.08  实则梯度下降
        self.w3 -= self.args.lr * l_deri_w3
        self.w2 -= self.args.lr * l_deri_w2
        self.w1 -= self.args.lr * l_deri_w1

The third step is to set parameters. Before instantiating training, all parameters can be placed in a separate file for easy parameter tuning.

# -*- coding:utf-8 -*-
"""
@File: args.py
"""

# argparse的用法见csdn的收藏夹,或者https://blog.csdn.net/qq_41762249/article/details/122244624
# --E 相当于关键词参数,如果没有--直接是E,就是位置参数
# type=int 传入参数的类型
# default=20 当没有参数传入时,默认值为20, help='***' 表示对该参数的解释为***

'''
number of rounds of training: 训练次数
number of communication rounds:通信回合数,即上传下载模型次数。
number of total clients:客户端总数
input dimension :输入维度
learning rate :学习率
sampling rate :采样率
local batch size : 本地批量大小
type of optimizer : 优化器类型
--device:有GPU就用,不然就用CPU

weight_decay :权值衰减
weight decay(权值衰减)的使用既不是为了提高你所说的收敛精确度也不是为了提高收敛速度,其最终目的是防止过拟合。在损失函数中,weight decay是放在正则项(regularization)前面的一个系数,正则项一般指示模型的复杂度,所以weight decay的作用是调节模型复杂度对损失函数的影响,若weight decay很大,则复杂的模型损失函数的值也就大。https://blog.csdn.net/xuxiatian/article/details/72771609

step size: 步长
gamma: 伽马参数
--clients: 10个客户端 Task1_W_Zone1、Task1_W_Zone2、Task1_W_Zone3...Task1_W_Zone10
'''

import argparse
import torch


def args_parser():
    parser = argparse.ArgumentParser() # 可选参数: description='描述程序内容' 通过命令行 python **.py--help 调用出

    parser.add_argument('--E', type=int, default=20, help='number of rounds of training')   
    parser.add_argument('--r', type=int, default=5, help='number of communication rounds')
    parser.add_argument('--K', type=int, default=10, help='number of total clients')
    parser.add_argument('--input_dim', type=int, default=28, help='input dimension')
    parser.add_argument('--lr', type=float, default=0.08, help='learning rate')
    parser.add_argument('--C', type=float, default=0.8, help='sampling rate')
    parser.add_argument('--B', type=int, default=50, help='local batch size')
    parser.add_argument('--optimizer', type=str, default='adam', help='type of optimizer')
    parser.add_argument('--device', default=torch.device("cuda" if torch.cuda.is_available() else "cpu"))
    parser.add_argument('--weight_decay', type=float, default=1e-4, help='weight_decay')
    parser.add_argument('--step_size', type=int, default=10, help='step size')
    parser.add_argument('--gamma', type=float, default=0.1, help='gamma')
    
    clients = ['Task1_W_Zone' + str(i) for i in range(1, 11)]
    parser.add_argument('--clients', default=clients)

    # args = parser.parse_args()
    # args,unknow = parser.parse_known_args()
    
    args = parser.parse_known_args()[0]
    return args

Step 4, model training.

import numpy as np
import random
import copy
import sys

sys.path.append('../')
from algorithms.bp_nn import train, test
from models import BP 
from args import args_parser  # 一些传入参数,见args.py

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # 只看error


#-----------------tf用于设置随机数
import tensorflow as tf

clients_wind = ['Task1_W_Zone' + str(i) for i in range(1, 11)]

# Implementation for FedAvg by numpy. 通过numpy实现FedAvg。
class FedAvg:
    def __init__(self, args): #self 默认必须参数,有类中全局变量之效,args表示,调用FedAvg时,必须传入的参数
        self.args = args
        self.clients = args.clients
        self.nn = BP(args=args, file_name='server') # BP是models中的一个类,同样需要传入参数。file_name方便后面为每个客户端取名
        self.nns = []
        # distribution
        for i in range(self.args.K): #args.K,客户端总数; 子程序为每一个客户端构造了一个BP类
            #copy.deepcopy() 深复制的用法是将某个变量的值赋给另一个变量(此时两个变量地址不同),因为地址不同,所以变量间互不干扰
            s = copy.deepcopy(self.nn)  
            s.file_name = self.clients[i]
            self.nns.append(s)

    def server(self):
        for t in range(self.args.r): #通信回合数,即本地模型上传下载全局模型次数
            print('round', t + 1, ':') # 输出:round1、round2、round3、round4、round5
#             m = np.max([int(self.args.C * self.args.K), 1]) # 抽样率*客户端总数,即每一轮参与训练的客户端数量,至少有1个客户端参与
            m = 5
            print(m)
            # sampling
            index = random.sample(range(0, self.args.K), m) #在0-(k-1)之间共k个中抽取m个序号,注意是序号/索引
            print(len(index))
            # dispatch
            self.dispatch(index) # 下面定义了dispatch函数:抽中的m本地客户端从服务端下载4个参数
            # local updating
            self.client_update(index) # 下面定义了client_update函数:抽中的m个客户端进行本地训练
            # aggregation
            self.aggregation(index) # 下面定义了aggregation函数:抽中的m个客户端,上传本地训练结果参数

        # return global model
        return self.nn #返回最终聚合后的模型

    def aggregation(self, index):
        # update w
        s = 0 #用来计一轮抽中的m个本地客户端总的样本数
        for j in index:
            # normal
            s += self.nns[j].len
        w1 = np.zeros_like(self.nn.w1) #np.zeros_like:生成和self.nn.w1一样的零阵,下同
        w2 = np.zeros_like(self.nn.w2)
        w3 = np.zeros_like(self.nn.w3)
        w4 = np.zeros_like(self.nn.w4)
        
        #-----------------自增1018
        nois = 0.05
        for j in index: # 对上传的每一个本地模型进行权重的加权求和,权重为该客户端样本数/该轮中参与训练的总样本数
            # normal
            w1 += self.nns[j].w1 * (self.nns[j].len / s) + tf.random.normal([1],mean=0, stddev=nois).numpy()
            w2 += self.nns[j].w2 * (self.nns[j].len / s) + tf.random.normal([1],mean=0, stddev=nois).numpy()
            w3 += self.nns[j].w3 * (self.nns[j].len / s) + tf.random.normal([1],mean=0, stddev=nois).numpy()
            w4 += self.nns[j].w4 * (self.nns[j].len / s) + tf.random.normal([1],mean=0, stddev=nois).numpy()
        # update server 更新服务端参数
        self.nn.w1, self.nn.w2, self.nn.w3, self.nn.w4 = w1, w2, w3, w4

    def dispatch(self, index):
        # distribute
        for i in index:
            self.nns[i].w1, self.nns[i].w2, self.nns[i].w3, self.nns[i].w4 = self.nn.w1, self.nn.w2, self.nn.w3, self.nn.w4

    def client_update(self, index):  # update nn
        for k in index:
            self.nns[k] = train(self.args, self.nns[k])

    def global_test(self):
        model = self.nn #最终聚合后的模型
        c = clients_wind  # 10个客户端名称 Task1_W_Zone1、Task1_W_Zone2、Task1_W_Zone3...Task1_W_Zone10
        for client in c:
            print(client)
            model.file_name = client 
            test(self.args, model)

'''
L1损失函数: mae
均方根误差: rmse
https://blog.csdn.net/qq_45758854/article/details/125807544
'''

def main():
    args = args_parser()
    fed = FedAvg(args)
    fed.server()
    fed.global_test()

if __name__ == '__main__':
    main()

This is the end of the code, the following is the result of the operation.

Each calculation will output the result, such as the case of training data

training...
data processing...
  0%|          | 0/20 [00:00<?, ?it/s]epoch 000 train_loss 0.00486654 val_loss 0.00703024
 50%|█████     | 10/20 [00:00<00:00, 95.47it/s]epoch 001 train_loss 0.00476797 val_loss 0.00696587
epoch 002 train_loss 0.00470252 val_loss 0.00693571
epoch 003 train_loss 0.00466028 val_loss 0.00693080
epoch 004 train_loss 0.00463130 val_loss 0.00694067
epoch 005 train_loss 0.00460982 val_loss 0.00695734
epoch 006 train_loss 0.00459274 val_loss 0.00697600
epoch 007 train_loss 0.00457843 val_loss 0.00699407
epoch 008 train_loss 0.00456600 val_loss 0.00701034
epoch 009 train_loss 0.00455495 val_loss 0.00702437
epoch 010 train_loss 0.00454502 val_loss 0.00703613
epoch 011 train_loss 0.00453605 val_loss 0.00704578
epoch 012 train_loss 0.00452794 val_loss 0.00705359
epoch 013 train_loss 0.00452062 val_loss 0.00705982
epoch 014 train_loss 0.00451403 val_loss 0.00706475
epoch 015 train_loss 0.00450812 val_loss 0.00706860
epoch 016 train_loss 0.00450283 val_loss 0.00707159
epoch 017 train_loss 0.00449813 val_loss 0.00707389
epoch 018 train_loss 0.00449396 val_loss 0.00707564

...

epoch 019 train_loss 0.00449026 val_loss 0.00707696
Task1_W_Zone1
data processing...
mae: 0.20787641855647404 rmse: 0.2836560807251283
Task1_W_Zone2
data processing...
mae: 0.1631928286850088 rmse: 0.21530131283252676
Task1_W_Zone3
data processing...

When testing data, each client prediction result is output, for example

Task1_W_Zone10
data processing...
mae: 0.25128443075937873 rmse: 0.31826369651645525

Finally these results are collected and visualized.
Welcome to communicate in the comment area, and the blogger will reply regularly.

7. How to obtain the complete project code

Either of the following methods is acceptable:
(1) Click GitHub-numpy-FedAvg to download by yourself. (Visit: https://github.com/chenyiadam/FedAVG.git )
(2) Leave a private message and comment on your email, and I will reply regularly.

Guess you like

Origin blog.csdn.net/AdamCY888/article/details/129470576