Principles of federated learning-classification-python code implementation

1 Introduction

The design goal of federated learning is to carry out efficient machine learning between multiple participants or multiple computing nodes under the premise of ensuring information security during big data exchange, protecting terminal data and personal data privacy, and ensuring legal compliance.

Advantages:
1. Data isolation, data will not be leaked to the outside, meeting the needs of user privacy protection and data security;
2. It can ensure that the quality of the model is not damaged, and there will be no negative migration, ensuring that the federated model is better than the split independent model;
3 1. The status of the participants is equal, and fair cooperation can be achieved;
4. It can ensure that all parties involved can exchange information and model parameters encryptedly while maintaining their independence, and at the same time gain growth.

The federated learning system is mainly composed of two parts:
1. Alignment of encrypted samples. Since the user groups of various data owners do not completely overlap, it is necessary to use user sample alignment technology to confirm common users (and not expose unique users) without disclosing their respective data, and then the characteristics of these users can be combined for modeling.
2. Encrypted model training. Once a common user group is identified, the data can be used to train machine learning models. In order to ensure the confidentiality of data during the training process, it is necessary to use a third-party collaborator for encrypted training. ①The collaborator distributes the public key to each data owner to encrypt the data that needs to be exchanged during the training process; ②The intermediate results are exchanged between data owners in an encrypted form to calculate the gradient; ③Each data owner is based on the encrypted gradient value Carry out calculations, and calculate the loss based on the label data at the same time, and summarize these results to the third-party collaborator, and the third-party collaborator calculates the total gradient through the summary result and decrypts it; ④The third-party collaborator sends the decrypted gradient back to Each data owner, let them update the parameters of their respective models according to the gradient.

Iterate the above steps until the loss function converges, thus completing the entire training process. In the process of sample alignment and model training, the data of each data owner is always kept locally, and the data interaction during training will not lead to data privacy leakage. Therefore, federated learning enables multi-party cooperative training of models.
insert image description here

2. Classification

2.1 Horizontal federated learning

Horizontal federated learning is suitable for the situation where the data features of participants overlap more, but the sample ID overlaps less. Horizontal federated learning is also called feature-aligned federated learning, that is, the data characteristics of the participants in horizontal federated learning are aligned, as shown in the figure, and multiple rows of samples with the same characteristics of multiple participants are combined for federated learning, that is, each The training data of the participants is divided horizontally, which is called horizontal federated learning. Horizontal federation increases the total number of training samples.
insert image description here
insert image description here

Horizontal federated learning is guided by the feature dimension of the data, and takes out parts with the same characteristics of the participants but different users for joint training. In this process, the training sample space is expanded through the sample union among the participants, thereby improving the accuracy and generalization ability of the model.

Steps:
1. The participant calculates the training gradient locally, encrypts the update of the gradient using encryption, differential privacy or secret sharing technology, and sends the encrypted result to the server; 2. The server does not know any information
about any participant , aggregate the gradients of each user to update the model parameters;
3. The server sends back the aggregated result model to each participant;
4. Each participant uses the decrypted gradient to update their respective models.

2.1 Horizontal federated learning

Vertical federated learning is also called sample-aligned federated learning, that is, the training samples of participants in vertical federated learning are aligned.
As shown in the figure, different data features of common samples of multiple participants are combined for federated learning, that is, the training data of each participant is divided vertically, which is called vertical federated learning. Vertical federated learning needs to do sample alignment first, that is, to find out the common samples owned by the participants. Only the different features of the common samples of multiple participants are combined for longitudinal federated learning. Vertical federation increases the feature dimension of training samples.
insert image description here
insert image description here
Vertical federated learning is based on the alignment of common users as the data alignment guide, and takes out the parts that are the same as the participating users but have different characteristics for joint training. Therefore, during joint training, it is necessary to first align the samples of the data of each participant to obtain overlapping data of users, and then perform training on the selected data sets.

Steps:
Step 1: Third-party C encryption sample alignment. Do this at the system level, so non-cross-users are not exposed at the enterprise-aware level.
Step 2: Align samples for model encryption training:
1. Collaborator C creates an encrypted pair and sends the public key to A and B;
2. A and B respectively calculate the intermediate results of the features related to themselves, and encrypt the interaction for Obtain their respective gradients and losses;
3. A and B respectively calculate their encrypted gradients and add masks to send to C, while B calculates the encrypted losses and sends them to C; 4. C
decrypts the gradients and losses and sends them back to A and B, A, B unmask and update the model.

2.3 Transfer Federated Learning

Federated transfer learning can be considered when there is little overlap in features and samples between participants, such as the union between banks and supermarkets in different regions. It is mainly applicable to scenarios based on deep neural networks.

A learning process that applies the model learned in the source domain to the target domain by using the similarity between data, tasks, or models. Transfer learning is mainly divided into three categories: instance-based transfer, feature-based transfer, and model-based transfer.
insert image description here

3. Code implementation (PyTorch)

The Fed Avg algorithm is mainly implemented here , and the following is the pseudo code of the algorithm:
,insert image description here

Experimental result graph:

insert image description here
insert image description here

code file

1. First, we need to allocate data to each client. In actual scenarios, each client has its own unique data. Here, in order to simulate the scenario, manually divide the data set to each client. (!!! Note that the path needs to be modified!!!)
The data set uses the MNIST data set, and the download address of Baidu Netdisk can be clicked on the link

# _*_ coding : utf-8 _*_
# @Time : 2022/11/16 10:36
# @Author : 小刘同学home
# @File : data handle
# @Project : 2022s

import numpy as np
import struct

from PIL import Image
import os

data_file = 'D:/postgraduate/MNIST_data/train-images.idx3-ubyte'  # 需要修改的路径

data_file_size = 47040016
data_file_size = str(data_file_size - 16) + 'B'

data_buf = open(data_file, 'rb').read()

magic, numImages, numRows, numColumns = struct.unpack_from(
    '>IIII', data_buf, 0)
datas = struct.unpack_from(
    '>' + data_file_size, data_buf, struct.calcsize('>IIII'))
datas = np.array(datas).astype(np.uint8).reshape(
    numImages, 1, numRows, numColumns)

label_file = 'D:/postgraduate/MNIST_data/train-labels.idx1-ubyte'  # 需要修改的路径

# It's 60008B, but we should set to 60000B
label_file_size = 60008
label_file_size = str(label_file_size - 8) + 'B'

label_buf = open(label_file, 'rb').read()

magic, numLabels = struct.unpack_from('>II', label_buf, 0)
labels = struct.unpack_from(
    '>' + label_file_size, label_buf, struct.calcsize('>II'))
labels = np.array(labels).astype(np.int64)

datas_root = './data/image_turn/'  # 需要修改的路径
if not os.path.exists(datas_root):
    os.mkdir(datas_root)

for i in range(10):
    file_name = datas_root + os.sep + str(i)
    if not os.path.exists(file_name):
        os.mkdir(file_name)

for ii in range(numLabels):
    img = Image.fromarray(datas[ii, 0, 0:28, 0:28])
    label = labels[ii]
    file_name = datas_root + os.sep + str(label) + os.sep + \
                'mnist_train_' + str(ii) + '.png'
    img.save(file_name)

import numpy as np
import struct

from PIL import Image
import os

data_file = 'D:/postgraduate/MNIST_data/t10k-images.idx3-ubyte'  # 需要修改的路径

data_file_size = 7840016
data_file_size = str(data_file_size - 16) + 'B'

data_buf = open(data_file, 'rb').read()

magic, numImages, numRows, numColumns = struct.unpack_from(
    '>IIII', data_buf, 0)
datas = struct.unpack_from(
    '>' + data_file_size, data_buf, struct.calcsize('>IIII'))
datas = np.array(datas).astype(np.uint8).reshape(
    numImages, 1, numRows, numColumns)

label_file = 'D:/postgraduate/MNIST_data/t10k-labels.idx1-ubyte'  # 需要修改的路径

# It's 10008B, but we should set to 10000B
label_file_size = 10008
label_file_size = str(label_file_size - 8) + 'B'

label_buf = open(label_file, 'rb').read()

magic, numLabels = struct.unpack_from('>II', label_buf, 0)
labels = struct.unpack_from(
    '>' + label_file_size, label_buf, struct.calcsize('>II'))
labels = np.array(labels).astype(np.int64)

datas_root = './data/image_test_turn/'  # 需要修改的路径

if not os.path.exists(datas_root):
    os.mkdir(datas_root)

for i in range(10):
    file_name = datas_root + os.sep + str(i)
    if not os.path.exists(file_name):
        os.mkdir(file_name)

for ii in range(numLabels):
    img = Image.fromarray(datas[ii, 0, 0:28, 0:28])
    label = labels[ii]
    file_name = datas_root + os.sep + str(label) + os.sep + \
                'mnist_test_' + str(ii) + '.png'
    img.save(file_name)

2.
The number of iterations I set in FedAvg.py is 1000. If you think the code takes too long to run, you can set it lower.

# _*_ coding : utf-8 _*_
# @Time : 2022/11/16 10:26
# @Author : 小刘同学home
# @File : FedAvg
# @Project : 2022s


import argparse
import torch
import os
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torch.autograd import Variable
from PIL import Image
import torch
import copy
import pandas as pd
import random
import time
import sys
import re
import matplotlib.pyplot as plt


name = str(sys.argv[0])

home_path = "./"


class MyDataset(torch.utils.data.Dataset):  # 创建自己的类:MyDataset,这个类是继承的torch.utils.data.Dataset
    def __init__(self, root, data, label, transform=None, target_transform=None):  # 初始化一些需要传入的参数
        super(MyDataset, self).__init__()
        imgs = []  # 创建一个名为img的空列表
        self.img_route = root
        for i in range(len(data)):
            imgs.append((data[i], int(label[i])))
        self.imgs = imgs
        self.transform = transform
        self.target_transform = target_transform

    def __getitem__(self, index):  # 这个方法是必须要有的,用于按照索引读取每个元素的具体内容
        fn, label = self.imgs[index]  # fn是图片path #fn和label分别获得imgs[index]也即是刚才每行中word[0]和word[1]的信息
        route = self.img_route + str(label) + "/" + fn
        img = Image.open(route)  # 按照path读入图片from PIL import Image # 按照路径读取图片
        if self.transform is not None:
            img = self.transform(img)  # 是否进行transform
        return img, label  # return很关键,return回哪些内容,那么我们在训练时循环读取每个batch时,就能获得哪些内容

    def __len__(self):  # 这个函数也必须要写,它返回的是数据集的长度,也就是多少张图片,要和loader的长度作区分
        return len(self.imgs)


filePath = home_path + 'data/image_turn/'
train_data = []
train_label = []
for i in range(10):
    train_data.append(os.listdir(filePath + str(i)))
    train_label.append([i] * len(train_data[i]))
filePath = home_path + 'data/image_test_turn/'
test_data = []
test_label = []
for i in range(10):
    test_data.append(os.listdir(filePath + str(i)))
    test_label.append([i] * len(test_data[i]))
test_ori = []
test_label_ori = []
for x in range(10):
    test_ori += test_data[x]
    test_label_ori += test_label[x]
test_data = MyDataset(home_path + "data/image_test_turn/", test_ori, test_label_ori,
                      transform=transforms.ToTensor())
test_loader = DataLoader(dataset=test_data, batch_size=64)


# 搭建卷积神经网络
class MyConvNet(nn.Module):
    def __init__(self):
        super(MyConvNet, self).__init__()

        # 定义第一个卷积层
        self.conv1 = nn.Sequential(
            nn.Conv2d(
                in_channels=1,   # 输入的feature map,输入通道数
                out_channels=16,  # 输出的feature map,输出通道数
                stride=1,  # 卷积核步长
                kernel_size=3,  # 卷积核尺寸
                padding=1,  # 进行填充
            ),
            nn.ReLU(),  # 激活函数
            nn.AvgPool2d(
                kernel_size=2, # 平均值池化层,使用2*2
                stride=2)
        )   # 池化后:(16*28*28) ->(16*14*14)

        # 定义第二个卷积层
        self.conv2 = nn.Sequential(
            nn.Conv2d(16, 32, 3, 1, 0),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
        )
        self.classifier = nn.Sequential(
            nn.Linear(32 * 6 * 6, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)
        output = self.classifier(x)
        return output


def train_model(model, traindataloader, criterion, optimizer, batch_max, num_epochs):
    train_loss_all = []
    train_acc_all = []
    for epoch in range(num_epochs):
        train_loss = 0.0
        train_corrects = 0
        train_num = 0
        temp = random.sample(traindataloader, batch_max)  #随机选取客户端
        for (b_x, b_y) in temp:
            model.train()  # 设置模式为训练模式
            if (torch.cuda.is_available()):
                b_x = b_x.cuda()
                b_y = b_y.cuda()
            output = model(b_x)
            pre_lab = torch.argmax(output, 1)
            loss = criterion(output, b_y)
            optimizer.zero_grad()
            loss.backward()  # 将损失loss 向输入侧进行反向传播
            optimizer.step()  # 优化器对x的值进行更新
            train_loss += loss.item() # .item()获得张量中的元素值
            train_corrects += torch.sum(pre_lab == b_y.data) # 将预测值与标签值相等的数累加
            train_num += b_x.size(0) # b_x.size(0):取出第一个维度的数字
        train_loss_all.append(train_loss / train_num)
        train_acc_all.append(train_corrects.double().item() / train_num)
        print("Train Loss:{:.4f}  Train Acc: {:.4f}".format(train_loss_all[-1], train_acc_all[-1]))
    return model



def local_train(local_convnet_dict, traindataloader, epochs, batch_max):
    if (torch.cuda.is_available()):
        local_convnet = MyConvNet().cuda()
    else:
        local_convnet = MyConvNet()
    local_convnet.load_state_dict(local_convnet_dict)
    optimizer = optim.Adam(local_convnet.parameters(), lr=0.01, weight_decay=5e-4)
    criterion = nn.CrossEntropyLoss()  # 交叉熵
    local_convnet = train_model(local_convnet, traindataloader, criterion, optimizer, batch_max, epochs)
    minus_convnet_dict = MyConvNet().state_dict()
    for name in local_convnet.state_dict():
        minus_convnet_dict[name] = local_convnet_dict[name] - local_convnet.state_dict()[name]
    return minus_convnet_dict



def Central_model_update(Central_model, minus_convnet_client):
    weight = 1

    model_dict = Central_model.state_dict()
    for name in Central_model.state_dict():
        for local_dict in minus_convnet_client:
            model_dict[name] = model_dict[name] - weight * local_dict[name] / len(minus_convnet_client)
    Central_model.load_state_dict(model_dict)
    return Central_model



def train_data_loader(client_num, ClientSort1):
    global train_data
    global train_label
    train_loaders = []
    for i in range(client_num):
        train_ori = []
        label_ori = []
        for j in range(10):
            train_ori += train_data[j]
            label_ori += train_label[j]
        train_datas = MyDataset(home_path + "data/image_turn/", train_ori, label_ori,
                                transform=transforms.ToTensor())
        train_loader = DataLoader(dataset=train_datas, batch_size=100, shuffle=True)
        train_list = []
        for step, (b_x, b_y) in enumerate(train_loader):
            train_list.append((b_x, b_y))
        train_loaders.append(train_list)
    return train_loaders


def test_accuracy(Central_model):
    global test_loader
    test_correct = 0
    for data in test_loader:
        Central_model.eval()  # 设置模式为评估模式
        inputs, lables = data
        if (torch.cuda.is_available()):
            inputs = inputs.cuda()
        inputs, lables = Variable(inputs), Variable(lables)
        outputs = Central_model(inputs)
        if (torch.cuda.is_available()):
            outputs = outputs.cpu()
        id = torch.max(outputs.data, 1)
        test_correct += torch.sum(id == lables.data)
        test_correct = test_correct
    print("correct:%.3f%%" % (100 * test_correct / len(test_ori)))
    return 100 * test_correct / len(test_ori)


############################
#      中央共享模型
############################
if (torch.cuda.is_available()):
    Central_model = MyConvNet().cuda()
else:
    Central_model = MyConvNet()
local_client_num = 10  # 局部客户端数量
ClientSort1 = 10
# Central_model.load_state_dict(torch.load('F:/params.pkl'))
global_epoch = 1000
# print(test_accuracy(Central_model))



train_loaders = train_data_loader(local_client_num, ClientSort1)
result = []
count = 0
for i in range(global_epoch):
    count += 1
    minus_model = []
    for j in range(local_client_num):
        minus_model.append(local_train(Central_model.state_dict(), train_loaders[j], 1, 1))
    Central_model = Central_model_update(Central_model, minus_model)
    print("epoch: ", count, "\naccuracy:")
    result.append(float(test_accuracy(Central_model)))



plt.xlabel('round')
plt.ylabel('accuracy')
plt.plot(range(0, len(result)), result, color='r', linewidth='1.0', label='同步FedAvg')
plt.savefig(home_path + name + ".jpg")
filename = open(home_path + name + ".txt", mode='w')
for namet in result:
    filename.write(str(namet))
    filename.write('\n')
filename.close()
torch.save(Central_model.state_dict(), filePath + name + '.pkl')

A detailed explanation of the above code can be viewed: PyTorch implements federated learning FedAvg (detailed explanation)

Guess you like

Origin blog.csdn.net/qq_43750528/article/details/128046207