MNIST handwritten digit recognition based on federated learning - Pytorch implementation

Reference: MNIST handwritten digit recognition based on federated learning - Pytorch implementation - Gray letter network (software development blog aggregation) (freesion.com)


1. MNIST dataset

A large number of references: MNIST data set_keep sane 802's blog-CSDN blog_mnist data set

1. Load data

train_set = torchvision.datasets.MNIST(
    root="./data",
    train=True,
    transform=transforms.ToTensor(),
    download=False)
  • transform=transforms.ToTensor() is used to convert np.ndarray or img of shape (H, W, C) into a tensor of (C, H, W), and normalize each value to [0, 1]
train_data_loader = torch.utils.data.DataLoader(
    dataset=train_set,
    batch_size=64,
    shuffle=True,
    drop_last=True)
  • dataset: Specifies the MNIST dataset to be loaded.
  • batch_size: Set the number of data pictures loaded in each batch to 64.
  • shuffle: Set to True to randomly shuffle the data when loading, which is often used for multi-batch model training.
  • drop_last: Set to True to indicate that when the data set size cannot be divided by batch_size, the last batch_size will be deleted, otherwise it will not be deleted.

2. Preview data

images, labels = next(iter(train_data_loader))
img = torchvision.utils.make_grid(images, padding=0)
img = img.numpy().transpose(1, 2, 0)
plt.imshow(img)
plt.show()
  • iter(): dataloader is essentially an iterable object that can be accessed using iter(). Iter(dataloader) returns an iterator, which can then be accessed using next().
  • next(): Returns the next item of the iterator.
  • make_grid(): The network that makes up the image is actually to combine multiple pictures into one picture.
  • img.numpy().transpose(1,2,0): The network input format of pytorch is ( number of channels, height, width ), while the image shape in numpy is ( height, width, number of channels )

The effect is as follows:


2. Implementation process

1. Load data

import torch
import torchvision
import torchvision.transforms as transforms
import torch.utils.data.dataloader as dataloader
from torch.utils.data import Subset
import torch.nn as nn
import torch.optim as optim
from torch.nn.parameter import Parameter

        After importing the library, use the Subset function to divide the training data set. There are a total of three institutions, ABC, and the number of training sets for each institution is 1000. Then put the training data set into Dataloader.

        Here we use the entire training set as a batch_size, so there is no need to shuffle.

# 训练集
train_set = torchvision.datasets.MNIST(
    root="./data",
    train=True,
    transform=transforms.ToTensor(),
    download=False)
train_set_A = Subset(train_set, range(0, 1000))
train_set_B = Subset(train_set, range(1000, 2000))
train_set_C = Subset(train_set, range(2000, 3000))
train_loader_A = dataloader.DataLoader(dataset=train_set_A, batch_size=1000, shuffle=False)
train_loader_B = dataloader.DataLoader(dataset=train_set_B, batch_size=1000, shuffle=False)
train_loader_C = dataloader.DataLoader(dataset=train_set_C, batch_size=1000, shuffle=False)
# 测试集
test_set = torchvision.datasets.MNIST(
    root="./data", 
    train=False, 
    transform=transforms.ToTensor(), 
    download=False)
transform=transforms.ToTensor(), download=False)
test_set = Subset(test_set, range(0, 2000))
test_loader = dataloader.DataLoader(dataset=test_set, shuffle=True)

2. General training

        First define the type of neural network, the simplest three-layer neural network is used here (it can also be said to be two layers, not counting the input layer), the input layer is 28×28, the hidden layer has 12 neurons, and the output layer has 10 neurons Yuan.

class NeuralNet(nn.Module):
    def __init__(self, input_num, hidden_num, output_num):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_num, hidden_num)  # 服从正态分布的权重w
        self.fc2 = nn.Linear(hidden_num, output_num)
        nn.init.normal_(self.fc1.weight)
        nn.init.normal_(self.fc2.weight)
        nn.init.constant_(self.fc1.bias, val=0)  # 初始化bias为0
        nn.init.constant_(self.fc2.bias, val=0)
        self.relu = nn.ReLU()  # Relu激励函数
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        y = self.fc2(x)
        return y
  • class NeuralNet(nn.Module): Customize a model, implement it by inheriting the nn.Module class, declare the definition of each layer in the __init__ constructor , and implement the connection relationship between layers in forward , which is actually forward process of dissemination
  • def __init__(self, input_num, hidden_num, output_num): Inherit the init method, there must be self in the parameter, and three parameters are required
  • super(NeuralNet, self).__init__(): The constructor must have
  • nn.Linear(): Linear layer
  • nn.init.normal_(self.fc1.weight): means to fill the weight of the linear layer with normally distributed data, the default mean is 0, and the standard deviation is 1
  • nn.init.constant_(self.fc1.bias, val=0): Fill self.fc1.bias with 0, which means b of the linear layer
  • forward(self, x): feed-forward function, undergoing a complete calculation
def train_and_test_1(train_loader, test_loader):
    class NeuralNet(nn.Module):...

    epoches = 20  # 迭代20轮
    lr = 0.01  # 学习率,即步长
    input_num = 784
    hidden_num = 12
    output_num = 10
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    model = NeuralNet(input_num, hidden_num, output_num)
    model.to(device)
    loss_func = nn.CrossEntropyLoss()  # 损失函数的类型:交叉熵损失函数
    optimizer = optim.Adam(model.parameters(), lr=lr)  # Adam优化,也可以用SGD随机梯度下降法
    # optimizer = optim.SGD(model.parameters(), lr=lr)
    for epoch in range(epoches):
        flag = 0
        for images, labels in train_loader:
            images = images.reshape(-1, 28 * 28).to(device)
            labels = labels.to(device)
            output = model(images)

            loss = loss_func(output, labels)
            optimizer.zero_grad()
            loss.backward()  # 误差反向传播,计算参数更新值
            optimizer.step()  # 将参数更新值施加到net的parameters上

            # 以下两步可以看每轮损失函数具体的变化情况
            # if (flag + 1) % 10 == 0:
            # print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch + 1, epoches, loss.item()))
            flag += 1

    params = list(model.named_parameters())  # 获取模型参数

    # 测试,评估准确率
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28 * 28).to(device)
        labels = labels.to(device)
        output = model(images)
        values, predicte = torch.max(output, 1)  # 0是每列的最大值,1是每行的最大值
        total += labels.size(0)
        # predicte == labels 返回每张图片的布尔类型
        correct += (predicte == labels).sum().item()
    print("The accuracy of total {} images: {}%".format(total, 100 * correct / total))
    return params
  • Epoches: number of iterations
  • lr: learning rate, ie step size
  • input_num, hidden_num, output_num: the number of units in each layer of the neural network
  • torch.device(): Deploy the tensor on the specified computing device for calculation, here is the GPU
  • model.to(device): When we specify the device, we need to load the model into the corresponding device
  • loss_func: Here we specify the cross-entropy loss function. To calculate loss, pass in output and labels. To view loss, you need to call loss.item()
  • optimizer: Adam optimization is specified
  • For loop iterative optimization: remember the Four King Kong, calculate the loss function → clear the gradient → backpropagation → update

The above is the process of updating a simple neural network, followed by testing:

  • params = list(model.named_parameters()): Get the model parameters, which are displayed in the form of multiple lists
  • The for loop calculates the accuracy rate of the model: the output of each batch is calculated by softmax (the operation here is to use the max function to return the predicte), and then add up the same label comparison values ​​(1 to 0 error).

Finally, the entire function will return the parameter params of the model at this time, which is used for federated aggregation.

3. Post-Federation Training

        The first is to define the neural network:

class NeuralNet(nn.Module):
    def __init__(self, input_num, hidden_num, output_num, com_para_fc1, com_para_fc2):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_num, hidden_num)
        self.fc2 = nn.Linear(hidden_num, output_num)
        self.fc1.weight = Parameter(com_para_fc1)
        self.fc2.weight = Parameter(com_para_fc2)
        nn.init.constant_(self.fc1.bias, val=0)
        nn.init.constant_(self.fc2.bias, val=0)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        y = self.fc2(x)
        return y

        It can be compared with the previous network, the difference is:

  • Two linear layer weights, initialized by input parameters
def train_and_test_2(train_loader, test_loader, com_para_fc1, com_para_fc2):
    class NeuralNet(nn.Module):...

    epoches = 20
    lr = 0.01
    input_num = 784
    hidden_num = 12
    output_num = 10
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    model = NeuralNet(input_num, hidden_num, output_num, com_para_fc1, com_para_fc2)
    model.to(device)
    loss_func = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    # optimizer = optim.SGD(model.parameters(), lr=lr)

    for epoch in range(epoches):
        flag = 0
        for images, labels in train_loader:
            # (images, labels) = data
            images = images.reshape(-1, 28 * 28).to(device)
            labels = labels.to(device)
            output = model(images)

            loss = loss_func(output, labels)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # if (flag + 1) % 10 == 0:
            # print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch + 1, epoches, loss.item()))
            flag += 1
    params = list(model.named_parameters())  # get the index by debuging

    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28 * 28).to(device)
        labels = labels.to(device)
        output = model(images)
        values, predicte = torch.max(output, 1)
        total += labels.size(0)
        correct += (predicte == labels).sum().item()
    print("The accuracy of total {} images: {}%".format(total, 100 * correct / total))
    return params

        Except for the network, the rest is the same as normal training.

4. Federal Average

        Here we only average w:

def combine_params(para_A, para_B, para_C, para_D, para_E):
    fc1_wA = para_A[0][1].data
    fc1_wB = para_A[0][1].data
    fc1_wC = para_A[0][1].data

    fc2_wA = para_A[2][1].data
    fc2_wB = para_A[2][1].data
    fc2_wC = para_A[2][1].data

    com_para_fc1 = (fc1_wA + fc1_wB + fc1_wC) / 3
    com_para_fc2 = (fc2_wA + fc2_wB + fc2_wC) / 3
    return com_para_fc1, com_para_fc2
  • para_A[0][1] and para_A[2][1]: As mentioned above, the network parameters we extract are a list, which is composed of tuple tuples, each tuple is the name of the weight (str) And weight data (Parameter), including w and b of the first linear layer, w and b of the second linear layer.
  • para_A[0][1].data: In order to be able to operate directly, the data method must be called from the Parameter type to convert it into tensor data.

        It can be seen that the average here is only the average of the w parts of the two linear layers in the three models. Finally, the averaged coefficients of the two linear layers are returned.

5. Main function

if __name__ == '__main__':
    print('\033[31m'+'Start training model ABC at 1st time...'+'\033[0m')
    para_A = train_and_test_1(train_loader_A, test_loader)
    para_B = train_and_test_1(train_loader_B, test_loader)
    para_C = train_and_test_1(train_loader_C, test_loader)
    for i in range(6):
        print('\033[31m'+'The {} round to be federated!!!'.format(i + 1)+'\033[0m')
        com_para_fc1, com_para_fc2 = combine_params(para_A, para_B, para_C)
        para_A = train_and_test_2(train_loader_A, test_loader, com_para_fc1, com_para_fc2)
        para_B = train_and_test_2(train_loader_B, test_loader, com_para_fc1, com_para_fc2)
        para_C = train_and_test_2(train_loader_C, test_loader, com_para_fc1, com_para_fc2)
  • In print, add '\033[31m' to change the subsequent characters to red

        Here, the initial moment is simulated first, each machine trains its own model, and then in the loop, all the models are aggregated, and then the aggregated model is received, and the local data continues to be used for training.


3. Training results

 (first round of training, without federated optimization)

  (The sixth round of training, federated optimization)

Guess you like

Origin blog.csdn.net/m0_51562349/article/details/127392119