NNDL Experiment Six Convolutional Neural Networks (4) ResNet18 implements MNIST

5.4 Handwritten digit recognition experiment based on residual network

Residual Network (ResNet) is a method of adding direct edges to nonlinear layers in the neural network model to alleviate the vanishing gradient problem, making it easier to train deep neural networks.

In the residual network, the most basic unit is the residual unit .

Definition of ( x ; θ ) f(\mathbf x;\theta)f(x;θ ) is one or more neural layers, and the residual unit is atf ( ) f()Add a direct edgebetween the input and output of f ( ) .

Different from the traditional network structure, let the network f (x; θ) f(x;\theta)f(x;θ ) to approximate an objective functionh ( x ) h(x)h ( x ) , in the residual network, the objective functionh ( x ) h(x)h ( x ) is split into two parts: the identity functionxxx and the residual functionh ( x ) − xh(x)-xh(x)x

Res B lockf ( x ) = f ( x ; θ ) + x , ( 5.22 ) \mathrm{ResBlock}_f(\mathbf x) = f(\mathbf x;\theta) + \mathbf x,(5.22)ResBlockf(x)=f(x;i )+x,5.22

Among them θ \thetaθ is a learnable parameter.

A typical residual unit is shown in Figure 5.14, consisting of multiple cascaded convolutional layers and a direct edge across the layers.


Figure 5.14: Residual unit structure

A residual network usually consists of many residual units stacked together. Let's build a residual network that is very typical in computer vision: ResNet18, and repeat the handwritten digit recognition task in the previous section.

5.4.1 Model construction

In this section, we first build the residual unit of ResNet18, and then build a complete network.

5.4.1.1 Residual unit

Here, we implement an operator ResBlockto construct a residual unit, in which use_residualparameters are defined to control whether to use residual connections in subsequent experiments.


The input and output shapes of the nonlinear layer wrapped by the residual unit should be consistent. If the number of channels of a convolutional layer's input feature map and output feature map is inconsistent, its output and input feature map cannot be added directly. To solve the above problem, we can use 1 × 1 1 \times 11×A convolution of size 1 maps the number of channels of the input feature map to the consistent number of channels of the output feature map of the concatenated convolution.

1 × 1 1 \times 1 1×1 Convolution: Exactly the same as standard convolution, the only special point is that the size of the convolution kernel is1 × 1 1 \times 11×1 , that is, the relationship between the local information of the input data is not considered, but the focus is on different channels. By using1 × 1 1 \times 11×1 Convolution can play the following roles:

  • Realize cross-channel interaction and integration of information. Considering that the input and output of the convolution operation have three dimensions (width, height, multi-channel), so 1 × 1 1 \times 11×1 Convolution is actually a linear combination of each pixel on different channels, thereby integrating the information of different channels;
  • Dimensionally reduce and increase the number of convolution kernel channels to reduce the number of parameters. After 1 × 1 1 \times 11×1. The output after convolution retains the original planar structure of the input data, and by adjusting the number of channels, it can complete the function of dimensionality increase or decrease;
  • Use 1 × 1 1 \times 11×1 The nonlinear activation function after convolution greatly increases nonlinearity while keeping the size of the feature map unchanged.

class ResBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, use_residual=True):
        super(ResBlock, self).__init__()
        self.stride = stride
        self.use_residual = use_residual
        # 第一个卷积层,卷积核大小为3×3,可以设置不同输出通道数以及步长
        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, padding=1, stride=self.stride, bias=False)
        # 第二个卷积层,卷积核大小为3×3,不改变输入特征图的形状,步长为1
        self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1, bias=False)
 
        # 如果conv2的输出和此残差块的输入数据形状不一致,则use_1x1conv = True
        # 当use_1x1conv = True,添加1个1x1的卷积作用在输入数据上,使其形状变成跟conv2一致
        if in_channels != out_channels or stride != 1:
            self.use_1x1conv = True
        else:
            self.use_1x1conv = False
        # 当残差单元包裹的非线性层输入和输出通道数不一致时,需要用1×1卷积调整通道数后再进行相加运算
        if self.use_1x1conv:
            self.shortcut = nn.Conv2d(in_channels, out_channels, 1, stride=self.stride, bias=False)
 
        # 每个卷积层后会接一个批量规范化层,批量规范化的内容在7.5.1中会进行详细介绍
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        if self.use_1x1conv:
            self.bn3 = nn.BatchNorm2d(out_channels)
 
    def forward(self, inputs):
        y = F.relu(self.bn1(self.conv1(inputs)))
        y = self.bn2(self.conv2(y))
        if self.use_residual:
            if self.use_1x1conv:  # 如果为真,对inputs进行1×1卷积,将形状调整成跟conv2的输出y一致
                shortcut = self.shortcut(inputs)
                shortcut = self.bn3(shortcut)
            else:  # 否则直接将inputs和conv2的输出y相加
                shortcut = inputs
            y = torch.add(shortcut, y)
        out = F.relu(y)
        return out
5.4.1.2 Overall structure of residual network

The residual network is a very deep network composed of many residual units connected in series. The network structure of ResNet18 is shown in Figure 5.16.


Figure 5.16: Residual network

For ease of understanding, the ResNet18 network can be divided into 6 modules:

  • The first module: contains a step size of 2 and a size of 7 × 7 7 \times 77×7 convolution layer, the number of output channels of the convolution layer is 64, the output of the convolution layer is processed by batch normalization and ReLU activation function, and then connected with a 3 × 3 3 \times 3 with a step size of23×Maximum aggregation layer of 3 ;
  • The second module: contains two residual units. After operation, the number of output channels is 64, and the size of the feature map remains unchanged;
  • The third module: contains two residual units. After operation, the number of output channels is 128, and the size of the feature map is reduced by half;
  • The fourth module: contains two residual units. After operation, the number of output channels is 256, and the size of the feature map is reduced by half;
  • The fifth module: contains two residual units. After operation, the number of output channels is 512, and the size of the feature map is reduced by half;
  • The sixth module: includes a global average pooling layer, turning the feature map into 1 × 1 1 \times 11×The size of 1 is finally calculated through the fully connected layer to calculate the final output.

The code implementation of the ResNet18 model is as follows:
Definition module one.

def make_first_module(in_channels):
    # 模块一:7*7卷积、批量规范化、汇聚
    m1 = nn.Sequential(nn.Conv2d(in_channels, 64, 7, stride=2, padding=3),
                    nn.BatchNorm2d(64), nn.ReLU(),
                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    return m1

Define module two to module five.

def resnet_module(input_channels, out_channels, num_res_blocks, stride=1, use_residual=True):
    blk = []
    # 根据num_res_blocks,循环生成残差单元
    for i in range(num_res_blocks):
        if i == 0: # 创建模块中的第一个残差单元
            blk.append(ResBlock(input_channels, out_channels,
                                stride=stride, use_residual=use_residual))
        else:      # 创建模块中的其他残差单元
            blk.append(ResBlock(out_channels, out_channels, use_residual=use_residual))
    return blk

Package module two to module five:

def make_modules(use_residual):
    # 模块二:包含两个残差单元,输入通道数为64,输出通道数为64,步长为1,特征图大小保持不变
    m2 = nn.Sequential(*resnet_module(64, 64, 2, stride=1, use_residual=use_residual))
    # 模块三:包含两个残差单元,输入通道数为64,输出通道数为128,步长为2,特征图大小缩小一半。
    m3 = nn.Sequential(*resnet_module(64, 128, 2, stride=2, use_residual=use_residual))
    # 模块四:包含两个残差单元,输入通道数为128,输出通道数为256,步长为2,特征图大小缩小一半。
    m4 = nn.Sequential(*resnet_module(128, 256, 2, stride=2, use_residual=use_residual))
    # 模块五:包含两个残差单元,输入通道数为256,输出通道数为512,步长为2,特征图大小缩小一半。
    m5 = nn.Sequential(*resnet_module(256, 512, 2, stride=2, use_residual=use_residual))
    return m2, m3, m4, m5

Define the complete network.

# 定义完整网络
class Model_ResNet18(nn.Module):
    def __init__(self, in_channels=3, num_classes=10, use_residual=True):
        super(Model_ResNet18,self).__init__()
        m1 = make_first_module(in_channels)
        m2, m3, m4, m5 = make_modules(use_residual)
        # 封装模块一到模块6
        self.net = nn.Sequential(m1, m2, m3, m4, m5,
                        # 模块六:汇聚层、全连接层
                        nn.AdaptiveAvgPool2d(1), nn.Flatten(), nn.Linear(512, num_classes) )
 
    def forward(self, x):
        return self.net(x)

The parameters of the torchsummary statistical model can also be used here.

from torchsummary import summary
 
model = Model_ResNet18(in_channels=1, num_classes=10, use_residual=True)
params_info = summary(model,input_size=(1,64,32))
print(params_info)

operation result:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 32, 16]           3,200
       BatchNorm2d-2           [-1, 64, 32, 16]             128
              ReLU-3           [-1, 64, 32, 16]               0
         MaxPool2d-4            [-1, 64, 16, 8]               0
            Conv2d-5            [-1, 64, 16, 8]          36,864
       BatchNorm2d-6            [-1, 64, 16, 8]             128
            Conv2d-7            [-1, 64, 16, 8]          36,864
       BatchNorm2d-8            [-1, 64, 16, 8]             128
          ResBlock-9            [-1, 64, 16, 8]               0
           Conv2d-10            [-1, 64, 16, 8]          36,864
      BatchNorm2d-11            [-1, 64, 16, 8]             128
           Conv2d-12            [-1, 64, 16, 8]          36,864
      BatchNorm2d-13            [-1, 64, 16, 8]             128
         ResBlock-14            [-1, 64, 16, 8]               0
           Conv2d-15            [-1, 128, 8, 4]          73,728
      BatchNorm2d-16            [-1, 128, 8, 4]             256
           Conv2d-17            [-1, 128, 8, 4]         147,456
      BatchNorm2d-18            [-1, 128, 8, 4]             256
           Conv2d-19            [-1, 128, 8, 4]           8,192
      BatchNorm2d-20            [-1, 128, 8, 4]             256
         ResBlock-21            [-1, 128, 8, 4]               0
           Conv2d-22            [-1, 128, 8, 4]         147,456
      BatchNorm2d-23            [-1, 128, 8, 4]             256
           Conv2d-24            [-1, 128, 8, 4]         147,456
      BatchNorm2d-25            [-1, 128, 8, 4]             256
         ResBlock-26            [-1, 128, 8, 4]               0
           Conv2d-27            [-1, 256, 4, 2]         294,912
      BatchNorm2d-28            [-1, 256, 4, 2]             512
           Conv2d-29            [-1, 256, 4, 2]         589,824
      BatchNorm2d-30            [-1, 256, 4, 2]             512
           Conv2d-31            [-1, 256, 4, 2]          32,768
      BatchNorm2d-32            [-1, 256, 4, 2]             512
         ResBlock-33            [-1, 256, 4, 2]               0
           Conv2d-34            [-1, 256, 4, 2]         589,824
      BatchNorm2d-35            [-1, 256, 4, 2]             512
           Conv2d-36            [-1, 256, 4, 2]         589,824
      BatchNorm2d-37            [-1, 256, 4, 2]             512
         ResBlock-38            [-1, 256, 4, 2]               0
           Conv2d-39            [-1, 512, 2, 1]       1,179,648
      BatchNorm2d-40            [-1, 512, 2, 1]           1,024
           Conv2d-41            [-1, 512, 2, 1]       2,359,296
      BatchNorm2d-42            [-1, 512, 2, 1]           1,024
           Conv2d-43            [-1, 512, 2, 1]         131,072
      BatchNorm2d-44            [-1, 512, 2, 1]           1,024
         ResBlock-45            [-1, 512, 2, 1]               0
           Conv2d-46            [-1, 512, 2, 1]       2,359,296
      BatchNorm2d-47            [-1, 512, 2, 1]           1,024
           Conv2d-48            [-1, 512, 2, 1]       2,359,296
      BatchNorm2d-49            [-1, 512, 2, 1]           1,024
         ResBlock-50            [-1, 512, 2, 1]               0
AdaptiveAvgPool2d-51            [-1, 512, 1, 1]               0
          Flatten-52                  [-1, 512]               0
           Linear-53                   [-1, 10]           5,130
================================================================
Total params: 11,175,434
Trainable params: 11,175,434
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 2.10
Params size (MB): 42.63
Estimated Total Size (MB): 44.74
----------------------------------------------------------------
None

The computational effort of using thop statistical model.

from thop import profile
 
FLOPs, PARAMs = profile(model, (1, 1, 32, 32), report_missing=True)
print(FLOPs, PARAMs)

In order to verify that the residual connection can promote the training of deep convolutional neural networks, first use ResNet18 (use_residual is set to False) to conduct a handwritten digit recognition experiment, then add the residual connection (use_residual is set to True), and observe the experiment Contrast effect.

5.4.2 ResNet18 without residual connection

In order to verify the effect of residual connection, experiments were first conducted using ResNet18 without residual connection.

5.4.2.1 Model training

Use the training set and validation set for model training, and train for a total of 5 epochs. In experiments, the model with the highest accuracy is saved as the best model. The code is implemented as follows

import plot
import metric
 
# 学习率大小
lr = 0.1
# 批次大小
batch_size = 64
# 加载数据
train_loader =	torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
dev_loader = 	torch.utils.data.DataLoader(dev_dataset, batch_size=batch_size)
test_loader = 	torch.utils.data.DataLoader(test_dataset, batch_size=batch_size)
# 定义LeNet网络
# 自定义算子实现的LeNet-5
model = Model_LeNet(in_channels=1, num_classes=10)
# 飞桨API实现的LeNet-5
# model = Paddle_LeNet(in_channels=1, num_classes=10)
# 定义优化器
optimizer = torch.optim.SGD(lr=lr, params=model.parameters())
# 定义损失函数
loss_fn = F.cross_entropy
# 定义评价指标
metric = metric.Accuracy(is_logist=True)
# 实例化RunnerV3
runner = RunnerV3(model, optimizer, loss_fn, metric)
# 启动训练
log_steps = 15
eval_steps = 15
runner.train(train_loader, dev_loader, num_epochs=5, log_steps=log_steps, 
            eval_steps=eval_steps, save_path="best_model.pdparams")
# 可视化观察训练集与验证集的Loss变化情况
plot(runner, 'cnn-loss2.pdf')

Among them plot.py:

import matplotlib.pyplot as plt

#新增绘制图像方法
def plot(runner,fig_name,x):
    plt.figure(figsize=(10,5))
    plt.subplot(1,2,1)
    epochs = [i for i in range(len(runner.train_scores))]
    #绘制训练损失变化曲线
    plt.plot(epochs, runner.train_loss, color='#e4007f', label=f"Train loss (num_epochs={
      
      x:d})")
    #绘制评价损失变化曲线
    plt.plot(epochs, runner.dev_loss, color='#f19ec2', linestyle='--', label="Dev loss")
    #绘制坐标轴和图例
    plt.ylabel("loss", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='upper right', fontsize='x-large')
    plt.subplot(1,2,2)
    #绘制训练准确率变化曲线
    plt.plot(epochs, runner.train_scores, color='#e4007f', label="Train accuracy")
    #绘制评价准确率变化曲线
    plt.plot(epochs, runner.dev_scores, color='#f19ec2', linestyle='--', label="Dev accuracy")
    #绘制坐标轴和图例
    plt.ylabel("score", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='lower right', fontsize='x-large')
    plt.tight_layout()
    plt.savefig(fig_name)
    plt.show()

operation result:

[Train] epoch: 0/5, step: 0/80, loss: 2.38001
[Train] epoch: 0/5, step: 15/80, loss: 1.27386
[Evaluate]  dev score: 0.09000, dev loss: 2.29575
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.09000
[Train] epoch: 1/5, step: 30/80, loss: 0.41643
[Evaluate]  dev score: 0.24500, dev loss: 2.27140
[Evaluate] best accuracy performence has been updated: 0.09000 --> 0.24500
[Train] epoch: 2/5, step: 45/80, loss: 0.32777
[Evaluate]  dev score: 0.81000, dev loss: 1.27506
[Evaluate] best accuracy performence has been updated: 0.24500 --> 0.81000
[Train] epoch: 3/5, step: 60/80, loss: 0.19541
[Evaluate]  dev score: 0.88500, dev loss: 0.48197
[Evaluate] best accuracy performence has been updated: 0.81000 --> 0.88500
[Train] epoch: 4/5, step: 75/80, loss: 0.04977
[Evaluate]  dev score: 0.93000, dev loss: 0.34029
[Evaluate] best accuracy performence has been updated: 0.88500 --> 0.93000
[Evaluate]  dev score: 0.92500, dev loss: 0.32881
[Train] Training done!

Insert image description here

5.4.2.2 Model evaluation

Use the test data to evaluate the best model saved during the training process, and observe the accuracy and loss of the model on the test set. The code is implemented as follows

# 加载最优模型
runner.load_model('best_model.pdparams')
# 模型评价
score, loss = runner.evaluate(test_loader)
print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))

operation result:

[Test] accuracy/loss: 0.8750/0.3666

Judging from the output results, compared with the LeNet-5 model evaluation experimental results, after the network level is deepened, the training effect does not increase but decreases.

5.4.3 ResNet18 with residual connection

5.4.3.1 Model training

Repeat the above experiment using ResNet18 with residual connection. The code is implemented as follows:

# 学习率大小
lr = 0.01
# 批次大小
batch_size = 64
# 加载数据
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
dev_loader = torch.utils.data.DataLoader(dev_dataset, batch_size=batch_size)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size)
# 定义网络,通过指定use_residual为True,使用残差结构的深层网络
model = Model_ResNet18(in_channels=1, num_classes=10, use_residual=True)
# 定义优化器
optimizer = opt.SGD(model.parameters(), lr)
# 实例化RunnerV3
runner = RunnerV3(model, optimizer, loss_fn, metric)
# 启动训练
log_steps = 15
eval_steps = 15
runner.train(train_loader, dev_loader, num_epochs=5, log_steps=log_steps,
            eval_steps=eval_steps, save_path="best_model.pdparams")
 
# 可视化观察训练集与验证集的Loss变化情况
plot(runner, 'cnn-loss3.pdf')

operation result:

[Train] epoch: 0/5, step: 0/80, loss: 2.38001
[Train] epoch: 0/5, step: 15/80, loss: 1.27386
[Evaluate]  dev score: 0.09000, dev loss: 2.29575
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.09000
[Train] epoch: 1/5, step: 30/80, loss: 0.41643
[Evaluate]  dev score: 0.24500, dev loss: 2.27140
[Evaluate] best accuracy performence has been updated: 0.09000 --> 0.24500
[Train] epoch: 2/5, step: 45/80, loss: 0.32777
[Evaluate]  dev score: 0.81000, dev loss: 1.27506
[Evaluate] best accuracy performence has been updated: 0.24500 --> 0.81000
[Train] epoch: 3/5, step: 60/80, loss: 0.19541
[Evaluate]  dev score: 0.88500, dev loss: 0.48197
[Evaluate] best accuracy performence has been updated: 0.81000 --> 0.88500
[Train] epoch: 4/5, step: 75/80, loss: 0.04977
[Evaluate]  dev score: 0.93000, dev loss: 0.34029
[Evaluate] best accuracy performence has been updated: 0.88500 --> 0.93000
[Evaluate]  dev score: 0.92500, dev loss: 0.32881
[Train] Training done!
5.4.3.2 Model evaluation

Use the test data to evaluate the best model saved during the training process, and observe the accuracy and loss of the model on the test set.

# 加载最优模型
runner.load_model('best_model.pdparams')
# 模型评价
score, loss = runner.evaluate(test_loader)
print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))

operation result:

[Test] accuracy/loss: 0.9600/0.1266

After adding the residual connection, the model convergence curve is smoother. Judging from the output results, compared with ResNet without residual connections, the model effect has been improved to a certain extent after adding residual connections.

5.4.4 Comparison experiment with high-level API implementation version

For Reset18, a relatively classic image classification network, Fei Paddle’s high-level API provides you with a well-implemented version, so you don’t need to start from scratch. Here, the high-level API version of the resnet18 model and the customized resnet18 model are given the same weight, and the same input data is used to observe whether the output results are consistent.

from torchvision.models import  resnet18
hapi_model = resnet18(pretrained=True)
# 自定义的resnet18模型
model = Model_ResNet18(in_channels=3, num_classes=1000, use_residual=True)
 
# 获取网络的权重
params = hapi_model.state_dict()
# 用来保存参数名映射后的网络权重
new_params = {
    
    }
# 将参数名进行映射
for key in params:
    if 'layer' in key:
        if 'downsample.0' in key:
            new_params['net.' + key[5:8] + '.shortcut' + key[-7:]] = params[key]
        elif 'downsample.1' in key:
            new_params['net.' + key[5:8] + '.shorcutt' + key[23:]] = params[key]
        else:
            new_params['net.' + key[5:]] = params[key]
    elif 'conv1.weight' == key:
        new_params['net.0.0.weight'] = params[key]
    elif 'bn1' in key:
        new_params['net.0.1' + key[3:]] = params[key]
    elif 'fc' in key:
        new_params['net.7' + key[2:]] = params[key]
 
del new_params["net.2.0.shorcutteight"]
del new_params["net.2.0.shorcuttias"]
del new_params["net.2.0.shorcuttunning_mean"]
del new_params["net.2.0.shorcuttunning_var"]
del new_params["net.2.0.shorcuttum_batches_tracked"]
del new_params["net.3.0.shorcutteight"]
del new_params["net.3.0.shorcuttias"]
del new_params["net.3.0.shorcuttunning_mean"]
del new_params["net.3.0.shorcuttunning_var"]
del new_params["net.3.0.shorcuttum_batches_tracked"]
del new_params["net.4.0.shorcutteight"]
del new_params["net.4.0.shorcuttias"]
del new_params["net.4.0.shorcuttunning_mean"]
del new_params["net.4.0.shorcuttunning_var"]
del new_params["net.4.0.shorcuttum_batches_tracked"]
 
 
inputs = np.random.randn(*[3, 3, 32, 32])
inputs = inputs.astype('float32')
x = torch.tensor(inputs)
output = hapi_model(x)
hapi_out = hapi_model(x)
# 计算两个模型输出的差异
diff = output - hapi_out
# 取差异最大的值
max_diff = torch.max(diff)
print(max_diff)

operation result:

tensor(0., grad_fn=<MaxBackward1>) 

It can be seen that the output results of the high-level API version of the resnet18 model and the customized resnet18 model are consistent, which means they are the same.

Required custom functions

plot.py

def plot(runner, fig_name):
    plt.figure(figsize=(10, 5))
 
    plt.subplot(1, 2, 1)
    train_items = runner.train_step_losses[::30]
    train_steps = [x[0] for x in train_items]
    train_losses = [x[1] for x in train_items]
 
    plt.plot(train_steps, train_losses, color='#8E004D', label="Train loss")
    if runner.dev_losses[0][0] != -1:
        dev_steps = [x[0] for x in runner.dev_losses]
        dev_losses = [x[1] for x in runner.dev_losses]
        plt.plot(dev_steps, dev_losses, color='#E20079', linestyle='--', label="Dev loss")
    # 绘制坐标轴和图例
    plt.ylabel("loss", fontsize='x-large')
    plt.xlabel("step", fontsize='x-large')
    plt.legend(loc='upper right', fontsize='x-large')
 
    plt.subplot(1, 2, 2)
    # 绘制评价准确率变化曲线
    if runner.dev_losses[0][0] != -1:
        plt.plot(dev_steps, runner.dev_scores,
                 color='#E20079', linestyle="--", label="Dev accuracy")
    else:
        plt.plot(list(range(len(runner.dev_scores))), runner.dev_scores,
                 color='#E20079', linestyle="--", label="Dev accuracy")
    # 绘制坐标轴和图例
    plt.ylabel("score", fontsize='x-large')
    plt.xlabel("step", fontsize='x-large')
    plt.legend(loc='lower right', fontsize='x-large')
 
    plt.savefig(fig_name)
    plt.show()

metric.py

import torch
 
 
class Accuracy():
    def __init__(self, is_logist=True):
        """
        输入:
           - is_logist: outputs是logist还是激活后的值
        """
 
        # 用于统计正确的样本个数
        self.num_correct = 0
        # 用于统计样本的总数
        self.num_count = 0
 
        self.is_logist = is_logist
 
    def update(self, outputs, labels):
        """
        输入:
           - outputs: 预测值, shape=[N,class_num]
           - labels: 标签值, shape=[N,1]
        """
 
        # 判断是二分类任务还是多分类任务,shape[1]=1时为二分类任务,shape[1]>1时为多分类任务
        if outputs.shape[1] == 1:  # 二分类
            outputs = torch.squeeze(outputs, dim=-1)
            if self.is_logist:
                # logist判断是否大于0
                preds = torch.tensor((outputs >= 0), dtype=torch.float32)
            else:
                # 如果不是logist,判断每个概率值是否大于0.5,当大于0.5时,类别为1,否则类别为0
                preds = torch.tensor((outputs >= 0.5), dtype=torch.float32)
        else:
            # 多分类时,使用'paddle.argmax'计算最大元素索引作为类别
            preds = torch.argmax(outputs, dim=1)
            preds = torch.tensor(preds, dtype=torch.int64)
 
        # 获取本批数据中预测正确的样本个数
        labels = torch.squeeze(labels, dim=-1)
        batch_correct = torch.sum(torch.tensor(preds == labels, dtype=torch.float32)).numpy()
        batch_count = len(labels)
 
        # 更新num_correct 和 num_count
        self.num_correct += batch_correct
        self.num_count += batch_count
 
    def accumulate(self):
        # 使用累计的数据,计算总的指标
        if self.num_count == 0:
            return 0
        return self.num_correct / self.num_count
 
    def reset(self):
        # 重置正确的数目和总数
        self.num_correct = 0
        self.num_count = 0
 
    def name(self):
        return "Accuracy"

RunnerV3.py

class RunnerV3(object):
    def __init__(self, model, optimizer, loss_fn, metric, **kwargs):
        self.model = model
        self.optimizer = optimizer
        self.loss_fn = loss_fn
        self.metric = metric  # 只用于计算评价指标
 
        # 记录训练过程中的评价指标变化情况
        self.dev_scores = []
 
        # 记录训练过程中的损失函数变化情况
        self.train_epoch_losses = []  # 一个epoch记录一次loss
        self.train_step_losses = []  # 一个step记录一次loss
        self.dev_losses = []
 
        # 记录全局最优指标
        self.best_score = 0
 
    def train(self, train_loader, dev_loader=None, **kwargs):
        # 将模型切换为训练模式
        self.model.train()
 
        # 传入训练轮数,如果没有传入值则默认为0
        num_epochs = kwargs.get("num_epochs", 0)
        # 传入log打印频率,如果没有传入值则默认为100
        log_steps = kwargs.get("log_steps", 100)
        # 评价频率
        eval_steps = kwargs.get("eval_steps", 0)
 
        # 传入模型保存路径,如果没有传入值则默认为"best_model.pdparams"
        save_path = kwargs.get("save_path", "best_model.pdparams")
 
        custom_print_log = kwargs.get("custom_print_log", None)
 
        # 训练总的步数
        num_training_steps = num_epochs * len(train_loader)
 
        if eval_steps:
            if self.metric is None:
                raise RuntimeError('Error: Metric can not be None!')
            if dev_loader is None:
                raise RuntimeError('Error: dev_loader can not be None!')
 
        # 运行的step数目
        global_step = 0
 
        # 进行num_epochs轮训练
        for epoch in range(num_epochs):
            # 用于统计训练集的损失
            total_loss = 0
            for step, data in enumerate(train_loader):
                X, y = data
                # 获取模型预测
                logits = self.model(X)
                y = torch.tensor(y, dtype=torch.int64)
                loss = self.loss_fn(logits, y)  # 默认求mean
                total_loss += loss
 
                # 训练过程中,每个step的loss进行保存
                self.train_step_losses.append((global_step, loss.item()))
 
                if log_steps and global_step % log_steps == 0:
                    print(
                        f"[Train] epoch: {
      
      epoch}/{
      
      num_epochs}, step: {
      
      global_step}/{
      
      num_training_steps}, loss: {
      
      loss.item():.5f}")
 
                # 梯度反向传播,计算每个参数的梯度值
                loss.backward()
 
                if custom_print_log:
                    custom_print_log(self)
 
                # 小批量梯度下降进行参数更新
                self.optimizer.step()
                # 梯度归零
                self.optimizer.zero_grad()
 
                # 判断是否需要评价
                if eval_steps > 0 and global_step > 0 and \
                        (global_step % eval_steps == 0 or global_step == (num_training_steps - 1)):
 
                    dev_score, dev_loss = self.evaluate(dev_loader, global_step=global_step)
                    print(f"[Evaluate]  dev score: {
      
      dev_score:.5f}, dev loss: {
      
      dev_loss:.5f}")
 
                    # 将模型切换为训练模式
                    self.model.train()
 
                    # 如果当前指标为最优指标,保存该模型
                    if dev_score > self.best_score:
                        self.save_model(save_path)
                        print(
                            f"[Evaluate] best accuracy performence has been updated: {
      
      self.best_score:.5f} --> {
      
      dev_score:.5f}")
                        self.best_score = dev_score
 
                global_step += 1
 
            # 当前epoch 训练loss累计值
            trn_loss = (total_loss / len(train_loader)).item()
            # epoch粒度的训练loss保存
            self.train_epoch_losses.append(trn_loss)
 
        print("[Train] Training done!")
 
    # 模型评估阶段,使用'paddle.no_grad()'控制不计算和存储梯度
    @torch.no_grad()
    def evaluate(self, dev_loader, **kwargs):
        assert self.metric is not None
 
        # 将模型设置为评估模式
        self.model.eval()
 
        global_step = kwargs.get("global_step", -1)
 
        # 用于统计训练集的损失
        total_loss = 0
 
        # 重置评价
        self.metric.reset()
 
        # 遍历验证集每个批次
        for batch_id, data in enumerate(dev_loader):
            X, y = data
 
            # 计算模型输出
            logits = self.model(X)
            y = torch.tensor(y, dtype=torch.int64)
 
            # 计算损失函数
            loss = self.loss_fn(logits, y).item()
            # 累积损失
            total_loss += loss
 
            # 累积评价
            self.metric.update(logits, y)
 
        dev_loss = (total_loss / len(dev_loader))
        dev_score = self.metric.accumulate()
 
        # 记录验证集loss
        if global_step != -1:
            self.dev_losses.append((global_step, dev_loss))
            self.dev_scores.append(dev_score)
 
        return dev_score, dev_loss
 
    # 模型评估阶段,使用'paddle.no_grad()'控制不计算和存储梯度
    @torch.no_grad()
    def predict(self, x, **kwargs):
        # 将模型设置为评估模式
        self.model.eval()
        # 运行模型前向计算,得到预测值
        logits = self.model(x)
        return logits
 
    def save_model(self, save_path):
        torch.save(self.model.state_dict(), save_path)
 
    def load_model(self, model_path):
        model_state_dict = torch.load(model_path)
        self.model.set_state_dict(model_state_dict)

Guess you like

Origin blog.csdn.net/weixin_51395608/article/details/127727174