[Hands-on Deep Learning] Li Mu - Convolutional Neural Network

This article is the learning record of Teacher Li Mu’s [Hands-On Deep Learning] course. The specific chapter is the convolutional neural network chapter.

From fully connected layers to convolutions

summary:

  • Translation invariance of images allows us to treat local images in the same way regardless of where it is located
  • Locality means that only a small fraction of the local image pixels are needed to compute the corresponding hidden representation.
  • In image processing, convolutional layers usually require fewer parameters than fully connected layers, but still achieve efficient performance.
  • Convolutional neural network CNN is a special type of neural network that can contain multiple convolutional layers
  • Multiple input and output channels allow the model to obtain multi-faceted features of the image at each spatial location.

Image convolution

import torch
from torch import nn
from d2l import torch as d2l


def corr2d(X, K):  # @save
    h, w = K.shape  # 卷积核的大小
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i:i + h, j:j + w] * K).sum()
    return Y


class Conv2D(nn.Module):
    def __init__(self, kernel_size):
        super().__init__()
        self.weight = nn.Parameter(torch.rand(kernel_size))
        self.bias = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        return corr2d(x, self.weight) + self.bias


X = torch.ones((6,8))
X[:,2:6] = 0
K = torch.tensor([[1.0,-1.0]])
Y = corr2d(X,K)
print(Y)

padding and stride

The shape of the input image is n h × n w n_h \times n_w nh×nw, the convolution shape is k h × k w k_h \times k_w kh×kwWhen , then the output shape is ( n h − k h + 1 ) × ( n w − k w + 1 ) (n_h-k_h+1)\times (n_w -k_w+1 ) (nhkh+1)×(nwkw+1)

Then if filled p h p_h ph行和 p w p_w pw column (average classification of top, bottom, left and right respectively), then the shape of the final output is:
( n h − k h + p h + 1 ) × ( n w − k w + p w + 1 ) (n_h -k_h + p_h + 1)\times(n_w-k_w+p_w+1) (nhkh+ph+1)×(nwkw+pw+1)
Young adjustment vertical width s h s_h sh, horizontal stride is s w s_w swWhen , the output shape is:
⌊ ( n h − k h + p h + s h ) / s h ⌋ × ⌊ ( n w − k w + p w + s w ) / s w ⌋ \lfloor ( n_h-k_h+p_h+s_h)/s_h\rfloor \times \lfloor (n_w-k_w+p_w+s_w)/s_w \rfloor ⌊(nhkh+ph+sh)/sh×⌊(nwkw+pw+sw)/sw

import torch
from torch import nn

def comp_conv2d(conv2d,x):
    x = x.reshape((1,1) + X.shape)  # 将维度弄成4个,前两个为填充和步幅
    y = conv2d(x)
    return y.reshape(y.shape[2:])

conv2d = nn.Conv2d(1,1,kernel_size=(3,5),padding=(0,1),stride=(3,4))
X = torch.rand(size=(8,8))
print(comp_conv2d(conv2d,X).shape)

小结

  • Padding can increase the height and width of the output. This is often used to make the output have the same height and width as the input.
  • The stride can reduce the height and width of the output. For example, the height and width of the output are only the height and width of the input 1 n \frac{1}{n} < /span>n1
  • Padding and strides can be used to effectively adjust the dimensions of your data

Multiple input multiple output channels

For multiple input channels, there are generally convolution kernels with the same number of channels to match them, and then the calculation process isFor each channel input two-dimensional The tensor is operated on the two-dimensional tensor of the convolution kernel of the corresponding channel. Each channel obtains a calculation result, and then the calculation results are added together as the value of the position of the output single channel, as shown below:

Insert image description here

For multiple output channels, each channel can be regarded as a response to different features, assuming< a i=3> c i , c o c_i, c_o cico are the numbers of input and output channels respectively, so in order to get the output of these multiple channels, we need to create a shape for each output channel as c i × k h × k w c_i\times k_h \times k_w ci×kh×kw size convolution kernel tensor, so the shape of the total convolution kernel is c o × c i × k h × k w c_o\times c_i \times k_h \times k_w a>co×ci×kh×kw

There is also a special convolution layer, which is 1 × 1 1\times 1 1×1Convolutional layer. Because the height and width are only 1, it has no ability to identify interactions between adjacent elements in the height and width dimensions. Its only calculation occurs on the channel. As shown below:

Insert image description here

This convolutional layer results in the input and output having the same height and width, but the number of channels changes, and each element in the output is a linear combination of elements from the same position in the input image, which It means that the function of this convolution layer can be regarded as a fully connected layer. Each channel of the input is an input node, and then each channel of the convolution kernel is the corresponding Weight.

Inko 1 × 1 1\times 1 1×1Convolutional layers are usually used to adjust the number of channels in the network layer and control the complexity of the model

Pooling layer (aggregation layer)

The pooling layer can be used to deal with problems where convolution is particularly sensitive to pixel position, such as the following:

Insert image description here

Then pooling includes maximum pooling and average pooling.

The specific implementation is:

pool2d = nn.MaxPool2d((2,3),padding=(1,1),stride=(2,3))

If dealing with multi-channel scenarios, the input and output channels will be kept equal.

小结

  • For a given input element, the max pooling layer outputs the maximum value within the window, and the average pooling layer outputs the average value within the window.
  • One of the main advantages of pooling layers is to alleviate the over-sensitivity of convolutional layers to position
  • The padding and stride of the pooling layer can be specified
  • Using a max pooling layer and a stride greater than 1 can reduce the dimensionality of the space
  • The number of output channels of the pooling layer is the same as the number of input channels

Convolutional Neural Network (LeNet)

import torch
from matplotlib import pyplot as plt
from torch import nn
from d2l import torch as d2l


class Reshape(torch.nn.Module):
    def forward(self, x):
        return x.view(-1, 1, 28, 28)


net = nn.Sequential(
    Reshape(),
    nn.Conv2d(1, 6, kernel_size=5, padding=2),
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Conv2d(6, 16, kernel_size=5),
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Flatten(),
    nn.Linear(16 * 5 * 5, 120),
    nn.Sigmoid(),
    nn.Linear(120, 84),
    nn.Sigmoid(),
    nn.Linear(84, 10)
)

# 载入数据集
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)


# 修改评估函数,使用GPU来计算
def evaluate_accuracy_gpu(net, data_iter, device=None):  # @save
    if isinstance(net, torch.nn.Module):
        net.eval()  # 转为评估模式
        if not device:  # 如果不是为None
            device = next(iter(net.parameters())).device

    metric = d2l.Accumulator(2)

    for X,y in data_iter:
        if isinstance(X, list):
            X = [x.to(device) for x in X]
        else:
            X = X.to(device)
        y = y.to(device)
        metric.add(d2l.accuracy(net(X),y), y.numel())
    return metric[0] / metric[1]


# 对训练函数做改动,使其能够在GPU上跑

def train_ch6(net, train_iter, test_iter, num_eopchs, lr, device):  #@ save
    def init_weights(m):
        if type(m) == nn.Linear or type(m) == nn.Conv2d:
            nn.init.xavier_uniform_(m.weight)
    net.apply(init_weights)
    print("training on:",device)
    net.to(device)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss = nn.CrossEntropyLoss()
    animator = d2l.Animator(xlabel='epoch', xlim=[1,num_eopchs],
                            legend=["train loss",'train acc', 'test,acc'])
    timer, num_batches = d2l.Timer(), len(train_iter)
    for epoch in range(num_eopchs):
        metric = d2l.Accumulator(3)
        net.train()  # 开启训练模式
        for i,(X,y) in enumerate(train_iter):
            timer.start()  # 开始计时
            optimizer.zero_grad()  # 清空梯度
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l.backward()
            optimizer.step()
            with torch.no_grad():
                metric.add(l * X.shape[0], d2l.accuracy(y_hat,y), X.shape[0])
            timer.stop()  # 停止计时
            train_l = metric[0] / metric[2]
            train_acc = metric[1] / metric[2]
            if (i+1) % (num_batches // 5) == 0 or i==num_batches-1:
                animator.add(epoch + (i+ 1) / num_batches,
                             (train_l, train_acc ,None))
        test_acc = evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch+1, (None, None, test_acc))
    print(f'loss{
      
       train_l:.3f},train acc{
      
      train_acc:.3f},'
          f'test acc{
      
      test_acc:.3f}')
    print(f'{
      
      metric[2] * num_eopchs / timer.sum():1f} examples / sec'
          f'on{
      
      str(device)}')


lr, num_epoch = 0.5,20
train_ch6(net, train_iter, test_iter, num_epoch, lr ,d2l.try_gpu())
plt.show()

Insert image description here

loss0.417,train acc0.847,test acc0.836
36144.960085 examples / seconcuda:0

小结

  • Convolutional neural network is a type of network that uses convolutional layers
  • In a convolutional neural network, a combination of convolutional layers, nonlinear activation functions, and pooling layers are used
  • To construct high-performance CNNs, we typically sequence the convolutional layers to gradually reduce the spatial resolution of their representation while increasing the number of channels
  • In traditional convolutional neural networks, the representation obtained by convolutional block encoding needs to be processed by one or more fully connected layers before output.
  • LeNet is one of the first convolutional neural networks released

Deep convolutional neural network (AlexNet)

import torch
from matplotlib import pyplot as plt
from torch import nn
from d2l import torch as d2l

net = nn.Sequential(
    nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Conv2d(96, 256, kernel_size=5, padding=2),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Conv2d(256, 384, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.Conv2d(384, 384, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.Conv2d(384, 256, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Flatten(),
    nn.Linear(6400, 4096),
    nn.ReLU(),
    nn.Dropout(p=0.5),
    nn.Linear(4096, 4096),
    nn.ReLU(),
    nn.Dropout(p=0.5),
    nn.Linear(4096, 10)
)

batch_size = 128
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)
# 读取数据然后将其高和宽都拉成224

lr, num_epochs = 0.01, 10
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
plt.show()

After running for a long time:

Insert image description here

loss 0.328, train acc 0.881, test acc 0.881
666.9 examples/sec on cuda:0

Network using blocks (VGG)

VGG follows the idea of ​​AlexNet and combines multiple convolutional layers and a pooling layer into a block. Then you can specify the number of convolutional layers in each block, as well as the number of blocks, and extract image information through multiple blocks. Then go through the fully connected layer.

The VGG block contains the following content:

  • Multiple convolutional layers with padding to preserve resolution
  • Each convolutional layer is followed by a nonlinear activation function
  • The last pooling layer

The specific code is as follows:

import torch
from matplotlib import pyplot as plt
from torch import nn
from d2l import torch as d2l


def vgg_block(num_convs, in_channels, out_channels):
    # 该函数用来创建单个的VGG块
    layers = []
    for _ in range(num_convs):
        layers.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        layers.append(nn.ReLU())
        in_channels = out_channels
    layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
    return nn.Sequential(*layers)


def vgg(conv_arch):
    conv_blks = []
    in_channels = 1
    # 构建卷积层
    for (num_convs, out_channels) in conv_arch:
        conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
        in_channels = out_channels

    return nn.Sequential(
        *conv_blks,
        nn.Flatten(),
        nn.Linear(out_channels * 7 * 7, 4096),
        nn.ReLU(),
        nn.Dropout(p=0.5),
        nn.Linear(4096, 4096),
        nn.ReLU(),
        nn.Dropout(p=0.5),
        nn.Linear(4096, 10)
    )


conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
# 第一个为块内卷积层个数,第二个为输出通道数

ratio = 4
small_conv_arch = [(pair[0], pair[1] // ratio) for pair in conv_arch]
#  除以ratio减少通道数目
net = vgg(small_conv_arch)

lr, num_epochs, batch_size = 0.05, 10, 128
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)
d2l.train_ch6(net,train_iter, test_iter, num_epochs, lr, d2l.try_gpu())

plt.show()

Insert image description here

loss 0.170, train acc 0.936, test acc 0.912
378.0 examples/sec on cuda:0

小结

  • VGG-11 uses reusable convolutional blocks to construct the network. Different VGG models can be defined by the difference in the number of convolutional layers and the number of output channels in each block.
  • The use of blocks results in very concisely defined networks, and complex networks can be designed efficiently using blocks
  • Deep and narrow convolutions (multiple layers 3 ​​× 3 3\times 3 were found in the study3×3) is shallower and wider (for example, few layers 5 × 5 5\times 5 5×5) is better

Network of Networks (NiN)

The previous networks all have a common feature in that they will finally process the representation of features through a fully connected layer, which results in a large number of parameters. Then NiN hopes to replace the fully connected layer with other modules, so it uses ** 1 × 1 1 \times 1 1×1 convolutional layer**, so 1 NiN block isa normal convolutional layer and two 1 × 1 1 \times 1 1×1 convolutional layer, then after multiple NiN blocks, expand the number of channels to the number of categories you want to output, and then use A global average pooling layer with the number of channels as the number of output categories is used for processing, that is, all averages are performed on each channel to obtain a single scalar, then there is o u t _ c h a n n e l s out\_channels out_c hannelsPersonally connected to If the number of units is increased, the softmax can be restarted and the production is finished.

import torch
from matplotlib import pyplot as plt
from torch import nn
from d2l import torch as d2l


def nin_block(in_channels, out_channels, kernel_size, strides, padding):
    return nn.Sequential(
        nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding),
        # 在第一个卷积层就将其转换为对应的通道数和大小
        nn.ReLU(),
        nn.Conv2d(out_channels, out_channels, kernel_size=1),
        nn.ReLU(),
        nn.Conv2d(out_channels, out_channels, kernel_size=1),
        nn.ReLU()  # 两个1*1的卷积层都不改变大小和通道
    )


net = nn.Sequential(
    nin_block(1, 96, kernel_size=11, strides=4, padding=0),
    nn.MaxPool2d(3, stride=2),  # 使得高宽减半
    nin_block(96, 256, kernel_size=5, strides=1, padding=2),
    nn.MaxPool2d(3,stride=2),
    nin_block(256, 384, kernel_size=3, strides=1, padding=1),
    nn.MaxPool2d(3,stride=2),
    nn.Dropout(p=0.5),
    # 标签类别数为10,因此最后一个输出通道数设为10
    nin_block(384, 10, kernel_size=3, strides=1, padding=1),
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten()  # 将四维度的转成两个维度(批量大小,输出通道数)
)

lr, num_epochs, batch_size = 0.1, 10, 128
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
plt.show()

Insert image description here

loss 0.383, train acc 0.857, test acc 0.847
513.3 examples/sec on cuda:0

小结

  • NiN uses a convolutional layer and multiple 1 × 1 1\times 1 1×1A block composed of convolutional layers that can be used in convolutional neural networks to allow for more pixel nonlinearity
  • NiN removes fully connected layers that are prone to overfitting and replaces them with global average pooling layers whose number of channels is the required number of outputs.
  • Removing fully connected layers can reduce overfitting and significantly reduce the number of parameters.

Network with parallel connections (GoogLeNet)

The problem with the various networks mentioned above is thatthe parameters of each convolutional layer may be different, while DNN The interpretability is so poor that it is difficult for us to explain clearly which hyperparameter convolutional layer we need and is the best. Therefore, in the GoogLeNet network, it introduces the Inception block, which introduces the idea of ​​parallel computing and puts a variety of common convolutional layers with different hyperparameters into it. I hope it can pass Multiple ways to extract features to obtain the most ideal feature extraction effect, as shown below:

Insert image description here

Its specific structure is:

Insert image description here

import torch
from matplotlib import pyplot as plt
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l


class Inception(nn.Module):
    def __init__(self, in_channels, c1,c2,c3, c4, **kwargs):
        super(Inception, self).__init__(**kwargs)
        # 线路1,单1*1卷积层
        self.p1_1 = nn.Conv2d(in_channels, c1, kernel_size=1)
        # 线路2,1*1卷积层后接3*3卷积层
        self.p2_1 = nn.Conv2d(in_channels, c2[0], kernel_size=1)
        self.p2_2 = nn.Conv2d(c2[0],c2[1], kernel_size=3, padding=1)
        # 线路3,1*1卷积层后接上5*5卷积层
        self.p3_1 = nn.Conv2d(in_channels, c3[0], kernel_size=1)
        self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2)
        # 线路4,3*3最大池化层后接上1*1卷积层
        self.p4_1 = nn.MaxPool2d(kernel_size=3, stride=1,padding=1)
        self.p4_2 = nn.Conv2d(in_channels, c4, kernel_size=1)

    def forward(self,x):
        p1 = F.relu(self.p1_1(x))
        p2 = F.relu(self.p2_2(F.relu(self.p2_1(x))))
        p3 = F.relu(self.p3_2(F.relu(self.p3_1(x))))
        p4 = F.relu(self.p4_2(self.p4_1(x)))
        # 再在通道维度上叠加在一起
        return torch.cat((p1,p2,p3,p4),dim=1)


b1 = nn.Sequential(
    nn.Conv2d(1,64, kernel_size=7, stride=2, padding=3),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)

b2 = nn.Sequential(
    nn.Conv2d(64, 64, kernel_size=1),
    nn.ReLU(),
    nn.Conv2d(64, 192, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)

b3 = nn.Sequential(
    Inception(192,64,(96,128),(16,32),32),
    Inception(256,128,(128,192),(32,96),64),
    nn.MaxPool2d(kernel_size=3,stride=2,padding=1)
)

b4 = nn.Sequential(
    Inception(480, 192, (96,208),(16,48), 64),
    Inception(512, 160, (112,224),(24,64), 64),
    Inception(512,128,(128,256),(24,64),64),
    Inception(512,112, (144,288),(32,64), 64),
    Inception(528, 256, (160,320),(32,128),128),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)

b5 = nn.Sequential(
    Inception(832,256, (160,320),(32,128),128),
    Inception(832, 384, (192,384), (48,128),128),
    nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten()
)

net = nn.Sequential(
    b1,b2,b3,b4,b5,nn.Linear(1024,10)
)

lr, num_epochs, batch_size = 0.05, 10, 128
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
plt.show()
"""
x = torch.rand(size=(1,1,96,96))
for layer in net:
    x = layer(x)
    print(layer.__class__.__name__, 'output shape \t', x.shape)
"""

Insert image description here

loss 0.284, train acc 0.891, test acc 0.884
731.9 examples/sec on cuda:0

小结

  • The Inception block is equivalent to a sub-network with 4 paths. It extracts information in parallel through convolutional layers and maximum pooling layers of different window shapes, and uses 1 × 1 1\times 1 1×1The convolutional layer reduces the channel dimension at each pixel level thereby reducing model complexity
  • GoogLeNet connects multiple carefully designed Inception blocks with other layers (convolutional layers, fully connected layers). The channel number allocation ratio of the Inception blocks is obtained through a large number of experiments on the ImageNet data set.
  • GoogLeNet and its successors were once among the most efficient models on ImageNet: it provided similar test accuracy with lower computational complexity

batch normalization

During the training process, under normal circumstances, the gradient of the subsequent layers will be relatively large, while the gradient of the previous layer will become smaller due to multiplication through multi-layer propagation. At this time, the learning rate If it is fixed, thenthe previous layer will be updated slower, and the later layer will be updated faster. Then when the update of the later layer is about to be completed, due to changes in the previous layer, Then the subsequent layers need to be updated again.

Then the idea of ​​batch normalization is: It is applied after each convolutional layer or linear layer, and its output is normalized to a certain distribution (different layers belong to The distributions obtained are different and they are learned separately), then limiting to a desired distribution can make the convergence faster.

Assume that the samples obtained from the current batch B are x = ( x 1 , x 2 , . . . , x n ) \pmb{x}=(x_1,x_2,.. .,x_n) xx=(x1,x2,...,xn),See:
μ ^ B = 1 ∣ B ∣ ∑ i ∈ B x i σ ^ B 2 = 1 ∣ B ∣ ∑ i ∈ B ( x i − μ ^ B ) 2 + ϵ ( ϵ infinitesimal 0 ) B N ( x i ) = γ x i − μ ^ B σ ^ B + β \hat{\mu}_B=\frac{1}{ \vert B\vert}\sum_{i\in B}x_i\\ good{\sigma}^2_B=\frac{1}{\vert B \vert}\sum_{i\in B}(x_i -\ hat{\mu}_B)^2+\epsilon~~(\epsilon wavelength range0)\\ BN(x_i)=\gamma \frac{x_i - \hat{\mu}_B}{\hat{\sigma }_B}+\betam^B=B1iBxip^B2=B1iB(xim^B)2+ϵ(ϵPrevention method0)  BN(xi)=cp^Bxim^B+β
可以认为 γ , β \gamma, \beta γ, β are respectively the variance and mean of the distribution to be normalized, which are two Parameters to be learned.

Research pointed out that its function may be to control the complexity of the model by adding noise to each mini-batch, because the batch is random Obtained, so the mean and variance of the batch are different, which is equivalent to adding a random offset to the batch μ ^ B \hat{\mu}_B m^Band random scaling σ ^ B \hat{\sigma}_B p^B. Note that it is not required to be used with Dropout.

It can be applied to the output of the fully connected layer and the convolutional layer, before the activation function, and can also be applied to the input of the fully connected layer and the convolutional layer:

  • For the fully connected layer, its role is in the feature dimension
  • For the convolutional layer, it acts on the channel dimension

And when we use batch normalization in training, we need to write down the mean value of the entire sample data set for each place where batch normalization is used. What is the sum of the variance, so that the prediction samples can be standardized when making predictions.

import torch
from matplotlib import pyplot as plt
from torch import nn
from d2l import torch as d2l


def batch_norm(X, gamma, beta, moving_mean, moving_var, eps, momentum):
    if not torch.is_grad_enabled():  # 说明当前在预测
        X_hat = (X - moving_mean) / torch.sqrt(moving_var + eps)  # 防止方差为0
        # 这两个参数就是整个数据集的均值和方差
    else:
        assert len(X.shape) in (2,4)  # 维度数目为2,是全连接层,为4是卷积层
        if len(X.shape) == 2:
            mean = X.mean(dim = 0)
            var = ((X - mean) ** 2 ).mean(dim = 0)
        else:
            mean = X.mean(dim=(0,2,3),keepdim=True)
            # 每一个通道是一个不同的特征,其提取了图像不同的特征,因此对通道维计算均值方差
            var = ((X - mean) ** 2).mean(dim=(0,2,3), keepdim = True)
        # 当前在训练模式
        X_hat = (X - mean) / torch.sqrt(var + eps)
        moving_mean = momentum * moving_mean + (1.0 - momentum) * mean
        moving_var = momentum * moving_var + (1.0 - momentum) * var
    Y = gamma * X_hat + beta
    return Y, moving_mean.data, moving_var.data


class BatchNorm(nn.Module):
    def __init__(self,num_features, num_dims):
        super().__init__()
        if num_dims == 2:
            shape = (1, num_features)
        else:
            shape = (1, num_features, 1, 1)
        self.gamma = nn.Parameter(torch.ones(shape))
        self.beta = nn.Parameter(torch.zeros(shape))
        self.moving_mean = torch.zeros(shape)
        self.moving_var = torch.ones(shape)

    def forward(self, X):
        if self.moving_mean.device != X.device:
            self.moving_mean = self.moving_mean.to(X.device)
            self.moving_var = self.moving_var.to(X.device)
        Y,self.moving_mean, self.moving_var = batch_norm(X,self.gamma, self.beta, self.moving_mean,
                                                         self.moving_var, eps=1e-5, momentum=0.9)
        return Y


net = nn.Sequential(nn.Conv2d(1, 6, kernel_size=5),
                    BatchNorm(6, num_dims=4),
                    nn.Sigmoid(),
                    nn.MaxPool2d(kernel_size=2, stride=2),
                    nn.Conv2d(6, 16,kernel_size=5),
                    BatchNorm(16, num_dims=4),
                    nn.Sigmoid(),
                    nn.MaxPool2d(kernel_size=2, stride=2),
                    nn.Flatten(),
                    nn.Linear(16 * 4 * 4, 120),
                    BatchNorm(120, num_dims=2),
                    nn.Sigmoid(),
                    nn.Linear(120, 84),
                    BatchNorm(84, num_dims=2),
                    nn.Sigmoid(),
                    nn.Linear(84, 10))

lr, num_epochs, batch_size = 1.0, 10 ,256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch6(net,train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
plt.show()

Insert image description here

loss 0.251, train acc 0.908, test acc 0.883
17375.8 examples/sec on cuda:0

There is also a simple implementation method in nn:

net = nn.Sequential(nn.Conv2d(1, 6, kernel_size=5),
                    nn.BatchNorm2d(6),
                    nn.Sigmoid(),
                    nn.MaxPool2d(kernel_size=2, stride=2),
                    nn.Conv2d(6, 16,kernel_size=5),
                    nn.BatchNorm2d(16),
                    nn.Sigmoid(),
                    nn.MaxPool2d(kernel_size=2, stride=2),
                    nn.Flatten(),
                    nn.Linear(16 * 4 * 4, 120),
                    nn.BatchNorm2d(120),
                    nn.Sigmoid(),
                    nn.Linear(120, 84),
                    nn.BatchNorm2d(84),
                    nn.Sigmoid(),
                    nn.Linear(84, 10))

小结

  • During the model training process, batch normalization uses the mean and standard deviation of the small batch to continuously adjust the intermediate output of the neural network, making the intermediate output of each layer of the entire neural network more stable.
  • The use of batch normalization is slightly different in fully connected layers and convolutional layers. You need to pay attention to the dimensions of the effect.
  • Batch normalization is the same as Dropout and is calculated differently in training mode and prediction mode.
  • Batch normalization has many beneficial side effects, mainly regularization

Residual Network (ResNet)

A question we need to discuss is:Can adding more layers further improve the accuracy?

Insert image description here

Therefore, ResNet is this idea, and its most specific manifestation is:

Insert image description here

Then to connect the input of the block to the output, the dimensions of the input and output need to be the same and can be added directly. Therefore, if the dimensions are changed inside the block, the input dimensions need to be changed before they can be added. :

Insert image description here

Generally speaking, the input is first processed with multiple ResNet blocks whose height and width are reduced by half, followed by multiple ResNet blocks with the same height and width, which can reduce the amount of calculation when extracting features later:

Insert image description here

Then the overall architecture is:

Insert image description here

So the code is:

import torch
from matplotlib import pyplot as plt
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l

class Residual(nn.Module):  #@save
    def __init__(self, input_channels, num_channels,
                 use_1x1conv=False, strides=1):
        super().__init__()
        self.conv1 = nn.Conv2d(input_channels, num_channels,
                               kernel_size=3, padding=1, stride=strides)
        self.conv2 = nn.Conv2d(num_channels, num_channels,
                               kernel_size=3, padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(input_channels, num_channels,
                                   kernel_size=1, stride=strides)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(num_channels)
        self.bn2 = nn.BatchNorm2d(num_channels)

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        Y += X
        return F.relu(Y)


# 第一个模块基本上在卷积神经网络中都是一样的
b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
                   nn.BatchNorm2d(64), nn.ReLU(),
                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

def resnet_block(input_channels, num_channels, num_residuals,first_block=False):
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            blk.append(Residual(input_channels, num_channels,use_1x1conv=True, strides=2))
        else:
            blk.append(Residual(num_channels, num_channels))
    return blk

b2 = nn.Sequential(*resnet_block(64,64,2,first_block=True))
b3 = nn.Sequential(*resnet_block(64,128,2))
b4 = nn.Sequential(*resnet_block(128,256,2))
b5 = nn.Sequential(*resnet_block(256,512,2))
# *号代表把resnet_block返回的列表展开,可以理解为把元素都拿出来,不是单个列表了
net = nn.Sequential(
    b1,b2,b3,b4,b5,
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(),
    nn.Linear(512,10)
)

"""
X = torch.rand(size=(1, 1, 224, 224))
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)
"""
lr, num_epochs, batch_size = 0.05, 10, 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
plt.show()

Insert image description here

loss 0.014, train acc 0.996, test acc 0.914
883.9 examples/sec on cuda:0

Teacher Li Mu later added a section about the gradient calculation of ResNet, as follows:
Assume y = f ( x ), then update to w = w − λ ∂ y ∂ w Then suppose that a module is added later as y ′ = g ( y ) = g ( f ( x ) ) , then the output derivative with respect to the parameters is ∂ y ′ ∂ w = ∂ g ( y ) ∂ y ∂ y ∂ w Then if g is a layer with strong learning ability (such as a fully connected layer), it will be closer to the real output. At this time, ∂ g ( y ) ∂ y is smaller, which results in ∂ y ′ ∂ w being smaller. , then the update of the f (x) layer is very slow. The main problem is that if the middle one of the multiplication is small, the gradient will disappear. And R e s N e t uses the residual method, that is, y ′ = f ( x ) + g ( f ( x ) ) , then ∂ y ′ ∂ w = ∂ y ∂ w + ∂ g ( y ) ∂ y ∂ y ∂ w Even if the second part is smaller, there is still the first part to provide a larger gradient. Therefore, the problem of gradient disappearance can be solved, and the part close to the data can also be updated. Assuming y=f(x), the update is ~~w=w-\lambda \frac{\partial y}{\partial w}\\ Then assuming that a module is added later as y^{\prime}=g(y)=g(f(x)), then the output derivative with respect to the parameters is ~\frac{\partial y^{\prime}}{ \partial w}=\frac{\partial g(y)}{\partial y}\frac{\partial y}{\partial w}\\ Then if g is a layer with strong learning ability (such as a fully connected layer ), then it will be closer to the real output. At this time, \frac{\partial g(y)}{\partial y} is smaller\\, resulting in \frac{\partial y^{\prime}}{\partial w } is small, then the update of the f(x) layer will be very slow. The main problem is multiplication\\If the middle one is small, the gradient will disappear\\And ResNet uses the residual method, that is, y^{\ prime}=f(x)+g(f(x)), then\frac{\partial y^{\prime}}{\partial w}=\frac{\partial y}{\partial w}+\frac {\partial g(y)}{\partial y}\frac{\partial y}{\partial w}\\Even if the second part is smaller, there is still the first part to provide a larger gradient. \\Therefore, the problem of gradient disappearance can be solved, and the parts close to the data can also be updated 假设y=f(x),则更新为  w=InlwyThen suppose that a module is added later asy=g(y)=g(f(x )),This time the number of inputs will be counted wy=yg(y)wyThen ifg is a layer with relatively strong learning ability (such as a fully connected layer), then it will be closer to the real output. At this timeyg(y)smallerthus leading towy较小,NA么f(x)< /span>Later's update is arrogant, the main problem is the methodIf the middle one is relatively small, there will be a problem of gradient disappearance.andResNe tResidual method for use, immediatelyy=f(x)+g(f(x)),那么wy=wy+yg(y)wyEven if the second part is smaller, there is still the first part to provide a larger gradient.Therefore, the problem of gradient disappearance can be solved, and parts close to the data can also be updated.

Image Classification Competition

This time I first used the ResNet11 that Teacher Li Mu taught in class to run, and the result reached a little more than 0.8. Please see the specific code below:

# 首先导入包
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image
import os
from d2l import torch as d2l
import matplotlib.pyplot as plt
from LeavesDataset import LeavesDataset  # 数据加载器

First, we need to process the data of the label class, convert it from a string to the corresponding category number, and establish a relationship between the two to facilitate the follow-up:

label_dataorgin = pd.read_csv("dataset/classify-leaves/train.csv")  # 读取csv文件
leaves_labels = sorted(list(set(label_dataorgin['label'])))  # 取出标签列然后set去重再列表排序
num_class = len(leaves_labels)  # 总共的类别数目
class_to_num = dict(zip(leaves_labels, range(num_class)))  # 建立字典,类别名称对应数字
num_to_class = {
    
    i:j for j,i in class_to_num.items()}  # 数字对应类别名称

The next step is to write our data loader, because I found a problem that if the data loader and the overall code are written in the same file, an error will be reported, and it will be said when the d2l training function is called later, then we need to write the definition of the data loader in another file and then reference it. I am in another LeavesDataset.py file It is defined in:Cannot find the definition of this data loader

class LeavesDataset(Dataset):
    def __init__(self, csv_path, file_path, mode = 'train', valid_ratio = 0.2,
                resize_height = 256, resize_width=256):
        self.resize_height = resize_height  # 拉伸的高度
        self.resize_width = resize_width  # 宽度

        self.file_path = file_path  # 文件路径
        self.mode = mode  # 模式

        self.data_csv = pd.read_csv(csv_path, header=None)  # 读取csv文件去除表头
        self.dataLength = len(self.data_csv.index) - 1  # 数据长度
        self.trainLength = int(self.dataLength * (1 - valid_ratio))  # 训练集的长度

        if mode == 'train':
            # 训练模式
            self.train_images = np.asarray(self.data_csv.iloc[1:self.trainLength, 0])  # 第0列为图像的名称
            self.train_labels = np.asarray(self.data_csv.iloc[1:self.trainLength, 1])  # 第1列为图像的标签
            self.image_arr = self.train_images
            self.label_arr = self.image_arr
        elif mode == 'valid':
            self.valid_images = np.asarray(self.data_csv.iloc[self.trainLength:, 0])
            self.valid_labels = np.asarray(self.data_csv.iloc[self.trainLength:, 1])
            self.image_arr = self.valid_images
            self.label_arr = self.valid_labels
        elif mode == 'test':
            self.test_images = np.asarray(self.data_csv.iloc[1:,0])  # 测试集没有标签列
            self.image_arr = self.test_images

        self.realLen_now = len(self.image_arr)

        print("{}模式下已完成数据载入,得到{}个数据".format(mode, self.realLen_now))

    def __getitem__(self, index):
        image_name = self.image_arr[index]  # 得到文件名

        img = Image.open(os.path.join(self.file_path, image_name))  # 拼接后得到当前访问图片的完整路径
        transform = transforms.Compose([
            transforms.Resize((224,224)),  # 更改为224*224
            transforms.ToTensor()
        ])
        img = transform(img)

        if self.mode == 'test':
            return img
        else:
            label = self.label_arr[index]
            number_label = class_to_num[label]

            return img, number_label

    def __len__(self):
        return self.realLen_now

Then the next step is to load each data set:

train_path = "dataset/classify-leaves/train.csv"  # 根据你的实际情况修改
test_path = "dataset/classify-leaves/test.csv"
img_path = "dataset/classify-leaves/"

train_dataset = LeavesDataset(train_path, img_path, mode = 'train')
valid_dataset = LeavesDataset(train_path, img_path, mode = 'valid')
test_dataset = LeavesDataset(test_path, img_path, mode = 'test')
batch_size = 64  # 这里如果显存不够可以调小

train_loader = DataLoader(dataset=train_dataset,batch_size=batch_size, shuffle=False,num_workers=5)  # 不随机打乱,进程数为5
valid_loader = DataLoader(dataset=valid_dataset,batch_size=batch_size, shuffle=False,num_workers=5)
test_loader = DataLoader(dataset=test_dataset,batch_size=batch_size, shuffle=False,num_workers=5)

After getting the data, the next step is to define the model. I first used ResNet11:

b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
                   nn.BatchNorm2d(64), nn.ReLU(),
                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

def resnet_block(input_channels, num_channels, num_residuals,first_block=False):
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            blk.append(d2l.Residual(input_channels, num_channels,use_1x1conv=True, strides=2))
        else:
            blk.append(d2l.Residual(num_channels, num_channels))
    return blk

b2 = nn.Sequential(*resnet_block(64,64,2,first_block=True))
b3 = nn.Sequential(*resnet_block(64,128,2))
b4 = nn.Sequential(*resnet_block(128,256,2))
b5 = nn.Sequential(*resnet_block(256,512,2))

net = nn.Sequential(
    b1,b2,b3,b4,b5,
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(),
    nn.Linear(512,176)
)

Then since I want to save the model if it achieves the required accuracy, I modified the training function:

def train_ch6_save(net, train_iter, test_iter, num_epochs, lr, device, best_acc):  #@save
    """Train a model with a GPU (defined in Chapter 6).

    Defined in :numref:`sec_lenet`"""
    def init_weights(m):
        if type(m) == nn.Linear or type(m) == nn.Conv2d:
            nn.init.xavier_uniform_(m.weight)
    net.apply(init_weights)
    print('training on', device)
    net.to(device)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss = nn.CrossEntropyLoss()
    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],
                            legend=['train loss', 'train acc', 'test acc'])
    timer, num_batches = d2l.Timer(), len(train_iter)
    for epoch in range(num_epochs):
        # Sum of training loss, sum of training accuracy, no. of examples
        metric = d2l.Accumulator(3)
        net.train()
        for i, (X, y) in enumerate(train_iter):
            timer.start()
            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l.backward()
            optimizer.step()
            with torch.no_grad():
                metric.add(l * X.shape[0], d2l.accuracy(y_hat, y), X.shape[0])
            timer.stop()
            train_l = metric[0] / metric[2]
            train_acc = metric[1] / metric[2]
            if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
                animator.add(epoch + (i + 1) / num_batches,
                             (train_l, train_acc, None))
        test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch + 1, (None, None, test_acc))

    print(f'loss {
      
      train_l:.3f}, train acc {
      
      train_acc:.3f}, '
          f'test acc {
      
      test_acc:.3f}')
    print(f'{
      
      metric[2] * num_epochs / timer.sum():.1f} examples/sec '
          f'on {
      
      str(device)}')
    if test_acc > best_acc:
        print("模型精度较高,值得保存!")
        torch.save(net.state_dict(), "Now_Best_Module.pth")
    else:
        print("模型精度不够,不值得保存")
lr, num_epochs,best_acc = 0.05, 25, 0.8  # epoch太小训练不完全
train_ch6_save(net, train_loader, valid_loader, num_epochs, lr, device=d2l.try_gpu(), best_acc=best_acc)
plt.show()

The result is:

Insert image description here

Then I hope to increase the depth of ResNet to improve the model complexity. I used the ResNet50 model on the Internet and found that it was too big. After reading the model, I read the data again. Setting the batch_size to a small size also uses up the video memory, so the model can only be modified to be smaller:

b2 = nn.Sequential(*resnet_block(64,64,2,first_block=True))
b3 = nn.Sequential(*resnet_block(64,256,2))
b4 = nn.Sequential(*resnet_block(256,512,2))
b5 = nn.Sequential(*resnet_block(512,2048,3))

net = nn.Sequential(
    b1,b2,b3,b4,b5,
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(),
    nn.Linear(2048,176)
)

After running for five hours, the result was overfitting...

loss 0.014, train acc 0.996, test acc 0.764
31.6 examples/sec on cuda:0

In the end, it took a whole day to debug several models, but it was still not as effective as the original ResNet11, so I finally decided to use this one.

So the complete code is:

# 首先导入包
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image
import os
from d2l import torch as d2l
import matplotlib.pyplot as plt
from tqdm import tqdm

from LeavesDataset import LeavesDataset

def resnet_block(input_channels, num_channels, num_residuals, first_block=False):  # 这是ResNet定义用到的函数
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            blk.append(d2l.Residual(input_channels, num_channels, use_1x1conv=True, strides=2))
        else:
            blk.append(d2l.Residual(num_channels, num_channels))
    return blk


def train_ch6_save(net, train_iter, test_iter, num_epochs, lr, device, best_acc):  # @save
    """Train a model with a GPU (defined in Chapter 6).
    这是因为我需要训练完保存因此将老师的训练函数进行了修改,就放在这里了
    Defined in :numref:`sec_lenet`"""

    def init_weights(m):
        if type(m) == nn.Linear or type(m) == nn.Conv2d:
            nn.init.xavier_uniform_(m.weight)

    net.apply(init_weights)
    print('training on', device)
    net.to(device)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss = nn.CrossEntropyLoss()
    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],
                            legend=['train loss', 'train acc', 'test acc'])
    timer, num_batches = d2l.Timer(), len(train_iter)
    for epoch in range(num_epochs):
        # Sum of training loss, sum of training accuracy, no. of examples
        metric = d2l.Accumulator(3)
        net.train()
        for i, (X, y) in enumerate(train_iter):
            timer.start()
            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l.backward()
            optimizer.step()
            with torch.no_grad():
                metric.add(l * X.shape[0], d2l.accuracy(y_hat, y), X.shape[0])
            timer.stop()
            train_l = metric[0] / metric[2]
            train_acc = metric[1] / metric[2]
            if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
                animator.add(epoch + (i + 1) / num_batches,
                             (train_l, train_acc, None))
        test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch + 1, (None, None, test_acc))

    print(f'loss {
      
      train_l:.3f}, train acc {
      
      train_acc:.3f}, '
          f'test acc {
      
      test_acc:.3f}')
    print(f'{
      
      metric[2] * num_epochs / timer.sum():.1f} examples/sec '
          f'on {
      
      str(device)}')
    if test_acc > best_acc:
        print("模型精度较高,值得保存!")
        torch.save(net.state_dict(), "Now_Best_Module.pth")  # 对模型进行保存
    else:
        print("模型精度不够,不值得保存")


if __name__ == "__main__":  # 一定要将运行的代码放在这里!否则会报错,我目前还不知道原因
    label_dataorgin = pd.read_csv("dataset/classify-leaves/train.csv")  # 读取训练的csv文件
    leaves_labels = sorted(list(set(label_dataorgin['label'])))  # 取出标签列然后去重再排序
    num_class = len(leaves_labels)  # 类别的个数
    class_to_num = dict(zip(leaves_labels, range(num_class)))  # 转换为字典
    num_to_class = {
    
    i: j for j, i in class_to_num.items()}

    train_path = "dataset/classify-leaves/train.csv"
    test_path = "dataset/classify-leaves/test.csv"
    img_path = "dataset/classify-leaves/"
    submission_path = "dataset/classify-leaves/submission.csv"  # 最终要提交的文件的路径
    train_dataset = LeavesDataset(train_path, img_path, mode='train')
    valid_dataset = LeavesDataset(train_path, img_path, mode='valid')
    test_dataset = LeavesDataset(test_path, img_path, mode='test')
    #print("数据载入完成")
    batch_size = 64
    train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=False, num_workers=5)
    valid_loader = DataLoader(dataset=valid_dataset, batch_size=batch_size, shuffle=False, num_workers=5)
    test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False, num_workers=5)
    #print("数据已变换为loader")

    # 定义模型
    # 第一个模块基本上在卷积神经网络中都是一样的
    b1 = nn.Sequential(nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
                       nn.BatchNorm2d(64), nn.ReLU(),
                       nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
    b3 = nn.Sequential(*resnet_block(64, 128, 2))
    b4 = nn.Sequential(*resnet_block(128, 256, 2))
    b5 = nn.Sequential(*resnet_block(256, 512, 2))

    net = nn.Sequential(
        b1, b2, b3, b4, b5,
        nn.AdaptiveAvgPool2d((1, 1)),
        nn.Flatten(),
        nn.Linear(512, 176)
    )
    lr, num_epochs, best_acc = 0.02, 15, 0.85
    device = d2l.try_gpu()
    train_ch6_save(net, train_loader, valid_loader, num_epochs, lr, device=device, best_acc=best_acc)
    plt.show()

    # 开始做预测
    net.load_state_dict(torch.load("Now_Best_Module.pth"))  # 载入模型
    # print("模型载入完成")
    net.to(device)
    net.eval()  # 开启预测模式
    predictions = []  # 用来存放结果类别对应的数字
    for i, data in enumerate(test_loader):
        imgs = data.to(device)
        with torch.no_grad():
            logits = net(imgs)  # 计算结果是一个176长的向量
        predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())
        # 取出最大的作为结果,并且放回cpu中,再转换成列表方便插入到predictions中
    preds = []
    for i in predictions:
        preds.append(num_to_class[i])  # 转换为字符串
    test_csv = pd.read_csv(test_path)
    test_csv['label'] = pd.Series(preds)  # 将结果作为一个新的列添加
    submission = pd.concat([test_csv['image'], test_csv['label']], axis=1)  # 拼接
    submission.to_csv(submission_path, index=False)  # 写入文件

The scores submitted were:

Insert image description here

Although the result is not very good, I am still very happy! It was my first time to complete a complete project and I really learned a lot! Only by starting from scratch can you truly understand where you are lacking, so you can make progress!

Please keep up your hard work!

Guess you like

Origin blog.csdn.net/StarandTiAmo/article/details/127538821