Convolutional Neural Network (LENET)

Part of the study notes of "Practice Deep Learning pytorch" is only for your own review.

Convolutional Neural Network (LENET)

Realization of multilayer perceptron from scratch. We constructed a multilayer perceptron model with a single hidden layer to classify the images in the Fashion-MNIST dataset. Each image is 28 pixels high and wide. We expand the pixels in the image one by one to obtain a vector with a degree of 784, and input it into the fully connected layer . However, this classification method has certain limitations.

1. The pixels in the same 列 adjacent image may be far apart in this direction. The patterns they constitute may be difficult to recognize by the model.
2. For large-scale input images, the use of fully connected layer capacity 易 causes the model to be too large. Suppose the input is a color photo (including 3 channels) with a height and width of 1000 pixels. Even if the number of outputs of the fully connected layer is still 256, the shape of the weight parameter of this layer is 3000000×256: it occupies about 3 GB of memory or video memory. This has brought complicated models and excessive storage overhead.

The convolutional layer tries to solve these two problems.

  • On the one hand, the convolutional layer retains the input shape , so that the correlation of the image pixels in both the height and width directions may be effectively recognized.
  • On the other hand, the convolutional layer recalculates the input of the same convolution kernel and the same position through the sliding window, thereby avoiding the parameter size from being too large. 

Convolutional neural networks are networks with convolutional layers. In this section, we will introduce an early convolutional neural network used to recognize handwritten digital images: LeNet . The name comes from Yann LeCun, the first author of LeNet. LeNet demonstrated that training convolutional neural networks through gradient descent can achieve the most advanced results of handwritten digit recognition at the time. This foundational work is the first time that the convolutional neural network is put on the stage, and it is well known to the world. The network structure of LeNet is shown in the figure below. 

 LENET model 

LeNet is divided into two parts: convolutional layer block and fully connected layer block.

The basic unit of the convolutional layer block is the convolutional layer followed by the maximum pooling layer :

The convolutional layer is used to recognize the spatial pattern of the image, such as lines and object parts, and the subsequent maximum pooling layer is used to reduce the sensitivity of the convolutional layer to position . The convolutional layer block is composed of two such basic units repeatedly stacked. In the convolutional layer block, each convolutional layer uses a 5×5 window and uses a sigmoid activation function on the output . The number of output channels of the first convolutional layer is 6, and the number of output channels of the second convolutional layer is increased to 16. This is because the height and width of the input of the second convolutional layer are smaller than that of the first convolutional layer, so increasing the output channel makes the parameter sizes of the two convolutional layers similar. The window shape of the two largest pooling layers of the convolutional layer block is 2×2, and the stride is 2. Since the pooling window has the same shape as the stride, the area covered by each sliding of the pooling window on the input does not overlap each other.

The output shape of the convolutional layer block is (batch size, channel, height, width). When the output of the convolutional layer block is passed to the fully connected layer block, the fully connected layer block will flatten each sample in the small batch. In other words, the input shape of the fully connected layer will become two-dimensional, where the first dimension is the samples in the mini-batch, and the second dimension is the vector representation after each sample is flattened , and the direction is the channel and the height. And the product of wide. The fully connected layer block contains 3 fully connected layers. Their output numbers are 120, 84 and 10 respectively, where 10 is the number of output categories.

The LeNet model is implemented by the Sequential class below .

import time
import torch
from torch import nn, optim
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l
device = torch.device('cuda' if torch.cuda.is_available() else
'cpu')
class LeNet(nn.Module):
    def __init__(self):
        # 初始化
        super(LeNet, self).__init__()
        # 卷积层 in_channels, out_channels,kernel_size
        self.conv = nn.Sequential(nn.Conv2d(1, 6, 5), 
        nn.Sigmoid(),
        # kernel_size, stride
        nn.MaxPool2d(2, 2), 

        nn.Conv2d(6, 16, 5),
        nn.Sigmoid(),
        nn.MaxPool2d(2, 2)
        )
        # 全连接层 
        self.fc = nn.Sequential(
        nn.Linear(16*4*4, 120),
        nn.Sigmoid(),
        nn.Linear(120, 84),
        nn.Sigmoid(),
        nn.Linear(84, 10)
        )
        # 前向传播
    def forward(self, img):
        feature = self.conv(img)
        output = self.fc(feature.view(img.shape[0], -1))
        return output

Check the shape of each layer.

net = LeNet()
print(net)

Output:

LeNet((conv): Sequential(
(0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(1): Sigmoid()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1,
ceil_mode=False)
(3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(4): Sigmoid()
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1,
ceil_mode=False)
)
(fc): Sequential(
(0): Linear(in_features=256, out_features=120, bias=True)
(1): Sigmoid()
(2): Linear(in_features=120, out_features=84, bias=True)
(3): Sigmoid()
(4): Linear(in_features=84, out_features=10, bias=True)))

It can be seen that the height and width of the input in the convolutional layer block decrease layer by layer. Since the convolutional layer uses a convolution kernel with a height and a width of 5, the height and width are reduced by 4 respectively, while the pooling layer halves the height and width, but the number of channels is increased from 1 to 16. The fully connected layer reduces the number of outputs layer by layer until the number of image categories is 10.

Get data and train models

Experimental LeNet model. In the experiment, we still use Fashion-MNIST as the training data set.

batch_size = 256
# 载入输数据集
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)

Because convolutional neural network calculations are more complex than multilayer perceptrons, it is recommended to use GPU to accelerate calculations. Therefore, we slightly modify the evaluate_accuracy function described in the implementation of softmax regression from scratch to support GPU computing.

# 本函数已保存在d2lzh_pytorch包中方便以后使用。
def evaluate_accuracy(data_iter, net, device=None):
    if device is None and isinstance(net, torch.nn.Module):
        # 如果没指定device就使用net的device
        device = list(net.parameters())[0].device
    acc_sum, n = 0.0, 0
    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(net, torch.nn.Module):
                net.eval() # 评估模式, 这会关闭dropout
                acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()
                net.train() # 改回训练模式
            else: # 自定义的模型, 不考虑GPU
                if('is_training' in net.__code__.co_varnames): # 如果有is_training这个参数
                    # 将is_training设置成False
                    acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item() 
                else:
                    acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() 
            n += y.shape[0]
    return acc_sum / n

Make slight changes to the defined train_ch3 function to ensure that the data and model used for calculation are in the same memory or video memory.

# 本函数已保存在d2lzh_pytorch包中方便以后使用
def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):
    net = net.to(device)
    print("training on ", device)
    # 交叉熵损失函数
    loss = torch.nn.CrossEntropyLoss()
    batch_count = 0
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
        for X, y in train_iter:
            X = X.to(device)
            y = y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            train_l_sum += l.cpu().item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
              % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))

 

# 学习率采⽤0.001
lr, num_epochs = 0.001, 5
# 训练算法使用Adam算法
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
train_ch5(net, train_iter, test_iter, batch_size, optimizer, device,num_epochs)

Output:

training on cuda
epoch 1, loss 0.0072, train acc 0.322, test acc 0.584, time 3.7 sec
epoch 2, loss 0.0037, train acc 0.649, test acc 0.699, time 1.8 sec
epoch 3, loss 0.0030, train acc 0.718, test acc 0.724, time 1.7 sec
epoch 4, loss 0.0027, train acc 0.741, test acc 0.746, time 1.6 sec
epoch 5, loss 0.0024, train acc 0.759, test acc 0.759, time 1.7 sec

summary

  • Convolutional neural networks are networks with convolutional layers.
  • LeNet alternately uses a convolutional layer and a maximum pooling layer followed by a fully connected layer for image classification.

Guess you like

Origin blog.csdn.net/dujuancao11/article/details/108571642