Quick Start with Pytorch Deep Learning—A Brief Introduction to LeNet (with code)

LeNet is a representative CNN, which was proposed in 1998. It is a network for handwritten digit recognition and is the basis of other deep learning network models. The network model structure is shown in the figure below. It has continuous convolution layers and pooling layers, and finally outputs the results through the fully connected layer.

2. Detailed explanation of parameters of each layer

2.1 INPUT layer - input layer

Data input layer, the size of the input image is: a one-dimensional one-channel image of 32*32 size.

Note: ①The grayscale image is a single-channel image, in which each pixel only carries information about light intensity;

②The RGB image is a color image and a three-channel image;

③Traditionally the input layer is not regarded as one of the network hierarchies, so the input layer is not counted as the network structure of LeNet.

2.2 C1 layer-convolutional layer

Input data (input feature map input feature map): 32*32

Convolution kernel size: 5*5

Calculation formula:

$height_{out}=\frac{height_{in}-height_{kernel}+2*padding}{stride}+1$ ； $width_{out}=\frac{width_{out}-widtht_{kernel}+2*padding}{stride}+1$

Among them, $height_{in}$ refers to the height of the input image; $width_{in}$ refers to the width of the input image; $height_{kernel}$ refers to The size of the convolution kernel; padding points to the padding outside the picture, and the default is 0; S refers to the step size, the step size of the convolution kernel traversing the picture, and the default is 1.

Convolution kernel type (number of channels): 6

Output data (output feature map): 28*28

2.3 S2 layer-pooling layer (downsampling layer)

Pooling is an operation that reduces the space in the height and length directions.

28*28

Kaisho area：2*2

样种类 (number of roads): 6

: 14*14

Note: ①After the pooling operation, the number of channels of input data and output data will not change.

②At this time, the size of each feature map in S2 is 1/4 of the size of each feature map in C1.

2.4 C3 layer-convolutional layer

Input data:All 6 or several feature map combinations in S2

Convolution kernel size: 5*5

Convolution kernel type (number of channels): 16

Output data (output feature map): 10*10

Note: Each feature map in C3 is connected to all 6 or several feature maps in S2, indicating that the feature map of this layer is a different combination of feature maps extracted from the previous layer.

2.5 S4 layer-pooling layer (downsampling layer)

counterpart：10*10

Kaisho area：2*2

Category type (Number of paths): 16

：5*5

2.6 C5 layer-convolutional layer

Input data:All 16 unit feature maps of the S4 layer (fully connected to s4)

Convolution kernel size: 5*5

Convolution kernel type (number of channels): 120

Output data (output feature map output feature map): 1*1

2.7 F6 layer - fully connected layer

Input data:120-dimensional vector

Output data:84-dimensional vector

2.8 Output layer-fully connected layer

Input data:84-dimensional vector

Output data:10-dimensional vector

3. Code implementation (the activation function used is the relu function)

3.1 Build a network framework

(1)Package:

import torch
import torch.nn as nn
import torch.nn.functional as F

(2)Define the convolutional neural network:Since the training data uses color pictures (three channels), it is different from the one introduced above There is a discrepancy in the number of channels.

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.conv1 = nn.Conv2d(3,6,5)
        self.conv2 = nn.Conv2d(6,16,5)
        self.fc1 = nn.Linear(16*5*5,120)
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)
    def forward(self,x):
        x = self.conv1(x)
        x = F.relu(x)
        x = F.max_pool2d(x,(2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)),2)
        x = x.view(-1,x.size()[1:].numel())
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

(3)Test the network effect:It is equivalent to printing the initialization part and you can check the structure of the network

net = Net()
print(net)

3.2 Define the data set

(1) Guide package:

import torchvision
import torchvision.transforms as transforms

(2) Download the data set:

Solve the problem of slow domestic download speed of Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz:

Solution:

①Download file:Download address:https://pan.baidu.com/s/1Nh28RyfwPNNfe_sS8NBNUA

Extraction code: 1h4x

②Rename the downloaded file to cifar-10-batches-py.tar.gz

③Save the file to the corresponding address

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
])

trainset = torchvision.datasets.CIFAR10(root='./data',train=True,download=True,transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data',train=False,download=True,transform=transform)
trainloader = torch.utils.data.DataLoader(trainset,batch_size=4,shuffle=True,num_workers=0)
testloader = torch.utils.data.DataLoader(testset,batch_size=4,shuffle=False,num_workers=0)

(3) Define tuple: perform Chinese conversion of category names

classes = ('airplane','automobile','bird','car','deer','dog','frog','horse','ship','truck')

(4) Run the data loader: use the drawing function to view the data loading effect

import matplotlib.pyplot as plt
import numpy as np

def imshow(img):
    img = img / 2 + 0.5
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg,(1,2,0)))
    plt.show()

dataiter = iter(trainloader)
images,labels = dataiter.next()

imshow(torchvision.utils.make_grid(images))

print(labels)
print(labels[0],classes[labels[0]])
print(' '.join(classes[labels[j]] for j in range(4)))

3.3 Define loss function and optimizer

(1)Define the loss function:Cross entropy loss function

criterion = nn.CrossEntropyLoss()

(2)Define the optimizer:Let the network update and continuously update good parameters to achieve better results

import torch.optim as optim
optimizer = optim.SGD(net.parameters(),lr=0.001,momentum=0.9)

3.4 Training network

for epoch in range(2):
    
    running_loss = 0.0
    
    for i,data in enumerate(trainloader,0):
        inputs,labels = data
        optimizer.zero_grad()
        
        outputs = net(inputs)
        loss = criterion(outputs,labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
        if i % 2000 == 1999:
            print('[%d,%5d] loss:%.3f' % (epoch + 1,i+1,running_loss/2000))
            running_loss = 0.0

print("Finish")

3.5 Test network

(1)Save the learned network parameters:Save the weight file locally, and you can call the file directly later

PATH='./cifar_net.pth'
torch.save(net.state_dict(),PATH)

(2) Test the training effect of a set of pictures

dataiter = iter(testloader)
images,labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
print('GroundTruth:',' '.join('%5s'% classes[labels[j]] for j in range(4)))

(3) Observe the test effect of the entire training set

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images,labels = data
        outputs = net(images)
        _,predicted = torch.max(outputs,1)
        
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

correctGailv = 100*(correct / total)
print(correctGailv)

Note: The accuracy can be improved by modifying the number of training rounds in epoch. By changing the number of epoch training rounds from 2 to 10, the accuracy can be greatly improved, as shown in the figure below:

4. Summary

(1) Compared with "current CNN", LeNet has the following differences:

①The activation functions are different: LeNet uses the sigmoid function, while the current CNN mainly uses the ReLU function, and the ReLU function is also used in the above code;

② The original LeNet uses subsampling to reduce the size of intermediate data, while Max pooling is the mainstream in current CNN.

(2) Questions that exist:

①What is the relationship between the number of neuron nodes in the convolution layer and the number of output channels in the convolution layer?

The number of neuron nodes in the convolutional layer = the number of elements in the output feature map of the convolutional layer (that is, the product of the width, height and channel number of the output feature map). The neurons on each output channel share parameters, so it can be considered that the number of neuron nodes in the convolution layer = the number of channels in the output feature map of the convolution layer (that is, the number of convolution kernels).

②There are still questions about the internal principles between each layer;

③ Understand some concepts of forward propagation, back propagation, and gradient update.

reference:

①:LeNet Explanation-CSDN Blog

②：001 Network Beginner_bilibili_bilibili