Table of contents
2. Detailed explanation of parameters of each layer
2.2 C1 layer-convolutional layer
2.3 S2 layer-pooling layer (downsampling layer)
2.4 C3 layer-convolutional layer
2.5 S4 layer-pooling layer (downsampling layer)
2.6 C5 layer-convolutional layer
2.7 F6 layer - fully connected layer
2.8 Output layer-fully connected layer
3. Code implementation (the activation function used is the relu function)
3.3 Define loss function and optimizer
1. Network model structure
LeNet is a representative CNN, which was proposed in 1998. It is a network for handwritten digit recognition and is the basis of other deep learning network models. The network model structure is shown in the figure below. It has continuous convolution layers and pooling layers, and finally outputs the results through the fully connected layer.
2. Detailed explanation of parameters of each layer
2.1 INPUT layer - input layer
Data input layer, the size of the input image is: a one-dimensional one-channel image of 32*32 size.
Note: ①The grayscale image is a single-channel image, in which each pixel only carries information about light intensity;
②The RGB image is a color image and a three-channel image;
③Traditionally the input layer is not regarded as one of the network hierarchies, so the input layer is not counted as the network structure of LeNet.
2.2 C1 layer-convolutional layer
Input data (input feature map input feature map): 32*32
Convolution kernel size: 5*5
Calculation formula:
;
Among them, refers to the height of the input image; refers to the width of the input image; refers to The size of the convolution kernel; padding points to the padding outside the picture, and the default is 0; S refers to the step size, the step size of the convolution kernel traversing the picture, and the default is 1.
Convolution kernel type (number of channels): 6
Output data (output feature map): 28*28
2.3 S2 layer-pooling layer (downsampling layer)
Pooling is an operation that reduces the space in the height and length directions.
28*28
Kaisho area:2*2
样 种类 (number of roads): 6
: 14*14
Note: ①After the pooling operation, the number of channels of input data and output data will not change.
②At this time, the size of each feature map in S2 is 1/4 of the size of each feature map in C1.
2.4 C3 layer-convolutional layer
Input data:All 6 or several feature map combinations in S2
Convolution kernel size: 5*5
Convolution kernel type (number of channels): 16
Output data (output feature map): 10*10
Note: Each feature map in C3 is connected to all 6 or several feature maps in S2, indicating that the feature map of this layer is a different combination of feature maps extracted from the previous layer.
2.5 S4 layer-pooling layer (downsampling layer)
counterpart:10*10
Kaisho area:2*2
Category type (Number of paths): 16
:5*5
2.6 C5 layer-convolutional layer
Input data:All 16 unit feature maps of the S4 layer (fully connected to s4)
Convolution kernel size: 5*5
Convolution kernel type (number of channels): 120
Output data (output feature map output feature map): 1*1
2.7 F6 layer - fully connected layer
Input data:120-dimensional vector
Output data:84-dimensional vector
2.8 Output layer-fully connected layer
Input data:84-dimensional vector
Output data:10-dimensional vector
3. Code implementation (the activation function used is the relu function)
3.1 Build a network framework
(1)Package:
import torch
import torch.nn as nn
import torch.nn.functional as F
(2)Define the convolutional neural network:Since the training data uses color pictures (three channels), it is different from the one introduced above There is a discrepancy in the number of channels.
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.conv1 = nn.Conv2d(3,6,5)
self.conv2 = nn.Conv2d(6,16,5)
self.fc1 = nn.Linear(16*5*5,120)
self.fc2 = nn.Linear(120,84)
self.fc3 = nn.Linear(84,10)
def forward(self,x):
x = self.conv1(x)
x = F.relu(x)
x = F.max_pool2d(x,(2,2))
x = F.max_pool2d(F.relu(self.conv2(x)),2)
x = x.view(-1,x.size()[1:].numel())
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
(3)Test the network effect:It is equivalent to printing the initialization part and you can check the structure of the network
net = Net()
print(net)
3.2 Define the data set
(1) Guide package:
import torchvision
import torchvision.transforms as transforms
(2) Download the data set:
Solve the problem of slow domestic download speed of Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz:
Solution:
①Download file:Download address:https://pan.baidu.com/s/1Nh28RyfwPNNfe_sS8NBNUA
Extraction code: 1h4x
②Rename the downloaded file to cifar-10-batches-py.tar.gz
③Save the file to the corresponding address
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
])
trainset = torchvision.datasets.CIFAR10(root='./data',train=True,download=True,transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data',train=False,download=True,transform=transform)
trainloader = torch.utils.data.DataLoader(trainset,batch_size=4,shuffle=True,num_workers=0)
testloader = torch.utils.data.DataLoader(testset,batch_size=4,shuffle=False,num_workers=0)
(3) Define tuple: perform Chinese conversion of category names
classes = ('airplane','automobile','bird','car','deer','dog','frog','horse','ship','truck')
(4) Run the data loader: use the drawing function to view the data loading effect
import matplotlib.pyplot as plt
import numpy as np
def imshow(img):
img = img / 2 + 0.5
npimg = img.numpy()
plt.imshow(np.transpose(npimg,(1,2,0)))
plt.show()
dataiter = iter(trainloader)
images,labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
print(labels)
print(labels[0],classes[labels[0]])
print(' '.join(classes[labels[j]] for j in range(4)))
3.3 Define loss function and optimizer
(1)Define the loss function:Cross entropy loss function
criterion = nn.CrossEntropyLoss()
(2)Define the optimizer:Let the network update and continuously update good parameters to achieve better results
import torch.optim as optim
optimizer = optim.SGD(net.parameters(),lr=0.001,momentum=0.9)
3.4 Training network
for epoch in range(2):
running_loss = 0.0
for i,data in enumerate(trainloader,0):
inputs,labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs,labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 2000 == 1999:
print('[%d,%5d] loss:%.3f' % (epoch + 1,i+1,running_loss/2000))
running_loss = 0.0
print("Finish")
3.5 Test network
(1)Save the learned network parameters:Save the weight file locally, and you can call the file directly later
PATH='./cifar_net.pth'
torch.save(net.state_dict(),PATH)
(2) Test the training effect of a set of pictures
dataiter = iter(testloader)
images,labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
print('GroundTruth:',' '.join('%5s'% classes[labels[j]] for j in range(4)))
(3) Observe the test effect of the entire training set
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images,labels = data
outputs = net(images)
_,predicted = torch.max(outputs,1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
correctGailv = 100*(correct / total)
print(correctGailv)
Note: The accuracy can be improved by modifying the number of training rounds in epoch. By changing the number of epoch training rounds from 2 to 10, the accuracy can be greatly improved, as shown in the figure below:
4. Summary
(1) Compared with "current CNN", LeNet has the following differences:
①The activation functions are different: LeNet uses the sigmoid function, while the current CNN mainly uses the ReLU function, and the ReLU function is also used in the above code;
② The original LeNet uses subsampling to reduce the size of intermediate data, while Max pooling is the mainstream in current CNN.
(2) Questions that exist:
①What is the relationship between the number of neuron nodes in the convolution layer and the number of output channels in the convolution layer?
The number of neuron nodes in the convolutional layer = the number of elements in the output feature map of the convolutional layer (that is, the product of the width, height and channel number of the output feature map). The neurons on each output channel share parameters, so it can be considered that the number of neuron nodes in the convolution layer = the number of channels in the output feature map of the convolution layer (that is, the number of convolution kernels).
②There are still questions about the internal principles between each layer;
③ Understand some concepts of forward propagation, back propagation, and gradient update.
reference: