CNN network classic - detailed explanation of LeNet5

Introduction to LeNet5

LeNet usually refers to LeNet5. The network was born in 1998 by Yann LeCun. It was originally designed to recognize handwritten digits. It was one of the earliest CNN networks and was regarded as a classic by subsequent scholars. The download address of the paper is http:/ /citeseerx.ist.psu.edu/viewdoc/download;jsessionid=054E02BBBFEFE6B1C98D119DC503F6A7?doi=10.1.1.42.7665&rep=rep1&type=pdf

On the official website of Ryerson University in Canada, there is a 3D visualization model of LeNet handwritten digit recognition. Students who want to know about LeNet can go to this website to experience it first. It will be of great help to understand LeNet . https://www.cs.ryerson.ca/~aharley/vis/conv/

 

network structure

The network structure diagram in the paper is as follows:

 

Including the input and output layers, LeNet-5 has a total of 8 layers of networks, which are divided into

Input input layer, C1 convolutional layer, S2 downsampling pooling layer, C3 convolutional layer, S4 downsampling convolutional layer, C5 convolutional layer, F6 fully connected layer, Output output layer.

Input input layer

The LeNet-5 input image is a single-channel 32x32 handwritten digit image.

 

C1 convolutional layer

The convolutional layer is used for feature extraction. The C1 convolutional layer uses 6 convolution kernels of 5x5 size here, and the convolution stride is 1. This layer receives a single-channel image of size 32x32 passed in from the input layer, and the output is 6 feature planes, each of which has a size of 28x28 (32-5+1).

The number of connections is (5x5+1)x28x28x6 = 122304.

The number of weights is the weight parameter of 6 convolution kernels, each convolution kernel has 5x5 weights + 1 offset, a total of (5x5+1)x6 = 156.

 

 

 

S2 pooling layer

The pooling layer is the downsampling layer, which is used to reduce the dimension of the data. Generally, in a CNN (convolutional neural network), a group of convolution operations is followed by a pooling layer. After the convolutional layer extracts features, it is equivalent to abstracting the data once, and the amount of data can be compressed once.

The pooling layer generally has two types: maximum pooling and average pooling. The LeNet5 network used average pooling at the beginning. Later, people found that the maximum pooling effect is better. At present, when using the LeNet5 network, many will use the maximum pooling. Value pooling.

The input of the pooling layer is six 28x28 feature maps, and each feature map is downsampled by a 2x2 window. After downsampling, the size of each feature map is reduced by half to 14x14, that is, the output is 6 feature maps of size 14x14.

 

C3 convolutional layer

The convolutional layer has 16 feature planes, input 6 feature maps of 14x14 size, and the output is 16 feature planes of 10x10 size (14-5+1=10).

Unlike the C2 layer, the feature plane of this layer and the pooling layer of the previous layer are not connected to all the feature planes of the upper layer. The connection matrix is ​​as follows:

 

Each of the first 6 feature planes is only connected to 3 of the 6 planes in the upper layer, each of the 7th to 15th layers is connected to 4, and only the last feature plane is connected to all the feature planes of the upper layer. The purpose of this design is mentioned in the paper, mainly to break the symmetry and extract deep features, because the features are not symmetrical, so this symmetry needs to be broken to extract more important features.

The number of connections, the first 6 feature planes (5x5x3+1)x10x10x6=45600, the 7th-15th are (5x5x4+1)x10x10x9=90900, and the last one (5x5x6+1)x10x10x1=15100, the total number is 151600.

The number of weights, (5x5x3+1)x6 + (5x5x4+1)x9 + (5x5x6+1)x1 = 1516.

S4 pooling layer

The input is 16 feature planes of size 10x10

The output is 16 feature planes of size 5x5

C5 convolutional layer

This is a special convolutional layer. The special feature is that the size of the convolution kernel is exactly equal to the size of the input feature map, both of which are 5x5 in size, resulting in the output of the convolutional layer being a 1x1 feature map. The convolutional layer uses 120 convolution kernels and outputs 120 1x1 feature maps.

 

The number of connections in this layer is (5x5x16+1)x120=48120, and the weights are also so many.

F6 fully connected layer

There are 84 fully connected layers with the size of neurons, which are equivalent to the hidden layers of the BP network. The number of connections and weights are (120+1)x84=10164.

Reference article

https://blog.csdn.net/weixin_42398658/article/details/84392845

https://blog.csdn.net/u013963380/article/details/93316111

Guess you like

Origin blog.csdn.net/Ango_/article/details/115922404