Deep Learning Theory (14) -- AlexNet's next level

Scientific knowledge

Loss function (loss function) or cost function (cost function) is a function that maps random events or their related random variables to non-negative real numbers to represent the "risk" or "loss" of the random event. In applications, loss functions are often associated with optimization problems as learning criteria, i.e., solving and evaluating models by minimizing the loss function.

# Preface #

    In the previous article, we studied the theoretical article of LetNet-5, analyzed the structure and dimension information of each layer of the network in detail, and then carried out code combat in the actual combat chapter. Through learning and actual combat, we found that LetNet-5 The network structure of the network seems to be a bit simple. First, the number of network layers is small. Second, the channel information of each layer of the network is less, in other words, the number of channels output by each layer is small. Such networks may not be adequate for more complex vision tasks. We usually think that a deeper network structure can learn deeper information, and a larger number of channels can enrich the representation of features. Today we will learn a network that is slightly more complicated than LetNet-5.

1. AlexNet

In this article, we share the AlexNet network. The name of the original paper is: ImageNet Classification with Deep Convolutional Neural Networks, which translates to ImageNet image classification based on deep convolutional neural networks. ImageNet is a large image data set, and the number of categories usually used is 1,000 classes, including various objects that are common in our lives, so the publisher of this data set will also hold large-scale competitions every year. It seems that it has not been carried out in the past two years. However, the latest papers are still refreshing the previous images Classification accuracy and error rate.

Screenshot of the paper:

Paper address:

https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

Network structure diagram:

Carefully observe the network structure diagram of the original paper. At first glance, do you feel that you don’t understand it? This is because it is parallel. In fact, it can be merged into a serial network structure. So why parallelize it? If you pay attention, you will find that the number of channels in each layer (48, 128, 192, 128, 1024, which is actually just a single branch, which is essentially twice the number of channels in the previous one) is getting larger and larger, and the calculations consumed by so many neurons The resources are very large, so the author uses two GPUs for training, and then merges the results together as the output. Since the current graphics card can already support such training requirements, in the later stage, we will merge the two branches together for the final training in actual combat.

Network structure analysis:

1. Input layer: an image whose shape is 3*227*227, representing a color image (three channels), the image size is 227*227, of course, the original image is not necessarily 227, it should be resized before input The operation turns 224 into 227.

2. Convolution layer 1:

Input: 3*227*227

Convolution kernel size: 11*11

Number of convolution kernels: 96 (sum of two branches, 48+48)

Step size: 4

Output feature map size: (227-11)/4+1 =55, ie 55*55

Output feature map shape: 96*55*55 represents 96 feature maps of 55*55, and 96 can be understood as a feature map of 55*55 size with 96 channels.

3. Maximum pooling layer 1:

Input: 96*55*55

Pooling kernel size: 3*3

Number of pooled cores: 96

Step size: 2

Output feature map size: (55-3)/2+1 =27, ie 27*27

Output feature map shape: 96*27*27

4. Convolution layer 2:

Input: 96*27*27

Convolution kernel size: 5*5

Number of convolution kernels: 256

Step size: 1 by default

padding: SAME

Output feature map size: 27+2*2-5+1 =27, ie 27*27

Output feature map shape: 256*27*27 represents 256 feature maps of 27*27, and 256 can be understood as a feature map with a size of 27*27 has 256 channels.

5. Maximum pooling layer 2:

Input: 256*27*27

Pooling kernel size: 3*3

Number of pooled cores: 256

Step size: 2

Output feature map size: (27-3)/2+1 =13, which is 13*13

Output feature map shape: 256*13*13

6. Convolution layer 3:

Input: 256*13*13

Convolution kernel size: 3*3

Number of convolution kernels: 384

Step size: 1 by default

padding:SAME

Output feature map size: Since padding is SAME, keep the same size as the original feature map: 13*13

Output feature map shape: 384*13*13 represents 384 feature maps of 13*13.

7. Convolution layer 4:

Input: 384*13*13

Convolution kernel size: 3*3

Number of convolution kernels: 384

Step size: 1 by default

padding:SAME

Output feature map size: Since padding is SAME, keep the same size as the original feature map: 13*13

Output feature map shape: 384*13*13 represents 384 feature maps of 13*13.

8. Maximum pooling layer 3:

Input: 256*13*13

Pooling kernel size: 3*3

Number of pooled cores: 256

Step size: 2

Output feature map size: (13-3)/2+1 =6, ie 6*6

Output feature map shape: 256*6*6

9. Fully connected layer 1:

Input: 256*6*6

Output neurons: 4096

Output shape: 4096

10. Fully connected layer 2:

Input: 4096

Output neurons: 4096

Output shape: 4096

11. Fully connected layer 3:

Input: 4096

Output neurons: 1000 (representing the number of categories)

Output shape: 1000

The above is the process of the entire network. The final output of the network is 1000 neurons, corresponding to 1000 categories.

epilogue

This week's sharing is over. The network is not very complicated. It should be noted that the last three convolutional layers use the padding operation (padding: SAME), so the output feature map is consistent with the input shape, and the others are Similar to the previous LetNet-5 structure. Veterans, go and familiarize yourself with the process of changing the image shape and the order of the network structure first. Next time we share, we will conduct code practice. Please look forward to it. If you have any questions, remember to backstage at any time.

have a good weekend!

Editor: Layman Yueyi|Review: Layman Xiaoquanquan

Past review

#Deep Learning Theory (13) -- LetNet-5 is surging

#Deep Learning Theory (12) -- Pooling of Dimensionality Reduction

#Deep Learning Theory (11) -- Convolutional Neural Network Flourishing Age (3)

What have we done in the past

# [Year-end summary] 2021, bid farewell to the old and welcome the new to start again

# [Year-end Summary] Saying goodbye to the old and welcoming the new, 2020, let's start again

Supongo que te gusta

Origin blog.csdn.net/xyl666666/article/details/119860862
Recomendado
Clasificación