[CNN] What is a deep convolutional neural network (AlexNet)? How to implement AlexNet?

Table of Contents of Series Articles

Chapter 2 Deep Convolutional Neural Network (AlexNet) in Deep Learning CNN


Table of contents

Table of Contents of Series Articles

Article directory

Preface

1. What is deep convolutional neural network (AlexNet)?

2. AlexNet network structure

3. Implement the AlexNet model

Summarize


Preface

This article mainly introduces the deep convolutional neural network (AlexNet) in the convolutional neural network. It contains a brief introduction to the deep convolutional neural network (AlexNet), and how to implement LeNet? After reading this article, you will have a basic understanding of Convolutional Neural Network (LeNet).


1. What is deep convolutional neural network (AlexNet)?

Deep convolutional neural network (AlexNet) was proposed in 2012. The name of this model comes from the name of the first author of the paper, Alex Krizhevsky. AlexNet uses an 8-layer convolutional neural network and won the ImageNet 2012 Image Recognition Challenge by a large margin.

The design concepts of AlexNet and LeNet are very similar, but there are also significant differences, which we will mention below.

 Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

2. AlexNet network structure

Let’s take a look at the network structure of AlexNet, as shown in the figure below:

It can be seen that the network structures of AlexNet and LeNet are very similar. Here we focus on the differences between the two to understand the network structure of AlexNet. (If you need to know about the LeNet network, you can refer to: (3 messages) [CNN] What is a convolutional neural network (LeNet)? How to implement LeNet?_Xiao Liang.'s blog-CSDN blog icon-default.png?t=N3I4https://blog.csdn.net/ m0_51816252/article/details/130657443

First, compared with the relatively small LeNet, AlexNet contains 8 layers of transformations, including 5 convolutional layers and 2 fully connected hidden layers, as well as 1 fully connected output layer.

The convolution window shape in the first layer of AlexNet is 11×11. Because the height and width of most images in ImageNet are more than 10 times larger than the height and width of MNIST images, the objects in ImageNet images occupy more pixels, so a larger convolution window is needed to capture the objects.

The convolution window shape in the second layer is reduced to 5×5, and then 3×3 is used throughout. In addition, the first, second and fifth convolutional layers are followed by a max-pooling layer with a window shape of 3×3 and a stride of 2. Moreover, the number of convolution channels used by AlexNet is also dozens of times greater than the number of convolution channels in LeNet. The last convolutional layer is two fully connected layers with 4096 outputs.

Second, AlexNet changed the sigmoid activation function into a simpler ReLU activation function.

On the one hand, the calculation of the ReLU activation function is simpler. For example, it does not have the exponentiation operation in the sigmoid activation function. On the other hand, the ReLU activation function makes the model easier to train under different parameter initialization methods. This is because when the output of the sigmoid activation function is very close to 0 or 1, the gradient in these areas is almost 0, causing backpropagation to be unable to continue to update some model parameters; while the gradient of the ReLU activation function in the positive interval is always 1. Therefore, if the model parameters are not initialized properly, the sigmoid function may obtain a gradient of almost 0 in the positive interval, making the model unable to be effectively trained.

Third, AlexNet uses dropout to control the model complexity of the fully connected layer. LeNet does not use the dropout method.

Fourth, AlexNet introduces a large number of image augmentations, such as flipping, cropping, and color changes, thereby further expanding the data set to alleviate overfitting.

3. Implement the AlexNet model

Here we use sequence directly to construct the model:

class AlexNet(nn.Module):
    def __init__(self):
        super(AlexNet, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 96, 11, 4), # in_channels, out_channels, kernel_size, stride, padding
            nn.ReLU(),
            nn.MaxPool2d(3, 2), # kernel_size, stride
            # 减小卷积窗口,使用填充为2来使得输入与输出的高和宽一致,且增大输出通道数
            nn.Conv2d(96, 256, 5, 1, 2),
            nn.ReLU(),
            nn.MaxPool2d(3, 2),
            # 连续3个卷积层,且使用更小的卷积窗口。除了最后的卷积层外,进一步增大了输出通道数。
            # 前两个卷积层后不使用池化层来减小输入的高和宽
            nn.Conv2d(256, 384, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2d(384, 384, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2d(384, 256, 3, 1, 1),
            nn.ReLU(),
            nn.MaxPool2d(3, 2)
        )
         # 这里全连接层的输出个数比LeNet中的大数倍。使用丢弃层来缓解过拟合
        self.fc = nn.Sequential(
            nn.Linear(256*5*5, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            # 输出层。
            nn.Linear(4096, 1000),
        )

    def forward(self, img):
        feature = self.conv(img)
        output = self.fc(feature.view(img.shape[0], -1))
        return output

After creating the model, we can enter the following code to view the shape of each layer.

net = AlexNet()

print(net)

Here I give the output directly:

AlexNet(
  (conv): Sequential(
    (0): Conv2d(1, 96, kernel_size=(11, 11), stride=(4, 4))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU()
    (8): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU()
    (10): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU()
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=6400, out_features=4096, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

We can see that the model contains 8 layers of transformation, including 5 convolutional layers, 2 fully connected hidden layers, and 1 fully connected output layer.


Summarize

After reading this article, I believe you already have a basic understanding of deep convolutional neural networks (AlexNet) . The emergence of AlexNet proved for the first time that learned features can surpass hand-designed features. This was at the time A big breakthrough. I hope you will gain something from reading this article.

Guess you like

Origin blog.csdn.net/m0_51816252/article/details/130657840