"Hands-on Deep Learning Pytorch Edition" 7.1 Deep Convolutional Neural Network (LeNet)

7.1.1 Learning Representations

The breakthrough of deep convolutional neural networks occurred in 2012. The breakthrough can be attributed to two key factors:

  • The missing ingredient: data
    The shortage of data sets improved with the big data wave that emerged around 2010. In the ImageNet Challenge, the ImageNet dataset was developed by researchers from Stanford University Professor Li Feifei's group. They used Google Image Search to pre-screen classified images and used Amazon crowdsourcing to annotate the category of each image. This scale of data is unprecedented.
  • The missing ingredient: Hardware
    In 2012, Alex Krizhevsky and Ilya Sutskever spurred the deep learning craze by implementing fast convolution operations using two NVIDIA GTX580 GPUs with 3GB of video memory.

7.1.2 AlexNet

AlexNet, which was born in 2012, proved for the first time that learned features can surpass manually designed features.

The architectures of AlexNet and LeNet are very similar (this book simplifies the model slightly, and takes out the design features that require two small GPUs to operate at the same time):

Fully connected layer(1000)

↑ \uparrow

Fully connected layer(4096)

↑ \uparrow

Fully connected layer(4096)

↑ \uparrow

3 × 3 3\times33×3 maximum aggregation layers, stride 2

↑ \uparrow

3 × 3 3\times33×3 convolutional layers (384), padding 1

↑ \uparrow

3 × 3 3\times33×3 convolutional layers (384), padding 1

↑ \uparrow

3 × 3 3\times33×3 convolutional layers (384), padding 1

↑ \uparrow

3 × 3 3\times33×3 maximum aggregation layers, stride 2

↑ \uparrow

5 x 5 5\times55×5 convolutional layers (256), padding 2

↑ \uparrow

3 × 3 3\times33×3 maximum aggregation layers, stride 2

↑ \uparrow

11 × 11 11\times11 11×11 convolutional layers (96), stride 4

↑ \uparrow

Input image ( 3 × 224 × 224 3\times224\times2243×224×224

The difference between AlexNet and LeNet:

- AlexNet 比 LeNet 深的多
- AlexNet 使用 ReLU 而非 sigmoid 作为激活函数

The following are the details of AlexNet.

  1. Model design

    Since most of the images in ImageNet are large, the first layer uses 11 × 11 11\times1111×11 super large convolution kernel. Subsequently, it will be reduced step by step to3 × 3 3\times33×3 . Moreover, the number of convolution channels of AlexNet is ten times that of LeNet.

    The last two huge fully connected layers each have 4096 outputs and nearly 1G model parameters. Due to the limited memory of early GPUs, the original AlexNet adopted a dual data stream design.

  2. activation function

    The ReLU activation function makes training models easier. Its gradient in the positive interval is always 1, while the sigmoid function may get a gradient of almost 0 in the positive interval.

  3. Capacity control and preprocessing

    AlexNet controls the complexity of the fully connected layer through the retreat method. In addition, in order to expand the data, AlexNet adds a large amount of image enhancement data (such as flipping, cropping and color changing) during training, which also makes the model more robust and reduces overfitting.

import torch
from torch import nn
from d2l import torch as d2l
net = nn.Sequential(
    # 这里使用一个11*11的更大窗口来捕捉对象。
    # 同时,步幅为4,以减少输出的高度和宽度。
    # 另外,输出通道的数目远大于LeNet
    nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    # 减小卷积窗口,使用填充为2来使得输入与输出的高和宽一致,且增大输出通道数
    nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    # 使用三个连续的卷积层和较小的卷积窗口。
    # 除了最后的卷积层,输出通道的数量进一步增加。
    # 在前两个卷积层之后,汇聚层不用于减少输入的高度和宽度
    nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
    nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
    nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Flatten(),
    # 这里,全连接层的输出数量是LeNet中的好几倍。使用dropout层来减轻过拟合
    nn.Linear(6400, 4096), nn.ReLU(),
    nn.Dropout(p=0.5),
    nn.Linear(4096, 4096), nn.ReLU(),
    nn.Dropout(p=0.5),
    # 最后是输出层。由于这里使用Fashion-MNIST,所以用类别数为10,而非论文中的1000
    nn.Linear(4096, 10))
X = torch.randn(1, 1, 224, 224)
for layer in net:
    X=layer(X)
    print(layer.__class__.__name__,'output shape:\t',X.shape)
Conv2d output shape:	 torch.Size([1, 96, 54, 54])
ReLU output shape:	 torch.Size([1, 96, 54, 54])
MaxPool2d output shape:	 torch.Size([1, 96, 26, 26])
Conv2d output shape:	 torch.Size([1, 256, 26, 26])
ReLU output shape:	 torch.Size([1, 256, 26, 26])
MaxPool2d output shape:	 torch.Size([1, 256, 12, 12])
Conv2d output shape:	 torch.Size([1, 384, 12, 12])
ReLU output shape:	 torch.Size([1, 384, 12, 12])
Conv2d output shape:	 torch.Size([1, 384, 12, 12])
ReLU output shape:	 torch.Size([1, 384, 12, 12])
Conv2d output shape:	 torch.Size([1, 256, 12, 12])
ReLU output shape:	 torch.Size([1, 256, 12, 12])
MaxPool2d output shape:	 torch.Size([1, 256, 5, 5])
Flatten output shape:	 torch.Size([1, 6400])
Linear output shape:	 torch.Size([1, 4096])
ReLU output shape:	 torch.Size([1, 4096])
Dropout output shape:	 torch.Size([1, 4096])
Linear output shape:	 torch.Size([1, 4096])
ReLU output shape:	 torch.Size([1, 4096])
Dropout output shape:	 torch.Size([1, 4096])
Linear output shape:	 torch.Size([1, 10])

7.1.3 Reading the data set

If you really use ImageNet training, even today's GPUs will take hours or days. This is only for demonstration, and the Fashion-MNIST data set is still used, so the image resolution problem needs to be solved here.

batch_size = 128
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)

7.1.4 Training AlexNet

lr, num_epochs = 0.01, 10
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())  # 大约需要二十分钟,慎跑
loss 0.330, train acc 0.879, test acc 0.878
592.4 examples/sec on cuda:0

Insert image description here

practise

(1) Try to increase the number of rounds. What are the differences compared to the results of LeNet? Why?

lr, num_epochs = 0.01, 15
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())  # 大约需要三十分钟,慎跑
loss 0.284, train acc 0.896, test acc 0.887
589.3 examples/sec on cuda:0

Insert image description here

Compared with LeNet, which increases the number of rounds but leads to a decrease in accuracy, AlexNet has better anti-overfitting capabilities, and the accuracy will increase as the number of rounds increases.


(2) The AlexNet model may be too complex for Fashion-MNIST.

a. 尝试简化模型以加快训练速度,同时确保准确性不会显著下降。

b. 设计一个更好的模型,可以直接在 $28\times28$ 像素的图像上工作。
net_Better = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=5, stride=2, padding=2), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=1),
    nn.Conv2d(64, 128, kernel_size=3, padding=1), nn.ReLU(),
    nn.Conv2d(128, 128, kernel_size=3, padding=1), nn.ReLU(),
    nn.Conv2d(128, 64, kernel_size=3, padding=1), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Flatten(),
    nn.Linear(64 * 5 * 5, 1024), nn.ReLU(),
    nn.Dropout(p=0.3), 
    nn.Linear(1024, 512), nn.ReLU(),
    nn.Dropout(p=0.3),
    nn.Linear(512, 10)
)

X = torch.randn(1, 1, 28, 28)
for layer in net_Better:
    X=layer(X)
    print(layer.__class__.__name__,'output shape:\t',X.shape)
Conv2d output shape:	 torch.Size([1, 64, 14, 14])
ReLU output shape:	 torch.Size([1, 64, 14, 14])
MaxPool2d output shape:	 torch.Size([1, 64, 12, 12])
Conv2d output shape:	 torch.Size([1, 128, 12, 12])
ReLU output shape:	 torch.Size([1, 128, 12, 12])
Conv2d output shape:	 torch.Size([1, 128, 12, 12])
ReLU output shape:	 torch.Size([1, 128, 12, 12])
Conv2d output shape:	 torch.Size([1, 64, 12, 12])
ReLU output shape:	 torch.Size([1, 64, 12, 12])
MaxPool2d output shape:	 torch.Size([1, 64, 5, 5])
Flatten output shape:	 torch.Size([1, 1600])
Linear output shape:	 torch.Size([1, 1024])
ReLU output shape:	 torch.Size([1, 1024])
Dropout output shape:	 torch.Size([1, 1024])
Linear output shape:	 torch.Size([1, 512])
ReLU output shape:	 torch.Size([1, 512])
Dropout output shape:	 torch.Size([1, 512])
Linear output shape:	 torch.Size([1, 10])
batch_size = 128
train_iter28, test_iter28 = d2l.load_data_fashion_mnist(batch_size=batch_size)
lr, num_epochs = 0.01, 10
d2l.train_ch6(net_Better, train_iter28, test_iter28, num_epochs, lr, d2l.try_gpu())  # 快多了
loss 0.429, train acc 0.841, test acc 0.843
6650.9 examples/sec on cuda:0

Insert image description here


(3) Modify the batch size and observe changes in model accuracy and GPU memory.

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)

lr, num_epochs = 0.01, 10
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())  # 大约需要二十分钟,慎跑
loss 0.407, train acc 0.850, test acc 0.855
587.8 examples/sec on cuda:0

Insert image description here

The 4G video memory is basically full, the accuracy is slightly reduced, and overfitting seems to be serious.


(4) Analyze the computing performance of AlexNet.

a. 在 AlexNet 中主要是哪一部分占用显存?

b. 在AlexNet中主要是哪部分需要更多的计算?

c. 计算结果时显存带宽如何?

a. The first fully connected layer occupies the most memory

b. The penultimate convolutional layer requires more computation


(5) If dropout and ReLU are applied to LeNet-5, will the effect be improved? What will happen if we try preprocessing again?

net_try = nn.Sequential(
    nn.Conv2d(1, 6, kernel_size=5, padding=2), nn.ReLU(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Conv2d(6, 16, kernel_size=5), nn.ReLU(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Flatten(),
    nn.Linear(16 * 5 * 5, 120), nn.ReLU(),
    nn.Dropout(p=0.2), 
    nn.Linear(120, 84), nn.ReLU(),
    nn.Dropout(p=0.2), 
    nn.Linear(84, 10))

lr, num_epochs = 0.6, 10
d2l.train_ch6(net_try, train_iter28, test_iter28, num_epochs, lr, d2l.try_gpu())  # 浅调一下还挺好
loss 0.306, train acc 0.887, test acc 0.883
26121.2 examples/sec on cuda:0

Insert image description here

After a slight adjustment, the effect is quite good and the accuracy is improved.

Guess you like

Origin blog.csdn.net/qq_43941037/article/details/132962691