Table of contents
I have been learning deep learning for a year. I have watched a lot of videos and books, and learned a lot of codes, but I still feel that my cognition is not solid enough. Combined with Mr. Li Mu's video course, I decided to introduce the process of reproducing LeNet in this blog. The code is based on the pycharm2021 platform, using python3.8 version + pytorch1.12.1 + cu116. Basically, the version of each package is flashed to the latest version to facilitate subsequent network upgrades and backward compatibility.
1. Introduction to LeNet
The LeNet network [1] was proposed by Yann LeCun, a researcher at AT&T Bell Labs at the time, and can be regarded as the pioneering work of convolutional neural networks. The reason why LeNet was chosen as the first neural network to try to reproduce is because the structure of the network itself is simple and clear, which is easy to understand. As a practical convolutional neural network that was successfully applied to banks and postal systems in the early days, the structure of LeNet is classic enough, and many of its ideas have been passed down to this day. Therefore, LeNet is very appropriate as a classic case of deep network code reproduction.
We first review the basic structure of LeNet. The input is a 32*32 single-channel image (the image size of the updated version of the minist dataset may be reduced to 28*28, so padding is required during convolution to ensure that the convolutional feature map is 28*28), and then Use a convolutional layer to transform a 6-channel 28*28 C1 feature map; add a step of pooling to compress the 28*28 feature to 14*14. Then follow the same steps to press out a 16-channel 5*5 feature map, and finally add two fully connected layers, and output a vector composed of 10 elements to judge the category of the input number. It can be seen that the whole structure is very clear and easy to understand.
2. LeNet implementation
The LeNet network is built as follows:
import torch
from torch import nn
from d2l import torch as s2l
class Reshape(torch.nn.Module):
def forward(self, x):
return x.view(-1,1,28,28)
net = torch.nn.Sequential(Reshape(),
nn.Conv2d(1,6,kernel_size = 5,padding=2),
nn.Sigmoid(),
nn.AvgPool2d(kernel_size=2,stride=2),
nn.Conv2d(6,16,kernel_size=5),
nn.Sigmoid(),
nn.AvgPool2d(kernel_size=2,stride=2),
nn.Flatten(),
nn.Linear(16*5*5, 120),
nn.Sigmoid(),
nn.Linear(120, 84),
nn.Sigmoid(),
nn.Linear(84, 10))
X = torch.rand(size = (1,1,28,28),dtype=torch.float32)
for layer in net:
X = layer(X)
print(layer.__class__.__name__,'output shape:\t',X.shape)
It can be seen that the implementation of the entire network is relatively simple. Here, according to the introduction of Mr. Li Mu's video, we give a random input to output the changes of each layer in the network to the input data. The results are as follows:
Reshape output shape: torch.Size([1, 1, 28, 28])
Conv2d output shape: torch.Size([1, 6, 28, 28])
Sigmoid output shape: torch.Size([1, 6, 28, 28])
AvgPool2d output shape: torch.Size([1, 6, 14, 14])
Conv2d output shape: torch.Size([1, 16, 10, 10])
Sigmoid output shape: torch.Size([1, 16, 10, 10])
AvgPool2d output shape: torch.Size([1, 16, 5, 5])
Flatten output shape: torch.Size([1, 400])
Linear output shape: torch.Size([1, 120])
Sigmoid output shape: torch.Size([1, 120])
Linear output shape: torch.Size([1, 84])
Sigmoid output shape: torch.Size([1, 84])
Linear output shape: torch.Size([1, 10])
After determining the network structure, we extract test data. Here, we use the Fashion-MNIST dataset to train and test the performance of the network. The data extraction code is as follows:
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size = batch_size)
Calculate the accuracy of the model on the dataset using the GPU:
def evaluate_accuracy_gpu(net, data_iter,device=None):
if isinstance(net, torch.nn.Module):
net.eval()
if not device:
device = next(iter(net.parameters())).device
metric = d2l.Accumulator(2)
for X,y in data_iter:
if isinstance(X,list):
X = [x.to(device) for x in X]
else:
X = X.to(device)
y = y.to(device)
metric.add(d2l.accuracy(net(X),y), y.numel())
return metric[0]/metric[1]
Add the full code of the training function:
import torch
from torch import nn
from d2l import torch as d2l
class Reshape(torch.nn.Module):
def forward(self, x):
return x.view(-1,1,28,28)
def evaluate_accuracy_gpu(net, data_iter,device=None):
if isinstance(net, torch.nn.Module):
net.eval()
if not device:
device = next(iter(net.parameters())).device
metric = d2l.Accumulator(2)
for X,y in data_iter:
if isinstance(X,list):
X = [x.to(device) for x in X]
else:
X = X.to(device)
y = y.to(device)
metric.add(d2l.accuracy(net(X),y), y.numel())
return metric[0]/metric[1]
def train_ch6(net, train_iter, test_iter, num_epochs, lr ,device):#lr: learning rate
"""train a model woth GPU"""
def init_weights(m):
if type(m) == nn.Linear or type(m) == nn.Conv2d:
nn.init.xavier_uniform_(m.weight)
net.apply(init_weights)
print('training on', device)
net.to(device)
optimizer = torch.optim.SGD(net.parameters(),lr=lr)
loss = nn.CrossEntropyLoss()
animator = d2l.Animator(xlabel = 'epoch', xlim = [1, num_epochs],
legend = ['train loss', 'train acc', 'test acc'])
timer, num_batches = d2l.Timer(),len(train_iter)
for epoch in range(num_epochs):
metric = d2l.Accumulator(3)
net.train()
for i, (X, y) in enumerate(train_iter):
timer.start()
optimizer.zero_grad()
X, y = X.to(device), y.to(device)
y_hat = net(X)
l = loss(y_hat, y)
l.backward()
optimizer.step()
metric.add(l * X.shape[0], d2l.accuracy(y_hat, y), X.shape[0])
timer.stop()
train_l = metric[0] / metric[2]
train_acc = metric[1] / metric[2]
if(i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
animator.add(epoch + (i + 1)/ num_batches,(train_l, train_acc, None))
test_acc = evaluate_accuracy_gpu(net, test_iter)
animator.add(epoch + 1, (None, None, test_acc))
print('Epoch:', epoch)
print(f'loss {train_l:.3f}, train acc {train_acc:,.3f},' f'test acc {test_acc:.3f}')
print(f'{metric[2] * num_epochs / timer.sum():.1f} examples/sec' f'on {str(device)}')
print(f'loss {train_l:.3f}, train acc {train_acc:,.3f},' f'test acc {test_acc:.3f}')
print(f'{metric[2] * num_epochs/timer.sum():.1f} examples/sec' f'on {str(device)}')
print('finished')
def main():
net = torch.nn.Sequential(Reshape(),
nn.Conv2d(1, 6, kernel_size=5, padding=2),
nn.Sigmoid(),
nn.AvgPool2d(kernel_size=2, stride=2),
nn.Conv2d(6, 16, kernel_size=5),
nn.Sigmoid(),
nn.AvgPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(16 * 5 * 5, 120),
nn.Sigmoid(),
nn.Linear(120, 84),
nn.Sigmoid(),
nn.Linear(84, 10))
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)
lr, num_epochs = 0.9, 10
train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
if __name__ == '__main__':
main()
3. Experimental results
Print result:
training on cuda:0
Epoch: 0
loss 2.317, train acc 0.102,test acc 0.100
174566.9 examples/secon cuda:0
Epoch: 1
loss 1.383, train acc 0.459,test acc 0.580
139471.3 examples/secon cuda:0
Epoch: 2
loss 0.857, train acc 0.661,test acc 0.652
115809.0 examples/secon cuda:0
Epoch: 3
loss 0.718, train acc 0.716,test acc 0.701
99568.9 examples/secon cuda:0
Epoch: 4
loss 0.648, train acc 0.748,test acc 0.752
87336.1 examples/secon cuda:0
Epoch: 5
loss 0.590, train acc 0.770,test acc 0.776
77399.1 examples/secon cuda:0
Epoch: 6
loss 0.550, train acc 0.787,test acc 0.781
69605.1 examples/secon cuda:0
Epoch: 7
loss 0.515, train acc 0.800,test acc 0.793
63230.5 examples/secon cuda:0
Epoch: 8
loss 0.485, train acc 0.816,test acc 0.799
57836.1 examples/secon cuda:0
Epoch: 9
loss 0.459, train acc 0.829,test acc 0.761
53456.0 examples/secon cuda:0
loss 0.459, train acc 0.829,test acc 0.761
53456.0 examples/secon cuda:0
Dynamic graph:
Note: If the animation cannot be displayed, please refer to the blog: What should I do if the animation cannot be displayed?
Reference
[1] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.