1. Introduction to data sets
SVHN stands for Street View House Number data set. It is one of the many digital recognition data sets created in the early days of deep learning. It is also the only digital recognition data set based on real-shot pictures. Its style is similar to the MNST data set. Each image is a number obtained after cropping, and it is a ten-category category related to the numbers 0 to 9. However, the entire data set supports three tasks: recognition, detection, and unsupervised. The SVHN data set also So there are three different benchmarks. Since the original images of SVHN are all derived from house numbers in Google Earth street views, the complexity of natural scene images in their pixel information is higher, digital recognition is more difficult, and the requirements for recognition models are obviously higher. In academia, when everyone is tired of the 99% accuracy on the MNIST data set and Fashion-MNIST data set, they often use the SVHN data set to verify the ability of their network architecture on real photos. At the same time, although it is a real-shot data set, the images in the SVHN recognition set are processed very small (size 32x32, channel 3), and the sample size is also around 100,000. Iteration can be implemented on the CPU, which is very suitable for Data set that goes through the complete process.
2. Stop the algorithm early
The optimization algorithm aims to find the global minimum of the loss function. Ideally, when the algorithm finds the global optimum, the neural network will "converge" and the iteration will stop. Unfortunately, we do not know what the real global minimum is, so we cannot judge whether the algorithm has actually found the global minimum. Secondly, a common situation may be that the local minimum that the algorithm can actually obtain is 0.5, and the optimization algorithm may lock the range between (0.500001,0.49999) in a short time, but due to the learning rate When it comes to hyperparameter settings, the minimum value of 0.5 cannot be reached. In both cases the optimization algorithm will continue to iterate (ineffectively), so we will need to manually stop the neural network. We will only stop the iteration of the neural network in two situations:
1. The neural network has achieved a good enough effect (very close to the convergence state), and continuing to iterate will not help the algorithm effect, for example, it will fall into over-simulation. combination, or the model will stagnate
2. The training time of the neural network is too long, even if we know that it has not found the optimal result
So how do we find a time point when the test set loss no longer decreases and the accuracy no longer increases? At this time, we can specify a threshold, for example, when the reduction value of the loss function is low in consecutive iterations When the threshold tol, or the score improvement value of the test set is lower than the threshold to, we can stop the iteration. At this time, even if the epochsi we specified has not been used up, we can also think that the neural network is very close "Convergence" can stop the neural network. This kind of stopping is called "eary stopping" in machine learning. Sometimes learning rate decay may also be combined with early stopping. In some neural networks, we may stipulate that when the reduction value of the loss function in consecutive iterations is lower than the threshold tol, the learning rate will be attenuated. Of course, if the optimization algorithm we use inherently has a learning rate decay mechanism, then we do not need to consider this. When actually implementing early stopping, we stipulate that the number of consecutive times is 5 consecutive times (if you like, you can set this value as a hyperparameter). At the same time, the decreasing value of the loss function is not compared between this iteration and the previous iteration. We need to compare the loss of this iteration with the minimum loss of historical iterations. If the historical minimum loss - the loss of this iteration > tol, we admit that the loss function has decreased. This setting is not very friendly to unstable structures. If we find that the model is unstable, we can set a smaller threshold. Based on this idea, let’s look at the specific code:
class EarlyStopping():
def __init__(self,patience=5,tol=0.0005): #惯例地定义我们所需要的一切变量/属性
self.patience = patience
self.tol = tol
self.counter = 0
self.lowest_loss =None
self.early_stop = False
def __call__(self,val_loss):
#这一轮迭代地损失与历史最低损失之间的差
if self.lowest_loss == None:
self.lowest_loss = val_loss
elif self.lowest_loss - val_loss > self.tol:
self.lowest_loss = val_loss
self.counter = 0
elif self.lowest_loss - val_loss < self.tol:
self.counter +=1
print("\t NOTICE: Early stopping counter {} of {}".format(self.counter,self.patience))
if self.counter >=self.patience:
print('\t NOTICE: Early stopping Actived')
self.early_stop = True
return self.early_stop
3. Training process
3.1. Preparation
Import the required packages and functions
import os
import torch
os.environ['KMP_DUPLICATE_LTB_OK']='True' #用于避免jupyter环境突然关闭
torch.backends.cudnn.benchmark=True #用于加速Gpu代码
import torchvision
from torch import nn,optim
from torch.nn import functional as F
from torchvision import transforms as T
from torchvision import models as M
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from time import time
import datetime
import random #控制随机性
import numpy as np
import pandas as pd
import gc #垃圾回收
#设置全局的随机数种子
torch.manual_seed(1412)
random.seed(1412)
np.random.seed(1412)
3.1.1. Equipment preparation
Configure device
torch.cuda.is_available()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
3.1.2. Load data set
3.1.2.1. Load the data set to view the characteristics of the data set
train = torchvision.datasets.SVHN(root='SVHN',split='train',download=True)
test = torchvision.datasets.SVHN(root='SVHN',split='test',download=True)
View information about the dataset
3.1.2.2. Load the data set into tensor format
Check the size of the image and the number of channels in the data set
. Write a program to visualize the image.
#让每个数据集随机显示五张图象
import matplotlib.pyplot as plt
import numpy as np
import random
def plotsample(data): #只能接受tensor格式
fig,axs = plt.subplots(1,5,figsize=(10,10)) #建立子图
for i in range(5):
num = random.randint(0,len(data)-1)
nping = torchvision.utils.make_grid(data[num][0]).numpy()
nplabel = data[num][1] #提取标签
axs[i].imshow(np.transpose(nping,(1,2,0)))
axs[i].set_title(nplabel)
axs[i].axis("off") #消除每个子图的坐标轴
Show results
3.1.2.3 Data enhancement operations
trainT = T.Compose([T.RandomCrop(28),T.RandomRotation(degrees=[-30,30]),T.ToTensor(),T.Normalize(mean=[0.485,0.456,0.106],std=[0.229,0.224,0.225])])
testT = T.Compose([T.RandomCrop(28),T.ToTensor(),T.Normalize(mean=[0.485,0.456,0.106],std=[0.229,0.224,0.225])])
train = torchvision.datasets.SVHN(root='SVHN',split='train',download=True,transform=trainT)
test = torchvision.datasets.SVHN(root='SVHN',split='test',download=True,transform=testT)
Visualize augmented data
3.2. Build a network
Load classic network
torch.manual_seed(1412)
resnet18_ =M.resnet18()
vgg16_ =M.vgg16()
Customize MyResNet network
class MyResNet(nn.Module):
def __init__(self):
super().__init__()
self.block1 = nn.Sequential(nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1,bias=False)
,resnet18_.bn1,resnet18_.relu)
self.block2 = resnet18_.layer2
self.block3 = resnet18_.layer3
self.avgpool = resnet18_.avgpool
self.fc = nn.Linear(in_features=256,out_features=10,bias=True)
def forward(self,x):
x = self.block1(x)
x = self.block3(self.block2(x))
x = self.avgpool(x)
x = x.view(x.shape[0],256)
x = self.fc(x)
return x
Customize MyVgg network
class MyVgg(nn.Module):
def __init__(self):
super().__init__()
self.features = nn.Sequential(*vgg16_.features[0:9] #星号用于解码
,nn.Conv2d(128,128,kernel_size=3,stride=1,padding=1)
,nn.ReLU(inplace=True)
,nn.MaxPool2d(2,2,padding=0,dilation=1,ceil_mode=False))
self.avgpool = vgg16_.avgpool
self.fc = nn.Sequential(nn.Linear(7*7*128,out_features=4096,bias=True),
*vgg16_.classifier[1:6],nn.Linear(in_features=4096,out_features=10,bias=True))
def forward(self,x):
x = self.features(x)
x = self.avgpool(x)
x = x.view(x.shape[0],7*7*128)
x = self.fc(x)
return x
Network verification
from torchinfo import summary
summary(MyResNet(),(10,3,28,28),depth=3)
#打印输出
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
MyResNet [10, 10] --
├─Sequential: 1-1 [10, 64, 28, 28] --
│ └─Conv2d: 2-1 [10, 64, 28, 28] 1,728
│ └─BatchNorm2d: 2-2 [10, 64, 28, 28] 128
│ └─ReLU: 2-3 [10, 64, 28, 28] --
├─Sequential: 1-2 [10, 128, 14, 14] --
│ └─BasicBlock: 2-4 [10, 128, 14, 14] --
│ │ └─Conv2d: 3-1 [10, 128, 14, 14] 73,728
│ │ └─BatchNorm2d: 3-2 [10, 128, 14, 14] 256
│ │ └─ReLU: 3-3 [10, 128, 14, 14] --
│ │ └─Conv2d: 3-4 [10, 128, 14, 14] 147,456
│ │ └─BatchNorm2d: 3-5 [10, 128, 14, 14] 256
│ │ └─Sequential: 3-6 [10, 128, 14, 14] 8,448
│ │ └─ReLU: 3-7 [10, 128, 14, 14] --
│ └─BasicBlock: 2-5 [10, 128, 14, 14] --
│ │ └─Conv2d: 3-8 [10, 128, 14, 14] 147,456
│ │ └─BatchNorm2d: 3-9 [10, 128, 14, 14] 256
│ │ └─ReLU: 3-10 [10, 128, 14, 14] --
│ │ └─Conv2d: 3-11 [10, 128, 14, 14] 147,456
│ │ └─BatchNorm2d: 3-12 [10, 128, 14, 14] 256
│ │ └─ReLU: 3-13 [10, 128, 14, 14] --
├─Sequential: 1-3 [10, 256, 7, 7] --
│ └─BasicBlock: 2-6 [10, 256, 7, 7] --
│ │ └─Conv2d: 3-14 [10, 256, 7, 7] 294,912
│ │ └─BatchNorm2d: 3-15 [10, 256, 7, 7] 512
│ │ └─ReLU: 3-16 [10, 256, 7, 7] --
│ │ └─Conv2d: 3-17 [10, 256, 7, 7] 589,824
│ │ └─BatchNorm2d: 3-18 [10, 256, 7, 7] 512
│ │ └─Sequential: 3-19 [10, 256, 7, 7] 33,280
│ │ └─ReLU: 3-20 [10, 256, 7, 7] --
│ └─BasicBlock: 2-7 [10, 256, 7, 7] --
│ │ └─Conv2d: 3-21 [10, 256, 7, 7] 589,824
│ │ └─BatchNorm2d: 3-22 [10, 256, 7, 7] 512
│ │ └─ReLU: 3-23 [10, 256, 7, 7] --
│ │ └─Conv2d: 3-24 [10, 256, 7, 7] 589,824
│ │ └─BatchNorm2d: 3-25 [10, 256, 7, 7] 512
│ │ └─ReLU: 3-26 [10, 256, 7, 7] --
├─AdaptiveAvgPool2d: 1-4 [10, 256, 1, 1] --
├─Linear: 1-5 [10, 10] 2,570
==========================================================================================
Total params: 2,629,706
Trainable params: 2,629,706
Non-trainable params: 0
Total mult-adds (G): 2.07
==========================================================================================
Input size (MB): 0.09
Forward/backward pass size (MB): 38.13
Params size (MB): 10.52
Estimated Total Size (MB): 48.75
==========================================================================================
summary(MyVgg(),(10,3,28,28),depth=4)
#打印输出为:
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
MyVgg [10, 10] --
├─Sequential: 1-1 [10, 128, 7, 7] --
│ └─Conv2d: 2-1 [10, 64, 28, 28] 1,792
│ └─ReLU: 2-2 [10, 64, 28, 28] --
│ └─Conv2d: 2-3 [10, 64, 28, 28] 36,928
│ └─ReLU: 2-4 [10, 64, 28, 28] --
│ └─MaxPool2d: 2-5 [10, 64, 14, 14] --
│ └─Conv2d: 2-6 [10, 128, 14, 14] 73,856
│ └─ReLU: 2-7 [10, 128, 14, 14] --
│ └─Conv2d: 2-8 [10, 128, 14, 14] 147,584
│ └─ReLU: 2-9 [10, 128, 14, 14] --
│ └─Conv2d: 2-10 [10, 128, 14, 14] 147,584
│ └─ReLU: 2-11 [10, 128, 14, 14] --
│ └─MaxPool2d: 2-12 [10, 128, 7, 7] --
├─AdaptiveAvgPool2d: 1-2 [10, 128, 7, 7] --
├─Sequential: 1-3 [10, 10] --
│ └─Linear: 2-13 [10, 4096] 25,694,208
│ └─ReLU: 2-14 [10, 4096] --
│ └─Dropout: 2-15 [10, 4096] --
│ └─Linear: 2-16 [10, 4096] 16,781,312
│ └─ReLU: 2-17 [10, 4096] --
│ └─Dropout: 2-18 [10, 4096] --
│ └─Linear: 2-19 [10, 10] 40,970
==========================================================================================
Total params: 42,924,234
Trainable params: 42,924,234
Non-trainable params: 0
Total mult-adds (G): 1.45
==========================================================================================
Input size (MB): 0.09
Forward/backward pass size (MB): 14.71
Params size (MB): 171.70
Estimated Total Size (MB): 186.50
==========================================================================================
3.3. Define training function
def fit_test(net,batchdata,testdata,criterion,opt,epochs,tol,modelname,PATH):
"""
对模型进行训练,并在每个epoch后输出训练集和测试集上的准备率/损失
"""
SamplePerEpoch = batchdata.dataset.__len__()
allsamples = SamplePerEpoch * epochs
trainedsample = 0
trainlosslist = []
testlosslist = []
early_stopping = EarlyStopping(tol=tol)
highestacc = None
for epoch in range(1,epochs+1):
net.train()
correct_train = 0
loss_train = 0
for batch_idx,(x,y) in enumerate(batchdata):
x = x.to(device,non_blocking=True)
y = y.to(device,non_blocking=True).view(x.shape[0])
sigma = net.forward(x)
loss = criterion(sigma,y)
loss.backward()
opt.step()
opt.zero_grad()
yhat = torch.max(sigma,1)[1] #真正的预测标签
correct = torch.sum(yhat==y) #实际预测正确的样本数量
trainedsample += x.shape[0]
loss_train += loss
correct_train += correct
if (batch_idx+1) % 125 == 0:
print("Epoch{}:[{}/{}({:.0f})%)]".format(epoch,trainedsample,allsamples,100*trainedsample/allsamples))
TrainAccThisEpoch = float(correct_train*100)/SamplePerEpoch
TrainLossThisEpoch = float(loss_train*100)/SamplePerEpoch
trainlosslist.append(TrainLossThisEpoch)
#清理GPU内存 清理掉不需要的中间变量
del x,y,correct
gc.collect() #清除数据与变量相关的缓存
torch.cuda.empty_cache()
#测试一次
net.eval()
loss_test = 0
correct_test = 0
TestSample = testdata.dataset.__len__()
for x,y in testdata:
with torch.no_grad():
x = x.to(device,non_blocking=True)
y = y.to(device,non_blocking=True).view(x.shape[0])
sigma = net.forward(x)
loss = criterion(sigma,y)
yhat = torch.max(sigma,1)[1]
correct = torch.sum(yhat==y)
loss_test += loss
correct_test += correct
TestAccThisEpoch = float(correct_test*100)/TestSample
TestLossThisEpoch = float(loss_test*100)/TestSample
testlosslist.append(TestLossThisEpoch)
print("\t Train loss:{:.6f},Test loss:{:.6f},Train acc:{:.3f}%,test acc:{:.3f}%".format(TrainLossThisEpoch
,TestLossThisEpoch
,TrainAccThisEpoch
,TestAccThisEpoch))
del x,y,correct
gc.collect() #清除数据与变量相关的缓存
torch.cuda.empty_cache()
if highestacc == None:
highestacc = TestAccThisEpoch
if highestacc < TestAccThisEpoch:
highestacc = TestAccThisEpoch
torch.save(net.state_dict(),os.path.join(PATH,modelname+".pt"))
print("\t Weight Saved")
#提前停止
early_stop = early_stopping(TestLossThisEpoch)
if early_stop == "True":
break
print("Done")
return trainlosslist,testlosslist
Define painting function
def plotloss(trainloss, testloss):
plt.figure(figsize=(10,7))
plt.plot(trainloss,color="red",label="Trainloss")
plt.plot(testloss,color="orange",label="Testloss")
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
3.4. Define the overall process function
def full_procedure(net,epochs,bs,modelname,PATH,lr = 0.001,alpha = 0.99,gamma = 0,wd = 0,tol=10**(-5)):
torch.manual_seed(1412)
torch.cuda.manual_seed(1412)
torch.cuda.manual_seed_all(1412)
batchdata = DataLoader(train,batch_size=bs,shuffle=True,drop_last=False,pin_memory=True)
testdata = DataLoader(test,batch_size=bs,shuffle=False,drop_last=False,pin_memory=True)
criterion = nn.CrossEntropyLoss(reduction="sum")
opt = optim.RMSprop(net.parameters(),lr=lr,alpha=alpha,momentum=gamma,weight_decay=wd)
trainloss,testloss = fit_test(net,batchdata,testdata,criterion,opt,epochs,tol,modelname,PATH)
return trainloss,testloss
3.5. Start iterative training
PATH = "/kaggle/working/SVHN"
avgtime = []
for i in range(1):
torch.manual_seed(1412)
torch.cuda.manual_seed(1412)
torch.cuda.manual_seed_all(1412)
resnet18_ = M.resnet18()
net = MyResNet().to(device,non_blocking=True)
start = time()
trainloss,testloss = full_procedure(net,epochs=20,bs=128,modelname="model_seletion_resnet",PATH=PATH)
print(time()-start)
plotloss(trainloss,testloss)