SVHN street real-life house number recognition

1. Introduction to data sets

SVHN stands for Street View House Number data set. It is one of the many digital recognition data sets created in the early days of deep learning. It is also the only digital recognition data set based on real-shot pictures. Its style is similar to the MNST data set. Each image is a number obtained after cropping, and it is a ten-category category related to the numbers 0 to 9. However, the entire data set supports three tasks: recognition, detection, and unsupervised. The SVHN data set also So there are three different benchmarks. Since the original images of SVHN are all derived from house numbers in Google Earth street views, the complexity of natural scene images in their pixel information is higher, digital recognition is more difficult, and the requirements for recognition models are obviously higher. In academia, when everyone is tired of the 99% accuracy on the MNIST data set and Fashion-MNIST data set, they often use the SVHN data set to verify the ability of their network architecture on real photos. At the same time, although it is a real-shot data set, the images in the SVHN recognition set are processed very small (size 32x32, channel 3), and the sample size is also around 100,000. Iteration can be implemented on the CPU, which is very suitable for Data set that goes through the complete process.

2. Stop the algorithm early

The optimization algorithm aims to find the global minimum of the loss function. Ideally, when the algorithm finds the global optimum, the neural network will "converge" and the iteration will stop. Unfortunately, we do not know what the real global minimum is, so we cannot judge whether the algorithm has actually found the global minimum. Secondly, a common situation may be that the local minimum that the algorithm can actually obtain is 0.5, and the optimization algorithm may lock the range between (0.500001,0.49999) in a short time, but due to the learning rate When it comes to hyperparameter settings, the minimum value of 0.5 cannot be reached. In both cases the optimization algorithm will continue to iterate (ineffectively), so we will need to manually stop the neural network. We will only stop the iteration of the neural network in two situations:
1. The neural network has achieved a good enough effect (very close to the convergence state), and continuing to iterate will not help the algorithm effect, for example, it will fall into over-simulation. combination, or the model will stagnate
2. The training time of the neural network is too long, even if we know that it has not found the optimal result
Insert image description here

So how do we find a time point when the test set loss no longer decreases and the accuracy no longer increases? At this time, we can specify a threshold, for example, when the reduction value of the loss function is low in consecutive iterations When the threshold tol, or the score improvement value of the test set is lower than the threshold to, we can stop the iteration. At this time, even if the epochsi we specified has not been used up, we can also think that the neural network is very close "Convergence" can stop the neural network. This kind of stopping is called "eary stopping" in machine learning. Sometimes learning rate decay may also be combined with early stopping. In some neural networks, we may stipulate that when the reduction value of the loss function in consecutive iterations is lower than the threshold tol, the learning rate will be attenuated. Of course, if the optimization algorithm we use inherently has a learning rate decay mechanism, then we do not need to consider this. When actually implementing early stopping, we stipulate that the number of consecutive times is 5 consecutive times (if you like, you can set this value as a hyperparameter). At the same time, the decreasing value of the loss function is not compared between this iteration and the previous iteration. We need to compare the loss of this iteration with the minimum loss of historical iterations. If the historical minimum loss - the loss of this iteration > tol, we admit that the loss function has decreased. This setting is not very friendly to unstable structures. If we find that the model is unstable, we can set a smaller threshold. Based on this idea, let’s look at the specific code:

class EarlyStopping():
    def __init__(self,patience=5,tol=0.0005):   #惯例地定义我们所需要的一切变量/属性
        self.patience = patience
        self.tol = tol
        self.counter = 0
        self.lowest_loss =None
        self.early_stop = False
        
    def __call__(self,val_loss):
        #这一轮迭代地损失与历史最低损失之间的差
        if self.lowest_loss == None:
            self.lowest_loss = val_loss
        elif self.lowest_loss - val_loss > self.tol:
            self.lowest_loss = val_loss
            self.counter = 0
        elif self.lowest_loss - val_loss < self.tol:
            self.counter +=1
            print("\t NOTICE: Early stopping counter {} of {}".format(self.counter,self.patience))
            if self.counter >=self.patience:
                print('\t NOTICE: Early stopping Actived')
                self.early_stop = True
        return self.early_stop

3. Training process

3.1. Preparation

Import the required packages and functions

import os
import torch
os.environ['KMP_DUPLICATE_LTB_OK']='True'  #用于避免jupyter环境突然关闭
torch.backends.cudnn.benchmark=True  #用于加速Gpu代码
import torchvision
from torch import nn,optim
from torch.nn import functional as F
from torchvision import transforms as T
from torchvision import models as M
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from time import time
import datetime
import random  #控制随机性
import numpy as np
import pandas as pd
import gc  #垃圾回收
#设置全局的随机数种子
torch.manual_seed(1412)
random.seed(1412)
np.random.seed(1412)

3.1.1. Equipment preparation

Configure device

torch.cuda.is_available()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

3.1.2. Load data set

3.1.2.1. Load the data set to view the characteristics of the data set

train = torchvision.datasets.SVHN(root='SVHN',split='train',download=True)
test = torchvision.datasets.SVHN(root='SVHN',split='test',download=True)

View information about the dataset
Insert image description here

3.1.2.2. Load the data set into tensor format

Insert image description here
Check the size of the image and the number of channels in the data set

. Write a program to visualize the image.

#让每个数据集随机显示五张图象
import matplotlib.pyplot as plt
import numpy as np
import random
def plotsample(data):   #只能接受tensor格式
    fig,axs = plt.subplots(1,5,figsize=(10,10))  #建立子图
    for i in range(5):
        num = random.randint(0,len(data)-1)
        nping = torchvision.utils.make_grid(data[num][0]).numpy()
        nplabel = data[num][1]  #提取标签
        axs[i].imshow(np.transpose(nping,(1,2,0)))
        axs[i].set_title(nplabel)
        axs[i].axis("off")  #消除每个子图的坐标轴

Show results
Insert image description here

3.1.2.3 Data enhancement operations

trainT = T.Compose([T.RandomCrop(28),T.RandomRotation(degrees=[-30,30]),T.ToTensor(),T.Normalize(mean=[0.485,0.456,0.106],std=[0.229,0.224,0.225])])
testT = T.Compose([T.RandomCrop(28),T.ToTensor(),T.Normalize(mean=[0.485,0.456,0.106],std=[0.229,0.224,0.225])])
train = torchvision.datasets.SVHN(root='SVHN',split='train',download=True,transform=trainT)
test = torchvision.datasets.SVHN(root='SVHN',split='test',download=True,transform=testT)

Visualize augmented data
Insert image description here

3.2. Build a network

Load classic network

torch.manual_seed(1412)
resnet18_ =M.resnet18()
vgg16_ =M.vgg16()

Customize MyResNet network

class MyResNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.block1 = nn.Sequential(nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1,bias=False)
                                    ,resnet18_.bn1,resnet18_.relu)
        self.block2 = resnet18_.layer2
        self.block3 = resnet18_.layer3
        self.avgpool = resnet18_.avgpool
        self.fc = nn.Linear(in_features=256,out_features=10,bias=True)
    
    def forward(self,x):
        x = self.block1(x)
        x = self.block3(self.block2(x))
        x = self.avgpool(x)
        x = x.view(x.shape[0],256)
        x = self.fc(x)
        return x

Customize MyVgg network

class MyVgg(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(*vgg16_.features[0:9]  #星号用于解码
                                    ,nn.Conv2d(128,128,kernel_size=3,stride=1,padding=1)
                                    ,nn.ReLU(inplace=True)
                                    ,nn.MaxPool2d(2,2,padding=0,dilation=1,ceil_mode=False))
        self.avgpool = vgg16_.avgpool
        self.fc = nn.Sequential(nn.Linear(7*7*128,out_features=4096,bias=True),
                                *vgg16_.classifier[1:6],nn.Linear(in_features=4096,out_features=10,bias=True))
    
    def forward(self,x):
        x = self.features(x)
        x = self.avgpool(x)
        x = x.view(x.shape[0],7*7*128)
        x = self.fc(x)
        return x

Network verification

from torchinfo import summary
summary(MyResNet(),(10,3,28,28),depth=3)

#打印输出
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
MyResNet                                 [10, 10]                  --
├─Sequential: 1-1                        [10, 64, 28, 28]          --
│    └─Conv2d: 2-1                       [10, 64, 28, 28]          1,728
│    └─BatchNorm2d: 2-2                  [10, 64, 28, 28]          128
│    └─ReLU: 2-3                         [10, 64, 28, 28]          --
├─Sequential: 1-2                        [10, 128, 14, 14]         --
│    └─BasicBlock: 2-4                   [10, 128, 14, 14]         --
│    │    └─Conv2d: 3-1                  [10, 128, 14, 14]         73,728
│    │    └─BatchNorm2d: 3-2             [10, 128, 14, 14]         256
│    │    └─ReLU: 3-3                    [10, 128, 14, 14]         --
│    │    └─Conv2d: 3-4                  [10, 128, 14, 14]         147,456
│    │    └─BatchNorm2d: 3-5             [10, 128, 14, 14]         256
│    │    └─Sequential: 3-6              [10, 128, 14, 14]         8,448
│    │    └─ReLU: 3-7                    [10, 128, 14, 14]         --
│    └─BasicBlock: 2-5                   [10, 128, 14, 14]         --
│    │    └─Conv2d: 3-8                  [10, 128, 14, 14]         147,456
│    │    └─BatchNorm2d: 3-9             [10, 128, 14, 14]         256
│    │    └─ReLU: 3-10                   [10, 128, 14, 14]         --
│    │    └─Conv2d: 3-11                 [10, 128, 14, 14]         147,456
│    │    └─BatchNorm2d: 3-12            [10, 128, 14, 14]         256
│    │    └─ReLU: 3-13                   [10, 128, 14, 14]         --
├─Sequential: 1-3                        [10, 256, 7, 7]           --
│    └─BasicBlock: 2-6                   [10, 256, 7, 7]           --
│    │    └─Conv2d: 3-14                 [10, 256, 7, 7]           294,912
│    │    └─BatchNorm2d: 3-15            [10, 256, 7, 7]           512
│    │    └─ReLU: 3-16                   [10, 256, 7, 7]           --
│    │    └─Conv2d: 3-17                 [10, 256, 7, 7]           589,824
│    │    └─BatchNorm2d: 3-18            [10, 256, 7, 7]           512
│    │    └─Sequential: 3-19             [10, 256, 7, 7]           33,280
│    │    └─ReLU: 3-20                   [10, 256, 7, 7]           --
│    └─BasicBlock: 2-7                   [10, 256, 7, 7]           --
│    │    └─Conv2d: 3-21                 [10, 256, 7, 7]           589,824
│    │    └─BatchNorm2d: 3-22            [10, 256, 7, 7]           512
│    │    └─ReLU: 3-23                   [10, 256, 7, 7]           --
│    │    └─Conv2d: 3-24                 [10, 256, 7, 7]           589,824
│    │    └─BatchNorm2d: 3-25            [10, 256, 7, 7]           512
│    │    └─ReLU: 3-26                   [10, 256, 7, 7]           --
├─AdaptiveAvgPool2d: 1-4                 [10, 256, 1, 1]           --
├─Linear: 1-5                            [10, 10]                  2,570
==========================================================================================
Total params: 2,629,706
Trainable params: 2,629,706
Non-trainable params: 0
Total mult-adds (G): 2.07
==========================================================================================
Input size (MB): 0.09
Forward/backward pass size (MB): 38.13
Params size (MB): 10.52
Estimated Total Size (MB): 48.75
==========================================================================================

summary(MyVgg(),(10,3,28,28),depth=4)
#打印输出为:
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
MyVgg                                    [10, 10]                  --
├─Sequential: 1-1                        [10, 128, 7, 7]           --
│    └─Conv2d: 2-1                       [10, 64, 28, 28]          1,792
│    └─ReLU: 2-2                         [10, 64, 28, 28]          --
│    └─Conv2d: 2-3                       [10, 64, 28, 28]          36,928
│    └─ReLU: 2-4                         [10, 64, 28, 28]          --
│    └─MaxPool2d: 2-5                    [10, 64, 14, 14]          --
│    └─Conv2d: 2-6                       [10, 128, 14, 14]         73,856
│    └─ReLU: 2-7                         [10, 128, 14, 14]         --
│    └─Conv2d: 2-8                       [10, 128, 14, 14]         147,584
│    └─ReLU: 2-9                         [10, 128, 14, 14]         --
│    └─Conv2d: 2-10                      [10, 128, 14, 14]         147,584
│    └─ReLU: 2-11                        [10, 128, 14, 14]         --
│    └─MaxPool2d: 2-12                   [10, 128, 7, 7]           --
├─AdaptiveAvgPool2d: 1-2                 [10, 128, 7, 7]           --
├─Sequential: 1-3                        [10, 10]                  --
│    └─Linear: 2-13                      [10, 4096]                25,694,208
│    └─ReLU: 2-14                        [10, 4096]                --
│    └─Dropout: 2-15                     [10, 4096]                --
│    └─Linear: 2-16                      [10, 4096]                16,781,312
│    └─ReLU: 2-17                        [10, 4096]                --
│    └─Dropout: 2-18                     [10, 4096]                --
│    └─Linear: 2-19                      [10, 10]                  40,970
==========================================================================================
Total params: 42,924,234
Trainable params: 42,924,234
Non-trainable params: 0
Total mult-adds (G): 1.45
==========================================================================================
Input size (MB): 0.09
Forward/backward pass size (MB): 14.71
Params size (MB): 171.70
Estimated Total Size (MB): 186.50
==========================================================================================

3.3. Define training function

def fit_test(net,batchdata,testdata,criterion,opt,epochs,tol,modelname,PATH):
    """
    对模型进行训练，并在每个epoch后输出训练集和测试集上的准备率/损失
    """
    SamplePerEpoch = batchdata.dataset.__len__()
    allsamples = SamplePerEpoch * epochs
    trainedsample = 0
    trainlosslist = []
    testlosslist = []
    early_stopping = EarlyStopping(tol=tol)
    highestacc = None
    for epoch in range(1,epochs+1):
        net.train()
        correct_train = 0
        loss_train = 0
        for batch_idx,(x,y) in enumerate(batchdata):
            x = x.to(device,non_blocking=True)
            y = y.to(device,non_blocking=True).view(x.shape[0])
            sigma = net.forward(x)
            loss = criterion(sigma,y)
            loss.backward()
            opt.step()
            opt.zero_grad()
            yhat = torch.max(sigma,1)[1]  #真正的预测标签
            correct = torch.sum(yhat==y) #实际预测正确的样本数量
            trainedsample += x.shape[0]
            loss_train += loss
            correct_train += correct
            if (batch_idx+1) % 125 == 0:
                print("Epoch{}:[{}/{}({:.0f})%)]".format(epoch,trainedsample,allsamples,100*trainedsample/allsamples))                             
        TrainAccThisEpoch = float(correct_train*100)/SamplePerEpoch
        TrainLossThisEpoch = float(loss_train*100)/SamplePerEpoch
        trainlosslist.append(TrainLossThisEpoch)
        #清理GPU内存    清理掉不需要的中间变量
        del x,y,correct  
        gc.collect()  #清除数据与变量相关的缓存
        torch.cuda.empty_cache()
        
        #测试一次
        net.eval() 
        loss_test = 0
        correct_test = 0
        TestSample = testdata.dataset.__len__()
        for x,y in testdata:
            with torch.no_grad():
                x = x.to(device,non_blocking=True)
                y = y.to(device,non_blocking=True).view(x.shape[0])
                sigma = net.forward(x)
                loss = criterion(sigma,y)
                yhat = torch.max(sigma,1)[1]
                correct = torch.sum(yhat==y)
                loss_test += loss
                correct_test += correct
        TestAccThisEpoch = float(correct_test*100)/TestSample
        TestLossThisEpoch = float(loss_test*100)/TestSample
        testlosslist.append(TestLossThisEpoch)
        print("\t Train loss:{:.6f},Test loss:{:.6f},Train acc:{:.3f}%,test acc:{:.3f}%".format(TrainLossThisEpoch
                                                                                               ,TestLossThisEpoch
                                                                                               ,TrainAccThisEpoch
                                                                                               ,TestAccThisEpoch))
        del x,y,correct  
        gc.collect()  #清除数据与变量相关的缓存
        torch.cuda.empty_cache()
        
        if highestacc == None:
            highestacc  = TestAccThisEpoch
        if highestacc < TestAccThisEpoch:
            highestacc  = TestAccThisEpoch
            torch.save(net.state_dict(),os.path.join(PATH,modelname+".pt"))
            print("\t Weight Saved")
        
        #提前停止
        early_stop = early_stopping(TestLossThisEpoch)
        if early_stop == "True":
            break
    print("Done")
    return trainlosslist,testlosslist

Define painting function

def plotloss(trainloss, testloss):
    plt.figure(figsize=(10,7))
    plt.plot(trainloss,color="red",label="Trainloss")
    plt.plot(testloss,color="orange",label="Testloss")
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.show()

3.4. Define the overall process function

def full_procedure(net,epochs,bs,modelname,PATH,lr = 0.001,alpha = 0.99,gamma = 0,wd = 0,tol=10**(-5)):
    torch.manual_seed(1412)
    torch.cuda.manual_seed(1412)
    torch.cuda.manual_seed_all(1412)
    batchdata = DataLoader(train,batch_size=bs,shuffle=True,drop_last=False,pin_memory=True)
    testdata = DataLoader(test,batch_size=bs,shuffle=False,drop_last=False,pin_memory=True)
    criterion = nn.CrossEntropyLoss(reduction="sum")
    opt = optim.RMSprop(net.parameters(),lr=lr,alpha=alpha,momentum=gamma,weight_decay=wd)
    
    trainloss,testloss = fit_test(net,batchdata,testdata,criterion,opt,epochs,tol,modelname,PATH)
    return trainloss,testloss

3.5. Start iterative training

PATH = "/kaggle/working/SVHN"
avgtime = []
for i in range(1):
    torch.manual_seed(1412)
    torch.cuda.manual_seed(1412)
    torch.cuda.manual_seed_all(1412)
    resnet18_ = M.resnet18()
    net = MyResNet().to(device,non_blocking=True)
    start = time()
    trainloss,testloss = full_procedure(net,epochs=20,bs=128,modelname="model_seletion_resnet",PATH=PATH)
    print(time()-start)
    plotloss(trainloss,testloss)

3.6. Give training results and draw graphics

Insert image description here