Pytorch official tutorial reading notes (1): image classifier

Official tutorial link

1. CIFAR10 data set

1.1 Data set description

The cifar10 data set contains 60000 sheets of size 32 × 32 32\times3232×3 2 RGB images, the images in the dataset are divided into 10 categories (see the figure below), each category has 6000 images, and the 60000 images in the cifar10 dataset are divided into 50000 training set images and 10000 images Picture of the test set.

data set

1.2 Get the data set

In pytorch, the torchvision package can be used to load and normalize the training set and test set of cifar10. The official source code (encapsulated and modified) is as follows:

import torchvision
import torchvision.transforms as transforms


def loadDataset(batch_size=4):
    """
    功能:下载并返回训练集(50000张图片)和测试集(10000张图片)
    batch_size:批大小,默认为官方教程中的4
    trainloader,testloader:可迭代对象
    """
    transform = transforms.Compose(
        [transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                            download=True, transform=transform)
    #PyTorch已有的数据读取接口的输入按照batch size封装成Tensor
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                            shuffle=True, num_workers=2)

    testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                        download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                            shuffle=False, num_workers=2)
    """
   	print(len(trainloader) * 4,len(testloader) * 4)
   	#50000 10000
    testd = iter(testloader)
    imgs,labs = testd.next()
    print(type(imgs),type(labs))
    #<class 'torch.Tensor'> <class 'torch.Tensor'>
    print(imgs.shape,labs.shape)
    #image的通道属性为NCHW,即batch,channel,height,width
    #torch.Size([4, 3, 32, 32]) torch.Size([4])
    """
    return trainloader,testloader

It should be noted that if you run the above code directly to download the data set, the download speed may be slow. The solution: how to import the local data set (take cifar10 as an example)-detailed tutorial, pytorch, CIFAR10 (data set can be downloaded directly through the link , And then follow the next steps in the solution). After the above code runs successfully, you can see the cifar-10-batches-py folder locally.

2. CNN model

2.1 LeNet-5 model

The LeNet-5 model is the model used for classification in the official pytorch tutorial. The model diagram is as follows:

lenet-5

The corresponding source code is interpreted as follows:

import torch
import torch.nn as nn
import torch.nn.functional as F

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        #输入图片通道数 3 输出通道 6 卷积核 5*5
        self.conv1 = nn.Conv2d(3, 6, 5)
        #池化核 2*2 步长 2
        self.pool = nn.MaxPool2d(2, 2)
        #输入通道 6 输出通道 16 卷积核 5*5
        self.conv2 = nn.Conv2d(6, 16, 5)
      	#全连接层
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        #conv1:输入 32*32*3 输出 28*28*6
        #maxpool1:输入 28*28*6 输出 14*14*6
        x = self.pool(F.relu(self.conv1(x)))
        #conv2:输入 14*14*6 输出 10*10*16
        #maxpool2:输入 10*10*16 输出 5*5*16
        x = self.pool(F.relu(self.conv2(x)))
        #reshape (1,16*5*5)
        x = x.view(-1, 16 * 5 * 5)
        #full connected layer1 16*5*5->120
        x = F.relu(self.fc1(x))
        #full connected layer2 120->84
        x = F.relu(self.fc2(x))
        #full connected layer3 84->10
        x = self.fc3(x)
        return x


def loadLeNet5(gpu=True):
    """
    功能:加载网络模型
    gpu:设定是否开启gpu加速
    """
    LeNet5 = LeNet5()
    if gpu:
        LeNet5.cuda(device)
    return LeNet5

if __name__ == "__main__":
    LeNet5 = LeNet5()

2.2 Alexnet model

I also implemented another classic CNN model-Alexnet, theoretical reference: AlexNet neural network structure , but I removed the rather controversial LRN layer. In addition, because the image size of the cafar10 data set is small, errors will occur during calculations, so when testing the model, the image is resized to 96 × 96 96\times9696×9 6 , so you need to modify the code as follows:

transform = transforms.Compose(
    [transforms.Resize(96),
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

The Alexnet model implementation code is as follows:

import torch
import torch.nn as nn
import torch.nn.functional as F

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

class Alexnet(nn.Module):
    def __init__(self):
        super(Alexnet,self).__init__()
        #输入通道 3 输出通道 96 卷积核 11*11 步长 4
        self.conv1 = nn.Conv2d(3,96,11,stride=4)
        #池化核 3*3 步长 2
        self.pool1 = nn.MaxPool2d(3,2)
        self.relu1 = nn.ReLU()
        
        #输入通道 3 输出通道 256 卷积核 5*5 步长 1 same padding
        self.conv2 = nn.Conv2d(96,256,5,stride=1,padding=2)
        self.relu2 = nn.ReLU()
        #池化核 3*3 步长 2
        self.pool2 = nn.MaxPool2d(3,2)

        #输入通道 256 输出通道 384 卷积核 3*3 步长 1 same padding
        self.conv3 = nn.Conv2d(256,384,3,stride=1,padding=1)
        self.relu3 = nn.ReLU()

        #输入通道384 输出通道 384 卷积核 3*3 步长 1 same padding
        self.conv4 = nn.Conv2d(384,384,3,stride=1,padding=1)
        self.relu4 = nn.ReLU()
		
        #输入通道 384 输出通道 256 卷积核 3*3 步长 1 same padding
        self.conv5 = nn.Conv2d(384,256,3,stride=1,padding=1)
        self.relu5 = nn.ReLU()
        #池化核 3*3 步长 2
        self.pool5 = nn.MaxPool2d(3,2)

        self.fc6 = nn.Linear(256*1*1,4096)
        self.relu6 = nn.ReLU()
        self.dropout6 = nn.Dropout(0.5)

        self.fc7 = nn.Linear(4096,4096)
        self.relu7 = nn.ReLU()
        self.dropout7 = nn.Dropout(0.5)

        self.fc8 = nn.Linear(4096,10)
        

    
    def forward(self,x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.pool1(x)
       
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.pool2(x)
       
        x = self.conv3(x)
        x = self.relu3(x)
        
        x = self.conv4(x)
        x = self.relu4(x)
        
        x = self.conv5(x)
        x = self.relu5(x)
        x = self.pool5(x)
        #print(x.shape)
        
        x = x.view(-1,256*1*1)
        x = self.fc6(x)
        x = self.relu6(x)
        x = self.dropout6(x)

        x = self.fc7(x)
        x = self.relu7(x)
        x = self.dropout7(x)

        x = self.fc8(x)

        return x

def loadAlexnet(gpu=True):
    """
    功能:加载网络模型
    gpu:设定是否开启gpu加速
    """
    alexnet = Alexnet()
    if gpu:
        alexnet.cuda(device)
    return alexnet

if __name__ == "__main__":
    x = torch.ones(1,3,96,96,dtype=torch.float32)
    net = Alexnet()
    net.forward(x)

3. Training and testing

In the training and testing of the model, I turned on GPU acceleration from the beginning. In addition, I did not follow the official tutorial for setting the super parameters. The following table shows some important parameters of the official tutorial and the parameters I actually use:

parameter Official tutorial actual use
Loss function Cross-Entropy Cross-Entropy
Optimizer SGD Adam
batch_size 4 256
iterations 2 50
learning rate 0.001 0.001,0.0001
#定义损失函数(交叉熵)
criterion = nn.CrossEntropyLoss().cuda()

#定义优化器
#optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
optimizer = optim.Adam(net.parameters(),lr=lr)

In the training process, the number of iterations of the data set is modified to 50, and batch_size is modified to 256, that is, the parameter bach_size of the loadDataset function is modified to 256, and the corresponding source code is as follows (encapsulated into a function):

def train(trainloader,net,lr=0.001,iterations=50):
    """
    功能:训练模型
    trainloader:训练数据集(Iterable)
    net:带训练的模型
    """
    #定义损失函数(交叉熵)
    criterion = nn.CrossEntropyLoss().cuda()
    #定义优化器
    #optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
    optimizer = optim.Adam(net.parameters(),lr=lr)
    #开始训练
    for epoch in range(iterations):  #迭代数据集iterations次
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            #获取输入数据
            inputs, labels = data
            #张量转换为GPU张量
            inputs, labels = inputs.cuda(device), labels.cuda(device)
            #梯度清零
            optimizer.zero_grad()
            #前向传播
            outputs = net(inputs)
            #计算损失
            loss = criterion(outputs, labels)
            #后向传播
            loss.backward()
            #更新网络参数
            optimizer.step()
            #对50次的损失求平均值
            running_loss += loss.item()
            if (i+1) % 50 == 0:
                print('[%d, %5d] loss: %.3f' %
                    (epoch + 1, i + 1, running_loss / 50))
                running_loss = 0.0

    print('Finished Training')
    return net

The test code is as follows (encapsulated into a function):

def test(testloader,net):
    """
    功能:验证模型
    testloader:测试数据集
    net:训练后的模型
    """
    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            #tensor转到GPU上
            images, labels = images.cuda(device), labels.cuda(device)
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

After training and testing, the test results of the last two models are compared as follows (due to time reasons, no parameter adjustment operation was performed):

Test set accuracy LeNet-5 model Alexnet model
learning rate = 0.001 63.4% 68.01%
learning rate = 0.0001 56.09% 76.09%

Guess you like

Origin blog.csdn.net/qq_42103091/article/details/109238607