train a classifier

Here, you have seen how to define a neural network, calculate the loss and update the weights of the network. now you might be thinking

What is data?

In general, when you are dealing with image, text, audio or video data, you can use standard python packages to load the data into numpy arrays. Then you can convert this array to torch.*Tensor.

For images, you can use Pillow, OpenCV
For audio, you can use scipy and librosa
For text, use either raw Python or Cython or NLTK and SpaCy

Especially for vision, we created a package called torchvision, which has data loaders for public datasets, such as ImageNet, CIRFAR10, MNIST, etc. Image to data. torchvision.datasets and torch.utils.data.DataLoader.

This provides great convenience and avoids writing boilerplate code.

For this tutorial, we use the CIFAR10 dataset. He has categories: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'. The image of CIFAR10 is 3 * 32 * 32

Please add a picture description

train an image classifier

We will do the following steps:

Load and normalize the CIFAR10 training and test datasets using torchvision
Define a Convolutional Neural Network
define a loss function
Train the network on the training set
Test the network on the test set

Load and normalize CIFAR10

Using torchvision, loading CIFAR10 is very simple

import torch
import torchvision
import torchvision.transforms as transforms

The output dataset of torchvision is a PILImage in the [0,1] range. We convert them to the standard range [-1,1] for Tensors.

**Note:** If there is a BrokenPipeError on the windows platform, set num_worker in torch.utils.data.DataLoader() to 0

import matplotlib.pyplot as plt
import numpy as np

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
)

batch_size = 4
trainset = torchvision.datasets.CIFAR10(root='./cifar10',
                                        train=True,
                                        download=True,
                                        transform=transform)
trainloader = Data.DataLoader(trainset,
                              batch_size=batch_size,
                              shuffle=True,
                              num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./cifar10', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

out:

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./cifar10\cifar-10-python.tar.gz
170500096it [02:27, 1156941.89it/s]

Show some images of the training

Please add a picture description

out:

Files already downloaded and verified
dog ship plane ship

Define a Convolutional Neural Network

Copy the neural network from the neural network chapter and modify it for a 3-channel image.

import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 单通道图像输入， 输出6通道， 5x5 卷积核
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # 一个仿射变换操作(Affine) : y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 来自图像的维度
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    def forward(self, x):
        # Max池化，窗口为(2,2)
        x = self.pool(F.relu(self.conv1(x)))
        # 如果尺寸是正方形， 你可以用一个数字来指定
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # 批处理除外，所有数据降维展平，意思就是二维图像转成一行数组。
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    net = Net()

Define loss function and optimizer

We use cross-entropy error and SGD with momentum

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

training network

This is where it gets interesting, we loop through the data iterator, feed the data into the network and optimize.

for epoch in range(2):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # 获取输入；数据是一个list类型的[inputs, labels]
        inputs, labels = data

        # 梯度参数设为0
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # 打印统计数据
        running_loss += loss.item()
        if i % 2000 == 1999:
            print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000))
print('Finished Training')

out:

[1, 2000] loss: 2.211
[1, 4000] loss: 1.825
[1, 6000] loss: 1.648
[1, 8000] loss: 1.562
[1, 10000] loss: 1.504
[1, 12000] loss: 1.448
[2, 2000] loss: 1.397
[2, 4000] loss: 1.353
[2, 6000] loss: 1.341
[2, 8000] loss: 1.313
[2, 10000] loss: 1.270
[2, 12000] loss: 1.280
Finished Training

Let's quickly save our trained model

PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

Test the network on the test set

We trained the network 2 times on the training dataset. But we need to check that the network has learned everything. We will verify this by predicting the class labels output by the neural network and validating it against the ground-truth. If the prediction is correct, we add that sample to the list of correct predictions.

The first step is to show a test set image

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

Please add a picture description

out:

GroundTruth: cat ship ship plane

Next, we load the previously saved model (note: there is no need to save and reload the model here, we do it just to illustrate how to do it)

net = Net()
net.load_state_dict(torch.load(PATH))

Now let's see how the neural network sees the examples above:

outputs = net(images)

The output is the energy of 10 classes. The higher the energy of a class, the more the network thinks the image belongs to this class. So, let's get the index of highest energy:

_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))

out:

Predicted:   frog  ship  ship  ship

See how the network performs on the entire dataset

correct = 0
total = 0
# 由于我们没有训练，我们不需要为我们的输出计算梯度
with torch.no_grad():
    for data in testloader:
        images, labels = data        
        # 图像通过网络计算输出        
        outputs = net(images)        
        # 我们选择能量最高的类型作为预测        _, 
        _, predicted = torch.max(outputs.data, 1)        
        total += labels.size(0)        
        correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

out:

Accuracy of the network on the 10000 test images: 54 %

This looks much better than probabilistic (10% accuracy) (randomly pick a type out of 10). It appears the network has learned something.

Hmmm, which types do well and which types do not

# 对每个类型准备计数
correct_pred = {
    
    classname: 0 for classname in classes}
total_pred = {
    
    classname: 0 for classname in classes}
# 不需要梯度
with torch.no_grad():
    for data in testloader:        
    images, labels = data        
    outputs = net(images)        
    _, predictions = torch.max(outputs, 1)       
    # 对每一个类型预测正确解收集        
    for label, prediction in zip(labels, predictions):            
    	if label == prediction:                
    	correct_pred[classes[label]] += 1            
    	total_pred[classes[label]] += 1
# 打印每个类型的准确率
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]    
    print("Accuracy for class {:5s} is: {:.1f} %".format(classname,                                                   accuracy))

out:

Accuracy for class plane is: 60.1 %
Accuracy for class car is: 69.8 %
Accuracy for class bird is: 45.2 %
Accuracy for class cat is: 26.4 %
Accuracy for class deer is: 30.5 %
Accuracy for class dog is: 60.4 %
Accuracy for class frog is: 70.7 %
Accuracy for class horse is: 69.5 %
Accuracy for class ship is: 53.3 %
Accuracy for class truck is: 61.0 %

train on GPU

Just like you put tensors on the GPU, put the neural network on the GPU.

First define our device as cuda first visible, if cuda is available:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")# 假设我们在CUDA机器，这里应该打印CUDA设备print(device)

out:

cuda:0

These methods will then recursively go through all modules and convert their arguments and buffers to CUDA tensors:

net.to(device)

Remember that you also send the input and target to the GPU every step

inputs, labels = data[0].to(device), data[1].to(device)

Why is there no emphasis on the acceleration of MASSIVE compared to CPU? Because your network is too small.

Pytorch Getting Started Notes 04

train a classifier

What is data?

train an image classifier

Load and normalize CIFAR10

Define a Convolutional Neural Network

Define loss function and optimizer

training network

Test the network on the test set

train on GPU

おすすめ