PyTorch learning image classifier

PyTorch learning image classifier

Learning Website

http://pytorch123.com/SecondSection/neural_networks/

A training image classifier

By the previous section, we already know how to define a neural network, as well as calculating its loss function, and the right to update the network weights
now, we will learn how to process the data.
Generally, when processing an image, text, audio, video data, using standard python package to download the data,
and converted into numpy array format. Then, convert these into arrays "torch.Tensor" Format

  • For the image, a Pillow, OpenCV package
  • For audio, you can use scipy, librosa package
  • For text, you may be used Cyphon loaded directly or Python, or using NLTK and SpaCy

For vision, PytorCh create a "torchvision" package, which contains a number of common data sets, e.g.
Imagenet, CIFAR10, MNIST the like, as well as some image conversion module: torchvision.datasets, torch.utils.data.DataLoader
below uses CIFAR10 examples of the data set, the image classification:

CIFAR10: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’,‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’.
尺寸:3*32*32

Image classification generally divided into the following five steps

  1. Use torchvision loaded and normalized training and test data sets of CIFAR10
  2. The definition of a convolution neural network
  3. Define a loss function
  4. Network training on the training data
  5. Testing on the test sample data network

1. Download and normalized data set CIFAR10

import torch
import torchvision
import torchvision.transforms as transforms

########################################################################
# torchvision加载的数据都是PILImage的数据类型,在[0, 1]之间
# 对上述类型的数据集进行归一化为[-1, 1]范围的tensors
# 归一化方法: (X-mean)/std
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # mean, std
# 检验是否已经存在,若不存在,则下载数据集
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
# 数据加载器,结合了数据集和取样器,并且可以提供多个线程处理数据集。
# 在训练模型时使用到此函数,用来把训练数据分成多个小组,此函数每次抛出一组数据。
# 直至把所有的数据都抛出。就是做一个数据的初始化。
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=0)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=0)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Some of the image display data

########################################################################
# 显示数据集中的一些图像

import matplotlib.pyplot as plt
import numpy as np

def imshow(img):
    img = img / 2 + 0.5     # unnormalize, 因为前面是将图像进行了归一化,即 x = (X-0.5)/0.5
    npimg = img.numpy()
    image = np.transpose(npimg, (1, 2, 0))
    plt.imshow(image)    # 1 是和第二个轴交换,2,是和第2个轴交换,0是和第一个轴交换image[Height, Width, Dim]
    plt.show()

# get some random training images
dataiter = iter(trainloader)     # 使得 trainloader 变成迭代器
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))  # 将若干图像拼成一幅图像
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

2. Define a convolution neural network

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)   # 输出 6*28*28
        self.pool = nn.MaxPool2d(2, 2)    # 6*14*14
        self.conv2 = nn.Conv2d(6, 16, 5)  # 16*10*10
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # conv2经过 pooling 后,变成 5*5 map, 所以 16*5*5个全连接神经元
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))   # 卷积 -> Relu -> Pool
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)             # view函数将张量x变形成一维的向量形式,作为全连接的输入
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

3. Define the loss function and optimizer

import torch.optim as optim

criterion = nn.CrossEntropyLoss()  # 损失函数
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)  # 优化器 SGD with momentum

4. Training Network

for epoch in range(2):  # 训练集训练次数

    running_loss = 0.0
    # enumerate()用于可迭代\可遍历的数据对象组合为一个索引序列,
    # 同时列出数据和数据下标.上面代码的0表示从索引从0开始,
    for i, data in enumerate(trainloader, 0):
        # 获得输入
        inputs, labels = data
        # 初始化参数梯度
        optimizer.zero_grad()
        # 前馈 + 后馈 + 优化
        outputs = net(inputs)
        loss = criterion(outputs, labels)    # labels 会进行二值化,即[1 0 0 0 0 0 0 0 0]
        loss.backward()    # 梯度反向传播
        optimizer.step()   # 更新参数空间

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

The test data set in the network structure

As already conducted two full training cycle in the training set, but we need to check whether the network really learned some of what things. The test method is the result of the data set of ground-truth network output compare.

  1. First, show some images
dataiter = iter(testloader)
images, labels = dataiter.next()
# 输出图像
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
print : 
GroundTruth:    cat  ship  ship plane
  1. View network output
outputs = net(images)
outputs
tensor([[-1.3145, -2.4341, -0.7362,  6.8300,  0.5993,  2.2841, -0.9894, -0.9424,
          1.3211, -3.0649],
        [ 4.2055,  8.5567, -2.8397, -2.3198, -3.1733, -4.6069, -8.4125, -2.9534,
         10.5395,  5.7375],
        [ 1.3612,  1.1350,  0.3872, -0.3729, -0.1908, -1.1665, -3.7862, -0.3712,
          3.3340, -0.1305],

outputs each category is predicted energy value of 10 kinds, i.e., the higher the value of a certain type of energy, which is considered to be the larger the probability of the category. Therefore, we need the kind of outputs corresponding to the class is the maximum energy.

_, predicted = torch.max(outputs, 1)   # predicted 对应的种类
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))
Predicted:    cat  ship  ship  ship
tensor([3, 8, 8, 8])

From the above output to see the test results seem good.
Below, we look at the training of network performance across the entire data set.

correct = 0   #预测正确的数据
total = 0     #总共的数据
with torch.no_grad():     # 因为是进行测试,所以不需要进行梯度传播
    for data in testloader:
        images, labels = data
        outputs = net(images)   #输出结果
        _, predicted = torch.max(outputs.data, 1) #选择数值最大的一类作为其预测结果
        total += labels.size(0)
        correct += (predicted == labels).sum().item()   # 预测值与标签相同则预测正确

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Accuracy of the network on the 10000 test images: 55 %

Because the predicted probability of 10% (10 class prediction of a class), so 55% looks a lot better than random prediction, it seems to have learned a few things.
Now, let's take a closer look, to predict what kind of a bit better, which is not good performance.

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()   # 将shape中为1的维度去掉
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()   # 正确预测累计
            class_total[label] += 1               # 每一类的总数

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))   # 每一类的准确率
Accuracy of plane : 52 %
Accuracy of   car : 70 %
Accuracy of  bird : 44 %
Accuracy of   cat : 28 %
Accuracy of  deer : 54 %
Accuracy of   dog : 41 %
Accuracy of  frog : 66 %
Accuracy of horse : 60 %
Accuracy of  ship : 65 %
Accuracy of truck : 68 %

6. Training data on the GPU

Training the neural network on the GPU, as the transfer of a Tensor to as the GPU.
First, let's define our device as the first visible cuda device.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# 如果程序运行在CUDA机器上,下面会输出一个device的id
print(device)
cuda:0

After the set, these methods will recursively through all the modules and their parameters and buffers converted tensors cuda of
the following statements is essential:

net.to(device)

At the same time, each step must be sent to the inputs and targets a gpu.

 inputs, labels = inputs.to(device), labels.to(device)

When the network is very small, without feeling the change rate, can be changed to an output of the convolutional 128, 128 to the input of a convolution 2, the observed effect. Later changed to 128, after training accuracy rate of 2 times

Accuracy of the network on the 10000 test images: 60 %

The results seem more features than FIG. 6 wherein FIG better, the following is the output of each class

Accuracy of plane : 73 %
Accuracy of   car : 82 %
Accuracy of  bird : 27 %
Accuracy of   cat : 32 %
Accuracy of  deer : 53 %
Accuracy of   dog : 55 %
Accuracy of  frog : 75 %
Accuracy of horse : 76 %
Accuracy of  ship : 70 %
Accuracy of truck : 54 %

Note:
there will be change after being given the GPU run as follows:

TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Solution:

将 npimg = img.numpy() 改为
npimg = img.cpu().numpy()
Published 38 original articles · won praise 29 · views 50000 +

Guess you like

Origin blog.csdn.net/ruotianxia/article/details/103547947
Recommended