Getting started with PyTorch: 04.PyTorch Image Classifier

You have learned how to define a neural network, calculate the loss value and update the weights in the network.

Now you may be thinking how to deal with data?

Generally speaking, when you are processing image, text, voice or video data, you can use standard python package to load the data into numpy array format, and then convert this array into torch.*Tensor

  • For images, you can use Pillow, OpenCV
  • For voice, you can use scipy, librosa
  • For text, you can directly load the module with Python or Cython basic data, or use NLTK and SpaCy

Especially for vision, we have created a package called totchvision, which contains the data loading module torchvision.datasets that supports the loading of public data sets like Imagenet, CIFAR10, MNIST, and the torch.utils.data data conversion module that supports the loading of image data. .DataLoader.

This provides great convenience and avoids writing "boilerplate code".

For this tutorial, we will use the CIFAR10 dataset, which contains ten categories:'airplane','automobile','bird','cat','deer','dog','frog','horse', ' ship','truck'. The image size in CIFAR-10 is 33232, which is the three-layer color channel of RGB, and the size of each channel is 32*32.

Train an image classifier

We will do the following steps in order:

  1. Use torchvision to load and normalize the training and test datasets of CIFAR10
  2. Define a convolutional neural network
  3. Define a loss function
  4. Train the network on training sample data
  5. Test the network on test sample data

Load and normalize CIFAR10 using torchvision, it is very simple to load CIFAR10 data.

import torch
import torchvision
import torchvision.transforms as transforms

The output of the torchvision dataset is PILImage in the range [0,1], we convert them into tensors with a normalized range of [-1,1].

transform = transforms.Compose(
[transforms.ToTensor(),
 transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
 download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
 shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
 download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
 shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Output:

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
Files already downloaded and verified

Let's show some of the training pictures.

import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
 img = img / 2 + 0.5 # unnormalize
 npimg = img.numpy()
 plt.imshow(np.transpose(npimg, (1, 2, 0)))
 plt.show()
# get some random training images
dataiter = iter (trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

Output:

cat plane ship frog

Define a convolutional neural network. Before this, copy the neural network from the neural network chapter and modify it to a 3-channel picture (before it was defined as 1 channel)

import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
 self.conv1 = nn.Conv2d(3, 6, 5)
 self.pool = nn.MaxPool2d(2, 2)
 self.conv2 = nn.Conv2d(6, 16, 5)
 self.fc1 = nn.Linear(16 * 5 * 5, 120)
 self.fc2 = nn.Linear(120, 84)
 self.fc3 = nn.Linear(84, 10)
def forward(self, x):
 x = self.pool(F.relu(self.conv1(x)))
 x = self.pool(F.relu(self.conv2(x)))
 x = x.view(-1, 16 * 5 * 5)
 x = F.relu(self.fc1(x))
 x = F.relu(self.fc2(x))
 x = self.fc3(x)
return x
net = Net()

Define a loss function and optimizer Let us use the classification cross-entropy Cross-Entropy as the loss function, momentum SGD as the optimizer.

import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Training the network things start to get interesting here, we just need to loop through the data iterator to the network and optimizer input.

for epoch in range(2): # loop over the dataset multiple times
 running_loss = 0.0
for i, data in enumerate(trainloader, 0):
 # get the inputs
 inputs, labels = data
 # zero the parameter gradients
 optimizer.zero_grad()
 # forward + backward + optimize
 outputs = net(inputs)
 loss = criterion(outputs, labels)
 loss.backward()
 optimizer.step()
 # print statistics
 running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
 running_loss = 0.0
print('Finished Training')

Output:

[1, 2000] loss: 2.187
[1, 4000] loss: 1.852
[1, 6000] loss: 1.672
[1, 8000] loss: 1.566
[1, 10000] loss: 1.490
[1, 12000] loss: 1.461
[2, 2000] loss: 1.389
[2, 4000] loss: 1.364
[2, 6000] loss: 1.343
[2, 8000] loss: 1.318
[2, 10000] loss: 1.282
[2, 12000] loss: 1.286
Finished Training

Testing the network on the test set We have trained the network twice with the training data set, but we need to check whether the network has learned something.

We will use the output of the neural network as the predicted class mark to check the prediction performance of the network, and use the real class mark of the sample to proofread. If the prediction is correct, we add the sample to the list of correct predictions.

Okay, the first step, let us show an image from the test set to get familiar with it.

 

Output:

GroundTruth: cat ship ship plane

Now let us see what the neural network thinks these samples should predict:

outputs = net(images)

The output is the degree of similarity between the prediction and ten classes. The higher the degree of similarity to a certain class, the more the network thinks the image belongs to this class. So let's print the most similar category labels:

_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
for j in range(4)))

Output:

Predicted: cat ship car ship

The result looks very good, let's see how the network performs on the entire data set.

correct = 0
total = 0
with torch.no_grad():
for data in testloader:
 images, labels = data
 outputs = net(images)
 _, predicted = torch.max(outputs.data, 1)
 total += labels.size(0)
 correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))

Output:

Accuracy of the network on the 10000 test images: 54 %

This seems to be better than random prediction. The accuracy of random prediction is 10% (which of the 10 categories is random prediction). It seems that the Internet has learned something.

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
 images, labels = data
 outputs = net(images)
 _, predicted = torch.max(outputs, 1)
 c = (predicted == labels).squeeze()
for i in range(4):
 label = labels[i]
 class_correct[label] += c[i].item()
 class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
 classes[i], 100 * class_correct[i] / class_total[i]))

Output:

Accuracy of plane : 57 %
Accuracy of car : 73 %
Accuracy of bird : 49 %
Accuracy of cat : 54 %
Accuracy of deer : 18 %
Accuracy of dog : 20 %
Accuracy of frog : 58 %
Accuracy of horse : 74 %
Accuracy of ship : 70 %
Accuracy of truck : 66 %

So what's next?

How do we run these neural networks on the GPU?

Training on the GPU is like how you transfer a tensor to the GPU, you have to transfer the neural network to the GPU. If CUDA is available, let us first define our device as the first visible cuda device.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should print a CUDA device:
print(device)

Output:

miracles: 0

The rest of this section will assume that the device is a CUDA device.

These methods then recursively traverse all modules and convert their parameters and buffers into CUDA tensors.

net.to(device)

Remember that you must also send inputs and targets to the GPU at every step:

inputs, labels = inputs.to(device), labels.to(device)

Why didn't you notice the huge speedup compared to the CPU? Because your network is very small.

Exercise: Try to increase the width of your network (the first nn.Conv2d parameter is set to 2, and the second nn.Conv2d parameter is set to 1-they need to have the same number), and see what speed increase you get.

aims:

  • Deep understanding of PyTorch's tensor and neural network
  • Trained a small neural network to classify images

Train on multiple GPUs

If you want to see large-scale acceleration, using all your GPUs, please check: Data Parallelism (https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html)

The 2020 Future Cup AI Challenge-Image Track-Voice Track will be opened at the same time, 300,000 prizes are waiting for you to challenge!

https://ai.futurelab.tv/tournament/6

 

Guess you like

Origin blog.csdn.net/aizhushou/article/details/107484941