train a classifier
Here, you have seen how to define a neural network, calculate the loss and update the weights of the network. now you might be thinking
What is data?
In general, when you are dealing with image, text, audio or video data, you can use standard python packages to load the data into numpy arrays. Then you can convert this array to torch.*Tensor.
- For images, you can use Pillow, OpenCV
- For audio, you can use scipy and librosa
- For text, use either raw Python or Cython or NLTK and SpaCy
Especially for vision, we created a package called torchvision, which has data loaders for public datasets, such as ImageNet, CIRFAR10, MNIST, etc. Image to data. torchvision.datasets and torch.utils.data.DataLoader.
This provides great convenience and avoids writing boilerplate code.
For this tutorial, we use the CIFAR10 dataset. He has categories: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'. The image of CIFAR10 is 3 * 32 * 32
train an image classifier
We will do the following steps:
- Load and normalize the CIFAR10 training and test datasets using torchvision
- Define a Convolutional Neural Network
- define a loss function
- Train the network on the training set
- Test the network on the test set
Load and normalize CIFAR10
Using torchvision, loading CIFAR10 is very simple
import torch
import torchvision
import torchvision.transforms as transforms
The output dataset of torchvision is a PILImage in the [0,1] range. We convert them to the standard range [-1,1] for Tensors.
**Note:** If there is a BrokenPipeError on the windows platform, set num_worker in torch.utils.data.DataLoader() to 0
import matplotlib.pyplot as plt
import numpy as np
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
)
batch_size = 4
trainset = torchvision.datasets.CIFAR10(root='./cifar10',
train=True,
download=True,
transform=transform)
trainloader = Data.DataLoader(trainset,
batch_size=batch_size,
shuffle=True,
num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./cifar10', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
out:
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./cifar10\cifar-10-python.tar.gz
170500096it [02:27, 1156941.89it/s]
Show some images of the training
out:
Files already downloaded and verified
dog ship plane ship
Define a Convolutional Neural Network
Copy the neural network from the neural network chapter and modify it for a 3-channel image.
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 单通道图像输入, 输出6通道, 5x5 卷积核
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2,2)
self.conv2 = nn.Conv2d(6, 16, 5)
# 一个仿射变换操作(Affine) : y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 来自图像的维度
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max池化,窗口为(2,2)
x = self.pool(F.relu(self.conv1(x)))
# 如果尺寸是正方形, 你可以用一个数字来指定
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # 批处理除外,所有数据降维展平,意思就是二维图像转成一行数组。
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
Define loss function and optimizer
We use cross-entropy error and SGD with momentum
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
training network
This is where it gets interesting, we loop through the data iterator, feed the data into the network and optimize.
for epoch in range(2):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# 获取输入;数据是一个list类型的[inputs, labels]
inputs, labels = data
# 梯度参数设为0
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# 打印统计数据
running_loss += loss.item()
if i % 2000 == 1999:
print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000))
print('Finished Training')
out:
[1, 2000] loss: 2.211
[1, 4000] loss: 1.825
[1, 6000] loss: 1.648
[1, 8000] loss: 1.562
[1, 10000] loss: 1.504
[1, 12000] loss: 1.448
[2, 2000] loss: 1.397
[2, 4000] loss: 1.353
[2, 6000] loss: 1.341
[2, 8000] loss: 1.313
[2, 10000] loss: 1.270
[2, 12000] loss: 1.280
Finished Training
Let's quickly save our trained model
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)
Test the network on the test set
We trained the network 2 times on the training dataset. But we need to check that the network has learned everything. We will verify this by predicting the class labels output by the neural network and validating it against the ground-truth. If the prediction is correct, we add that sample to the list of correct predictions.
The first step is to show a test set image
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
out:
GroundTruth: cat ship ship plane
Next, we load the previously saved model (note: there is no need to save and reload the model here, we do it just to illustrate how to do it)
net = Net()
net.load_state_dict(torch.load(PATH))
Now let's see how the neural network sees the examples above:
outputs = net(images)
The output is the energy of 10 classes. The higher the energy of a class, the more the network thinks the image belongs to this class. So, let's get the index of highest energy:
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))
out:
Predicted: frog ship ship ship
See how the network performs on the entire dataset
correct = 0
total = 0
# 由于我们没有训练,我们不需要为我们的输出计算梯度
with torch.no_grad():
for data in testloader:
images, labels = data
# 图像通过网络计算输出
outputs = net(images)
# 我们选择能量最高的类型作为预测 _,
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))
out:
Accuracy of the network on the 10000 test images: 54 %
This looks much better than probabilistic (10% accuracy) (randomly pick a type out of 10). It appears the network has learned something.
Hmmm, which types do well and which types do not
# 对每个类型准备计数
correct_pred = {
classname: 0 for classname in classes}
total_pred = {
classname: 0 for classname in classes}
# 不需要梯度
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predictions = torch.max(outputs, 1)
# 对每一个类型预测正确解收集
for label, prediction in zip(labels, predictions):
if label == prediction:
correct_pred[classes[label]] += 1
total_pred[classes[label]] += 1
# 打印每个类型的准确率
for classname, correct_count in correct_pred.items():
accuracy = 100 * float(correct_count) / total_pred[classname]
print("Accuracy for class {:5s} is: {:.1f} %".format(classname, accuracy))
out:
Accuracy for class plane is: 60.1 %
Accuracy for class car is: 69.8 %
Accuracy for class bird is: 45.2 %
Accuracy for class cat is: 26.4 %
Accuracy for class deer is: 30.5 %
Accuracy for class dog is: 60.4 %
Accuracy for class frog is: 70.7 %
Accuracy for class horse is: 69.5 %
Accuracy for class ship is: 53.3 %
Accuracy for class truck is: 61.0 %
train on GPU
Just like you put tensors on the GPU, put the neural network on the GPU.
First define our device as cuda first visible, if cuda is available:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")# 假设我们在CUDA机器,这里应该打印CUDA设备print(device)
out:
cuda:0
These methods will then recursively go through all modules and convert their arguments and buffers to CUDA tensors:
net.to(device)
Remember that you also send the input and target to the GPU every step
inputs, labels = data[0].to(device), data[1].to(device)
Why is there no emphasis on the acceleration of MASSIVE compared to CPU? Because your network is too small.