Getting Started with PyTorch

This article is translated from the official tutorial -- DEEP LEARNING WITH PYTORCH: A 60 MINUTE BLITZ , an official tutorial for getting started with PyTorch quickly.

1. What is Pytorch

Pytorch is a Python-based scientific computing library for two groups of people:

Hope to replace Numpy to take advantage of the power of GPUs;
A platform that can provide more flexible and faster deep learning research.

1.1 Installation

http://t.csdn.cn/AjNuh

1.2 Tensors

A major function of Pytorch is to replace the Numpy library, so first introduce Tensors, that is, tensors, which are equivalent to Numpy's multidimensional arrays (ndarrays). The difference between the two is that Tensors can be applied to the GPU to speed up calculations.

First import the necessary libraries, mainly torch

from __future__ import print_function
import torch

1.2.1 Declarations and definitions

The first is the declaration and definition of Tensors, which are as follows:

torch.empty() : declares an uninitialized matrix.

# 创建一个 5*3 的矩阵
x = torch.empty(5, 3)
print(x)

The output is as follows:

tensor([[9.2737e-41, 8.9074e-01, 1.9286e-37],
        [1.7228e-34, 5.7064e+01, 9.2737e-41],
        [2.2803e+02, 1.9288e-37, 1.7228e-34],
        [1.4609e+04, 9.2737e-41, 5.8375e+04],
        [1.9290e-37, 1.7228e-34, 3.7402e+06]])

torch.rand() : Randomly initialize a matrix

# 创建一个随机初始化的 5*3 矩阵
rand_x = torch.rand(5, 3)
print(rand_x)

Output result:

tensor([[0.4311, 0.2798, 0.8444],
        [0.0829, 0.9029, 0.8463],
        [0.7139, 0.4225, 0.5623],
        [0.7642, 0.0329, 0.8816],
        [1.0000, 0.9830, 0.9256]])

torch.zeros() : Create a matrix whose values are all 0

# 创建一个数值皆是 0，类型为 long 的矩阵
zero_x = torch.zeros(5, 3, dtype=torch.long)
print(zero_x)

The output is as follows:

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

Similarly, you can also create a matrix whose values are all 1, call torch.ones

torch.tensor() : directly pass the tensor value to create

# tensor 数值是 [5.5, 3]
tensor1 = torch.tensor([5.5, 3])
print(tensor1)

Output result:

tensor([5.5000, 3.0000])

In addition to the above methods, you can also create new tensor variables based on existing tensor variables. The advantage of this approach is that you can retain some properties of existing tensors, including size and numerical properties, unless these properties are redefined. The corresponding implementation method is as follows:

tensor.new_ones() : The new_*() method requires an input size

# 显示定义新的尺寸是 5*3，数值类型是 torch.double
tensor2 = tensor1.new_ones(5, 3, dtype=torch.double)  # new_* 方法需要输入 tensor 大小
print(tensor2)

Output result:

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

torch.randn_like(old_tensor) : keep the same dimension size

# 修改数值类型
tensor3 = torch.randn_like(tensor2, dtype=torch.float)
print('tensor3: ', tensor3)

tensor2 The output result, here is to declare a new variable based on the variable declared in the previous method . It can be seen that the size is 5*3, but the value type has changed.

tensor3:  tensor([[-0.4491, -0.2634, -0.0040],
        [-0.1624,  0.4475, -0.8407],
        [-0.6539, -1.2772,  0.6060],
        [ 0.2304,  0.0879, -0.3876],
        [ 1.2900, -0.7475, -1.8212]])

Finally, methods can be used to obtain the size of tensors tensor.size() :

print(tensor3.size())  
# 输出: torch.Size([5, 3])

Note : torch.Size It is actually a tuple type, so all tuple operations are supported .

1.2.2 Operations

The operation also contains a lot of syntax, but here is a quick introduction, and only the addition operation is used as an example. For more operation introductions, you can click the following URL to view the official documents, including transposition, indexing, slicing, mathematical calculations, linear algebra, random count and so on:

https://pytorch.org/docs/stable/torch.html

For the addition operation, there are several implementations:

+ operator
torch.add(tensor1, tensor2, [out=tensor3])
tensor1.add_(tensor2) : directly modify the tensor variable

tensor4 = torch.rand(5, 3)
print('tensor3 + tensor4= ', tensor3 + tensor4)
print('tensor3 + tensor4= ', torch.add(tensor3, tensor4))
# 新声明一个 tensor 变量保存加法操作的结果
result = torch.empty(5, 3)
torch.add(tensor3, tensor4, out=result)
print('add result= ', result)
# 直接修改变量
tensor3.add_(tensor4)
print('tensor3= ', tensor3)

output result

tensor3 + tensor4=  tensor([[ 0.1000,  0.1325,  0.0461],
        [ 0.4731,  0.4523, -0.7517],
        [ 0.2995, -0.9576,  1.4906],
        [ 1.0461,  0.7557, -0.0187],
        [ 2.2446, -0.3473, -1.0873]])

tensor3 + tensor4=  tensor([[ 0.1000,  0.1325,  0.0461],
        [ 0.4731,  0.4523, -0.7517],
        [ 0.2995, -0.9576,  1.4906],
        [ 1.0461,  0.7557, -0.0187],
        [ 2.2446, -0.3473, -1.0873]])

add result=  tensor([[ 0.1000,  0.1325,  0.0461],
        [ 0.4731,  0.4523, -0.7517],
        [ 0.2995, -0.9576,  1.4906],
        [ 1.0461,  0.7557, -0.0187],
        [ 2.2446, -0.3473, -1.0873]])

tensor3=  tensor([[ 0.1000,  0.1325,  0.0461],
        [ 0.4731,  0.4523, -0.7517],
        [ 0.2995, -0.9576,  1.4906],
        [ 1.0461,  0.7557, -0.0187],
        [ 2.2446, -0.3473, -1.0873]])

Note : operations that can change the tensor variable have a suffix _, for example x.copy_(y), x.t_() , can change the x variable

In addition to the addition operation, the access to Tensor is similar to Numpy's array. You can use the index to access the data of a certain dimension, as follows:

# 访问 tensor3 第一列数据
print(tensor3[:, 0])

Output result:

tensor([0.1000, 0.4731, 0.2995, 1.0461, 2.2446])

To modify the size of Tensor, you can use it torch.view() as follows:

x = torch.randn(4, 4)
y = x.view(16)
# -1 表示除给定维度外的其余维度的乘积
z = x.view(-1, 8)
print(x.size(), y.size(), z.size())

Output result:

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])

If the tensor has only one element, it can be used .item() to obtain a value similar to the integer type in Python:

x = torch.randn(1)
print(x)
print(x.item())

output result:

tensor([0.4549])
0.4549027979373932

For more calculation operations, please refer to the introduction of the official document:

https://pytorch.org/docs/stable/torch.html

1.3 Conversion with Numpy arrays

Tensor and Numpy arrays can be converted to each other, and the two share the memory space under the CPU after conversion, that is, changing the value of one of them will change the other variable accordingly.

1.3.1 Convert Tensor to Numpy array

An example of converting Tensor to Numpy array is shown below, calling tensor.numpy() can realize this conversion operation.

a = torch.ones(5)
print(a)
b = a.numpy()
print(b)

Output result:

tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]

In addition, I just said that the two share the same memory space. The example is shown below. Modify tensor the variable ato see if the a converted Numpy array variable changes.b

a.add_(1)
print(a)
print(b)

The output is as follows, obviously, b also changes with a the change of .

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]

1.3.2 Convert Numpy array to Tensor

The action of the transformation is to call torch.from_numpy(numpy_array) the method. Examples are as follows:

import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

Output result:

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

All types of variables except CPU , support the mutual conversion operation with arrays.CharTensorTensorNumpy

1.4. CUDA tensors

Tensors.to It can be converted to different devices through methods, namely CPU or GPU. Examples are as follows:

# 当 CUDA 可用的时候，可用运行下方这段代码，采用 torch.device() 方法来改变 tensors 是否在 GPU 上进行计算操作
if torch.cuda.is_available():
    device = torch.device("cuda")          # 定义一个 CUDA 设备对象
    y = torch.ones_like(x, device=device)  # 显示创建在 GPU 上的一个 tensor
    x = x.to(device)                       # 也可以采用 .to("cuda") 
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # .to() 方法也可以改变数值类型

Output the result, the first result is the result on the GPU, it will be included when printing the variable device='cuda:0', and the second is the variable on the CPU.

tensor([1.4549], device='cuda:0')
tensor([1.4549], dtype=torch.float64)

Tutorial for this section:

https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html

Code for this section:

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/basic_practise.ipynb

2. autograd

For Pytorch's neural network, a very critical library is that autograd it mainly provides the function of automatic differentiation of all operations on Tensors, that is, the function of calculating gradients. It belongs to define-by-run the type framework, i.e. the backpropagation operation is defined in terms of how the code is run, so each iteration can be different.

Next, some examples will be briefly introduced to illustrate the role of this library.

2.1 Tensors

torch.Tensor It is the main library of Pytorch. When its property is set .requires_grad=True, it will start to track all the operations on the variable. After the calculation is completed, it can be called .backward() and automatically calculate all the gradients. The obtained gradients are stored in the properties .grad .

Calling .detach() the method separates the history of the calculation, which stops a tensor variable from continuing to track its history information, and also prevents future calculations from being tracked.

And if you want to prevent tracking history (and using memory), you can put the code block with torch.no_grad(): inside, which is very useful when using a model for evaluation, because the model will contain some requires_grad=True training parameters with, but not actually Their gradient information is required.

For autograd the implementation of , there is another class that is also very important -- Function .

Tensor The two Function classes are associated and build an acyclic graph that encodes a complete computational record. Every tensor variable has an attribute .grad_fn that refers to the Tensor that created it Function (except Tensors created by the user, theirs grad_fn=None ).

If you want to perform a derivative operation, you can call Tensor the method of a variable .backward() . If the variable is a scalar, that is, there is only one element, then there is no need to pass any parameters to the method .backward(). When it contains multiple elements, a gradient parameter must be specified to indicate the matching size of the tensor. This part is described in the second section. The content of the gradient.

Next, we will start to use code to further introduce.

First import the necessary libraries:

import torch

Start creating a tensor, and let requires_grad=True it track the computations associated with that variable:

x = torch.ones(2, 2, requires_grad=True)
print(x)

Output result:

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

Perform arbitrary computational operations, here a simple addition operation:

y = x + 2
print(y)

Output result:

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward>)

y is the result of an operation, so it has attributes grad_fn:

print(y.grad_fn)

Output result:

<AddBackward object at 0x00000216D25DCC88>

y Continue to operate on variables :

z = y * y * 3
out = z.mean()

print('z=', z)
print('out=', out)

Output result:

z= tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward>)

out= tensor(27., grad_fn=<MeanBackward1>)

In fact, the default of a Tensor variable requires_grad is False that you can specify the attribute when defining a variable as above True. Of course, you can also define the variable and call it .requires_grad_(True) to be True . The suffix here _ will change the attribute of the variable itself. Addition is introduced in the previous section. The operation add_() has been explained, here is a code example:

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

The output results are as follows, the first line is requires_grad the result of setting, and then the call is displayed .requires_grad_(True), and the output result is True .

False

True

<SumBackward0 object at 0x00000216D25ED710>

2.2 Gradient

The next step is to start calculating the gradient and perform backpropagation. out The variable is defined in the previous section, it is a scalar, so it is out.backward() equivalent out.backward(torch.tensor(1.)) , the code is as follows:

out.backward()
# 输出梯度 d(out)/dx
print(x.grad)

Output result:

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

The result should be a matrix with values all 4.5. Here we use to o represent out variables, so according to the previous definition, there will be:

in detail, the initial definition x is a matrix of all 1s, and then the addition operation x+2 is obtained y , and then y*y*3, is obtained z , and z it is a 2*2 matrix at this time, so The variable obtained by the overall average out should be divided by 4, so the above three formulas are obtained.

So, computing the gradient:
Mathematically, if you have a vector-valued function:

Then the corresponding gradient is a Jacobian matrix:

In general, torch.autograd it is a tool for calculating the vector-Jacobian product. Skip the mathematical formula here, and go directly to the code example introduction:

x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

Output result:

tensor([ 237.5009, 1774.2396,  274.0625], grad_fn=<MulBackward>)

The variable obtained here y is no longer a scalar, and torch.autograd the complete Jacobian cannot be directly calculated, but we can simply pass the vector to backward() the method as a parameter to get the product of the Jacobian vector, as shown below:

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

Output result:

tensor([ 102.4000, 1024.0000,    0.1024])

Finally, add with torch.no_grad() to stop tracking variable history for automatic gradient calculations:

print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

Output result:

True

True

False

More about autograd sums Function :

https://pytorch.org/docs/autograd

Tutorial for this section:

https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

Code for this section:

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/autograd.ipynb

3. Neural network

torch.nn Specialized in implementing neural networks in PyTorch . It nn.Module contains the construction of the network layer, and a method -- forward(input) , and returns the output of the network outptu .

Below is a classic LeNet network for classifying characters.

For a neural network, a standard training process looks like this:

Define a multilayer neural network
Preprocessing the dataset and preparing it as input to the network
input data to the network
Calculate the loss of the network
backpropagation, compute gradient
To update the gradient of the network, a simple update rule is weight = weight - learning_rate * gradient

3.1 Define the network

First define a neural network, the following is a 5-layer convolutional neural network, including two convolutional layers and three fully connected layers:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 输入图像是单通道，conv1 kenrnel size=5*5，输出通道 6
        self.conv1 = nn.Conv2d(1, 6, 5)
        # conv2 kernel size=5*5, 输出通道 16
        self.conv2 = nn.Conv2d(6, 16, 5)
        # 全连接层
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # max-pooling 采用一个 (2,2) 的滑动窗口
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # 核(kernel)大小是方形的话，可仅定义一个数字，如 (2,2) 用 2 即可
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        # 除了 batch 维度外的所有维度
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
print(net)

Print the network structure:

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

The function must be implemented here forward , and backward the function autograd is automatically defined when it is used, and forward any tensor operation can be used in the method.

net.parameters() It can return the training parameters of the network, the usage example is as follows:

params = list(net.parameters())
print('参数数量: ', len(params))
# conv1.weight
print('第一个参数大小: ', params[0].size())

output:

参数数量:  10
第一个参数大小:  torch.Size([6, 1, 5, 5])

Then simply test the network and randomly generate a 32*32 input:

# 随机定义一个变量输入网络
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

Output result:

tensor([[ 0.1005,  0.0263,  0.0013, -0.1157, -0.1197, -0.0141,  0.1425, -0.0521,
          0.0689,  0.0220]], grad_fn=<ThAddmmBackward>)

Then backpropagation needs to clear the gradient cache first, and backpropagate the random gradient:

# 清空所有参数的梯度缓存，然后计算随机梯度进行反向传播
net.zero_grad()
out.backward(torch.randn(1, 10))

Note :

torch.nn Only supports mini-batches data, that is, the input cannot be a single sample, such as nn.Conv2d a 4-dimensional tensor for the received input -- nSamples * nChannels * Height * Width .
So, if you input a single sample, you need to use it input.unsqueeze(0) to expand a fake batch dimension, that is, change from 3 dimensions to 4 dimensions .

3.2 Loss function

The input of the loss function is (output, target) the data of the network output and the real label pair, and then returns a value indicating the gap between the network output and the real label.

In fact, a lot of loss functions have been defined in PyTorch , here only a simple mean square error is used: nn.MSELoss , the example is as follows:

output = net(input)
# 定义伪标签
target = torch.randn(10)
# 调整大小，使得和 output 一样的 size
target = target.view(1, -1)
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

The output is as follows:

tensor(0.6524, grad_fn=<MseLossBackward>)

Here, the calculation diagram of the data input to output experience of the entire network is shown below, which is actually the loss process of data calculation from the input layer to the output layer.

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss

If called loss.backward() , the entire graph is differentiable, that is to say, including loss all tensor variables in the graph, as long as they have attributes requires_grad=True , their gradient .gradtensors will always accumulate along with the gradient.

To illustrate with code:

# MSELoss
print(loss.grad_fn)
# Linear layer
print(loss.grad_fn.next_functions[0][0])
# Relu
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])

output:

<MseLossBackward object at 0x0000019C0C349908>

<ThAddmmBackward object at 0x0000019C0C365A58>

<ExpandBackward object at 0x0000019C0C3659E8>

3.3 Backpropagation

The implementation of backpropagation only needs to be called loss.backward() . Of course, the current gradient cache needs to be cleared first, that is, .zero_grad() the method, otherwise the previous gradient will be added to the current gradient, which will affect the update of the weight parameters.

The following is a simple example, taking the results of conv1 the layer's bias parameters bias before and after backpropagation as an example:

# 清空所有参数的梯度缓存
net.zero_grad()
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

Output result:

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])

conv1.bias.grad after backward
tensor([ 0.0069,  0.0021,  0.0090, -0.0060, -0.0008, -0.0073])

To learn more about torch.nn the library, you can check the official documentation:

https://pytorch.org/docs/stable/nn.html

3.4 Updating weights

The simplest update weight rule using the Stochastic Gradient Descent (SGD) method is as follows:

weight = weight - learning_rate * gradient

According to this rule, the code implementation is as follows:

# 简单实现权重的更新例子
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

But this is just the simplest rule. There are many optimization algorithms for deep learning, not only SGD, but also Nesterov-SGD, Adam, RMSProp etc. In order to adopt these different methods, torch.optim libraries are used here, and the usage examples are as follows:

import torch.optim as optim
# 创建优化器
optimizer = optim.SGD(net.parameters(), lr=0.01)

# 在训练过程中执行下列操作
optimizer.zero_grad() # 清空梯度缓存
output = net(input)
loss = criterion(output, target)
loss.backward()
# 更新权重
optimizer.step()

Note that the method also needs to be called optimizer.zero_grad() to clear the gradient cache.

Tutorial for this section:

https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html

Code for this section:

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/neural_network.ipynb

4. Train the classifier

The previous section introduced how to build a neural network, calculate loss and update the weight parameters of the network, and the next thing to do is to implement an image classifier.

4.1 Training Data

Before training the classifier, of course, the problem of data needs to be considered. Usually when processing data such as pictures, text, voice or video, standard Python libraries are generally used to load and convert them into Numpy arrays, and then convert them back into PyTorch tensors.

For images, Pillow, OpenCV libraries can be used;
For speech, there are scipy and librosa;
For text, you can choose native Python or Cython to load data, or use NLTK and SpaCy .

PyTorch has created a special torchvision library for computer vision, which contains a data loader (data loader) that can load more common data sets, such as Imagenet, CIFAR10, MNIST etc., and then there is a data converter (data transformers) for images , the library called is torchvision.datasets and torch.utils.data.DataLoader .

In this tutorial, we will use CIFAR10 the dataset, which contains 10 classes, namely Airplane, Car, Bird, Cat, Deer, Dog, Frog, Horse, Boat, and Truck. The images in the dataset are all 3x32x32. Some examples are as follows:

4.2 Training Image Classifier

The training process is as follows:

torchvision Load and normalize CIFAR10 the training and test sets by calling ;
Build a Convolutional Neural Network;
Define a loss function;
Train the network on the training set;
Test the network performance on the test set.

4.2.1 Loading and normalizing CIFAR10

First import the necessary packages:

import torch
import torchvision
import torchvision.transforms as transforms

torchvision The output images of the data set are all PILImage , that is, the value range is [0, 1] , here we need to do a conversion to become the value range [-1, 1] , the code is as follows:

# 将图片数据从 [0,1] 归一化为 [-1, 1] 的取值范围
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

After downloading the data here, you can visualize some training pictures, the code is as follows:

import matplotlib.pyplot as plt
import numpy as np

# 展示图片的函数
def imshow(img):
    img = img / 2 + 0.5     # 非归一化
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# 随机获取训练集图片
dataiter = iter(trainloader)
images, labels = dataiter.next()

# 展示图片
imshow(torchvision.utils.make_grid(images))
# 打印图片类别标签
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

The display picture is as follows:

Its class labels are:

 frog plane   dog  ship

4.2.2 Building a Convolutional Neural Network

In fact, this part of the content can directly use the network defined in the previous section, except for the modified conv1 input channel, which is changed from 1 to 3, because this time it receives a 3-channel color image.

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

4.2.3 Define loss function and optimizer

Here the category cross-entropy function and the SGD optimization method with momentum are used:

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

4.2.4 Training Network

The fourth step is naturally to start training the network, specify the epoch that needs to be iterated, then input data, and print the information of the current network for a specified number of times, such as loss or performance evaluation criteria such as accuracy.

import time
start = time.time()
for epoch in range(2):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # 获取输入数据
        inputs, labels = data
        # 清空梯度缓存
        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # 打印统计信息
        running_loss += loss.item()
        if i % 2000 == 1999:
            # 每 2000 次迭代打印一次信息
            print('[%d, %5d] loss: %.3f' % (epoch + 1, i+1, running_loss / 2000))
            running_loss = 0.0
print('Finished Training! Total cost time: ', time.time()-start)

Here we define a total of 2 epochs for training. The training information is as follows, and the time-consuming is about 77s.

[1,  2000] loss: 2.226
[1,  4000] loss: 1.897
[1,  6000] loss: 1.725
[1,  8000] loss: 1.617
[1, 10000] loss: 1.524
[1, 12000] loss: 1.489
[2,  2000] loss: 1.407
[2,  4000] loss: 1.376
[2,  6000] loss: 1.354
[2,  8000] loss: 1.347
[2, 10000] loss: 1.324
[2, 12000] loss: 1.311

Finished Training! Total cost time:  77.24696755409241

4.2.5 Test model performance

After training a network model, it needs to be tested with a test set to check the generalization ability of the network model. For image classification tasks, accuracy is generally used as the evaluation criterion.

First of all, let's batch do a small test with a picture, here batch=4 , that is, 4 pictures, the code is as follows:

dataiter = iter(testloader)
images, labels = dataiter.next()

# 打印图片
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

The images and labels are as follows:

GroundTruth:    cat  ship  ship plane

Then use these four pictures to enter the network and see the prediction results of the network:

# 网络输出
outputs = net(images)

# 预测结果
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))

The output is:

Predicted:    cat  ship  ship  ship

The first three pictures are all predicted correctly, and the fourth picture incorrectly predicts that the plane is a ship.

Next, let's see how accurate we can achieve on the entire test set!

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

The output is as follows

Accuracy of the network on the 10000 test images: 55 %

The accuracy rate may not be the same here. The result in the tutorial is that 51% due to the weight initialization problem, there may be some fluctuations. Compared with the accuracy rate of randomly guessing 10 categories (ie 10%), this result is good. Of course, in practice Is very bad, but we only use 5 layer network, and only as a sample code for the tutorial.

Then, you can go a step further and check the classification accuracy rate of each category. The difference from the above code is that the part of calculating the accuracy rate is that c = (predicted == labels).squeeze()this code will actually output 1 or 0 according to whether the predicted and real labels are equal, indicating that True or false, so when calculating the number of correct predictions for the current category, add them directly. If the prediction is correct, it will naturally add 1, and if it is wrong, it will add 0, that is, there will be no change.

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (classes[i], 100 * class_correct[i] / class_total[i]))

In the output results, it can be seen that cats, birds, and deer are the top three categories with the most inaccurate predictions, but ships and trucks are the most accurate.

Accuracy of plane : 58 %
Accuracy of   car : 59 %
Accuracy of  bird : 40 %
Accuracy of   cat : 33 %
Accuracy of  deer : 39 %
Accuracy of   dog : 60 %
Accuracy of  frog : 54 %
Accuracy of horse : 66 %
Accuracy of  ship : 70 %
Accuracy of truck : 72 %

4.3 Training on GPU

Deep learning naturally requires GPUs to speed up training. So the next step is to introduce how to implement it if it is trained on the GPU.

First, you need to check whether there is an available GPU for training, the code is as follows:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

The output is as follows, which indicates that your first or only GPU is idle, otherwise it will be printed cpu .

cuda:0

Now that there is an available GPU, the next step is to train on the GPU. The code that needs to be modified is as follows, respectively, the network parameters and data need to be transferred to the GPU:

net.to(device)
inputs, labels = inputs.to(device), labels.to(device)

Modified training part code:

import time
# 在 GPU 上训练注意需要将网络和数据放到 GPU 上
net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

start = time.time()
for epoch in range(2):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # 获取输入数据
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        # 清空梯度缓存
        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # 打印统计信息
        running_loss += loss.item()
        if i % 2000 == 1999:
            # 每 2000 次迭代打印一次信息
            print('[%d, %5d] loss: %.3f' % (epoch + 1, i+1, running_loss / 2000))
            running_loss = 0.0
print('Finished Training! Total cost time: ', time.time() - start)

Note that after this call net.to(device) , the optimizer needs to be defined, that is, the network parameters of the CUDA tensor are passed in. The training results are similar to the previous ones, and in fact, because the network is very small, there is not much speed improvement when transferring to the GPU, and my training results seem to be slower, which may also be because of the GPU graphics card of my laptop question.

If you need to further increase the speed, you can consider using multiple GPUs, which is the content of the next section.

Tutorial for this section:

https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

Code for this section:

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/train_classifier_example.ipynb

5. Data Parallelism

In this part of the tutorial you will learn how to DataParallel train a network using multiple GPUs.

First of all, it is very simple to train the model on the GPU, as shown in the following code, define an device object, and then use .to() the method to put the network model parameters on the specified GPU.

device = torch.device("cuda:0")
model.to(device)

The next step is to put all the tensor variables on the GPU:

mytensor = my_tensor.to(device)

Note that here my_tensor.to(device) is returning a my_tensor new copy object, rather than modifying my_tensor the variable directly, so you need to assign it to a new tensor, and then use this tensor.

Pytorch will only use one GPU by default, so multiple GPUs need to be used DataParallel . The code is as follows:

model = nn.DataParallel(model)

This code is the key to the tutorial in this section, and will continue to be introduced in detail next.

5.1 Imports and parameters

First import the necessary libraries and define some parameters:

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# Parameters and DataLoaders
input_size = 5
output_size = 2

batch_size = 30
data_size = 100

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Here mainly define the network input size and output size, batch as well as the size of the picture, and define an device object.

5.2 Building a fake dataset

The next step is to build a fake (random) dataset. The implementation code is as follows:

class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len

rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
                         batch_size=batch_size, shuffle=True)

5.3 Simple Models

Next, build a simple network model that contains only one layer of fully connected neural network, and add print() functions to monitor the size of network input and output tensors :

class Model(nn.Module):
    # Our model

    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, input):
        output = self.fc(input)
        print("\tIn Model: input size", input.size(),
              "output size", output.size())

        return output

5.4 Create model and data parallel

This is the core part of this section. First, you need to define a model instance, and check whether you have multiple GPUs. If so, you can wrap the model nn.DataParallel and call it model.to(device) . code show as below:

model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
  model = nn.DataParallel(model)

model.to(device)

5.5 Running the model

Then you can run the model and see the printed information:

for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("Outside: input size", input.size(),
          "output_size", output.size())

The output is as follows:

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
        In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
        In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

5.6 Running Results

If there is only 1 or no GPU, then batch=30 the model will get input and output sizes of 30. But if there are multiple GPUs, then the result is as follows:

2 GPUs

# on 2 GPUs
Let's use 2 GPUs!
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
    In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

3 GPUs

Let's use 3 GPUs!
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

8 GPUs

Let's use 8 GPUs!
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

5.7 Summary

DataParallel It automatically splits the dataset and sends tasks to multiple models on multiple GPUs. Then it waits for each model to finish its work, it collects and fuses the results, and returns.

A more detailed data parallel tutorial:

https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html

Tutorial for this section:

https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html

summary

The tutorial starts with the most basic tensor, then introduces the very important automatic gradient calculation autograd , then introduces how to build a neural network, how to train an image classifier, and finally briefly introduces the method of using multiple GPUs to speed up training.