PyTorch Deep Learning: Getting Started in 60 Minutes

PyTorch Deep Learning: Getting Started in 60 Minutes


Purpose of this tutorial:

  • A higher level understanding of PyTorch's Tensor library and neural networks.
  • Train a small neural network to classify images.

This tutorial starts on the premise that you have a certain numpy foundation

Note: Make sure you have installed the torch and torchvision packages.

This is a Python-based scientific computing package designed to serve two types of occasions:

  • Replace numpy to realize the potential of GPU
  • An experimental platform for deep learning that offers high flexibility and efficiency

let's play

Getting started with pytorch basics

(1) tensors

Tensor is a special data structure, similar to array matrix. In pytoch, tensors are used to encode the input and output of the model

import torch 
import numpy as np

1. Tensor initialization

# 直接数据
data=[[1,2],[3,4]]
x_data=torch.tensor(data)
# numpy 数组
np_array=np.array(data)
x_np=torch.from_numpy(np_array)
# 从另一个tensor
x_ones=torch.ones_like(x_data)#保留shape,datatype
print(f'ones tensor:\n{x_ones}\n')
x_rands=torch.rand_like(x_data,dtype=torch.float)#保留shape
print(f'random tensor:\n{x_rands}\n')
ones tensor:
tensor([[1, 1],
        [1, 1]])

random tensor:
tensor([[0.3272, 0.3049],
        [0.3315, 0.8603]])

shape is tensor dimension

shape=(2,3,)
rand_tensor=torch.rand(shape)
ones_tensor=torch.ones(shape)
zeros_tensor=torch.zeros(shape)
print(rand_tensor)
print(ones_tensor)
print(zeros_tensor)
tensor([[0.3955, 0.7930, 0.1733],
        [0.3849, 0.5444, 0.3754]])
tensor([[1., 1., 1.],
        [1., 1., 1.]])
tensor([[0., 0., 0.],
        [0., 0., 0.]])

2. Tensor properties

shape, datatype, device (storage location)

tensor=torch.rand(3,4)
print(tensor.shape,'\n',tensor.dtype,'\n',tensor.device)
torch.Size([3, 4]) 
 torch.float32 
 cpu

3. Tensor operation

Transpose , index, slice, math operations, linear algebra, random sampling

# 索引和切片
tensor=torch.ones(4,4)
tensor[:,1]=0
print(tensor)
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])
# 连接
t1=torch.cat([tensor,tensor,tensor],dim=1)
t1
tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])
# 数字乘
tensor.mul(tensor)
tensor*tensor
# 矩阵乘
tensor.matmul(tensor.T)
[email protected]
tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]])
# 就地操作_
print(tensor)
tensor.add_(4)
print(tensor)
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])
tensor([[5., 4., 5., 5.],
        [5., 4., 5., 5.],
        [5., 4., 5., 5.],
        [5., 4., 5., 5.]])

4.bridge numpy

# tensor-->numpy
t=torch.ones(5)
print(f't:{t}')
n=t.numpy()
print(f'n:{n}')
t:tensor([1., 1., 1., 1., 1.])
n:[1. 1. 1. 1. 1.]
# tensor变化会在numpy中反应
t.add_(1)
print(t)
print(n)
tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]
# numpy-->tensor
n=np.ones(5)
t=torch.from_numpy(n)
np.add(n,1,out=n)
print(t)
print(n)
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
[2. 2. 2. 2. 2.]

(2) torch.autograd

pytorch automatic difference engine that provides support for neural network training

1.usage in pytorch

import ssl
ssl._create_default_https_context = ssl._create_unverified_context
import torch,torchvision
model=torchvision.models.resnet18(pretrained=True)
data=torch.rand(1,3,64,64)
labels=torch.rand(1,1000)
prediction=model(data)#forward
loss=(prediction-labels).sum()#loss function
loss.backward()#backward
optim=torch.optim.SGD(model.parameters(),lr=1e-2,momentum=0.9)#lr学习率
optim.step()#初始化梯度下降

2.differentiation in autograd

import torch
#requires_grad=True:every operation on them should be tracked.
a=torch.tensor([2.,3.],requires_grad=True)
b=torch.tensor([6.,4.],requires_grad=True)
#a,b是NN参数,Q是误差
Q=3*a**3-b**2
external_grad=torch.tensor([1,1])
#Q.backward:计算Q对a,b的gradients并储存在tensor.grad中
Q.backward(gradient=external_grad)
print(a.grad)
print(b.grad)
tensor([36., 81.])
tensor([-12.,  -8.])

3.computational graph

autograd retains all data (tensors) and operations in a DAG (directed acyclic graph, containing function objects)

1. Forward propagation: calculate the result tensor, record the gradient function (leaves–root)

2. Backpropagation: Calculate the gradient of each parameter and save it in tensor.grad, chain rule (root–leaves)

x=torch.rand(5,5)
y=torch.rand(5,5)
z=torch.rand((5,5),requires_grad=True)
a=x+y
print(a.requires_grad)
b=x+z
print(b.requires_grad)
False
True

frozen parameters: do not calculate the parameters of the gradient, reducing the amount of calculation

from torch import nn,optim
model=torchvision.models.resnet18(pretrained=True)
#frozen 所有的参数除了function的权重和偏差
for param in model.parameters():
    param.requires_grad=False
model.fc=nn.Linear(512,10)
optimizer=optim.SGD(model.parameters(),lr=1e-2, momentum=0.9)

(3) Neural network

The torch.nn package builds neural networks

Neural network training steps:

1. Define the neural network (including some parameters/weights that need to be learned)

2. Traverse the input dataset

3. Processing input through the network

4. Calculate the loss function

5. Network parameter gradient backpropagation

6. Usually use a simple update rule to update the weight of the network: weight = weight - learning_rate * gradient

1.define network

(1)Containers:

  • Module: The base class for all neural network models

(2)Convolution Layers:

  • nn.Conv2d:Applies a 2D convolution over an input signal composed of several input planes

(3)Linear Layers

  • nn.Linear:Applies a linear transformation to the incoming data(y=wx+b)
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
net = Net()
print(net)
Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

You only need to define the forward function, you can use autograd to customize the backward function

Model learning parameters are returned by net.parameters()

params = list(net.parameters())
print(len(params))
print(params[0].size())#卷积层1的权重
#print(params)
10
torch.Size([6, 1, 5, 5])
input = torch.randn(1,1,32,32)
out = net(input)
print(out)
tensor([[ 0.0735, -0.0377,  0.1258, -0.0828, -0.0173, -0.0726, -0.0875, -0.0256,
         -0.0797,  0.0959]], grad_fn=<AddmmBackward0>)

Use stochastic gradients to zero all parameters and backpropagation's gradient buffer

net.zero_grad
out.backward(torch.randn(1,10))

torch.nn only supports small batches. The entire torch.nn package only supports input as mini-samples rather than single samples. For example, nn.Conv2d takes a 4D tensor of nSamples x nChannels x Height x Width

Classes seen so far:

  • torch.Tensor: A multi-dimensional array that supports automatic differentiation of backward() and saves tensor gradients
  • nn.Module: neural network module, encapsulation parameters
  • nn.Parameter: A tensor that is automatically registered as a parameter when assigned as a property of a Module
  • autograd.Function: implements the forward and reverse definitions of automatic differentiation operations. Each Tensor operation creates at least one Function node, which is connected to the function that created the Tensor and encodes its history.

2.loss function

The loss function takes (output, target) as input and calculates a value to estimate the distance between the output and the target. The nn package has several different loss functions. Simple ones like nn.MSELoss calculate the mean square error

output = net(input)
target = torch.randn(10)#只是用于例子
target = target.view(1,-1)#使其与输出保持相同shape
criterion = nn.MSELoss()
loss = criterion(output,target)
print(loss)
tensor(0.4356, grad_fn=<MseLossBackward0>)

Using the .grad_fn attribute to follow the loss backwards, a computational graph will be obtained. When loss.backward() is called, the entire graph is differentiated, and all tensors in the graph with requires_grad=True will accumulate their .grad tensors with the gradient

print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # relu
<MseLossBackward0 object at 0x7fef4965df10>
<AddmmBackward0 object at 0x7fef4965d3a0>
<AccumulateGrad object at 0x7fef4965df10>

3.Backprop

For backpropagation, only loss.backward() is needed, and the existing gradient should be cleared before that, otherwise the gradient will be accumulated into the current gradient

net.zero_grad() # 清除梯度

print('conv1的前偏差梯度')
print(net.conv1.bias.grad)

loss.backward()

print('conv1的后偏差梯度')
print(net.conv1.bias.grad)
conv1的前偏差梯度
tensor([0., 0., 0., 0., 0., 0.])
conv1的后偏差梯度
tensor([ 0.0124,  0.0051, -0.0029, -0.0088,  0.0048,  0.0012])

4.Update the weights

The simplest update rule is stochastic gradient descent (SGD)

  • weight = weight - learning_rate * gradient
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data*learning_rate)

But when using a neural network, you may need to use different update rules, such as SGD, Nesterov-SGD, Adam, RMSProp, etc., all methods can be implemented using the torch.optim package

import torch.optim as optim

# 创建optimizer
optimizer = optim.SGD(net.parameters(),lr=0.01)

# 在training loop里
optimizer.zero_grad() # 将梯度缓冲区手动设置为0
output = net(input)
loss = criterion(output,target)
loss.backward()
optimizer.step()
print(net.conv1.bias.grad)
tensor([ 0.0119,  0.0050, -0.0034, -0.0109,  0.0049, -0.0009])

Getting started with Pytorch

Tensors

Tensors are similar to ndarrays in numpy, and Tensor can also use GPU to accelerate operations.

from __future__ import print_function
import torch
x = torch.Tensor(5, 3)  # 构造一个未初始化的5*3的矩阵
x = torch.rand(5, 3)  # 构造一个随机初始化的矩阵
x # 此处在notebook中输出x的值来查看具体的x内容
x.size()

#NOTE: torch.Size 事实上是一个tuple, 所以其支持相关的操作*
y = torch.rand(5, 3)

#此处 将两个同形矩阵相加有两种语法结构
x + y # 语法一
torch.add(x, y) # 语法二

# 另外输出tensor也有两种写法
result = torch.Tensor(5, 3) # 语法一
torch.add(x, y, out=result) # 语法二
y.add_(x) # 将y与x相加

# 特别注明:任何可以改变tensor内容的操作都会在方法名后加一个下划线'_'
# 例如:x.copy_(y), x.t_(), 这俩都会改变x的值。

#另外python中的切片操作也是资次的。
x[:,1] #这一操作会输出x矩阵的第二列的所有值

Reading material:

100+ Tensor operations, including transposition, indexing, slicing, mathematical operations, linear algorithms, random numbers, etc.

See: torch - PyTorch 0.1.9 documentation

Numpy Bridge

Converting Torch's Tensor and numpy's array to each other is just like sprinkling water. Note that Torch's Tensor and numpy's array share their storage space, and modifying one will cause the other to be modified as well.

# 此处演示tensor和numpy数据结构的相互转换
a = torch.ones(5)
b = a.numpy()

# 此处演示当修改numpy数组之后,与之相关联的tensor也会相应的被修改
a.add_(1)
print(a)
print(b)

# 将numpy的Array转换为torch的Tensor
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

# 另外除了CharTensor之外,所有的tensor都可以在CPU运算和GPU预算之间相互转换
# 使用CUDA函数来将Tensor移动到GPU上
# 当CUDA可用时会进行GPU的运算
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    x + y

Neural Networks in PyTorch

Next, introduce the neural network part in pytorch. All neural networks in PyTorch come from the autograd package

First we'll take a quick look, and then we'll train our first neural network.

Autograd: automatic differentiation

The autograd package provides automatic derivation methods for all Tensor operations.
This is a runtime-defined framework, which means that your backpropagation is defined by the way your code runs, so each iteration can be different.

With these examples in mind, let's look at these features in simpler terms.

autograd.Variable This is the core class in this package. It wraps a Tensor and supports almost all operations defined on it. Once you're done with your calculations, you can call .backward() to automatically calculate all gradients.

You can access the original tensor through the attribute .data, and the gradient on this Variable is concentrated in the .grad attribute.

There is also a very important class Function in automatic derivation.

Both Variable and Function are related to each other and build an acyclic graph describing the entire operation process. Each Variable has a .creator property, which refers to a Function that creates the Variable. (except for user-created Variables whose creator part is None).

If you want to do a derivative calculation, you can call .backward() on the Variable. If the Variable is a scalar (eg it contains a single element of data), you don't need to specify any arguments to backward(), however if it has more elements, you need to specify a grad_output argument that matches the tensor's shape.

from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad = True)
y = x + 2
y.creator

# y 是作为一个操作的结果创建的因此y有一个creator 
z = y * y * 3
out = z.mean()

# 现在我们来使用反向传播
out.backward()

# out.backward()和操作out.backward(torch.Tensor([1.0]))是等价的
# 在此处输出 d(out)/dx
x.grad

The end result should be a matrix full of 4.5s. Set the output variable to o . We use this formula to calculate:

o = 1 4 ∑ i z i o = \frac{1}{4}\sum_i z_i o=41izio = \frac{1}{4}\sum_i z_i, z i = 3 ( x i + 2 ) 2 z_i = 3(x_i+2)^2 zi=3(xi+2)2z_i = 3(x_i+2)^2, z i ∣ x i = 1 = 27 z_i\bigr\rvert_{x_i=1} = 27 zi xi=1=27z_i\bigr\rvert_{x_i=1} = 27,因此, ∂ o ∂ x i = 3 2 ( x i + 2 ) \frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2) xio=23(xi+2)\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2),最后有 ∂ o ∂ x i ∣ x i = 1 = 9 2 = 4.5 \frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5 xio xi=1=29=4.5\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5

You can do crazy things with autograd.

x = torch.randn(3)
x = Variable(x, requires_grad = True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)
x.grad

Reading material:

You can read more about Variable and Function documentation here: pytorch.org/docs/autograd.html

Neural Networks

Neural networks can be built using the torch.nn package.

Now you have a preliminary understanding of autograd, and nn is built on the basis of autograd for model definition and differentiation.

The nn.Module contains the layers of the neural network, and the forward(input) method can return the output.

As an example, take a look at this neural network for digital image classification.

This is a simple feedforward neural network. The input result is obtained from the front, passed from one layer to another, and finally the final result is output.

A typical neural network training process looks like this:

  • Define a neural network with learnable parameters (or weights)

  • Iterate over an input dataset:

  • Process the input with a neural network

  • Calculate the cost value (how much is the correction to the output value)

  • Propagate the gradients back into the parameters of the neural network

  • update the weights in the network

  • Usually a simple update rule is used: weight = weight + learning_rate * gradient

Let's define a neural network:

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5) # 1 input image channel, 6 output channels, 5x5 square convolution kernel
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120) # an affine operation: y = Wx + b
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv2(x)), 2) # If the size is a square you can only specify a single number
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:] # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
net

'''神经网络的输出结果是这样的
Net (
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear (400 -> 120)
  (fc2): Linear (120 -> 84)
  (fc3): Linear (84 -> 10)
)
'''

You only need to define a forward function, and backward will be automatically generated.

You can use all Tensor operations in the forward function.

The learnable parameters in the model are returned by net.parameters().

params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight

input = Variable(torch.randn(1, 1, 32, 32))
out = net(input)
'''out 的输出结果如下
Variable containing:
-0.0158 -0.0682 -0.1239 -0.0136 -0.0645  0.0107 -0.0230 -0.0085  0.1172 -0.0393
[torch.FloatTensor of size 1x10]
'''

net.zero_grad() # 对所有的参数的梯度缓冲区进行归零
out.backward(torch.randn(1, 10)) # 使用随机的梯度进行反向传播

Note: torch.nn only accepts small batches of data.
The entire torch.nn package only accepts data for that small batch of samples, not individual samples. For example, nn.Conv2d can construct a four-dimensional TensornSamples x nChannels x Height x Width.
If you are taking a single sample, use input.unsqueeze(0) to add a fake dimension.

Review what we learned earlier:

  • torch.Tensor - a multidimensional array
  • autograd.Variable - mutate the Tensor and record the history of the operation. It has the same API as Tensor, and some APIs of backward(). Also contains the gradient associated with the tensor.
  • nn.Module - the neural network module. Convenient data encapsulation, which can move calculations to the GPU, and also includes some input and output things.
  • nn.Parameter - A variable that is automatically registered as a parameter when any value is assigned to the Module.
  • autograd.Function - implements the definition of feed-forward and feed-back using the autograd method. Each Variable operation will generate at least one independent Function node, and record the operation history after connecting with the function that generated the Variable.

Parts we have understood so far:

  • A neural network is defined.
  • Input is handled and feedback is implemented.

Still unfinished:

  • Calculate the cost.
  • Update the weights in the network.

A cost function takes (output, target) pairs of inputs and computes the estimated distance between the output and the target.

Some different cost functions in the nn package .

A simple cost function: nn.MSELoss computes the mean squared error between the input and the target.

for example:

output = net(input)
target = Variable(torch.range(1, 10))  # a dummy target, for example
criterion = nn.MSELoss()
loss = criterion(output, target)
'''loss的值如下
Variable containing:
 38.5849
[torch.FloatTensor of size 1]
'''

Now, if you follow the loss from the back to the front, using the .creator property you can see a calculation flow diagram like this:

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d  
      -> view -> linear -> relu -> linear -> relu -> linear 
      -> MSELoss
      -> loss

So when we call loss.backward(), the whole graph is distinguished by cost, and all variables in the graph will accumulate gradients with .grad.

# For illustration, let us follow a few steps backward
print(loss.creator) # MSELoss
print(loss.creator.previous_functions[0][0]) # Linear
print(loss.creator.previous_functions[0][0].previous_functions[0][0]) # ReLU

'''
<torch.nn._functions.thnn.auto.MSELoss object at 0x7fe8102dd7c8>
<torch.nn._functions.linear.Linear object at 0x7fe8102dd708>
<torch.nn._functions.thnn.auto.Threshold object at 0x7fe8102dd648>
'''

# 现在我们应当调用loss.backward(), 之后来看看 conv1's在进行反馈之后的偏置梯度如何
net.zero_grad() # 归零操作
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

''' 这些步骤的输出结果如下
conv1.bias.grad before backward
Variable containing:
 0
 0
 0
 0
 0
 0
[torch.FloatTensor of size 6]

conv1.bias.grad after backward
Variable containing:
 0.0346
-0.0141
 0.0544
-0.1224
-0.1677
 0.0908
[torch.FloatTensor of size 6]
'''

Now we have seen how to use the cost function.

Reading material:

The neuralnet package contains many modules and cost functions for neural networks, the full list with documentation is here: torch.nn - PyTorch 0.1.9 documentation

There is only one left to learn:

  • Update the weights of the network

The simplest update rule is stochastic gradient descent (SGD):

weight = weight - learning_rate * gradient

We can express this in simple python:

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

However when you use neural networks you want to use different kinds of methods such as: SGD, Nesterov-SGD, Adam, RMSProp, etc.

We built a small package torch.optim to achieve this functionality, which contains all these methods. It is also very simple to use:

import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr = 0.01)

# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update

That's it.

But you might be thinking now.

So what about the data?

Generally speaking, when you process images, sounds, texts, and videos, you need to use other independent packages in python to convert them into arrays in numpy, and then convert them to torch.*Tensor.

  • For images, you can use Pillow, OpenCV.
  • Sound processing can be done with scipy and librosa.
  • Text processing can use native Python or Cython as well as NLTK and SpaCy.

Especially for images, we have the torchvision package available, which contains some ready-made datasets such as: Imagenet, CIFAR10, MNIST, etc. There are also tools for converting images. This is very convenient and avoids writing boilerplate code.

This tutorial uses the CIFAR10 dataset. The categories we want to classify are: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'. The images in this dataset are all 3-channel, 32x32 pixel images.

The following is a practical exercise for the use of the torch neural network.

train an image classifier

We have to do these steps in order:

  1. Use torchvision to read and preprocess the CIFAR10 dataset
  2. Define a Convolutional Neural Network
  3. define a cost function
  4. Train the training set data in the neural network
  5. Test the neural network using the test set data

1. Read and preprocess CIFAR10

It is quite convenient to use torchvision to read CIFAR10.

import torchvision
import torchvision.transforms as transforms


# torchvision数据集的输出是在[0, 1]范围内的PILImage图片。
# 我们此处使用归一化的方法将其转化为Tensor,数据范围为[-1, 1]

transform=transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                             ])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, 
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, 
                                          shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
'''注:这一部分需要下载部分数据集 因此速度可能会有一些慢 同时你会看到这样的输出

Downloading http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
Extracting tar file
Done!
Files already downloaded and verified
'''

Let's find some pictures from it.

# functions to show an image
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
def imshow(img):
    img = img / 2 + 0.5 # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1,2,0)))

# show some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s'%classes[labels[j]] for j in range(4)))

The result is this:

2. Define a Convolutional Neural Network

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool  = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

3. Define the cost function and optimizer

criterion = nn.CrossEntropyLoss() # use a Classification Cross-Entropy loss
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

4. Train the network

Things get interesting. We only need to iterate round after round and continue to adjust parameters through input.

for epoch in range(2): # loop over the dataset multiple times
    
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        
        # wrap them in Variable
        inputs, labels = Variable(inputs), Variable(labels)
        
        # zero the parameter gradients
        optimizer.zero_grad()
        
        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()        
        optimizer.step()
        
        # print statistics
        running_loss += loss.data[0]
        if i % 2000 == 1999: # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss / 2000))
            running_loss = 0.0
print('Finished Training')
'''这部分的输出结果为
[1,  2000] loss: 2.212
[1,  4000] loss: 1.892
[1,  6000] loss: 1.681
[1,  8000] loss: 1.590
[1, 10000] loss: 1.515
[1, 12000] loss: 1.475
[2,  2000] loss: 1.409
[2,  4000] loss: 1.394
[2,  6000] loss: 1.376
[2,  8000] loss: 1.334
[2, 10000] loss: 1.313
[2, 12000] loss: 1.264
Finished Training
'''

We have already trained twice. At this point, you need to test what the result is.

By comparing the classification given by the neural network with the known category results, it can be concluded whether it is correct or not. If the prediction is correct, we can add the sample to the list of correctly predicted results.

Good first step, let's show some pictures to get acquainted.

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s'%classes[labels[j]] for j in range(4)))

The result is this:

Alright, let's see what the neural network thinks about these few photos.

outputs = net(Variable(images))

# the outputs are energies for the 10 classes. 
# Higher the energy for a class, the more the network 
# thinks that the image is of the particular class

# So, let's get the index of the highest energy
_, predicted = torch.max(outputs.data, 1)

print('Predicted: ', ' '.join('%5s'% classes[predicted[j][0]] for j in range(4)))

'''输出结果为
Predicted:    cat plane   car plane
'''

It turned out to be looking pretty good.

See how the neural network performs on the entire dataset.

correct = 0
total = 0
for data in testloader:
    images, labels = data
    outputs = net(Variable(images))
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

'''输出结果为
Accuracy of the network on the 10000 test images: 54 %
'''

It seems that the output of this thing is better than the random integer. If you choose one out of ten randomly, the accuracy rate is only about 10%.

It looks like the neural network has learned something.

Um. . . So which categories are performing well and which are not?

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
for data in testloader:
    images, labels = data
    outputs = net(Variable(images))
    _, predicted = torch.max(outputs.data, 1)
    c = (predicted == labels).squeeze()
    for i in range(4):
        label = labels[i]
        class_correct[label] += c[i]
        class_total[label] += 1

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (classes[i], 100 * class_correct[i] / class_total[i]))

'''输出结果为
Accuracy of plane : 73 %
Accuracy of   car : 70 %
Accuracy of  bird : 52 %
Accuracy of   cat : 27 %
Accuracy of  deer : 34 %
Accuracy of   dog : 37 %
Accuracy of  frog : 62 %
Accuracy of horse : 72 %
Accuracy of  ship : 64 %
Accuracy of truck : 53 %
'''

Ok, so what's next?

How do we run neural networks on the GPU?

training on GPU

Just like you pass Tensor to GPU for calculation, you can also pass neural network to GPU.

This process will be performed step by step until all components are passed to the GPU.

net.cuda()

'''输出结果为
Net (
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear (400 -> 120)
  (fc2): Linear (120 -> 84)
  (fc3): Linear (84 -> 10)
)
'''

Remember, each step needs to pass the input and target to the GPU.

   inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())

Why didn't I compare CPU computing and GPU computing? Because the neural network is so small, the gap is not obvious.

reach the goal:

  • Understand PyTorch's Tensor library and neural networks at a higher level.
  • Train a small neural network.

Pytorch official tutorial (detailed version)

(一)Datasets & DataLoaders

Code that handles data samples can become messy and difficult to maintain; ideally, we want the dataset code to be separated from the model training code for better readability and modularity. PyTorch provides two data primitives: torch.utils.data.DataLoaderand torch.utils.data.Dataset, which allow you to use preloaded datasets as well as your own. DatasetStores samples and their corresponding labels, DataLoaderwrapped Datasetinto an iterator for easy access to samples. The PyTorch domain library provides many preloaded datasets (such as FashionMNIST), torch.utils.data.Datasetsubclasses of , and implements functions specific to specific data. They can be used for prototyping and benchmarking your models.

load dataset

This is an example of loading the Fashion-MNIST dataset from TorchVision. Fashion MNIST is a dataset of Zalando article images, containing 60,000 training examples and 10,000 test examples. Each example consists of a 28×28 grayscale image and an associated label from one of 10 categories. Loading FashionMNIST requires the following parameters

  • root: training/testing data storage path
  • train: specifies the training or testing dataset
  • download=True: download data from internet if not available in "root directory"
  • transformand target_transformspecify feature and label transformations
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt


train_data = datasets.FashionMNIST(root='data',train=True,download=True,transform=ToTensor())

test_data = datasets.FashionMNIST(root='data',train=False,download=True,transform=ToTensor())

Iterate and visualize datasets

We can index manually like a list Datasets: train_data[index]. matplotlibVisualize some training data samples using

labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",}
figure = plt.figure(figsize=(8,8))
cols,rows = 3,3
for i in range(1,cols * rows + 1):
    sample_index = torch.randint(len(train_data),size=(1,)).item() # 获取随机索引
    img,label = train_data[sample_index] # 找到随机索引下的图像和标签
    figure.add_subplot(rows,cols,i) # 增加子图,add_subplot面向对象,subplot面向函数
    plt.title(labels_map[label])
    plt.axis("off") # 关闭坐标轴
    plt.imshow(img.squeeze(),cmap='gray') # 对图像进行处理,cmap颜色图谱
plt.show() # 显示图像

insert image description here

Create a custom dataset file

A custom dataset class must contain three functions: __init__, __len__, and __getitem__. For example, images are stored in img_dira directory, and tags are stored separately in a CSV fileannotations_file

import os
import pandas as pd
from torchvision.io import read_image

class CustomImageDataset(Dataset):
    def __init__(self,annotations_file,img_dir,transform = None,target_transform = None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.traget_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        # iloc[:,:]切片,左闭右开,iloc[idx,0]取idx行0列元素
        # os.path.join路径连接
        img_path = os.path.join(self.img_dir,self.img_labels.iloc[idx,0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx,1]
        if self.transform:
            image = self.transform(image)
        if self.traget_transform:
            label = self.traget_transform(label)
        return image,label

init

__init__The function runs once when the Dataset object is instantiated. We initialize a directory containing images, annotation files, and two transformations. The contents of the labels.csv file are as follows:

tshirt1.jpg, 0
tshirt2.jpg, 0
......
ankleboot999.jpg, 9
def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
    self.img_labels = pd.read_csv(annotations_file)
    self.img_dir = img_dir
    self.transform = transform
    self.target_transform = target_transform

len

__len__The function returns the number of samples in the dataset

def __len__(self):
    return len(self.img_labels)

timed

__getitem__The function loads and returns idxa sample at a given index in the dataset. Based on the index, it identifies the location of the image on disk, read_imageconverts it to a tensor using , and self.img_labelsretrieves the corresponding label from the csv data. Calls the transform function on it (if applicable) and returns the tensor image and corresponding label as a tuple.

def __getitem__(self, idx):
    img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
    image = read_image(img_path)
    label = self.img_labels.iloc[idx, 1]
    if self.transform:
        image = self.transform(image)
    if self.target_transform:
        label = self.target_transform(label)
    return image, label

Prepare training data using DataLoaders

DatasetRetrieve dataset features and labels one sample at a time. When training models, we often want to pass samples in "minibatches", rearrange the data at each epoch to reduce model overfitting, and use Python's accelerated data retrieval multiprocessing. . DataLoaderis an iterator that can implement the above functions

from torch.utils.data import DataLoader
# shuffle如果设置为True,则会在每个epoch重新排列数据
train_dataloader = DataLoader(train_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

Iterate through DataLoader

Having loaded the data into DataLoader, it is possible to iterate over the dataset, each iteration returns the sum of batches( batch_size=64) , set train_features, after we iterate over all batches, the data is shuffled (to have finer-grained control over the order in which the data is loaded)train_labelsshuffle=True

train_features,train_labels = next(iter(train_dataloader))
print(f'feature batch shape:{train_features.size()}')
print(f'label batch shape:{train_labels.size()}')
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img,cmap='gray')
plt.show()
print(f'label:{label}')
feature batch shape:torch.Size([64, 1, 28, 28])
label batch shape:torch.Size([64])

insert image description here

label:4

(2) Transforms

Data doesn't always come in the final processed form needed to train machine learning algorithms. We use transforms to perform some operations on the data to make it suitable for training. All TorchVision datasets have two parameters transform(correction features), target_transform(correction labels), and the torchvision.transforms module provides several commonly used transformations.
FashionMNIST features are in the form of PIL images, and labels are integers. For training, features are required as normalized tensors and labels as a one-hot encoded tensor. use ToTensorand Lambdaimplement

import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
    target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)

ToTensor()

ToTensorTurns a PIL imageNumPy array or NumPy array ndarrayinto a floating-point tensor FloatTensor, scaling image pixel intensity values ​​in the range [0,1]

Lambda Transforms

Lambda transforms apply any user-defined Lambda function, here a function is defined to turn an integer into a one-hot encoded tensor, first create a tensor of all 0s of size 10 (number of labels), and then call the index at the label scatter_y Change the value to 1 in the position

target_transform = Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))

Tensor.scatter_(dim, index, src, reduce=None)In the dim dimension, find the element corresponding to index, and replace the value with src

print(torch.zeros(10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(3), value=1))
tensor([0., 0., 0., 1., 0., 0., 0., 0., 0., 0.])

(3) Building a neural network

Use pytorch to build a neural network for image classification in the FashionMNIST dataset

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('using {} device'.format(device))
using cpu device

Define the neural network class

Inherit nn.Moduleto build a neural network, including two parts

  • __init__: Define the network layer
  • forward: perform forward pass
class network(nn.Module):
    def __init__(self):
        super(network, self).__init__()
        self.flatten = nn.Flatten() # 将连续范围的维度拉平成张量
        self.layers = nn.Sequential(
            nn.Linear(28*28,512),
            nn.ReLU(),
            nn.Linear(512,512),
            nn.ReLU(),
            nn.Linear(512,10))
    def forward(self,x):
        x = self.flatten(x) # 输入到网络中的是(batch_size,input)
        values = self.layers(x)
        return values

torch.nn.Flatten(start_dim=1, end_dim=- 1)By default only the first dimension is kept

  • start_dim:first dim to flatten (default = 1).

  • end_dim:last dim to flatten (default = -1).

# torch.nn.Flatten示例
input = torch.randn(32,1,5,5)
m = nn.Flatten()
output = m(input)
print(output.size())
m1 = nn.Flatten(0,2)
print(m1(input).size())
torch.Size([32, 25])
torch.Size([160, 5])

Create an networkinstance and move to device, output structure

model = network().to(device)
print(model)
network(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (layers): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

Traverse the input data and perform model forward propagation without directly callingforward

x = torch.rand(2,28,28,device=device)
value = model(x)
print(value)
print(value.size())
pred_probab = nn.Softmax(dim=1)(value)
print(pred_probab)
y_pred = pred_probab.argmax(1)
print(f'predicted class:{y_pred}')
tensor([[-0.0355,  0.0948, -0.1048,  0.0802,  0.0177,  0.0038, -0.0281, -0.0767,
          0.0303, -0.1290],
        [-0.0238,  0.1298, -0.0700,  0.0861,  0.0168, -0.0418, -0.0421, -0.0772,
          0.0369, -0.1391]], grad_fn=<AddmmBackward0>)
torch.Size([2, 10])
tensor([[0.0977, 0.1113, 0.0912, 0.1097, 0.1030, 0.1016, 0.0984, 0.0938, 0.1043,
         0.0890],
        [0.0986, 0.1149, 0.0941, 0.1100, 0.1027, 0.0968, 0.0968, 0.0935, 0.1048,
         0.0878]], grad_fn=<SoftmaxBackward0>)
predicted class:tensor([1, 1])

torch.nn.Softmax(dim=None)softmax normalization

# torch.nn.Softmax示例
m = nn.Softmax(dim=1)
input = torch.randn(2,3)
print(input)
output = m(input)
print(output)
tensor([[-0.5471,  1.3495,  1.5911],
        [-0.0185, -0.1420, -0.0556]])
tensor([[0.0619, 0.4126, 0.5254],
        [0.3512, 0.3104, 0.3384]])

model structure layer

Disassemble the layers in the model and observe the input and output

raw input

input_image = torch.rand(3,28,28)
print(input_image.size())
torch.Size([3, 28, 28])

nn.Flatten

Change the 2-dimensional 28✖️28 image into 784 pixel values, and keep the batch dimension (dim=0)

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())
torch.Size([3, 784])

nn.Linear

linear conversion

layer1 = nn.Linear(in_features=28*28,out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size( ))
torch.Size([3, 20])

nn. ReLU

Nonlinear correction unit (activation function)

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")
print(hidden1.size())
Before ReLU: tensor([[ 0.4574, -0.5313, -0.4628, -0.9403, -0.7630,  0.1807, -0.2847, -0.2741,
          0.0954,  0.2327,  0.4603,  0.0227, -0.1299, -0.2346, -0.1800,  0.9115,
         -0.0870, -0.0171, -0.0064,  0.0540],
        [ 0.0888, -0.6782, -0.2557, -0.6717, -0.4488,  0.1024, -0.3013, -0.3186,
         -0.1338,  0.3944,  0.0704,  0.1429,  0.0521, -0.3326, -0.3113,  0.6518,
         -0.0978, -0.0721, -0.3396,  0.4712],
        [ 0.1781,  0.0885, -0.4775, -0.5661, -0.0099,  0.2617, -0.2678, -0.1444,
          0.1345,  0.3259,  0.3984,  0.2392,  0.0529, -0.0349, -0.3266,  0.7488,
         -0.3498,  0.1157,  0.0126,  0.3502]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.4574, 0.0000, 0.0000, 0.0000, 0.0000, 0.1807, 0.0000, 0.0000, 0.0954,
         0.2327, 0.4603, 0.0227, 0.0000, 0.0000, 0.0000, 0.9115, 0.0000, 0.0000,
         0.0000, 0.0540],
        [0.0888, 0.0000, 0.0000, 0.0000, 0.0000, 0.1024, 0.0000, 0.0000, 0.0000,
         0.3944, 0.0704, 0.1429, 0.0521, 0.0000, 0.0000, 0.6518, 0.0000, 0.0000,
         0.0000, 0.4712],
        [0.1781, 0.0885, 0.0000, 0.0000, 0.0000, 0.2617, 0.0000, 0.0000, 0.1345,
         0.3259, 0.3984, 0.2392, 0.0529, 0.0000, 0.0000, 0.7488, 0.0000, 0.1157,
         0.0126, 0.3502]], grad_fn=<ReluBackward0>)
torch.Size([3, 20])

nn.Sequential

nn.Sequentialis an ordered container for a module, and data is passed to all modules in the order defined

seq_modules = nn.Sequential(flatten,layer1,nn.ReLU(),nn.Linear(20,10))
input_image = torch.randn(3,28,28)
values1 = seq_modules(input_image)
print(values1)
tensor([[ 0.2472,  0.2597, -0.0157,  0.3206, -0.0073,  0.1631,  0.2956,  0.0561,
          0.2993,  0.1807],
        [-0.0782,  0.1838, -0.0215,  0.2395, -0.0804, -0.0021,  0.0883, -0.0698,
          0.1463, -0.0151],
        [-0.1162,  0.0673, -0.2301,  0.1612, -0.1472, -0.0447,  0.0671, -0.2915,
          0.3176,  0.2391]], grad_fn=<AddmmBackward0>)

nn.Softmax

The last linear layer of the neural network returns the original value in [-\infty, \infty], after passing through nn.Softmaxthe module, the output value is in [0, 1], which represents the predicted probability of each category, and dimthe parameter indicates that the sum of the values ​​of the changed dimensions is 1

softmax = nn.Softmax(dim=1)
pred_probab1 = softmax(values1)
print(pred_probab1)
tensor([[0.1062, 0.1075, 0.0816, 0.1143, 0.0823, 0.0976, 0.1115, 0.0877, 0.1119,
         0.0994],
        [0.0884, 0.1148, 0.0935, 0.1214, 0.0882, 0.0954, 0.1044, 0.0891, 0.1106,
         0.0941],
        [0.0872, 0.1048, 0.0778, 0.1151, 0.0845, 0.0937, 0.1048, 0.0732, 0.1346,
         0.1244]], grad_fn=<SoftmaxBackward0>)

Model parameters

Use parameters()and named_parameters()can get the parameters of each layer, including weight and bias

print(f'model structure:{model}\n')

for name,param in model.named_parameters():
    print(f'layer:{name}|size"{param.size()}|param:{param[:2]}\n')

#print(model.parameters())
model structure:network(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (layers): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

layer:layers.0.weight|size"torch.Size([512, 784])|param:tensor([[ 0.0122, -0.0204, -0.0185,  ..., -0.0196,  0.0257, -0.0084],
        [-0.0066, -0.0195, -0.0199,  ..., -0.0175, -0.0007,  0.0003]],
       grad_fn=<SliceBackward0>)

layer:layers.0.bias|size"torch.Size([512])|param:tensor([0.0086, 0.0104], grad_fn=<SliceBackward0>)

layer:layers.2.weight|size"torch.Size([512, 512])|param:tensor([[-0.0306, -0.0408,  0.0062,  ...,  0.0289, -0.0164,  0.0099],
        [ 0.0015,  0.0052,  0.0182,  ...,  0.0431, -0.0174,  0.0049]],
       grad_fn=<SliceBackward0>)

layer:layers.2.bias|size"torch.Size([512])|param:tensor([-0.0337,  0.0294], grad_fn=<SliceBackward0>)

layer:layers.4.weight|size"torch.Size([10, 512])|param:tensor([[ 0.0413,  0.0015,  0.0388,  ...,  0.0347,  0.0160,  0.0221],
        [-0.0010,  0.0031,  0.0421,  ..., -0.0226,  0.0340, -0.0220]],
       grad_fn=<SliceBackward0>)

layer:layers.4.bias|size"torch.Size([10])|param:tensor([0.0210, 0.0243], grad_fn=<SliceBackward0>)

(4) Automatic differencetorch.autograd

The most frequently used algorithm for training neural networks is back propagation , and the parameters (model weights) are adjusted according to the gradient of the loss function. In order to calculate the gradient, pytorch has a built-in
difference engine torch.autogradthat supports the gradient calculation of any calculation graph. Taking the simplest single-layer neural network as an example, input x, parameters wand bsome loss functions,

import torch

x = torch.ones(5)  # 输入张量
y = torch.zeros(3)  # 预期输出
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
print(loss)
tensor(2.2890, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)

Tensors, Functions, Computational Graphs

The code defines the computational graph as follows :
insert image description here

wand bare the parameters that need to be optimized, so it is necessary to calculate the gradient of these variables to the loss function and set requires_gradthe properties of the tensor

Values ​​that can be set when creating a tensor requires_grad, or using x.requires_grad_(True)methods afterwards, functions used to implement forward and backpropagation on tensors are instances Functionof classes, and backpropagation functions are stored on tensor grad_fnproperties

print(f'gradient function for z={z.grad_fn}\n')
print(f'gradient function for loss={loss.grad_fn}\n')
gradient function for z=<AddBackward0 object at 0x7fb47069aa30>

gradient function for loss=<BinaryCrossEntropyWithLogitsBackward0 object at 0x7fb47069a250>

Calculate the gradient

In order to optimize the weight of the network parameters, it is necessary to calculate the derivative of the loss function to each parameter under fixedx values ​​∂ loss ∂ w \frac{\partial loss}{\partial w} ∂w∂loss and ∂ loss ∂ b \frac{\partial yloss}{\partial b} ∂b∂loss , to compute these derivatives, one needs to call loss.backward(), pass w.gradand b.gradget the gradient values

loss.backward()
print(w.grad)
print(b.grad)
tensor([[0.3263, 0.0754, 0.3122],
        [0.3263, 0.0754, 0.3122],
        [0.3263, 0.0754, 0.3122],
        [0.3263, 0.0754, 0.3122],
        [0.3263, 0.0754, 0.3122]])
tensor([0.3263, 0.0754, 0.3122])

Only the attributes of the leaf nodes of the calculation graph can be obtained grad, which requires_gradis set to true. For other nodes, the gradient cannot be obtained; for performance reasons, only one gradient calculation can be performed using "backward" on a given graph. If you want to perform several "backward" calls on the same graph, pass "retain_graph=True" to the "backward" call

Disable gradient tracking

The set requires_grad=Truetensor will track the calculation history and support gradient calculation, but in some cases, it is not necessary to do so. For example, after the model has been trained, apply it to the input data, just perform forward propagation _forward_, which can be passed torch.no_grad()Block tracking calculations

z = torch.matmul(x,w) + b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x,w) + b
print(z.requires_grad)
True
False

Another way to have the same effect is to use on tensorsdetach()

z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)
False

Gradient tracking is disabled for the following reasons:

  • Mark some parameters in the neural network as frozen parameters , which is more common in fine-tuning pre-trained networks
  • Only speed up computations are accelerated in forward propagation , and vector calculations without gradient tracking are more efficient

Computational Graphs

Conceptually, autograd records data (tensors) and all operations performed (and new tensors produced) in a directed acyclic graph (DAG) composed of function objects. In this DAG, leaves are input tensors and roots are output tensors. By tracing this graph from root to leaf, gradients can be automatically computed using the chain rule
. During the forward pass, autograd automatically does two things:

  • Run the requested operation to compute the resulting tensor
  • Gradient function _gradient function_ that preserves operations in the DAG

DAG root's .backward()are invoked when autograd:

  • .grad_fnCalculate the gradient according to each
  • Accumulate it into .gradthe attributes of the respective tensors
  • Propagate to leaf tensors using the chain rule

DAGs are dynamic in PyTorch , the graph is created from scratch, after the call .backward(), autograd starts to populate the new graph, this is why control flow statements can be used in the model, at each iteration, the shape, size and operation can be changed

(5) Optimizing model parameters

After having the model and data, it is necessary to optimize the parameters for model training, verification and testing. Training the model is an iterative process. Each iteration (also called an _epoch_), the model will predict the output, calculate the prediction error ( loss ), and collect the derivative of the error on each parameter. These parameters are optimized using gradient descent.

Before data loading and neural network code:

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

train_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(train_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.layers = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        values = self.layers(x)
        return values

model = NeuralNetwork()

Hyperparameters

Hyperparameters are adjustable parameters used to control the model optimization process. Different hyperparameter values ​​can affect model training and convergence speed

Define the following hyperparameters for training:

  • Number of Epochs : number of iterations
  • Batch Size : The number of data samples propagated through the network before parameter update
  • Learning Rate : learning rate
learning_rate = 1e-3
batch_size = 64
epochs = 5

Optimization Loop Optimization Loop

Once the hyperparameters are set, our model can be trained and optimized through an optimization loop. Each iteration of the optimization loop is called an epoch . Each epoch consists of two main parts:

  • The Train Loop : Iterates over the training dataset and tries to converge to the best parameters.
  • The Validation/Test Loop : Iterate over the test dataset to check if the model performance is improving.

loss function

Our untrained network may not give the correct answer when faced with some training data. The loss function measures how different the obtained result is from the target value, and we want to minimize the loss function during training. To compute the loss, we use the input for a given data sample to make a prediction and compare it to the true data label value.

Common loss functions include those suitable for regression tasks nn.MSELoss(mean squared error), those suitable for classification nn.NLLLoss(negative log-likelihood), nn.CrossEntropyLosscombining nn.LogSoftmaxand nn.NLLLoss. use herenn.CrossEntropyLoss

# 初始化损失函数
loss_fn = nn.CrossEntropyLoss()

Optimizer

Optimization is the process of adjusting model parameters at each training step to reduce model error. An optimization algorithm defines how this process is performed (in this case using stochastic gradient descent). All optimization logic is encapsulated in optimizer objects. The SGD optimizer is used here; moreover, there are many different optimizers in pytorch, such as ADAM and RMSProp, which work better for different types of models and data.

# 定义优化器
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In the training loop, optimization has three main steps:

  • Call to optimizer.zero_grad()reset the gradient of the model parameters. The gradient will accumulate by default. In order to prevent repeated calculations, it will be cleared every iteration
  • call loss.backward()backpropagation
  • Once there is a gradient, call optimizer.step()to adjust each parameter value

training loop and test loop

Define train_loopthe training optimization and define test_loopthe performance of the evaluation model on the test set

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset) # 训练集数据总量
    for number, (x, y) in enumerate(dataloader):
        # number迭代次数,每次迭代输入batch=64的张量(64,1,28,28)
        # 计算预测和误差
        pred = model(x)
        loss = loss_fn(pred, y)

        # 反向传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if number % 100 == 0:
            # 每迭代100次,输出当前损失函数值及遍历进度
            loss, current = loss.item(), number * len(x) # current当前已经遍历的图像数,len(x)=batch_size
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset) # 测试集总量
    num_batches = len(dataloader) # 最大迭代次数
    test_loss, correct = 0, 0

    with torch.no_grad():
        for x, y in dataloader:
            pred = model(x)
            test_loss += loss_fn(pred, y).item()
            # 输出如:test_loss=torch.tensor(1.0873)
            # pred.argmax(1)返回值最大值对应的位置,sum()求批量的正确数
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches # 单次迭代的误差总和/总迭代次数=平均误差
    correct /= size # 单次迭代的正确数总和/数据总量=准确率
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
print(len(train_dataloader.dataset))
print(len(train_dataloader))
print(len(test_dataloader.dataset))
print(len(test_dataloader))
x,y = next(iter(train_dataloader))
print(len(x))
print(x.size())
print(y.size())
60000
938
10000
157
64
torch.Size([64, 1, 28, 28])
torch.Size([64])
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 2
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")
Epoch 1
-------------------------------
loss: 1.040251  [    0/60000]
loss: 1.070957  [ 6400/60000]
loss: 0.869483  [12800/60000]
loss: 1.033000  [19200/60000]
loss: 0.908716  [25600/60000]
loss: 0.930925  [32000/60000]
loss: 0.973219  [38400/60000]
loss: 0.913604  [44800/60000]
loss: 0.960071  [51200/60000]
loss: 0.904625  [57600/60000]
Test Error: 
 Accuracy: 67.1%, Avg loss: 0.911718 

Epoch 2
-------------------------------
loss: 0.952776  [    0/60000]
loss: 1.005409  [ 6400/60000]
loss: 0.788150  [12800/60000]
loss: 0.969153  [19200/60000]
loss: 0.852390  [25600/60000]
loss: 0.862806  [32000/60000]
loss: 0.920238  [38400/60000]
loss: 0.863878  [44800/60000]
loss: 0.903000  [51200/60000]
loss: 0.858517  [57600/60000]
Test Error: 
 Accuracy: 68.3%, Avg loss: 0.859433 

Done!

(6) Save and load the model

Finally, learn how to maintain model state by saving, loading, and running model predictions. torchvision.modelsThe subpackages contain model definitions for different tasks, including: image classification, pixel-by-pixel semantic segmentation, object detection, instance segmentation, person keypoint detection, video classification, and optical flow.

import torch
import torchvision.models as models

Save and load model weights

pytorch stores the learned parameters in an internal state dictionary called state_dict, these can torch.savebe retained via the method

# vgg16是一种图像分类的模型结构
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

model = models.vgg16(pretrained=True) # 以vgg16模型为例
torch.save(model.state_dict(), 'model_weights.pth')

To load model weights, you need to create an instance of the same model first, then load_state_dict()load the parameters using the method

model = models.vgg16() # 不指定 pretrained=True,也就是不加载默认参数
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

Before prediction, be sure to call model.eval()the method to set the dropout and batch normalization layers as the evaluation model, otherwise it will lead to inconsistent prediction results

Save and load models

When loading model weights, we need to instantiate the model class first, since this class defines the structure of the network. To save the structure of this class along with the model, pass model(instead model.state_dict()) to the save function:

torch.save(model, 'model.pth')

Load the model:

model = torch.load('model.pth')

This approach uses Python's pickle module when serializing the model, so it relies on the actual class definitions being available when the model is loaded.

Guess you like

Origin blog.csdn.net/feichangyanse/article/details/129371258