PyTorch basics (two)--automatic derivative Autograd

I. Introduction

As mentioned in the previous article, PyTorch provides two important advanced functions, namely:

Tensor calculations with powerful GPU acceleration (such as NumPy)
Deep neural network including automatic derivation system

We will use the first feature in later learning. Let's not list it here for the time being. Let's first discuss and study what is the automatic derivative Autograd and what is the principle of the automatic derivative Autograd.

Of course, we must first ensure that the torch package is imported correctly

import torch
# 打印一下PyTorch的版本
torch.__verson__

PyTorch's Autograd module implements the back-propagation to find the gradient in the deep learning algorithm. Autograd can automatically provide differentiation for all operations on tensors (Tensor class), which simplifies the complex process of manually calculating derivatives.

When the tensor is created, by setting the requires_grad flag as True, tell Pytorch that the tensor needs to be automatically derived. PyTorch will record the history of each step of the tensor and automatically calculate it.

First, we create a new tensor x

# requires_grad = True表示告诉Pytorch需要对该张量进行自动求导，
# 并且PyTorch会记录该张量的每一步操作历史并自动计算。
x = torch.rand(5,5,requires_grad = True)
x

Insert picture description here

PyTorch will automatically track and record all operations on the tensor. When the calculation is completed, the .backward() method is called to automatically calculate the gradient and save the calculation result to the grad attribute.

Then we create another tensor y

y = torch.rand(5,5,requires_grad = True)
y

Insert picture description here

Second, what is Autograd

1. When it comes to large neural networks, we are not good at calculus. It is a fact that we cannot efficiently calculate a very complex calculus anytime and anywhere. Therefore, it is unrealistic to calculate the gradient of such a large composite function by explicitly solving mathematical equations, especially since these curves exist in a large number of dimensions, which is incomprehensible. This is where PyTorch's Autograd comes into play. It abstracts complex mathematics and helps us "magically" calculate the gradient of high-dimensional curves with only a few lines of code.
2. Each tensor has a .grad_fn attribute. If the tensor is manually created by the user, then the grad_fn of the tensor is None .
3. After the tensor is operated on, grad_fn has been assigned a new function, which references a Function object that created the Tensor class. Tensor and Function are connected to each other to generate an acyclic graph, which records and encodes the complete calculation history.

2.1 Simple automatic derivation

Case 1

z = torch.sum(x+y)
z

Automatic derivative

z.backward()
print(x.grad,y.grad)

Insert picture description here

If the Tensor class represents a scalar (that is, it contains a tensor with one element), you do not need to specify any parameters for backward(), but if it has more elements, you need to specify a gradient parameter, which is shape matching Tensor. The above z.backward() is equivalent to the abbreviation of z.backward(torch.tensor(1.)). This kind of parameter often appears in single-label classification in image classification, and outputs a scalar representing the label of the image.

2.2 Complex automatic derivation

Case 2

x = torch.rand(5, 5, requires_grad=True)
y = torch.rand(5, 5, requires_grad=True)
z= x**2+y**3
z

Insert picture description here

Automatic derivative

#我们的返回值不是一个标量，所以需要输入一个大小相同的张量作为参数，这里我们用ones_like函数根据x生成一个张量
z.backward(torch.ones_like(x))
print(x.grad)

Insert picture description here
We can use the with torch.no_grad() context manager to temporarily prohibit automatic derivation of tensors that have been set with requirements_grad = True. This method is often used when calculating the accuracy of the test set.

with torch.no_grad():
    print((x + y**2).requires_grad)
    # 会打印False，说明临时禁制对已设置requires_grad = True的张量进行自动求导

Three, a simple analysis of the automatic derivation Autograd process

In order to illustrate the principle of Pytorch's automatic derivation, let's try to analyze the source code of PyTorch. Although Pytorch's Tensor and TensorBase are implemented using CPP, some Python methods can be used to view the properties and status of these objects in Python. . Python's dir() function returns a list of parameter attributes and methods. The z in the above case 2 is a Tensor variable, let's see which member variables are in it.

dir(z)

A large number of things will be printed, and many will be returned. We directly exclude some special methods (starting and ending with __) and private methods (starting with _), just look at a few more main attributes:

.is_leaf : Record whether it is a leaf node. Use this attribute to determine the type of this variable. The "graph leaves" and "leaf variables" mentioned in the official documentation refer to variables such as x and y that are manually created instead of calculated. These variables are called Create variables . Like z, the result obtained after calculation is called the result variable .
To determine whether a Tensor variable is a creation variable or a result variable, you can pass.is_leaf

print('x.is_leaf=' + str(x.is_leaf)) # 打印True
print('z.is_leaf=' + str(z.is_leaf)) # 打印False

x is manually created and failed calculations, so it is regarded as a leaf node, that is, a creation variable, and z is obtained through a series of calculations of x and y, so it is not a leaf node or a result variable.

grad_fn : Each tensor has a grad_fn attribute to save the tensor. If the tensor is created by the user, grad_fn is None.
Why do we update x.grad and y.grad when we execute the z.backward() method? The .grad_fn attribute records this part of the operation. Although the .backward() method is also implemented by CPP, it can be simply explored through Python.

z.grad_fn

Insert picture description here
grad_fn is a variable of type AddBackward0. The AddBackward0 class is also written in Cpp, but we can probably know from the name that it is the reverse of addition (ADD). Let's see what's in it.

dir(z.grad_fn)

Output result

['__call__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_register_hook_dict',
 'metadata',
 'name',
 'next_functions',
 'register_hook',
 'requires_grad']

next_functions is the essence of grad_fn

z.grad_fn.next_functions

Insert picture description here
next_functions is a tuple of tuple of PowBackward0 and int.
Why are 2 tuples? Because our operation is z = x**2+y**3. The AddBackward0 just now is the addition, and the previous operation is the power PowBackward0. The first element of the tuple is the operation record related to x

View the first element of the first tuple of z's grad_fn attribute next_functions

xg = z.grad_fn.next_functions[0][0]
dir(xg)

Output result

['__call__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_register_hook_dict',
 'metadata',
 'name',
 'next_functions',
 'register_hook',
 'requires_grad']

Continue to analyze, check next_functions of xg

x_leaf=xg.next_functions[0][0]
type(x_leaf)

Output result

AccumulateGrad

In PyTorch's reverse graph calculation, the AccumulateGrad type represents the leaf node type, which is the end node of the calculation graph. There is a .variable attribute in the AccumulateGrad class that points to the leaf node.

x_leaf.variable

The .variable attribute is our generated variable x

So the whole procedure is very clear:

1. When we execute z.backward(). This operation will call the attribute grad_fn in z to perform the derivative operation.
2. This operation will traverse the next_functions of grad_fn, and then take out the Function (AccumulateGrad) inside, and perform the derivative operation. This part is a recursive process until the final type is a leaf node.
3. After calculating the result, save the result to the grad attribute of the object (x and y) referenced by their corresponding variable.
4. End of derivation. The grad variables of all leaf nodes have been updated accordingly.
Finally, after we execute c.backward(), the grad values in a and b have been updated.
First, let’s analyze this case and use this case to further understand the workflow of Autograd.

# 导入相应的包
import torch
import torch.autograd

# 创建张量x、y
x = torch.tensor([1.0],requires_grad = True)
y  =torch.tensor([2.0],requires_grad = False)
z = x * y

# 求解x的梯度
z.backward(torch.ones_like(x))
x.grad

Output result

x.grad =  tensor([2.])

Insert picture description here
Image source: https://zhuanlan.zhihu.com/p/148669484

It can be seen from the calculation graph that in a tensor:

The stored data is stored in data
Save this gradient in grad
requires_grad indicates whether to start tracking all operation history

Autograd specific workflow:
x and y are tensors created by ourselves, let x and y perform multiplication operations to generate tensor z.

1. When we call the .backward() method, the operation will call the grad_fn attribute in the tensor z. As we said earlier, the grad_fn attribute is used to save the operation of the tensor. If the tensor is a created tensor, then the grad_fn attribute of the tensor is None.
2. This operation will traverse the next_functions of grad_fn, and then take out the Function (AccumulateGrad) inside, and perform the derivative operation. This part is a recursive process until the final type is a leaf node.
3. After calculating the result, save the result to the grad attribute of the object (x) referenced by their corresponding variable.
The derivation is over. The grad variables of all leaf nodes have been updated accordingly.
4. Finally, when we finish executing z.backward(), the grad value in x is updated.

Fourth, extend Autograd

If you need to customize autograd to extend new functions, you need to extend the Function class. Because Function uses autograd to calculate the result and gradient, and encode the operation history. The most important methods in the Function class are forward() and backward(), which represent forward propagation and backward propagation, respectively.

A custom Function requires the following three methods:

__init__ (optional): If this operation requires additional parameters, you need to define the Function constructor. If you don't need it, you can ignore it.
forward(): Perform calculation code for forward propagation
backward(): The code for gradient calculation during backpropagation. The number of parameters is the same as the number of forward return values, and each parameter represents the gradient returned to this operation.

# 引入Function便于扩展
from torch.autograd.function import Function

# 定义一个乘以常数的操作(输入参数是张量)
# 方法必须是静态方法，所以要加上@staticmethod 
class MulConstant(Function):
    @staticmethod 
    def forward(ctx, tensor, constant):
        # ctx 用来保存信息这里类似self，并且ctx的属性可以在backward中调用
        ctx.constant=constant
        return tensor *constant
    @staticmethod
    def backward(ctx, grad_output):
        # 返回的参数要与输入的参数一样.
        # 第一个输入为3x3的张量，第二个为一个常数
        # 常数的梯度必须是 None.
        return grad_output, None

After defining our new operation, let’s test it

a=torch.rand(3,3,requires_grad=True)
b=MulConstant.apply(a,5)
print("a:"+str(a))
print("b:"+str(b)) # b为a的元素乘以5

Backpropagation, the return value is not a scalar, so the backward method requires parameters

b.backward(torch.ones_like(a))
a.grad