Detailed explanation of the backtracking mechanism of Pytorch dynamic graph

Original link: Detailed explanation of the backtracking mechanism of Pytorch dynamic graph

Hello everyone, I am Tai Ge. After a month of serialization, "Mastering PyTorch in 5 Minutes" has introduced the routine operations and computing skills of tensors. The following chapters will enter the deep learning part, which will be presented in a combination of theory and code to help you understand the details.

For the learning and understanding of the dynamic graph backtracking mechanism, we first start with the differential calculation of tensors.

Notice

In this section, we will not distinguish the difference between differential value, derivative value, and gradient value for the time being, and we will make the distinction later when explaining gradient descent. Students who don't understand can understand it as a derivative.

1 Variablewithrequires_grad

TensorSome students will say that classes need to be converted into classes in advance when performing differential operations Variable, but in fact, PyTorch 0.4after the version, Variablethe concept of the concept is gradually weakened, Tensorit is no longer a pure calculation carrier, and differentiability has become Tensora basic attribute of We only need to specify that the tensor can be differentiated Tensorby setting requires_gradthe attribute to when creating.True

x = torch.tensor(1.,requires_grad = True)
x
# tensor(1., requires_grad=True)

At this time, the tensor ais a differentiable tensor, requires_gradwhich is an attribute of it, and can be viewed and modified.

# 查看可微分性
x.requires_grad
# True

# 修改可微分性
x.requires_grad = False

# 再次查看可微分性
x.requires_grad
# False

2 Properties of differentiability

Differentiability is reflected in all operations involving differentiable tensors.

  • requires_gradProperty: Differentiability
# 构建可微分张量
x = torch.tensor(1., requires_grad = True)
x
# tensor(1., requires_grad=True)

# 构建函数关系
y = x ** 2
y
# tensor(1., grad_fn=<PowBackward0>)

We found that the tensor yhas an grad_fnattribute at this time, and the value is <PowBackward0>, we can view the attribute:

y.grad_fn
# <PowBackward0 at 0x200a2047208>

grad_fnThe stored Tensordifferential function can also be said to store the functional relationship of the differentiable tensor during the calculation process. Here, the xpower yoperation is performed, which is powthe method, corresponding to the above return attribute.

# 但x作为初始张量,并没有grad_fn属性
x.grad_fn

It is worth noting that y not only has a power operation relationship with x y = x 2 y = x^2y=x2 , and more importantly,yis itself axtensor computed from tensors:

# 打印y
y
# tensor(1., grad_fn=<PowBackward0>)

For a tensor ( x) generated by a differentiable tensor ( y), it is also differentiable:

y.requires_grad
# True

In contrast x, yit not only has the value of tensor, but also is differentiable, and additionally stores the function calculation information from x to y.

Let's try again around ycreating new function relations: z = y + 1:

z = y + 1
z
# tensor(2., grad_fn=<AddBackward0>)

# z同样可微
z.requires_grad
# True

# z保存了y到z的函数函数add
z.grad_fn
# <AddBackward0 at 0x200a2037648>

It can be found that zthe value is also stored, and it is differentiable, and ythe calculation relationship ( add) of the sum is also stored.

3 Backtracking mechanism

In PyTorchthe tensor calculation process, if we set the initial tensor to be differentiable, then in the calculation process, each new tensor calculated from the original tensor is differentiable, and the previous step will also be saved This is the so-called backtracking mechanism.

According to this backtracking mechanism, we can clearly grasp the calculation process of each step of the tensor, and draw a tensor calculation diagram accordingly.

4 tensor calculation graph

With the help of the backtracking mechanism, we can abstract the complex calculation process of tensors into a graph , such as the three tensors Graphwe defined earlier x, y, , and the calculation relationship between the three can be represented by the following figure.z

Calculation graph definition

The computational graph model is composed of nodes nodesand edges edges. Nodes represent operators, that is, tensors. Edges between nodes represent the functional relationship between tensors, and directions represent the actual operation direction.

node type

In a tensor computation graph, although each node represents a differentiable tensor, there are slight differences from node to node, such as in the previous example:

  • yand zsaved the Function Compute relationship, but xnot
  • can be found zto be the end point of all calculations

Therefore, nodes can be divided into three categories, namely:

  1. Leaf nodes : differentiable tensors of the initial input, in the previous examplex
  2. Output node : the last computed tensor, in the previous examplez
  3. Intermediate nodes : In a calculation graph, except for leaf nodes and output nodes, all others are intermediate nodes. In the previous example,y

In a calculation graph, there can be multiple leaf nodes and intermediate nodes, but in most cases, there is only one output node. If there are multiple output results, we often save them in a vector.

5 Dynamics of Computational Graphs

PyTorchThe calculation graph of the is a dynamic calculation graph, which is automatically generated according to the calculation process of the differentiable tensor, and is continuously updated with the addition of new tensors or operations, which makes the calculation graph of the more flexible, efficient PyTorchand easy to build.

The static graph ( TF1) needs to build the calculation flow chart first, and then execute the session after passing in the data. For specific code comparison and analysis, see how to choose between TF and PyTorch .

Original link: Detailed explanation of the backtracking mechanism of Pytorch dynamic graph

Guess you like

Origin blog.csdn.net/Antai_ZHU/article/details/121904172