Original link: Detailed explanation of the backtracking mechanism of Pytorch dynamic graph
Hello everyone, I am Tai Ge. After a month of serialization, "Mastering PyTorch in 5 Minutes" has introduced the routine operations and computing skills of tensors. The following chapters will enter the deep learning part, which will be presented in a combination of theory and code to help you understand the details.
For the learning and understanding of the dynamic graph backtracking mechanism, we first start with the differential calculation of tensors.
Notice
In this section, we will not distinguish the difference between differential value, derivative value, and gradient value for the time being, and we will make the distinction later when explaining gradient descent. Students who don't understand can understand it as a derivative.
1 Variable
withrequires_grad
Tensor
Some students will say that classes need to be converted into classes in advance when performing differential operations Variable
, but in fact, PyTorch 0.4
after the version, Variable
the concept of the concept is gradually weakened, Tensor
it is no longer a pure calculation carrier, and differentiability has become Tensor
a basic attribute of We only need to specify that the tensor can be differentiated Tensor
by setting requires_grad
the attribute to when creating.True
x = torch.tensor(1.,requires_grad = True)
x
# tensor(1., requires_grad=True)
At this time, the tensor a
is a differentiable tensor, requires_grad
which is an attribute of it, and can be viewed and modified.
# 查看可微分性
x.requires_grad
# True
# 修改可微分性
x.requires_grad = False
# 再次查看可微分性
x.requires_grad
# False
2 Properties of differentiability
Differentiability is reflected in all operations involving differentiable tensors.
requires_grad
Property: Differentiability
# 构建可微分张量
x = torch.tensor(1., requires_grad = True)
x
# tensor(1., requires_grad=True)
# 构建函数关系
y = x ** 2
y
# tensor(1., grad_fn=<PowBackward0>)
We found that the tensor y
has an grad_fn
attribute at this time, and the value is <PowBackward0>
, we can view the attribute:
y.grad_fn
# <PowBackward0 at 0x200a2047208>
grad_fn
The stored Tensor
differential function can also be said to store the functional relationship of the differentiable tensor during the calculation process. Here, the x
power y
operation is performed, which is pow
the method, corresponding to the above return attribute.
# 但x作为初始张量,并没有grad_fn属性
x.grad_fn
It is worth noting that y not only has a power operation relationship with x y = x 2 y = x^2y=x2 , and more importantly,y
is itself ax
tensor computed from tensors:
# 打印y
y
# tensor(1., grad_fn=<PowBackward0>)
For a tensor ( x
) generated by a differentiable tensor ( y
), it is also differentiable:
y.requires_grad
# True
In contrast x
, y
it not only has the value of tensor, but also is differentiable, and additionally stores the function calculation information from x to y.
Let's try again around y
creating new function relations: z = y + 1
:
z = y + 1
z
# tensor(2., grad_fn=<AddBackward0>)
# z同样可微
z.requires_grad
# True
# z保存了y到z的函数函数add
z.grad_fn
# <AddBackward0 at 0x200a2037648>
It can be found that z
the value is also stored, and it is differentiable, and y
the calculation relationship ( add
) of the sum is also stored.
3 Backtracking mechanism
In PyTorch
the tensor calculation process, if we set the initial tensor to be differentiable, then in the calculation process, each new tensor calculated from the original tensor is differentiable, and the previous step will also be saved This is the so-called backtracking mechanism.
According to this backtracking mechanism, we can clearly grasp the calculation process of each step of the tensor, and draw a tensor calculation diagram accordingly.
4 tensor calculation graph
With the help of the backtracking mechanism, we can abstract the complex calculation process of tensors into a graph , such as the three tensors Graph
we defined earlier x
, y
, , and the calculation relationship between the three can be represented by the following figure.z
Calculation graph definition
The computational graph model is composed of nodes nodes
and edges edges
. Nodes represent operators, that is, tensors. Edges between nodes represent the functional relationship between tensors, and directions represent the actual operation direction.
node type
In a tensor computation graph, although each node represents a differentiable tensor, there are slight differences from node to node, such as in the previous example:
y
andz
saved the Function Compute relationship, butx
not- can be found
z
to be the end point of all calculations
Therefore, nodes can be divided into three categories, namely:
- Leaf nodes : differentiable tensors of the initial input, in the previous example
x
- Output node : the last computed tensor, in the previous example
z
- Intermediate nodes : In a calculation graph, except for leaf nodes and output nodes, all others are intermediate nodes. In the previous example,
y
In a calculation graph, there can be multiple leaf nodes and intermediate nodes, but in most cases, there is only one output node. If there are multiple output results, we often save them in a vector.
5 Dynamics of Computational Graphs
PyTorch
The calculation graph of the is a dynamic calculation graph, which is automatically generated according to the calculation process of the differentiable tensor, and is continuously updated with the addition of new tensors or operations, which makes the calculation graph of the more flexible, efficient PyTorch
and easy to build.
The static graph ( TF1
) needs to build the calculation flow chart first, and then execute the session after passing in the data. For specific code comparison and analysis, see how to choose between TF and PyTorch .
Original link: Detailed explanation of the backtracking mechanism of Pytorch dynamic graph