Pytorch study notes 02_ automatic derivation and related problem record solutions


1. Related Articles

Preliminary understanding of articles about automatic derivation in pytorch
:
https://www.cnblogs.com/cocode/p/10746347.html

These two articles are more detailed:
https://www.cnblogs.com/charleechan/p/12255191.html
https://zhuanlan.zhihu.com/p/51385110

Example:
http://blog.sina.com.cn/s/blog_573ef4200102xcxr.html

Second, the general idea

(1) Main principles

Pytorch will form a dynamic calculation graph (DAG) for the calculation process of variables to record the whole process of calculation. When automatically deriving, that is, the root node automatically derivates the leaf nodes.
insert image description here
insert image description here

(2) Key process

  1. To track all operations on a tensor, set its .requires_grad to True .
  2. To calculate the gradient of a tensor, call .backward() on it after getting a Tensor , and all gradients of this tensor will be automatically accumulated to the .grad attribute.
  3. To prevent a tensor from tracking history, call the .detach() method to detach it from its computation history and disable tracking of its future computation records.

(3) Take the derivation of z with respect to x as an example

  1. Set the requires_grad of x to True during initialization
  2. call z.backward()
  3. Just check x.grad

Note:
It is necessary to ensure that x is the leaf node of the calculation graph (you can view it through x.is_leaf), and z is the root node of the calculation graph.
If it is not a leaf node, you can save the gradient through .retain_grad()

(4) A little explanation of z.backward()

  1. If you are asking for a scalar, then the gradients default to None, so you can call z.backward () directly.
  2. If you are asking for a tensor, then gradients should pass in a Tensor.

  Generally speaking, I am deriving scalars. For example, in the neural network, our loss will be a scalar. Then we let the loss derivate the parameter w of the neural network directly through loss.backward().

  However, sometimes we may have multiple output values, such as loss=[loss1, loss2, loss3], then we can let each component of loss be derived from x, and this time we use:
loss.backward(torch.tensor([[1.0,1.0,1.0,1.0]]))

  If you want different components to have different weights, then just give gradients different values, for example:
loss.backward(torch.tensor([[0.1,1.0,10.0,0.001]]))

(5) Move some calculations outside the calculation graph

Sometimes, we want to move some computations outside of the recorded computation graph. For example, suppose y is computed as a function of x and z is computed as a function of y and x. Imagine we want to compute the gradient of z with respect to x, but for some reason we want to treat y as a constant and only take into account x's contribution after y has been computed.

Here, we can detach y to return a new variable u that has the same value as y, but discard any information about how y was computed in the computation graph. In other words, the gradient does not flow backwards through u to x. Therefore, the following backpropagation function computes z = u ∗ xz=u*xz=uThe partial derivative of x with respect to x while treating u as a constant instead ofz = x ∗ x ∗ xz=x*x*xz=xxThe partial derivative of x with respect to x .

x.grad.zero_()
y = x * x
u = y.detach()
z = u * x

z.sum().backward()
x.grad == u

---------------------
tensor([True, True, True, True])

Three, encounter problems

Question 1: Non-computational graph leaf nodes

insert image description here

x = torch.ones(1, requires_grad=True)*3
y = torch.ones(1, requires_grad=True)*4

z = torch.pow(x, 2) + 3*torch.pow(y, 2)
z.backward()
print(x.grad)        # x = 3 时, dz/dx=2x=2*3=6
print(y.grad)       # y = 4 时, dz/dy=6y=6*4=24

报错:

UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won’t be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.

Reason: x, y has a one-step multiplication calculation at the beginning, and is no longer a leaf node of the calculation graph

x = torch.ones(1, requires_grad=True)*3
y = torch.ones(1, requires_grad=True)*4
print(x.is_leaf)

------
False

By looking at the is_leaf attribute of x, it is False

When initializing, remove the multiplication operation, is_leaf is True

x = torch.ones(1, requires_grad=True)
y = torch.ones(1, requires_grad=True)
print(x.is_leaf)

------
True

Solutions (specific analysis of specific problems, the following are just some feasible solutions):

Solution 1:
Set retain_grad() to retain the gradient

import torch
from torch.autograd import Variable

x = Variable(torch.ones(1)*3, requires_grad=True)
y = Variable(torch.ones(1)*4, requires_grad=True)
print(x.is_leaf)

x = torch.ones(1, requires_grad=True)*3  # 初始化是有乘法运算,此时x已经不是计算图叶子节点
y = torch.ones(1, requires_grad=True)*4
print(x.is_leaf)
x.retain_grad()
y.retain_grad()
z = torch.pow(x, 2) + 3*torch.pow(y, 2)
z.backward()

print(x.grad)        # x = 3 时, dz/dx=2x=2*3=6
print(y.grad)       # y = 4 时, dz/dy=6y=6*4=24

Solution 2:

from torch.autograd import Variable

x = Variable(torch.ones(1)*3, requires_grad=True)
y = Variable(torch.ones(1)*4, requires_grad=True)
print(x.is_leaf)

z = torch.pow(x, 2) + 3*torch.pow(y, 2)
z.backward()
print(x.grad)        # x = 3 时, dz/dx=2x=2*3=6
print(y.grad)       # y = 4 时, dz/dy=6y=6*4=24

-----------
True
tensor([6.])
tensor([24.])

Solution 3:
with torch.no_grad(): plus in-place operation

import torch

x = torch.ones(1, requires_grad=True)
y = torch.ones(1, requires_grad=True)

with torch.no_grad():
    x *= 3
    y *= 4

z = torch.pow(x, 2) + 3*torch.pow(y, 2)
z.backward()

print(x.is_leaf)
print(x)
print(y)
print(x.grad)

Similar question: https://stackoverflow.com/questions/65532022/lack-of-gradient-when-creating-tensor-from-numpy

Problem 2: The gradient is not automatically cleared (use of .grad.zero())

Explanation of gradient automatic accumulation in pytorch

import torch

w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

for i in range(3):

    a = torch.add(w, x)
    b = torch.add(w, 1)
    y = torch.mul(a, b)

    y.backward()
    print(w.grad)

结果:
tensor([5.])
tensor([10.])
tensor([15.])

After adding w.grad.zero_():

import torch

w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

for i in range(3):

    a = torch.add(w, x)
    b = torch.add(w, 1)
    y = torch.mul(a, b)

    y.backward()
    print(w.grad)

	w.grad.zero_()
结果:
tensor([5.])
tensor([5.])
tensor([5.])

Problem 3: .data and .detach()

The difference between .data and .detach

Question 4: pytorch inplace operation

torch.no_grad & inplace operation knowledge points

Guess you like

Origin blog.csdn.net/zzhhjjjj/article/details/112912971