[Pytorch 3.2 autograd] autograd summary

Note link: Autograd link The
following is the content of the note
3.2 Autograd
torch.autograd is a set of automatic derivation engine specially developed for the convenience of user calculations, which can automatically construct a calculation graph according to the input and forward propagation process, and perform back propagation.
3.2 .1 Variable
from future import print_function
import torch as t
from torch.autograd import Variable as V
“”"--------------------"""

To create Variable from tensor, you need to specify the derivative

a = V(t.ones(3, 4), requires_grad =True)
a
“”"--------------------"""
b = V(t.zeros(3,4))
b
“”"--------------------"""

The use of functions is consistent with Tensor

c = a + b

c = a.add(b)
c
“”"--------------------"""
d = c.sum()
d.backward() # 反向传播
“”"--------------------"""

Pay attention to the difference between the two

The former becomes tensor after taking data, and calculates sum from tensor to get float

The latter is still Variable after calculating sum

c.data.sum(), c.sum()
“”"--------------------"""
a.grad
“”"--------------------"""

Although it is not specified that C requires guidance, C depends on A, and A requires guidance

Therefore, the requirements_grad attribute of c will be automatically set to true

a.requires_grad, b.requires_grad, c.requires_grad,d.requires_grad
“”"--------------------"""

The variable created by the user belongs to the leaf node, and the corresponding grad_fn is none

a.is_leaf,b.is_leaf,c.is_leaf,d.is_leaf
"""--------------------"""
#c.grad is none, c is not Leaf node, its gradient is used to calculate the gradient of A
# Although c.requeires_grad = True, it will be released immediately after the gradient calculation
c.grad is None
The difference between manual derivation and Atograd
from future import print_function
import torch as t
from torch.autograd import Variable as V

def f(x):
“”“计算y”""
y = x ** 2 * t.exp(x)
return y

def gradf(x):
“手动求导”
dx = x * 2 t.exp(x) + x**2t.exp(x)
return dx
“”"--------------------"""
x = V(t.randn(3,4),requires_grad = True)
y = f(x)
y
“”"--------------------"""
y.backward(t.ones(y.size())) # grad_variable形状与Y一致
x.grad
“”"--------------------"""

The formula calculated by autograd is the same as the result of manual calculation

gradf(x)
autograd的细节:
from future import print_function
import torch as t
from torch.autograd import Variable as V

x = V(t.ones(1))
b = V(t.rand(1),requires_grad =True)
w = V(t.rand(1),requires_grad =True)
y = w * x # 等价于 y = w.mul(x)
z = y + b # 等价于 z = y.add(b)
“”"--------------------"""
x.requires_grad, b.requires_grad,w.requires_grad
“”"--------------------"""

Although y.requires_grad is not specified as True, because y depends on w that needs to be guided

Therefore, y.requires_grad is True

y.requires_grad
“”"--------------------"""
x.is_leaf,w.is_leaf,b.is_leaf,y.is_leaf,z.is_leaf
“”"--------------------"""

next_functions saves the input of grad_fn, which is a tuple, and the elements of tuple are also Function

The first one is y, which is the output of multiplication (mul), so the corresponding backpropagation function y.grad_fn is MulBackward

The second is b, it is a leaf node, created by the user, grad_fn is None, but there is

z.grad_fn.next_functions
“”"--------------------"""

The grad_fn of variable corresponds to the function in the figure

z.grad_fn.next_functions[0][0] == y.grad_fn
“”"--------------------"""

The grad_fn of the leaf node is None

When w.grad_fn,x.grad_fn
calculate the gradient of w, the value of x (${\partial y\over \partial w} = x $) is used. These values ​​will be saved as a buffer in the forward process. It will be cleared automatically after calculating the gradient. In order to be able to backpropagate multiple times, you need to specify retain_graph to retain these buffers.
from future import print_function
import torch as t
from torch.autograd import Variable as V

Use retain_graph to save buffer

z.backward(retain_graph=True)
w.grad
“”"--------------------"""

Multiple back propagation, gradient accumulation, which is the meaning of the AccumulateGrad logo in w

z.backward()
w.grad
PyTorch uses a dynamic graph. Its calculation graph is built from scratch every time it is propagated forward, so it can use Python control statements (such as for, if, etc.) to create calculations on demand Figure. This is very useful in the field of natural language processing, it means that you don't need to construct all possible graph paths in advance, the graph is constructed at runtime.
from future import print_function
import torch as t
from torch.autograd import Variable as V

def abs(x):
if x.data[0] > 0:return x
else:return -x

x = t.ones(1, requires_grad =True)
y = abs(x)
y.backward()
x.grad
“”"--------------------"""
x = -1 * t.ones(1)
x = x.requires_grad_()
y = abs(x)
y.backward()
x.grad
“”"--------------------"""
def f(x):
result = 1
for ii in x:
if ii.item()>0: result=ii*result
return result
x = t.arange(-2.0,4.0,requires_grad=True)

y = f(x) # y = x[3]*x[4]*x[5]
y.backward()
x.grad
“”"---------------- ----"""
Sometimes we may not want autograd to differentiate tensor. Considering that derivation needs to cache many intermediate structures and add additional memory/video memory overhead, then we can turn off automatic derivation. For scenarios that do not require backpropagation (such as inference, that is, when testing inference), turning off automatic derivation can achieve a certain degree of speed improvement and save about half of the video memory, because it does not need to allocate space to calculate the gradient.
x = t.ones(1, requires_grad=True)
w = t.rand(1, requires_grad=True)
y = x * w

y depends on w, and w.requires_grad = True

x.requires_grad, w.requires_grad, y.requires_grad
“”"--------------------"""
with t.no_grad():
x = t.ones(1)
w = t.rand(1, requires_grad = True)
y = x * w

y depends on w and x. Although w.requires_grad = True, y's requirements_grad is still False

x.requires_grad, w.requires_grad, y.requires_grad
“”"--------------------"""

Equivalent to t.no_grad()

t.set_grad_enabled(False)
x = t.ones(1)
w = t.rand(1, requires_grad = True)
y = x * w

y depends on w and x. Although w.requires_grad = True, y's requirements_grad is still False

x.requires_grad, w.requires_grad, y.requires_grad

In the back propagation process, the non-leaf node's derivative is cleared after the calculation is completed. If you want to view the gradient of these variables, there are two methods:
use the autograd.grad function.
Use the hook
autograd.grad and hook methods are both very powerful tools. For more detailed usage, refer to the official api documentation. Here is an example to illustrate the basic usage. The hook method is recommended, but in actual use, you should try to avoid modifying the grad value.
x = t.ones(3, requires_grad=True)
w = t.rand(3, requires_grad=True)
y = x * w

y depends on w, and w.requires_grad = True

z = y.sum()
x.requires_grad, w.requires_grad, y.requires_grad
“”"--------------------"""

The non-leaf node grad is automatically cleared after calculation, y.grad is None

z.backward()
(x.grad, w.grad, y.grad)
“”"--------------------"""

The first method: use grad to get the gradient of the intermediate variable

x = t.ones(3, requires_grad=True)
w = t.rand(3, requires_grad=True)
y = x * w
z = y.sum()

The gradient of z to y, implicitly calling backward()

t.autograd.grad(z, y)
“”"--------------------"""

The second method: use hook

Hook is a function, the input is gradient, there should be no return value

def variable_hook(grad):
print('gradient of y:',grad)

x = t.ones(3, requires_grad=True)
w = t.rand(3, requires_grad=True)
y = x * w

Register hook

hook_handle = y.register_hook(variable_hook)
z = y.sum()
z.backward()

Unless you need to use the hook every time, remember to remove the hook after using it

hook_handle.remove()
“”"--------------------"""

The meaning of the grad attribute in variable and the grad_variables parameter of the backward function
x = t.arange(0,3, requires_grad=True)
y = x 2 + x*2
z = y.sum()
z.backward() # reverse from z To propagate
x.grad
"""--------------------"""
x = t.arange(0,3, requires_grad=True)
y = x
2 + x*2
z = y.sum()
y_gradient = t.Tensor([1,1,1]) # dz/dy
y.backward(y_gradient)
#Backward propagation from y x.grad
uses Variable to implement linear regression
import torch as t
%matplotlib inline
from matplotlib import pyplot as plt
from IPython import display
import numpy as np
“”"--------------------"""

Set the random number seed, so that the following output is consistent when running on different computers

t.manual_seed(1000)
def get_fake_data(batch_size=8):
''' Generate random data: y = x*2 + 3, plus some noise'''
x = t.rand(batch_size,1) * 5
y = x * 2 + 3 + t.randn(batch_size, 1)
return x, y
“”"--------------------"""

Let’s see what the xy distribution looks like

x, y = get_fake_data()
plt.scatter(x.squeeze().numpy(), y.squeeze().numpy())
“”"--------------------"""

Random initialization parameters

w = t.rand(1,1, requires_grad=True)
b = t.zeros(1,1, requires_grad=True)
losses = np.zeros(500)

lr =0.005 # learning rate

for ii in range(500):
x, y = get_fake_data(batch_size=32)

# forward:计算loss
y_pred = x.mm(w) + b.expand_as(y)
loss = 0.5 * (y_pred - y) ** 2
loss = loss.sum()
losses[ii] = loss.item()

# backward:手动计算梯度
loss.backward()

# 更新参数
w.data.sub_(lr * w.grad.data)
b.data.sub_(lr * b.grad.data)

# 梯度清零
w.grad.data.zero_()
b.grad.data.zero_()

if ii%50 ==0:
    # 画图
    display.clear_output(wait=True)
    x = t.arange(0, 6).view(-1, 1)
	x = t.tensor(x, dtype=t.float32)  # 要注意格式
    y = x.mm(w.data) + b.data.expand_as(x)
    plt.plot(x.numpy(), y.numpy()) # predicted
    
    x2, y2 = get_fake_data(batch_size=20) 
    plt.scatter(x2.numpy(), y2.numpy()) # true data
    
    plt.xlim(0,5)
    plt.ylim(0,13)   
    plt.show()
    plt.pause(0.5)

print(w.item(), b.item())
“”"--------------------"""
plt.plot(losses)
plt.ylim(5,50)

Guess you like

Origin blog.csdn.net/Leomn_J/article/details/113122576