PyTorch Tutorials 2 AUTOGRAD: AUTOMATIC DIFFERENTIATION

%matplotlib inline

Autograd(自动求导): Automatic Differentiation(自动微分)

Central to all neural networks in PyTorch is the autograd package.
Let’s first briefly visit this, and we will then go to training our
first neural network.

The autograd package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.

Let us see this in more simple terms with some examples.

PyTorch中所有神经网络的核心是autograd包。让我们先简单地看一下这个,然后我们来训练我们的第一个神经网络。
autograd包为张量上的所有操作提供自动微分。它是一个按运行定义的框架,这意味着您的支持是由代码的运行方式定义的,并且每个迭代都可以是不同的。
让我们用更简单的术语和一些例子来看看。

Tensor

torch.Tensor is the central class of the package. If you set its attribute
.requires_grad as True, it starts to track all operations on it. When
you finish your computation you can call .backward() and have all the
gradients computed automatically. The gradient for this tensor will be
accumulated into .grad attribute.

torch.Tensor是包的中心类。如果将其属性.requires_grad设置为True,它将开始跟踪其上的所有操作。当您完成计算时,您可以调用. backer()并自动计算所有的梯度。这个张量的梯度将累积为.grad属性。

To stop a tensor from tracking history, you can call .detach() to detach
it from the computation history, and to prevent future computation from being
tracked.

要阻止张量跟踪历史,可以调用.detach()将其从计算历史中分离出来,并防止跟踪未来的计算。

To prevent tracking history (and using memory), you can also wrap the code block
in with torch.no_grad():. This can be particularly helpful when evaluating a
model because the model may have trainable parameters with requires_grad=True,
but for which we don't need the gradients.

为了防止跟踪历史(和使用内存),还可以使用torch.no_grad():将代码块封装起来。这在评估模型时特别有用,因为模型可能有requires_grad=True的可训练参数,但我们不需要梯度。

There’s one more class which is very important for autograd
implementation - a Function.

还有一个类对autograd实现非常重要——Function

Tensor and Function are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each tensor has
a .grad_fn attribute that references a Function that has created
the Tensor (except for Tensors created by the user - their
grad_fn is None).

张量和函数是相互联系的,并建立一个无环图,它编码了一个完整的计算历史。每个张量都有一个.grad_fn属性,该属性引用一个创建了张量的函数(用户创建的张量除外——它们的grad_fn是None)。

If you want to compute the derivatives, you can call .backward() on
a Tensor. If Tensor is a scalar (i.e. it holds a one element
data), you don’t need to specify any arguments to backward(),
however if it has more elements, you need to specify a gradient
argument that is a tensor of matching shape.

如果你想计算导数,你可以在一个张量上调用. reverse()。如果张量是一个标量(即它包含一个元素数据),您不需要指定任何向后()的参数,但是如果它有更多的元素,您需要指定一个梯度参数,这是一个匹配形状的张量。

import torch

Create a tensor and set requires_grad=True to track computation with it

x = torch.ones(2, 2, requires_grad=True)
print(x)
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

Do an operation of tensor:

y = x + 2
print(y)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

y was created as a result of an operation, so it has a grad_fn.

print(y.grad_fn)
<AddBackward0 object at 0x7f34eebf07b8>

Do more operations on y

z = y * y * 3
out = z.mean()

print(z, out)
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)

.requires_grad_( ... ) changes an existing Tensor's requires_grad
flag in-place. The input flag defaults to False if not given.

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
False
True
<SumBackward0 object at 0x7f34933dacc0>

Gradients

Let's backprop now
Because out contains a single scalar, out.backward() is
equivalent to out.backward(torch.tensor(1)).

out.backward()

print gradients d(out)/dx

print(x.grad)
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

You should have got a matrix of 4.5. Let’s call the out
Tensor\(o\)”.
We have that \(o = \frac{1}{4}\sum_i z_i\),
\(z_i = 3(x_i+2)^2\) and \(z_i\bigr\rvert_{x_i=1} = 27\).
Therefore,
\(\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)\), hence
\(\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5\).

You can do many crazy things with autograd!

x = torch.randn(3, requires_grad=True)
print(x)
y = x * 2
print(y)
while y.data.norm() < 1000:
    y = y * 2

print(y)
tensor([ 0.0147, -1.4388,  1.3875], requires_grad=True)
tensor([ 0.0295, -2.8775,  2.7750], grad_fn=<MulBackward0>)
tensor([   7.5405, -736.6459,  710.3948], grad_fn=<MulBackward0>)
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)

print(x.grad)
tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])

You can also stop autograd from tracking history on Tensors
with .requires_grad=True by wrapping the code block in
with torch.no_grad():

print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)
True
True
False

Read Later:

Documentation of autograd and Function is at
http://pytorch.org/docs/autograd

猜你喜欢

转载自www.cnblogs.com/chenxiangzhen/p/10958794.html