1. Autograd: automatic differentiation
The core of all neural networks in PyTorch is the autograd
package. We first briefly introduce this package, and then train our first neural network.
autograd
The package provides automatic differentiation for all operations on the tensor. It is a framework defined at runtime, which means that backpropagation is defined according to how your code runs, and each iteration can be different.
Next we use some simple examples to see this package:
Tensor
torch.Tensor
It is the core class of the package. If you set its property .requires_grad
to True, it will start tracking all operations on it. After completing the calculation, you can call .backward()
and automatically calculate all gradients. The gradient of this tensor will be accumulated into the .grad
attribute.
To prevent a tensor from tracking history, you can call to .detach()
separate it from the calculation history and prevent future calculations from being tracked.
To prevent tracking history (and memory usage), you can also use torch.no_grad() to wrap the code block: this can be particularly useful when evaluating the model, because the model may have requires_grad = True
trainable parameters, but we don't need gradients.
There is another class that is very important for autograd implementation-Function.
Tensor and Function are connected to each other and construct an acyclic graph to construct a complete calculation process. Each tensor has an .grad_fn
attribute that refers to the Function of the created Tensor (except for the Tensors created by the user-they grad_fn
are None
).
If you want to calculate the derivative, you can call it on Tensor .backward()
. If the Tensor is a scalar (that is, it contains one element data), you don't need to backward()
specify any parameters, but if it has more elements, you need to specify a gradient parameter, which is a tensor of matching shape.
import torch
Create a tensor and set it requires_grad = True
to track its calculation
x = torch.ones(2, 2, requires_grad=True)
print(x)
输出:
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
Perform operations on tensors:
y = x + 2
print(y)
输出:
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
Because y is created by an operation, it has grad_fn, and x is created by the user, so its grad_fn is None.
print(y.grad_fn)
print(x.grad_fn)
输出:
<AddBackward0 object at 0x000001E020B794A8>
None
Perform operation on y
z = y * y * 3
out = z.mean()
print(z, out)
输出:
tensor([[27., 27.],
[27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward1>)
.requires_grad_(...)
Change the requires_grad
logo of the existing Tensor in place . If not given, the input flag defaults to False.
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
输出:
False
True
<SumBackward0 object at 0x000001E020B79FD0>
Gradients
Now let's perform back propagation, which is out.backward()
equivalent to executingout.backward(torch.tensor(1.))
out.backward()
Output the gradient d(out)/dx of out to x:
print(x.grad)
输出:
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])
Let us now look at an example of Jacobian vector product:
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(y)
输出:
tensor([ 384.5854, -13.6405, -1049.2870], grad_fn=<MulBackward0>)
Now in this case, y is no longer a scalar. torch.autograd
It is not possible to directly calculate the complete Jacobian determinant, but if we only want the Jacobian vector product, we only need to pass the vector backward as a parameter:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)
输出:
tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])
You can also use the torch.no_grad() code and use .requires_grad = True on the tensor to stop using the trace history.
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
print((x ** 2).requires_grad)
输出:
True
True
False