Xiaobai learns Pytorch series -- torch.autograd API

Xiaobai learns Pytorch series -- torch.autograd API

torch.AutogradProvides classes and functions that implement automatic differentiation of arbitrary scalar-valued functions. It requires minimal changes to existing code - you just need to declare the tensor s whose gradient should be computed with the requires grad=True keyword. So far, we only support floating point tensor types (half, float, double and bfloat 16) and complex tensor (cfloat, cdouble)types autograd.

basic concept

Variable,Parameter和torch.tensor()

torch.nn.Parameter(It is a subclass of Variable)
If it needs to be updated during the training process of the network, it must be defined as Parameter, similar to W (weight) and b (bias) are alsoParameter

VariableBy default, the gradient is not required, and parameters need to be manually set requires_grad=True. VariableBecause it needs to be backpropagated multiple times, backwardit is very troublesome to manually indicate the parameters () at the time. This problem is mainly solved
Pytorchby introducing nn.Parametervariables and mechanisms of types . Yes , the subclass is essentially the same as the latter, except that the default is to find the gradient, and at the same time, the variables in a network can be easily accessed through , just define all the parameters in the network that need to be trained and updated as Type, and then use it to complete the update of all parameters, for example:optimizerParameterVariableparameterparameternet.parameters()Parameteroptimizeroptimizer = torch.optim.SGD(net.parameters(), lr=1e-1)

Similarities
: torch.tensor(), torch.autograd.Variableand are torch.nn.Parameterbasically the same.
The former two requires_gradparameters can be set, and the latter is directly defaulted requires_grad=True.
All three have attributes such as .data, .grad, .grad_fnand so on.
So, as long as requires_grad=True, the gradient can be computed as well backward().

The difference:
torch.nn.Parameter , directly defaults requires_grad=True, which is more convenient when the number of parameters is large.

backpropagation

Reference: https://blog.csdn.net/lj2048/article/details/113527400
The realization of autograd depends on the two data types of Variable and Function. Variable is the outer packaging of Tensor. Varibale and Tensor are basically the same, the difference is that the following attributes are added.

variableThe data attribute of the type variable stores Tensor data, and gradthe attribute stores the derivative of the variable, creatorrepresenting the creator of the variable.
VariableAnd Functionthey are inseparable from each other, as shown in the figure below, it is the process of data forward and backward transmission to generate derivatives.

As shown in the figure, suppose we have an input variable input (the data type is Variable), the input is input by the user, so its creator is a null value, and the input gets the output1 variable through the first data operation operation1 (such as addition, subtraction, multiplication and division). (The data type is still Variable), a function1 variable (data type is an instance of Function) will be automatically generated during this process, and the creator of output1 is this function1. Subsequently, output1 generates output2 through a data operation, and this process will also generate another instance function2, and the creator of output2 is function2.

For examples, please refer to this Blog: Refer to Autograd of PyTorch Tutorial

The target tensor is generally a scalar. For example, the loss value Loss we often use is generally a scalar. But there are also non-scalar cases. The target value of Deep Dream, which will be introduced later, is a tensor with multiple elements. How about backpropagating non-scalars?

PyTorch has a simple rule that tensors are not allowed to derive tensors, and only scalars are allowed to derive tensors. Therefore, if the
target tensor calls a non-scalar backward(), you need to pass in a gradientparameter, which is also a tensor. And it needs to be backward()the same shape as the called tensor.

reference

https://zhuanlan.zhihu.com/p/321449610

Guess you like

Origin blog.csdn.net/weixin_42486623/article/details/129928450