Some understanding of detach clone gradient in pytorch

Sometimes it is necessary to copy the tensor in pytorch. If it is in the iterative process, the problem of gradient will be involved.

There are two commonly used tensor replication methods I know. .detach() and .clone()

1. The .detach() method, for example, there is already tensor a, b=a.detach(), which is equivalent to giving an alias b to a, and the two are actually one thing in the memory, but when calculating the gradient, the gradient Passing from the back to b will not pass it forward, and it ends here. This method can be used when only the tensor value is needed, and the forward gradient is not required or the gradient is truncated here deliberately.

2. The .clone() method is the opposite of the above. If c=a.clone(), then c and a are two tensors in memory. Changing one of them will not affect the value of the other, but c has The same gradient return route as a, retains the function of continuing to return the gradient, but no longer shares storage.

3. At the beginning, everyone is familiar with the data type of tensor. When reading blogs, I sometimes see variable, which is visually understood as another data format. tensor is a part of variable. tensor only contains values. In addition to tensor, there are variables. The gradient value, whether the gradient is computable and other attributes (the dimension is wider), the operation of removing the tensor from the variable in the iterative process is irreversible, because all the information about the gradient has been lost.

4. In addition, this is a simple example about the backword() function , the content is clear and easy to understand, and the backtracking process is very clear

 

Guess you like

Origin blog.csdn.net/qq_41872271/article/details/109060557