pytorch tensor copy clone() and detach()

Transfer from: https://blog.csdn.net/winycg/article/details/100813519

tensor replication can use clone()functions and detach()functions to achieve various needs.

clone

The clone() function can return an identical tensor, the new tensor opens up new memory, but it still stays in the calculation graph.

cloneThe operation supports gradient backtracking without sharing data memory, so it is commonly used in scenarios where a unit in a neural network needs to be reused.

detach

The detach() function can return an identical tensor. The new tensor opens up to share memory with the old tensor. The new tensor will be separated from the calculation graph and will not involve gradient calculation. In addition, some in-place operations (in-place, such as resize_ / resize_as_ / set_ / transpose_) can cause errors when executed in either of them.
detachThe operation is separated from the calculation graph in the shared data memory, so it is commonly used in the scenario where only the tensor value is used in the neural network without the need to track the derivative.

Usage analysis

# Operation New/Shared memory Still in computation graph
tensor.clone() New Yes
tensor.detach() Shared No
tensor.clone().detach() New No

Some examples are as follows:
first import the package and fix the random seed

import torch
torch.manual_seed(0)

1. tensor requires_grad=True after clone(), tensor requires_grad=False after detach(), but the gradient does not flow to the tensor after clone()

x= torch.tensor([1., 2., 3.], requires_grad=True)
clone_x = x.clone()
detach_x = x.detach()
clone_detach_x = x.clone().detach()

f = torch.nn.Linear(3, 1)
y = f(x)
y.backward()

print(x.grad)
print(clone_x.requires_grad)
print(clone_x.grad)
print(detach_x.requires_grad)
print(clone_detach_x.requires_grad)

Output:

tensor([-0.0043,  0.3097, -0.4752])
True
None
False
False

2. Change the tensor involved in the calculation graph to the tensor after clone(). At this time, the gradient still only flows to the original tensor.

x= torch.tensor([1., 2., 3.], requires_grad=True)
clone_x = x.clone()
detach_x = x.detach()
clone_detach_x = x.detach().clone()

f = torch.nn.Linear(3, 1)
y = f(clone_x)
y.backward()

print(x.grad)
print(clone_x.grad)
print(detach_x.requires_grad)
print(clone_detach_x.requires_grad)

Output:

tensor([-0.0043,  0.3097, -0.4752])
None
False
False

3. Set the original tensor to require_grad=False, the gradient after clone() to .requires_grad_(), the tensor after clone() participates in the calculation of the calculation graph, and the gradient is transmitted to the tensor before clone().

x= torch.tensor([1., 2., 3.], requires_grad=False)
clone_x = x.clone().requires_grad_()
detach_x = x.detach()
clone_detach_x = x.detach().clone()

f = torch.nn.Linear(3, 1)
y = f(clone_x)
y.backward()

print(x.grad)
print(clone_x.grad)
print(detach_x.requires_grad)
print(clone_detach_x.requires_grad)

Output:

tensor([-0.0043,  0.3097, -0.4752])
None
False
False

4. Since the tensor after detach() shares memory with the original tensor, the tensor value of detach() has also changed after the original tensor is updated in the calculation graph.

x = torch.tensor([1., 2., 3.], requires_grad=True)
f = torch.nn.Linear(3, 1)
w = f.weight.detach()
print(f.weight)
print(w)

y = f(x)
y.backward()

optimizer = torch.optim.SGD(f.parameters(), 0.1)
optimizer.step()

print(f.weight)
print(w)

Output:

Parameter containing:
tensor([[-0.0043,  0.3097, -0.4752]], requires_grad=True)
tensor([[-0.0043,  0.3097, -0.4752]])
Parameter containing:
tensor([[-0.1043,  0.1097, -0.7752]], requires_grad=True)
tensor([[-0.1043,  0.1097, -0.7752]])

 

Guess you like

Origin blog.csdn.net/Answer3664/article/details/104417013