When I was conducting a test gradient experiment, I found that when the original variable flows through F.softmax, the gradient of the original variable cannot be obtained. An example is as follows:
import torch.nn.functional as F
import torch
x = torch.randn(1,5,requires_grad=True)
print(x)
# x = F.softmax(x,dim=1)
# print(x)
l = 0
for i in range(5):
l = l + x[0][i]
print(l)
l.backward()
print(x.grad)
If x does not pass F.softmax(), the following gradient information will appear:
tensor([[ 1.4093, -0.2620, 0.6668, -0.3897, 1.4681]], requires_grad=True)
tensor(2.8925, grad_fn=<AddBackward0>)
tensor([[1., 1., 1., 1., 1.]])
If F.softmax() is passed, the gradient information of x cannot be obtained. An example is as follows:
import torch.nn.functional as F
import torch
x = torch.randn(1,5,requires_grad=True)
print(x)
x = F.softmax(x,dim=1)
print(x)
l = 0
for i in range(5):
l = l + x[0][i]
print(l)
l.backward()
print(x.grad)
The gradient of x at this time is None:
tensor([[ 1.0408, 0.5212, 0.2902, -0.7637, -0.7276]], requires_grad=True)
tensor([[0.4163, 0.2476, 0.1965, 0.0685, 0.0710]], grad_fn=<SoftmaxBackward>)
tensor(1., grad_fn=<AddBackward0>)
None