Can autograd in pytorch handle a repeated use of a layer within the same module?

ihdv :

Suppose I have a layer layer in a torch module and use it twice or more times during a single forward step, in a way that the result output by this layer is later again inputed to the same layer. Can pytorch's autograd compute the grad of the weights of this layer correctly?

Here is an mwe of what I am talking about:

import torch
import torch.nn as nn
import torch.nn.functional as F

class net(nn.Module):
    def __init__(self,in_dim,out_dim):
        super(net,self).__init__()
        self.layer = nn.Linear(in_dim,out_dim,bias=False)

    def forward(self,x):
        x = self.layer(x)
        x = self.layer(x)
        return x

input_x = torch.tensor([10.])
label = torch.tensor([5.])
n = net(1,1)
loss_fn = nn.MSELoss()

out = n(input_x)
loss = loss_fn(out,label)
n.zero_grad()
loss.backward()

for param in n.parameters():
    w = param.item()
    g = param.grad

print('Input = %.4f; label = %.4f'%(input_x,label))
print('Weight = %.4f; output = %.4f'%(w,out))
print('Gradient w.r.t. the weight is %.4f'%(g))
print('And it should be %.4f'%(4*(w**2*input_x-label)*w*input_x))

And the output is (may be different on your computer if the initial value of the weight is different):

Input = 10.0000; label = 5.0000
Weight = 0.9472; output = 8.9717
Gradient w.r.t. the weight is 150.4767
And it should be 150.4766

In this example, I have defined a module with only one linear layer (in_dim=out_dim=1 and no bias). w is the weight of this layer; input_x is the input value; label is the desired value. Since the loss is chosen as MSE, the formula for the loss is

((w^2)*input_x-label)^2

Computing by hand, we have

dw/dx = 2*((w^2)*input_x-label)*(2*w*input_x)

The output of my example above shows that autograd gives the same result as computed by hand, giving me a reason to believe that it can work in this case. But in a real application, the layer may have inputs and outputs of higher dimensions, a nonlinear activation function after it, and the neural network could have multiple layers.

What I want to ask is: can I trust autograd to handle such situation, but a lot more complicated than that in my example? How does it work when a layer is called iteratively?

a_guest :

This will work just fine. From the perspective of the autograd engine this isn't a cyclic application since the resulting computation graph will unwrap the repeated computation as a linear sequence. To illustrate this, for a single layer you might have:

x -----> layer --------+
           ^           |
           |  2 times  |
           +-----------+

From the autograd perspective this looks like:

x ---> layer ---> layer ---> layer

Here layer is the same layer copied 3 times over the graph. This means when computing the gradient for the layer's weights they will be accumulated from all the three stages. So when using backward:

x ---> layer ---> layer ---> layer ---> loss_func
                                            |
       lback <--- lback <--- lback <--------+
         |          |          |
         |          v          |
         +------> weights <----+
                   _grad

Here lback represents the local derivative of the layer forward transformation which uses the upstream gradient as an input. Each one adds to the layer's weights_grad.

Recurrent Neural Networks use this repeated application of layers (cells) at their basis. See for example this tutorial about Classifying Names with a Character-Level RNN.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=342072&siteId=1