classtorch.nn.
RNN
(*args, **kwargs)
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
num_layers – Number of recurrent layers. E.g., setting num_layers=2
would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
nonlinearity – The non-linearity to use. Can be either ‘tanh’ or ‘relu’. Default: ‘tanh’
bias – If False
, then the layer does not use bias weights b_ih and b_hh. Default: True
batch_first – If True
, then the input and output tensors are provided as (batch, seq, feature)
dropout – If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout
. Default: 0
bidirectional – If True
, becomes a bidirectional RNN. Default: False
首先,RNN这里的序列长度,是动态的,不写在参数里的,具体会由输入的input参数而定
RNN的对象接受的参数,input维度是(seq_len, batch_size, input_dim),h0维度是(num_layers * directions, batch_size, hidden_dim)
其中,input的seq_len决定了序列的长度,h0是提供给每层RNN的初始输入,所有num_layers要和RNN的num_layers对得上
返回两个值,一个output,一个hn
hn的维度是(num_layers * directions, batch_size, hidden_dim),是RNN的右侧输出,如果是双向的话,就还有一个左侧输出,其维度不受batch_first的控制
output的维度是(seq_len, batch_size, hidden_dim * directions),是RNN的上侧输出
关于batch_first:
batch_first 设置之后,输入和输出都是batch_first了,不仅仅是输入要batch_first,输出也自动变为了batch_first
但是!hn的维度不变,还是 (num_layers * num_directions, batch, hidden_size), batch还是在中间
关于num_layers :
而num_layers并不是RNN的序列长度,而是堆叠层数,由上一层每个时间节点的输出作为下一层每个时间节点的输入
num_layers in RNN is just stacking RNNs on top of each other. So you get a hidden from each layer and an output only from the topmost layer.
如果numlayers=1, RNN图示如下:
如果numlayers=2, RNN图示如下:
LSTM可以参考下图:
原文地址: