pytorch笔记：07)LSTM

LSTM的介绍博文：https://colah.github.io/posts/2015-08-Understanding-LSTMs/
官方AIP：https://pytorch.org/docs/stable/nn.html?#torch.nn.LSTM

一个栗子，假如我们输入有3个句子，每个句子都由5个单词组成，而每个单词用10维的词向量表示，则seq_len=5, batch=3, input_size=10

类初始化核心参数：

input_size – The number of expected features in the input x
#单词的词向量的维数，如input_size=10
hidden_size – The number of features in the hidden state h
#隐藏层的维度
num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking 
#有多少个LSTM层，如num_layers=2即2个LSTM串联一起，上个LSTM的输出即下一个的输入

有个困惑点：num_layers=2即2个LSTM串联一起，上个LSTM的输出即下一个的输入，若约定输入维度为10，隐藏层维度为20，那第一个LSTM的隐藏层维度是10还是20 ？

 for layer in range(num_layers):
     for direction in range(num_directions):
         layer_input_size = input_size if layer == 0 else hidden_size * num_directions
         #num_directions=1
         w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
         w_hh = Parameter(torch.Tensor(gate_size, hidden_size))

从上面的代码可以看到，无论是第几个LSTM，其隐藏层维度不变。只不过下个LSTM的输入维度变成了上一个LSTM的隐藏层维度。LSTM1(10,20) ->LSTM2(20,20) (input_size,hidden_size )

Inputs: input, (h_0, c_0)

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. 
#可以参看上面的例子，注意这里的参数排列和keras不同，batch在第二个位置上
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch.
#初始化的隐藏元
c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch.
#初始化的记忆元
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.

Outputs: output, (h_n, c_n)

output of shape (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. 
#可以类比input,具体看看下面的例子
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
#隐藏元输出
c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len
#记忆元输出

官方API栗子

#词向量维数10,隐藏元维度20,2个LSTM层串联,
rnn = nn.LSTM(10, 20, 2)
#序列长度seq_len=5,batch_size=3,词向量维数=10
input = torch.randn(5, 3, 10)
#初始化的隐藏元和记忆元,通常它们是维度是一样的
#2个LSTM层，batch_size=3,隐藏元维度20
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
#这里有2层lstm，output是最后一层lstm的每个词向量对应隐藏层的输出,其与层数无关，只与序列长度相关
#hn,cn是所有层最后一个隐藏元和记忆元的输出
output, (hn,cn) = rnn(input, (h0, c0))

print(output.size(),hn.size(),cn.size())
torch.Size([5, 3, 20]) torch.Size([2, 3, 20]) torch.Size([2, 3, 20])

官方tutorials栗子
(比对2种写法得结果，代码中把对h0和c0的初始换成了zeros)

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)
lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [torch.zeros(1, 3) for _ in range(5)]  # 长度为5的序列，batch=1,词向量维度3
hidden = (torch.zeros(1, 1, 3),torch.zeros(1, 1, 3)) # 1个STML层，1个batch,隐藏元维度3
#让每个词向量依次通过STML
for i in inputs:
    out, hidden = lstm(i.view(1, 1, -1), hidden)
    print(out) #打印每次的out

#tensor(1.00000e-02 *[[[-9.1601, -2.4799, -4.8088]]])
#tensor([[[-0.1536, -0.0358, -0.0787]]])
#tensor([[[-0.1923, -0.0398, -0.0976]]])

另一种写法，其把序列视为一个整体处理

#inputs_size: (5,1,3) seq_len=5,batch=1,feature=3
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (torch.zeros(1, 1, 3), torch.zeros(1, 1, 3)) 
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

tensor([[[-0.0916, -0.0248, -0.0481]],
        [[-0.1536, -0.0358, -0.0787]],
        [[-0.1923, -0.0398, -0.0976]],
        [[-0.2158, -0.0405, -0.1092]],
        [[-0.2301, -0.0398, -0.1162]]])
(tensor([[[-0.2301, -0.0398, -0.1162]]]), tensor([[[-0.6446, -0.0941, -0.2772]]]))

两种方法output是一样的，并且hn是最后一个词向量隐藏元输出

pytorch笔记：07)LSTM

猜你喜欢