1、序列模型和长期记忆网络

至此，我们已经看到了各种前馈网络。即，网络根本不维持任何状态。这可能不是我们想要的行为。序列模型是NLP的核心：它们是在输入之间存在一定时间依存关系的模型。序列模型的经典示例是用于词性标记的隐马尔可夫模型。另一个示例是条件随机场。

递归神经网络是维持某种状态的网络。例如，它的输出可以用作下一个输入的一部分，以便信息可以随着网络在序列上传递而传播。对于LSTM，对于序列中的每个元素，都有一个对应的隐藏状态 ht，原则上可以包含序列中任意点的信息。我们可以使用隐藏状态来预测语言模型中的单词，例如词性标签，槽位识别。

2、Pytorch中的LSTM

Pytorch的LSTM输入一般为3D张量。第1维为词数，第2维为batch数，第3维为词向量的维度。LSTM的原理可以看博客循环神经网络。如果我们想对句子“ The cow jumped”运行序列模型，我们的输入应如下所示

#LSTM依赖的数学函数及输入输出参数详解
lstm=nn.LSTM(3,4,2) #输入维度3，输出维度3,层数2
'''
nn.LSTM将多层长短期记忆（LSTM）RNN应用于输入顺序。
对于输入序列中的每个元素，每一层计算以下内容
对应的数学函数：
math::
        \begin{array}{ll} \\
            i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\
            f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\
            g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\
            o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho}) \\
            c_t = f_t \odot c_{t-1} + i_t \odot g_t \\
            h_t = o_t \odot \tanh(c_t) \\
        \end{array}
    :math:`h_t` 表示`t`时刻的隐藏状态, :math:`c_t`表示`t`时刻的cell状态, 
    :math:`x_t` 表示`t`时刻的输入, 
    :math:`h_{t-1}`表示`t-1`时刻的隐藏状态或者初始状态
    math:`f_t`,:math:`g_t`,:math:`o_t`分别表示输入，遗忘、cell、输出门
    :math:`\sigma`表示sigmoid 函数， :math:`\odot` Hadamard乘积.
Args:
    input_size：输入x的特征维度
    hidden_size：处于隐藏状态“h”的特征维度
    num_layers：循环图层数。例如，设置``num_layers = 2``意味着将两个LSTM堆叠在一起以形成“堆叠的LSTM”，
               第二个LSTM接收第一个LSTM的输出，计算最终结果。默认值：1
    bias：如果为False，则该图层不使用偏见权重“ b_ih”和“ b_hh”。默认值：``True``
    batch_first：如果为``True''，则提供输入和输出张量作为（批次，序列，特征）。默认值：``False''
    dropout：如果非零，则在每个输出的输出端引入一个“ Dropout”层LSTM层（最后一层除外），丢失概率等于attr：`dropout`。默认值：0
    bidirectional：如果为``True''，则变为双向LSTM。默认值：``False''

Inputs: input, (h_0, c_0)
        - input张量的shape为 `(seq_len, batch, input_size)`: 张量包括了输入序列的特征
        - h_0张量的shape为`(num_layers * num_directions, batch, hidden_size)`: 张量包含了每一batch中初始隐藏状态
          如果LSTM是双向的, num_directions为2, 否则为1.
        - c_0张量的shape为 `(num_layers * num_directions, batch, hidden_size)`: 张量包含了每一batch中初始cell状态
          如果`(h_0, c_0)` 未提供, h_0，c_0默认为全0张量
Outputs: output, (h_n, c_n)
        - output张量的shape为`(seq_len, batch, num_directions * hidden_size)`:张量是LSTM最后一层每个t时刻o_t
       如果class:`torch.nn.utils.rnn.PackedSequence` 已经作为输入给出，输出也将是packed序列。对于unpacked的情况，可以将方向分开
          使用``output.view（seq_len，batch，num_directions，hidden_size）``，
          前进和后退分别是方向“ 0”和“ 1”。
        - h_n张量的shape为`(num_layers * num_directions, batch, hidden_size)`: 张量是LSTM最后一层t(t = seq_len)时刻h_t隐藏状态
         可通过h_n.view(num_layers, num_directions, batch, hidden_size)获取每层隐藏状态。
        - c_n张量的shape为`(num_layers * num_directions, batch, hidden_size)`: 张量是LSTM最后一层t(t = seq_len)时刻c_t状态


'''  
inputs=[torch.randn(1,3) for _ in range(5)] #构建一个句子长度为5，维度为3的输入句子，一个batch的数据

#初始化隐层状态和cell状态 hidden输入维度(num_layers * num_directions, batch, hidden_size)
hidden = (torch.randn(2, 1, 4),
          torch.randn(2, 1, 4))

'''
before: 1 tensor([[[ 0.0645, -0.1050,  0.1427, -0.0289]]], grad_fn=<StackBackward>) (tensor([[[ 0.0465,  0.0638,  0.3326, -0.2273]],[[ 0.0645, -0.1050,  0.1427, -0.0289]]], grad_fn=<StackBackward>), tensor([[[ 0.0992,  0.2878,  0.9721, -0.4657]],[[ 0.1024, -0.3557,  0.6139, -0.0391]]], grad_fn=<StackBackward>))
after: 1 tensor([[[ 0.1458, -0.0954,  0.1286,  0.1728]]], grad_fn=<StackBackward>) (tensor([[[ 0.0157,  0.2103,  0.2820, -0.2021]],[[ 0.1458, -0.0954,  0.1286,  0.1728]]], grad_fn=<StackBackward>), tensor([[[ 0.0302,  0.6760,  0.6505, -0.4757]],[[ 0.2552, -0.2182,  0.3291,  0.3289]]], grad_fn=<StackBackward>))
'''

for i,input_x in enumerate(inputs):
    input_x=input_x.view(1,1,-1)
    if i==0:
        input_all=input_x
    else:
        input_all=torch.cat((input_all,input_x),dim=0)
    out,hidden=lstm(input_x,hidden)
    print("after:",i,out,hidden) 
    

print(input_all.size())
out,hidden=lstm(input_all,hidden)
print('all seq',out,hidden) #out:seq_len*hidden_dim

3、LSTM用于词性标记

目前使用LSTM实现词性标注的实例，目前未使用Viterbi或Forward-Backward之类的算法。在此处会使用词向量，实现参考pytorch之词嵌(三)

流程如下：输入句子为 w1,…,wM。wi∈V，T 是标签集，yi 对应单词的标签wi，对单词标签wi 的预测为ŷi。

预测序列为 ŷ1,…,ŷM； ŷ i∈T。

预测目标函数为：

#词性标注LSTM使用实例
def word2id(data,w2i):
    for ws,tag in data:
        for w in ws:
            if w not in w2i:
                w2i[w]=len(w2i)
    
        
#词性标注数据
train = [
    ("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),
    ("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
]

w2i={}
t2i={"DET": 0, "NN": 1, "V": 2}
word2id(train,w2i)

EMBEDDING_DIM = 6
HIDDEN_DIM = 6

class LstmTag(nn.Module):
    def __init__(self,embedding_dim,hidden_dim,vocab_size,tag_size):
        super(LstmTag,self).__init__()
        self.embedding=nn.Embedding(vocab_size,embedding_dim)
        self.lstm=nn.LSTM(embedding_dim,hidden_dim)
        self.lstm2tag=nn.Linear(hidden_dim,tag_size)
    
    def forward(self,x):
        embeds=self.embedding(x) 
        o,h=self.lstm(embeds.view(len(x),1,-1))
        tags=self.lstm2tag(o.view(len(x),-1))
        tags_prob=F.log_softmax(tags)
        return tags_prob

model=LstmTag(EMBEDDING_DIM,HIDDEN_DIM,len(w2i),len(t2i))
loss_function=nn.NLLLoss()
opt=optim.SGD(model.parameters(),lr=0.1)

for input_x,y in train:
    ids=torch.tensor([w2i[w] for w in input_x],dtype=torch.long)
    targets=torch.tensor([t2i[w] for w in y],dtype=torch.long)
    probs=model(ids)
    print(probs)
    loss=loss_function(probs,targets)
    loss.backward()
    opt.step()
    print(loss)

pytorch之LSTM(四)

1、序列模型和长期记忆网络

2、Pytorch中的LSTM

3、LSTM用于词性标记

猜你喜欢