深度学习课程--assign3--LSTM结构的理解

LSTM（Long Short Term Memory networks）

特殊的RNN的一种

因为RNN能吸收前一个神经元的大部分信息，而对于远一点的神经元的信息却利用的少。这就导致了预测的不准确，比如语言文字的预测，‘我生活在中国，喜欢去旅游，而且我喜欢说。。。 ’，如果要预测喜欢说的下一个词语，那么‘中国’这个词就很重要，但这个词离预测的太远了，导致传递信息的误差大。这个问题称为长期依赖问题。LSTM主要的特点是它可以将先前的网络信息传递至当前神经元，能够很好地解决这个问题。

这是LSTM的结构图，相比RNN是
在这里插入图片描述

第一步：

这里把前一个的隐藏层 $h_{t-1}$ 和输入值 $x_t$ , 加上bias，再通过sigmoid函数得到 $f_t$ 。公式是
$f_t = \sigma(W_f ·[h_{t-1}, x_t] + b_f )$
这一层的参数 $W_f,b_f$ 全部用f做下标，以免跟其他层混淆.

第二步：

在这里插入图片描述这一层将会第一层相似，得到相似的 $i_t$ ,两个公式是
$i_t = \sigma(W_i ·[h_{t-1}, x_t] + b_i)$
这里的参数 $W_i, b_i$ 将会用 i 做下标
$\hat{C_t} = tanh(W_C·[h_{t-1}, x_t] + b_C)$
这里的参数 $W_C,b_C$ 将会用 C做下标

第三步：

在这里插入图片描述
这里是把第一步, 第二步的结果和上一层的cell 做相乘和相加的处理。
$C_t = f_t * C_{t-1} + i_t * \hat{C_t}$
这里就可以更新 $C_{t-1}$ , 得到 $C_t$ , 用于下一层的的计算。
第四步就会更新 $h_{t-1}$ , 得到 $h_{t}$ , 用于下一层的计算。

第四步：

在这里插入图片描述
这一步将会计算，
$o_t = \sigma(W_o·[h_{t-1}, x_t] + b_o)$
这里的参数 $W_o,b_o$ 用的是 0 做下标.
然后结合第三步的 $C_t$ 计算
公式为
$h_t = o_t * tanh(C_t)$
终于到最后一步，可以更新 $h_t$ .

实战python code

我们将会使用numpy来实现LSTM的结构，包括feedward和backward来更新权值。
现在我们上面的所有公式整合在一起，方便设置相应的参数
要知道我们最后的目的是更新 $C_t, h_t$ ,所以其他的参数计算也是为了这个目的。
在这里插入图片描述

$f_t = \sigma(W_f ·[h_{t-1}, x_t] + b_f )$
$i_t = \sigma(W_i ·[h_{t-1}, x_t] + b_i)$
$\hat{C_t} = tanh(W_C·[h_{t-1}, x_t] + b_C)$
$C_t = f_t * C_{t-1} + i_t * \hat{C_t}$
$o_t = \sigma(W_o·[h_{t-1}, x_t] + b_o)$
$h_t = o_t * tanh(C_t)$

课程实战-Python-简单手写LSTM结构

首先先定义LSTM结构出现的两个激活函数 --sigmoid+tanh

def sigmoid(x):
  out = 1/(1+tf.exp(-x))
  return out 
def tanh(x):
  out = (tf.exp(x)-tf.exp(-x))/(tf.exp(x)+tf.exp(-x))
  return out

然后，根据LSTM结定义

def LSTM_step(cell_inputs, cell_states, kernel, recurrent_kernel, bias):
    """
    Run one time step of the cell. That is, given the current inputs(x) and the cell states(C_{t-1}) from the last time step, 
    calculate the current state(h_t) and cell output(C_t).
    
    Hint: In LSTM there exist both matrix multiplication and element-wise multiplication. Try not to mix them.
    -开始我混淆了 matrix multiplication和element-wise 全程只用了matrix multiplication，导致输出的C_t是一个scale，但其实理应是(1,16)
    
        
        
    :param cell_inputs: The input at the current time step. The last dimension of it should be 1.
    :param cell_states:  The state value of the cell from the last time step, containing previous hidden state h_{t-1} and cell state C_{t-1}.
    :param kernel: The kernel matrix for the multiplication with cell_inputs
    :param recurrent_kernel: The kernel matrix for the multiplication with hidden state h_tml
    :param bias: Common bias value
    
    
    :return: current hidden state, and a list of hidden state and cell state
    """
    h_tml = cell_states[0]  #previosu hidden gate h_{t-1}
    c_tml = cell_states[1]  #previous cell gate C_{t-1}
 
 	#这里是公式  
 	#$f_t =(W_f ·[h_{t-1}, x_t] + b_f )$
	#$i_t =(W_i ·[h_{t-1}, x_t] + b_i)$
 	#$\hat{C_t} =(W_C·[h_{t-1}, x_t] + b_C)$
	#$o_t =(W_o·[h_{t-1}, x_t] + b_o)$
	#这四个公式的结合 称为z 
    z = tf.matmul(cell_inputs, kernel)
    z += tf.matmul(h_tml,recurrent_kernel)
    z += bias
    #把z分开为四分，通过激活函数分别称为ft,it,hat_ct,ot
    z0, z1, z2, z3 = tf.split(z,4,axis=1)
    
    ft = sigmoid(z0)   #在我们的数据里，ft shape为(1,64)
    it = sigmoid(z1)   #shape 为 （1，64）
    hat_ct = tanh(z2)   #同理shape
    ot = sigmoid(z3)    #同理shape

    #update计算 cell gate - ct 
    ct = ft * c_tml + it * hat_ct   #这里计算的ct是用点乘 shape为是1，64
	#update计算 hidden gate - ht 
    ht =tanh(ct) * ot               #这里计算的ht是点乘，element wise ht shape为 也是1，64
    
    return ht, [ht,ct]

最后随机定义数据来check LSTM step

扫描二维码关注公众号，回复： 12908666 查看本文章

import numpy as np
cell_inputs = np.ones((1,1))
cell_states = [0.2*np.ones((1,64)), np.zeros((1,64))]
kernel = 0.1*np.ones((1,256))
recurrent_kernel = 0.1*np.ones((64,256))
bias = np.zeros(256)

h , [h,c] = LSTM_step(cell_inputs, cell_states, kernel, recurrent_kernel, bias)
print('Simple verification:')
print('Is h correct?', np.isclose(h.numpy()[0][0],0.48484358))
print('Is c correct?', np.isclose(c.numpy()[0][0],0.70387213))

Simple verification:
Is h correct? True
Is c correct? True