TensorFlow 中 RNN&LSTM 的使用

一、RNN&LSTM 基类

1、RNN 基类

class tf.contrib.rnn.BasicRNNCell(num_units, activation=None, reuse=None, name=None)
输入参数：

num_units： int, the number of units in the RNN cell.

activation： Nonlinearity to use. Default: tanh.

reuse： (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.

name： String, the name of the layer. Layers with the same name will share weights, but to avoid mistakes we require reuse=True in such cases.

输出：

一个隐层神经元数量为 num_units 的 RNN 基本单元（实例化的 cell）

常用属性：

state_size：size(s) of state(s) used by this cell，等于隐层神经元数量

output_size： size of outputs produced by this cell

注意： 在此函数中，state_size 永远等于 output_size

常用方法：

call(inputs, state)： 返回两个一模一样的隐层状态值

zero_state(batch_size, dtype)： 返回一个形状为 [batch_size, state_size] 的全零张量

代码示例

import tensorflow as tf

cell = tf.contrib.rnn.BasicRNNCell(num_units=128)
print(cell.state_size) # 128

inputs = tf.placeholder(shape=[32, 100], dtype=tf.float32)  # 32 是 batch_size
h0 = cell.zero_state(batch_size=32, dtype=tf.float32) # 通过 zero_state 得到一个全 0 的初始状态，形状为(batch_size, state_size)

output, h1 = cell.call(inputs=inputs, state=h0)   # 调用 call 函数, 在时间序列上推进一步
print(h1.shape) # (32, 128)
output == h1  # True

2、LSTM 基类

class tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None)
输入参数：

num_units： int, the number of units in the RNN cell.

forget_bias: float, The bias added to forget gates. Must set to 0.0 manually when restoring from CudnnLSTM-trained checkpoints.

state_is_tuple: If True, accepted and returned states are 2-tuples of the c_state and m_state

activation： Nonlinearity to use. Default: tanh.

reuse： (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.

name： String, the name of the layer. Layers with the same name will share weights, but to avoid mistakes we require reuse=True in such cases.

输出：

一个隐层神经元数量为 num_units 的 LSTM 基本单元（实例化的 lstm_cell）

state_size：size(s) of state(s) used by this cell，等于隐层神经元数量

output_size： size of outputs produced by this cell.

注意： 在此函数中，state_size 永远等于 output_size

常用方法：

call(inputs, state)： 返回一个是 new_h，一个是 new_state（LSTMStateTuple：包含 c 和 h）

zero_state(batch_size, dtype)： 返回一个形状为 [batch_size, state_size] 的全零张量，注意此时state_size 是 LSTMStateTuple(c=num_units , h=num_units)

代码示例

import tensorflow as tf

lstm_cell = tf.contrib.rnn.BasicRNNCell(num_units=128)
print(lstm_cell.output_size)  # 128
print(lstm_cell.state_size)   # LSTMStateTuple(c=128, h=128)  

inputs = tf.placeholder(shape=[32, 100], dtype=tf.float32)  # 32 是 batch_size
h0 = lstm_cell.zero_state(batch_size=32, dtype=tf.float32) 
print(h0)
# LSTMStateTuple(c=<tf.Tensor 'BasicLSTMCellZeroState/zeros:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'BasicLSTMCellZeroState/zeros_1:0' shape=(32, 128) dtype=float32>)


new_h, new_state = lstm_cell.call(inputs=inputs, state=h0)   # 调用 call 函数, 在时间序列上推进一步
print(new_h.shape)  # (32, 128)
print(new_state.h)  # Tensor("mul_2:0", shape=(32, 128), dtype=float32)
print(new_state.c)  # Tensor("add_1:0", shape=(32, 128), dtype=float32)

二、一次执行多步：tf.nn.dynamic_rnn

目的：解决基础的 RNNCell 每次只能在时间上前进了一步的缺点。
函数：TF 提供了一个 tf.nn.dynamic_rnn 函数，使用该函数就相当于调用了 n 次call函数。即通过 ${(h_0,x_1, x_2, …., x_n)}$ 直接得 ${(h_1, h_2…, h_n)}$ 。

1、 RNN

tf.nn.dynamic_rnn(cell, inputs, initial_state=None, sequence_length=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)
输入参数：

cell： 一个 RNNCell 实例对象

inputs： RNN 的输入序列

initial_state： RNN 的初始状态， If cell.state_size is an integer, this must be a Tensor of appropriate type and shape [batch_size, cell.state_size]. If cell.state_size is a tuple, this should be a tuple of tensors having shapes [batch_size, s] for s in cell.state_size.

sequence_length： 形状为 [batch_size]，其中的每一个值为 sequence length（即 time_steps）， eg：sequence_length=tf.fill([batch_size], time_steps)

time_major： 默认形状为 [batch_size, max_time, depth].

scope： VariableScope for the created subgraph; defaults to “rnn”.

输出 (outputs, state) ：

outputs：是 time_steps 步里所有的输出，形状为 [batch_size, max_time, cell.output_size]

state：是最后一步的隐状态，形状为batch_size, cell.state_size

2、 BLSTM

tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, initial_state_fw=None, initial_state_bw=None, sequence_length=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)
输入参数：

只比上面 1 中多了一个反向的 LSTMCell 实例对象和反向的初始状态；输入 inputs 相同，只是信息是双向传递的

输出 (outputs, output_states) ：

outputs：

输出是 time_steps 步里所有的输出，它是一个元组 (output_fw, output_bw) 包含了前向和后向的输出结果，每一个结果的形状为 [batch_size, max_time, cell_fw.output_size]

It returns a tuple instead of a single concatenated Tensor. If the concatenated one is preferred, the forward and backward outputs can be concatenated as tf.concat(outputs, 2)

output_states：: 是一个元组 (output_state_fw, output_state_bw) ，包含前向和后向的最后一步的状态

三、堆叠多层：MultiRNNCell

很多时候，单层 RNN 的能力有限，我们需要多层的 RNN，在 TensorFlow 中，可以使用 tf.nn.rnn_cell.MultiRNNCell 函数对RNNCell 进行堆叠。

# 创建 2 个 LSTMCell，隐层神经元的数量分别为 128 和 256
rnn_layers = [tf.nn.rnn_cell.LSTMCell(size) for size in [128, 256]]

# create a RNN cell composed sequentially of a number of RNNCells
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)

# 'outputs' is a tensor of shape [batch_size, max_time, 256]
# 'state' is a N-tuple where N is the number of LSTMCells containing a
# tf.contrib.rnn.LSTMStateTuple for each cell
outputs, state = tf.nn.dynamic_rnn(cell=multi_rnn_cell,
                                   inputs=data,
                                   dtype=tf.float32)

四、参考资料

1、TensorFlow中RNN实现的正确打开方式
 2、https://www.tensorflow.org/api_guides/python/contrib.rnn
3、https://www.tensorflow.org/api_guides/python/nn#Recurrent_Neural_Networks

TensorFlow 中 RNN&LSTM 的使用

一、RNN&LSTM 基类

1、RNN 基类

2、LSTM 基类

二、一次执行多步：tf.nn.dynamic_rnn

1、 RNN

2、 BLSTM

三、堆叠多层：MultiRNNCell

四、参考资料

猜你喜欢