TFLearn之Time Distributed

引子

最近想要写一个DRNN，也即Deep Recurrent Neural Network，一开始不是很理解怎么去完成这个东西，查了一点资料，这里要说的Time Distributed是其中的关键因素。

Time Distributed

我们参考TFLearn官网给出的资料对这个接口进行理解。

Time Distributed主要用于将某一个函数function应用于input tensor中的每一个timestep。这是什么意思呢？
我们先来看看基本的函数接口：

tflearn.layers.core.time_distributed (incoming, fn, args=None, scope=None)

输入incoming的shape为[samples, timesteps, input_dim]；fn即为要施加在每一个timestep的函数，该函数的第一个参数即为incoming，其余参数在args中给出；args是我们需要传入fn的参数列表，为list类型；scope用于为每个timestep Tensor赋予一个scope，第i个timestep的Tensor所属的scope为”scope”-“i”。
此外，该函数输出Tensor的shape为[samples, timesteps, output_dim]。

示例

我们举例来看一下如何使用time_distributed()函数接口。需要清楚的是，该函数的输入输出均为tensor，所以是一个Layer，也即，它可以与其他的Layer串联进行使用。

参照官网举两个简单的例子：

a）多次使用全连接层

x = time_distributed(input_tensor, fully_connected, [64])

b）多次使用卷积层

x = time_distributed(input_tensor, conv_2d, [64, 3], scope='tconv')

是否使用scope将不会产生影响，因为这两种方式都是在不同的timestep使用不同的全连接层。比如：

x = time_distributed(input_tensor, fully_connected, [64])

如果timesteps为2，则产生的全连接层按照默认的“FullyConnected”进行命名：分别为“FullyConnected”、“FullyConnected_1”。

验证如下：

import tflearn
import tensorflow as tf
from tflearn.layers.core import input_data, fully_connected, time_distributed

input_layer = input_data(shape=[2, 3], name="input")
print input_layer
net = time_distributed(input_layer, fully_connected, [8])

vs = tf.trainable_variables()
for v in vs:
    print v

输出结果为：

Tensor("input/X:0", shape=(?, 2, 3), dtype=float32)
<tf.Variable 'FullyConnected/W:0 shape=(3, 8) dtype=float32_ref'>
<tf.Variable 'FullyConnected/b:0 shape=(8,) dtype=float32_ref'>
<tf.Variable 'FullyConnected_1/W:0 shape=(3, 8) dtype=float32_ref'>
<tf.Variable 'FullyConnected_1/b:0 shape=(8,) dtype=float32_ref'>

为了更加清晰地确定FullyConnected和FullyConnected_1中的权值是不一样的，我们直接对权值进行输出：

import tflearn
import tensorflow as tf
from tflearn.layers.core import input_data, fully_connected, time_distributed

input_layer = input_data(shape=[2, 3], name="input")
print input_layer
net = time_distributed(input_layer, fully_connected, [8])

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

vs = tf.trainable_variables()
with sess.as_default():
    for v in vs:
        print tflearn.variables.get_value(v)

输出如下：

所以，我们可以很明确地说，在tflearn中，函数time_distributed()不论是否传入scope参数，都不会共享参数。这与Keras不一样，Keras是默认共享参数的。

值得注意的是，TFLearn中的input_layer并不用像TensorFlow一样使用None作为第一个维度对输入数据个数进行占位的，这一点可以从上面的input_shape中可以看出来。不过如果我们传入shape=[None, 2, 3]也是可以的，这样的话，input的shape仍然为(?, 2, 3)。

源码理解

下面对源代码进行分析：

def time_distributed(incoming, fn, args=None, scope=None):

    if not args: args = list()
    assert isinstance(args, list), "'args' must be a list."

    if not isinstance(incoming, tf.Tensor):
        incoming = tf.transpose(tf.stack(incoming), [1, 0, 2])

    input_shape = utils.get_incoming_shape(incoming)
    timestep = input_shape[1]
    x = tf.unstack(incoming, axis=1)
    if scope:
        x = [fn(x[i], scope=scope+'-'+str(i), *args)
             for i in range(timestep)]
    else:
        x = [fn(x[i], *args) for i in range(timestep)]

    x = list(map(lambda t: tf.reshape(t, [-1, 1]+utils.get_incoming_shape(t)[1:]), x))
    return tf.concat(x, 1)

我较为关注的是其中的scope参数，从源码中来看，如果我们传入的参数是scope=”name_scope”，则time_distributed()函数将会为每个timestep分配一个不同的scope，这有什么效果呢？并没有什么特殊效果。

假设我们不向time_distributed()传入参数scope，那么fully_connected()函数也会因为默认的参数reuse=False，而强制创建新变量，也即，每次调用fully_connected()产生的全连接层的参数都是不同的（当然，这基于我们不向fully_connected()传入scope参数，然后fully_connected()才能自动生成不同的命名域，以使创建的新变量不发生冲突）。这一点我们在前面已经经过实验验证了，具体的原因可以从TFLearn的variable_scope.py文件的注释中找到：TFLearn中定义的get_variable()函数中有参数reuse，默认值为None，所以会根据Layer的默认值来进行设置，比如fully_connected()中的默认值False，所以最终我们可以使用下面的语句来创建网络：

input_layer = input_data(shape=[None, 4, 3])
net = fully_connected(input_layer, 8)
net = fully_connected(net, 8)
net = fully_connected(net, 8)

此时，我们将有三层参数不共享的全连接层。

值得注意的是，TFLearn中定义lstm()函数所用的reuse参数的默认值同样也是False，并且最终使用BasicLSTMCell()函数的reuse参数仍然为False，所以大家会不会有疑惑：lstm中不是应该对不同的timestep重用同一个神经网络么？没错，所以lstm中的重用使用scope.reuse_variables()实现，详见TensorFlow rnn.py。

嗯，相信说到这里，大家已经对于TFLearn中封装的get_variable()函数的reuse=False这种情况有所理解了，我们下面列出get_variable()中reuse参数的三种情形：

reuse=False：只创建新变量，如果同一个命名域中存在同名变量，则发生冲突；
reuse=True：只使用已有变量，如果该命名域中不存在该变量，则报错；
reuse=tf.AUTO_REUSE：如果该命名域中存在同名变量，则直接返回该变量，如果不存在，则新建变量。

我们有两种方案使用time_distributed：

自定义函数fn_udef()，对fn()函数进行封装，从而传入期望的关键字参数scope=”name_scope”以及reuse=tf.AUTO_REUSE（因为time_distributed所给出的接口仅能传入位置参数），从而实现参数共享;
使用不同的命名域，也即为time_distributed()传入参数scope=”name_scope”。

所以，当我们想要使不同的timestep中的fn进行参数共享的话，代码如下：

def fn_udef(incoming, n_units, scope=None):
    return fn(incoming, n_units, reuse=tf.AUTO_REUSE, scope="name_scope")

x = time_distributed(input_tensor, fn_udef, [64])

此时，我们的fn在“name_scope”下的参数将被复用，也即不同的timestep之间共享参数。需要注意的是，tf.AUTO_REUSE是在TensorFlow1.4.0之后引入的，不过大家也不用担心，如果我们的版本低于1.4.0，我们可以自己改写time_distributed()函数：

def time_distributed_udef(incoming, fn, args=None, scope=None):

    if not args: args = list()
    assert isinstance(args, list), "'args' must be a list."

    if not isinstance(incoming, tf.Tensor):
        incoming = tf.transpose(tf.stack(incoming), [1, 0, 2])

    input_shape = utils.get_incoming_shape(incoming)
    timestep = input_shape[1]
    x = tf.unstack(incoming, axis=1)
    #=======================这一段改写了========================
    if scope:
        for i in range(timestep):
            if i == 0:
                reuse = False
            else:
                reuse = True
            x[i] = fn(x[i], reuse=reuse, scope=scope, *args)
    #=========================================================
    x = list(map(lambda t: tf.reshape(t, [-1, 1]+utils.get_incoming_shape(t)[1:]), x))
    return tf.concat(x, 1)

x = time_distributed_udef(input_tensor, fn, [64], scope="name_scope")

这个时候，我们传入time_distributed()的scope将被传入到fn()中，且fn()中的reuse参数在第一个timestep时为False，即创建新参数，往后则为True，即重用scope中已有的参数。

如果我们想要对不同的timestep使用不同的fn的话，则代码为：

x = time_distributed(input_tensor, fn, [64], scope="name_scope")

这个时候，“name_scope”是传入到time_distributed()函数中的，而不是传入到fn()中，而time_distributed()将会按照timesteps产生一系列的scope，比如timestep为4，则有“name_scope-0”、“name_scope-1”、“name_scope-2”、“name_scope-3”，然后在这些不同的命名域中，多组不同的fn中的参数将被创建。

总结

根据我们在不同timestep时是否需要参数共享，对于time_distributed函数有两种使用方案：

自定义函数fn_udef()，对fn()函数进行封装，从而传入期望的关键字参数scope=”name_scope”以及reuse=tf.AUTO_REUSE，从而实现参数共享;
使用不同的命名域，也即为time_distributed()传入参数scope=”name_scope”。

关于time_distributed就讨论到这里啦~如果觉得有帮助的话，就右边点个赞，顺便关注一波老铁咯~