深度学习模型中的参数数量（备忘）

原文地址：huay’ blog/模型中的参数数量（备忘）

记录模型参数数量的计算方法

最早使用 tensorflow 的时候没怎么注意这个问题；
后面高级 API 用的多了，有点忘记怎么计算模型的参数数量了；
特此记录以作备忘

参数来源

模型的参数数量 = 每一层的参数数量之和

每一层的参数数量需要由该层的规模（n_units）和上一层的输出（n_features）共同决定

全连接层

input_shape: [batch_size, n_features]

output_shape: [batch_size, n_units]

参数数量 = n_features * n_units

参数数量 = n_features * n_units + n_units （使用偏置）

卷积层

Conv1D

input_shape: [batch_size, max_steps, n_features]

output_shape: [batch_size, new_steps, n_filters]

扫描二维码关注公众号，回复： 442482 查看本文章

kernel_size = [kernel_w]

kernel_shape = [kernel_w, n_features, n_filters]

参数数量 = kernel_w * n_features * n_filters

参数数量 = kernel_w * n_features * n_filters + n_filters （使用偏置）

Conv2D

input_shape: [batch_size, in_height, in_width, in_channels]

output_shape: [batch_size, new_height, new_width, out_channels]

kernel_size = [kernel_h, kernel_w]

kernel_shape = [kernel_h, kernel_w, n_features, n_filters]

参数数量 = kernel_h * kernel_w * n_features * n_filters

参数数量 = kernel_h * kernel_w * n_features * n_filters + n_filters （使用偏置）

Conv3D

类似 Conv1D/Conv2D，略

RNN 层

~~具体有哪几部分参数不是非常理解~~
从参数数量回看，大概了解了 RNN 的参数情况：

先看基础 RNN 的计算公式：

h^{(t)} = f (U x^{(t)} + W h^{(t - 1)} + b)

$h^{(t)} = f(Ux^{(t)} + Wh^{(t-1)} + b)$

可以看到参数有 3 个： $U$ , $W$ , $b$

下面是公式中每个字母的 shape：

U: `[n_units, n_features]`
x: `[n_features, 1]`
W: `[n_units, n_units]`
h: `[n_units, 1]`
b: `[n_units, 1]`

这里 $h$ 没有考虑 batch_size，实际上应该是 [batch_size, n_units]；

如果你喜欢手写 rnn，而不是直接使用 dynamic_rnn，那你肯定写过这句 initial_state = cell.zero_state(batch_size, dtype=tf.float32)，这就是公式中的 $h$ ，但是这不算参数，而属于输出部分

（其实也可以看做是参数，只是不通过反向传播更新）

有了以上的基础，LSTM 和 GRU 的参数有哪些，具体看它们的模型图就能知道了

下面是使用 TF 的测试代码：

# 参数
batch_size = 16
max_steps = 5
n_features = 32
n_units = 64

# 测试
inputs = tf.placeholder(tf.float32, [batch_size, max_steps, n_features])

cell = cell_fn(n_units)

outputs, state = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)

tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)

基础 RNN

cell_fn = tf.nn.rnn_cell.BasicRNNCell

测试结果：

[<tf.Variable 'rnn/basic_rnn_cell/kernel:0' shape=(96, 64) dtype=float32_ref>,
 <tf.Variable 'rnn/basic_rnn_cell/bias:0' shape=(64,) dtype=float32_ref>]

参数数量 = (n_features + n_units) * n_units + n_units

根据 RNN 的计算公式 $f(Ux^{(t)} + Wh^{(t-1)} + b)$ ，参数有 3 个： $U$ , $W$ , $b$

其中每个字母的形状如下：

    U: `[n_units, n_features]`
    x: `[n_features, 1]`
    W: `[n_units, n_units]`
    h: `[n_units, 1]`
    b: `[n_units, 1]`

LSTM

cell_fn = tf.nn.rnn_cell.BasicLSTMCell
cell_fn = tf.nn.rnn_cell.LSTMCell

测试结果：

[<tf.Variable 'rnn/basic_lstm_cell/kernel:0' shape=(96, 256) dtype=float32_ref>,
 <tf.Variable 'rnn/basic_lstm_cell/bias:0' shape=(256,) dtype=float32_ref>]

参数数量 = (n_features + n_units) * (n_units * 4) + (n_units * 4)

GRU

cell_fn = tf.nn.rnn_cell.GRUCell

测试结果：

[<tf.Variable 'rnn/gru_cell/gates/kernel:0' shape=(96, 128) dtype=float32_ref>,
 <tf.Variable 'rnn/gru_cell/gates/bias:0' shape=(128,) dtype=float32_ref>,
 <tf.Variable 'rnn/gru_cell/candidate/kernel:0' shape=(96, 64) dtype=float32_ref>,
 <tf.Variable 'rnn/gru_cell/candidate/bias:0' shape=(64,) dtype=float32_ref>]

参数数量 = (n_features + n_units) * (n_units * 3) + (n_units * 3)

GRU 比 LSTM 少了一个门（将遗忘门和输入门合成了一个单一的更新门），与结果一致

双向 rnn/lstm/gru

参数数量再乘 2 （当 cell_fw == cell_bw 时）

测试代码:

# 参数
batch_size = 16
max_steps = 5
n_features = 32
n_units = 64

# 测试
inputs = tf.placeholder(tf.float32, [batch_size, max_steps, n_features])

cell_fw = cell_fn(n_units)
cell_bw = cell_fn(n_units)

outputs, state = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, dtype=tf.float32)

tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)