[RPAN]代码阅读

attention lstm 层

在文件lstm_lib.py中具体实现函数，在attention_lstm.py中调用类，在layers.py中定义类
输入：conv_fea()

attention block

α = w * t a n h (c o n v_f e a * U + h_{t - 1} * H + b)

$\alpha=w*tanh(conv\_fea*U+h_{t-1}*H+b)$
文中说有5个不同的part，因此就要有五个不同的U，和H
U：(conv_dim,dim_w) 将输入feature维度变成dim_w=32*5
H：(dim_proj,dim_w) 将隐藏层维度变成32*5
这里这个dim_w的维度是32*5，最后K被_slice切成五个部分，每个部分的维度是32，再与对应的w相乘
用五个感知机，

        lamda_modal =tensor.tanh(tensor.dot(conv_fea_att_step_tmp, LSTM_U_lamada_modal) +  # 49*n_sample*512
                       tensor.dot(h_modal_, LSTM_H_lamada_modal)[None, :, :] + LSTM_b_lamada_modal)#,

然后在与W相乘，乘上参数矩阵W，都是组后一个维度在变化，32到1

lamda_modal_up_elbow = tensor.dot(_slice(lamda_modal, 0, dim_part),LSTM_W_lamada_modal[:,:2])

最后进行concate
所以attention的输出维度[timestep, reg_x*reg_y+1, num_seq, num_joints)] 注意力权重在feature map的每个像素K上有权重，但是在fea_dim上是没有的。这点跟以往的注意力不一样，以往的都是注意力集中在1024个feature维度上，将K变成一个

lstm

conv_fea与attention相乘

h_tmp_pool4 = conv_fea_att_step_tmp[:, :, None, :] * lamada_com[:, :, :, None] # 49* batchsize*13*51
再将属于各个part的关节点加起来，不贴了，作为lstm每个cell的输入

各个门

i，c，f,o都是运算完下面的基本运算，然后再分割的

    preact_modal = tensor.dot(h_modal_, LSTM_U_modal).astype(config.floatX)
    preact_modal +=tensor.dot(h_joint, LSTM_W_modal)#.astype(config.floatX)
    preact_modal += LSTM_b_modal

用scan循环函数实现不同的time_steptheano.scan(_step,...),每个step就是每个时间步的具体实现
lstm的输出：
output= proj = proj.reshape([n_timesteps*num_seq,int(lstm_options[‘dim_proj’])])

softmax layer

    def __init__(self, input, n_in, n_out):

        self.W = Weight((n_in, n_out))
        self.b = Weight((n_out,), std=0)

        self.p_y_given_x = T.nnet.softmax(
            T.dot(input, self.W.val) + self.b.val)

        self.y_pred = T.argmax(self.p_y_given_x, axis=1)

attention lstm 层

attention block

lstm

conv_fea与attention相乘

各个门

softmax layer

猜你喜欢