@每日一篇小论文----arXiv:1803.10963v2
attentive statistic pooling
本文提出了在与文本无关的说话人验证中深度说话人嵌入的细心统计汇总。 在传统的扬声器嵌入中,帧级特征在单个话语的所有帧上被平均以形成话语级特征。 我们的方法利用注意机制为不同的帧提供不同的权重,并且不仅生成加权平均值而且生成加权标准偏差。 通过这种方式,它可以更有效地捕捉扬声器特性的长期变化。
核心思想
在statistic pooling中加入attention机制.
stattistic pooling
统计汇总层计算平均向量μ以及二阶统计量作为帧级特征 (t = 1,…,T)上的标准偏差向量σ。将其作为pooling层的输出,用以之后的全连接层。
均值:
标准差:
def statistic_pooling(inputs, scope=None):
"""
统计池化
reference: x-vector
:param inputs:
:param scope:
:return:
"""
with tf.name_scope(scope):
mean, variance = tf.nn.moments(inputs, axes=1)
std = tf.sqrt(variance)
concat = tf.concat([mean, std], axis=1)
return concat
Attention mechanism
一种自主学习的注意力机制,参考: arXiv:1409.0743
scalar score:
normalized score:
weighted mean vector:
def attention(inputs, attention_size, return_alphas=False):
"""
reference to paper :"Hierarchical Attention Networks for Document Classification"
:param inputs: tensor
:param attention_size: output size
:param return_alphas:
:return:
"""
# if bi-rnn
if isinstance(inputs, tuple):
inputs = tf.concat(inputs, 2)
# inputs shape [batch_size, time_steps, features]
hidden_size = inputs.shape[2].value
# define parameter
w_2 = tf.Variable(tf.random_normal([hidden_size, attention_size], stddev=0.1))
b_2 = tf.Variable(tf.random_normal([attention_size], stddev=0.1))
u = tf.Variable(tf.random_normal([attention_size], stddev=0.1))
# reference to paper
with tf.name_scope('v'):
v = tf.tanh(tf.tensordot(inputs, w_2, axes=1) + b_2)
uv = tf.tensordot(v, u, axes=1, name='uv')
alphas = tf.nn.softmax(uv, name='alphas')
# sum the inputs by alphas
output = tf.reduce_sum(inputs * tf.expand_dims(alphas, -1), 1)
if not return_alphas:
return output
else:
return output, alphas
Attentive statistics pooling
通过attnetion机制,重新构建均值,方差
结构图:
def attentive_statistic_pooling(inputs, attention_size, scope=None):
"""
带注意力机制的 统计池化
reference: arXiv:1803.10963v2
:param inputs:
:param scope:
:return:
"""
def get_attention_std(inputs, anchor, alphas):
h_square_with_attention_ = tf.matmul(tf.transpose(tf.square(inputs), [0, 2, 1]),
tf.expand_dims(alphas, -1))
h_square_with_attention = tf.squeeze(h_square_with_attention_, axis=-1)
mean_square = tf.square(anchor)
difference = tf.maximum(tf.subtract(h_square_with_attention, mean_square), 0.0)
attention_std = tf.sqrt(difference)
return attention_std
with tf.name_scope(scope):
# 求平均值,方差
mean, variance = tf.nn.moments(inputs, axes=1)
std = tf.sqrt(variance)
# 获得attnetion层输出
attention_anchor, attention_alphas = attention(inputs,
attention_size,
return_alphas=True)
attention_std = get_attention_std(inputs,
attention_anchor,
attention_alphas)
concat = tf.concat([attention_anchor, attention_std], axis=1)
return concat