Look senet block of code to understand how to achieve
def Squeeze_excitation_layer(self, input_x, out_dim, ratio, layer_name):
with tf.name_scope(layer_name) :
squeeze = Global_Average_Pooling(input_x)#对每个通道取全局最大化
excitation = Fully_connected(squeeze, units=out_dim / ratio, layer_name=layer_name+'_fully_connected1')
excitation = Relu(excitation)
#为什么用两个全卷基层, 而且为什么卷基层的unit不同?
excitation = Fully_connected(excitation, units=out_dim, layer_name=layer_name+'_fully_connected2')
excitation = Sigmoid(excitation)
excitation = tf.reshape(excitation, [-1,1,1,out_dim])
scale = input_x * excitation
return scale
Why use two full volume grassroots, the grassroots and why the different volumes of unit?
(1) with two fully connected layers, as a whole can not be applied relu connecting layer and two non-linear sigmoid function simultaneously, but they both are indispensable.
(2) To reduce the parameter, the ratio r is set. (Papers take 16)
On between accuracy and the amount of a compromise parameter, r = 16.
reference
(1) [DL- architecture -ResNet Department] 007 Senet https://zhuanlan.zhihu.com/p/29708106
(2) [depth] learning from entry to give up-and-Excitation Networks Squeeze https://zhuanlan.zhihu.com/p/29812913