Normalize Layer in Caffe

message NormalizeParameter {
  optional bool across_spatial = 1 [default = true];
  // Initial value of scale. Default is 1.0 for all
  optional FillerParameter scale_filler = 2;
  // Whether or not scale parameters are shared across channels.
  optional bool channel_shared = 3 [default = true];
  // Epsilon for not dividing by zero while normalizing variance
  optional float eps = 4 [default = 1e-10];
}

下面进行实验,主要检查三个参数的具体含义,即across_spatial , scale_filler ,channel_shared。

  1. across_spatial: 对于每个样本,其到达norm层的张量形状表示为(1,c,h,w),那么across_spatial用来指示标准化是否要跨空间位置 (即(h,w))。如果across_spatial=False,则表示分别对(h,w)空间中的每个位置的c个通道的元素(也就是该位置的特征描述矢量)进行单独的标准化。如果across_spatial=True, 则基于所有的c*h*w个元素进行标准化。
  2. scale_filler: 包含可学习的参数(见下面的prototxt)。和卷积层参数一样,可以通过设置学习率来决定是否对该参数进行更新,例如设置学习率为0,实现常数的缩放。
  3. channel_shared: 用来控制scale_filler的参数是否被多个通道共享。如果channel_shared=True, 那么scale_filler中参数的形状为(1,)也就是一个待学习的标量;如果channel_shared=False, 那么scale_filler中参数的形状/长度为(c,), c为通道数。

检查1:前向

deploy.prototxt:

name: "demo"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 2 dim: 3 dim: 2 dim: 3 } }
}
layer {
  name: "norm"
  type: "Normalize"
  bottom: "data"
  top: "norm"
  norm_param {
    across_spatial: True
    scale_filler {
      type: "constant"
      value: 6
    }
    channel_shared: True
  }
}

main.py

#coding=UTF-8
import caffe
import numpy as np
caffe.set_mode_cpu()

input_data = np.zeros(shape=(2,3,2,3),dtype=np.float32)

input_data[0,0,:,:] = np.array([[1,2,3],[4,5,6]])
input_data[0,1,:,:] = np.array([[1,1,2],[4,5,6]])
input_data[0,2,:,:] = np.array([[1,2,2],[4,5,6]])


input_data[1,0,:,:] = np.array([[1,2,3],[4,5,6]])
input_data[1,1,:,:] = np.array([[1,2,3],[4,4,6]])
input_data[1,2,:,:] = np.array([[1,2,3],[4,5,5]])


deploy_pro = 'deploy.prototxt'
# specify any model and make sure any weight of layers is not loaded 
weight_file = '../pytorch-caffe-master/ZOO_AlexNet/bvlc_alexnet.caffemodel' # not use
net = caffe.Net(deploy_pro,weight_file,caffe.TEST)

shape = input_data.shape
net.blobs['data'].reshape(shape[0],shape[1],shape[2],shape[3])
net.blobs['data'].data[...] = input_data

net.forward()

result = net.blobs['norm'].data

print(result)

from caffe.proto import caffe_pb2
import google.protobuf.text_format
net = caffe_pb2.NetParameter()
f = open(deploy_pro, 'r')
net = google.protobuf.text_format.Merge(str(f.read()), net)
f.close()

across_spatial = True
channel_shared = True
scale_type     = ''
scale_value    = 0

for i in range(0, len(net.layer)):
    if net.layer[i].type == 'Normalize':
        if net.layer[i].norm_param.across_spatial == True: # bias term, for example
            across_spatial = True
        else:
            across_spatial = False
        if net.layer[i].norm_param.channel_shared == True:
            channel_shared = True
        else:
            channel_shared = False
        scale_type = net.layer[i].norm_param.scale_filler.type
        scale_value = net.layer[i].norm_param.scale_filler.value
        #
        break
print('The parameters in Normalize layer:')
print(across_spatial)
print(channel_shared)
print(scale_type)
print(scale_value)
if across_spatial == False and channel_shared == True and abs(scale_value-1.0)<1e-10:
    print('when: across_spatial == False, channel_shared == False, scale_value = 1 ')
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(temp_result**2))
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

if across_spatial == True and channel_shared == True and abs(scale_value-1.0)<1e-10:
    print('when: across_spatial == True, channel_shared == True, scale_value = 1, check for across_spatial')
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2))  # input_data[i,:,:,:] stand for across space/position/seat
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

if across_spatial == True and channel_shared == True and abs(scale_value-1.0) >= 0.5: # set scale_value = 0.5
    print('when: across_spatial == True, channel_shared == True, scale_value != 1 , check for scale_value')
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2))  # input_data[i,:,:,:] stand for across space/position/seat
    result_byhand = result_byhand * scale_value
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

if across_spatial == True and channel_shared == False and abs(scale_value-1.0) >= 0.5: # set scale_value = 0.5
    print('when: across_spatial == True, channel_shared == False, scale_value != 1 , check for channel_shared')
    #####################################################################
    # need back propagation 
    #####################################################################
    # across_spatial = False, represent: normalize each position(x,y) across three channels
    # check i-th sample
    i = 0 # i = o or i = 1 in our case
    position = [0,2]  # the position in heatmap, n * c * h * w, that is, in h*w
    # the result computed by hand 
    temp_result = input_data[i,:,position[0],position[1]] 
    result_byhand = temp_result / np.sqrt(np.sum(input_data[i,:,:,:]**2))  # input_data[i,:,:,:] stand for across space/position/seat
    result_byhand = result_byhand * scale_value
    # the result computed by normalized layer
    result_bylayer = result[i,:,position[0],position[1]]

    print(result_byhand)
    print(result_bylayer)

检查2:scale的更新

和普通的卷积层参数一样,scale_filler 中的value值会随着迭代的更新而更新,例如,在AlexNet网络的pool5层(其中pool5层输出的blob形状为(n*c*h*w)=(n*256*6*6))后面添加如上的一个norm层,然后在每次迭代后输出:

scale_value = mysolver.net.params['norm'][0].data
print(scale_value)

结果:

5.9999228
5.999773
5.999566
5.99917
5.9985614
5.997842
5.9970937

如果想将scale_value设置为固定的值,那么和卷积层参数一样,将学习率设置为0:,如下:

layer {
  name: "norm"
  type: "Normalize"
  bottom: "pool5"
  top: "norm"
  param {
    lr_mult: 0 # 学习率设置为0
    decay_mult: 0
  }
  norm_param {
    across_spatial: True
    scale_filler {
      type: "constant"
      value: 6
    }
    channel_shared: True
  }

}

检测3:channel_shared

默认情况下,channel_shared设置为True,那么我们在prototxt中将其修改为False。

layer {
  name: "norm"
  type: "Normalize"
  bottom: "pool5"
  top: "norm"
  param {
    lr_mult: 0 # 学习率设置为0
    decay_mult: 0
  }
  norm_param {
    across_spatial: True
    scale_filler {
      type: "constant"
      value: 6
    }
    channel_shared: False
  }

}

然后,我们在每次迭代后打印scale_value或者shape或者长度,按照参考文献[1]和[2]描述,其长度应该为256,即每个通道对应一个scale_value值。

 scale_value = mysolver.net.params['norm'][0].data
 print(scale_value.shape)  #结果:(256,)

参考文献:
1.https://blog.csdn.net/zqjackking/article/details/69938901[caffe中的normalization_layer]
2.https://blog.csdn.net/weixin_35653315/article/details/72715367 [Normalization on conv4_3 in SSD]
3.Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
4.SSD 中的 test_normalize_layer.cpp.

猜你喜欢

转载自blog.csdn.net/xuluhui123/article/details/80295222