caffe 之batchnorm

caffe中bn往往和scale一起搭配使用。

layer{

　　bottom: "conv1"

　　top: "conv1"

　　name:"bn_conv1"

　　type:"BatchNorm"

　　batch_norm_param{

　　　　use_global_stats: true

　　}

layer{

　　bottom:"conv1"

　　top: "conv1"

　　name: "scale_conv1"

　　type: "Scale"

　　scale_param{

　　　　bias_term: true

　　}

其中，batch_norm_param中设置use_global_stats为true指在前向推理过程中，使用已经得到的均值和方差统计量进行归一化处理，不再更新这两个统计量。

bias_term： true表示将其配置为线性变换层。

观察caffe.proto中关于BN层参数的描述。

message BatchNormParameter{

　　// If false, normalization is performed over the current mini-batch

　　// and global statistics are accumulated (but not yet used) by a moving average

　　// If true, those accumulated mean and variance values are used for the normalization

　　// By default, it is set to false when the network is in the training phase and true when the network is in the testing phase

　　// 设置为False的话，更新全局统计量，对当前mini-batch进行规范化时，不使用全局统计量

　　// 而是当前batch的均值和方差

　　// 设置为True，使用全局统计量做规范化

　　// 后面在BN的实现代码，这个变量默认随着当前网络在train或test phase而变化

　　// 当train时为false；当test时为true

　　optional bool use_global_stats = 1;

　　// what fraction of the moving average remains each iteration?

　　// Smaller values make the moving average decay faster, giving more

　　// weight to the recent values

　　// Each iteraction updates the moving average @f$S_{t-1}@f$ with the current mean

　　// BN在统计全局均值和方差信息时，使用的是滑动平均法。

　　// St = (1-beta)*Yt + beta*S_{t-1}

　　// 其中St为当前估计出来的全局统计量（均值或方差）， Yt为当前batch的均值或方差

　　// beta是滑动因子。这是一种常见的平均滤波方法。

　　optional float moving_average_fraction =2 [default = .999];

　　// Small value to add to the variance estimate so that we don't divide by zero

　　// 防止除数为0加上去的eps

　　optional float eps = 3 [default = 1e-5];

}