Caffe | 核心积木Layer层类详解

0.简介

Layer层类是Caffe中搭建网络的基本单元，当然也是使用Caffe训练的核心部件单元，因此我们将其称之为Caffe的核心积木。Layer基类派生出了各种不同功能的层类，Layer类派生出来的层类通过实现两个虚函数Forward()和Backward()，产生了各式各样功能的层类。Forward是从根据bottom计算top的过程进行前向计算，Backward则相反根据top计算bottom的过程进行反向传播。

1.Layer基类

layer.hpp是所有的网络层的基类，其中，定义了一些通用的接口（各个派生类中都有的操作），比如：

(前馈_cpu or 前馈_gpu) ：通过给定的bottom blob的值计算top blob的值。需要注意某些层没有gpu版。
(反馈_cpu or 反馈_gpu)：反馈：通过给定的top blob的误差梯度计算bottom blob的梯度值。
Layersetup：读取指定层类的layer param（层参数），为后续reshape做准备。
reshape：根据输入该层的bottom blob的形状，和改成定制化的计算策略（也就是当前层的逻辑）计算得到对应的top blob的形状，并预先分配好内存空间。
const LayerParameter& layer_param() const { return layer_param_; }
用以读取protobuf文件中存储的layer参数
vector<share_ptr<Blob>> blobs_；
用以存储当前层各类可学习参数

其中也定义了一些通用的字段，比如：

message LayerParameter {
  optional string name = 1; // 层名
  optional string type = 2; // 层类型
  repeated string bottom = 3; // 层输入 
  repeated string top = 4; // 层输出
  optional Phase phase = 10; // Train或者Test
  repeated float loss_weight = 5; //为每个top量设定权重，通常设为0或1
  repeated ParamSpec param = 6;//指定训练参数（solver中设置的学习率乘以该参数，
                               //才为当前层的真正学习率）。
  repeated BlobProto blobs = 7;
  repeated bool propagate_down = 11;//若当前层该参数设为0，则当前层会被反向梯度传播给跳过

当然也有一些特定派生子类层才会有的参数：

optional TransformationParameter transform_param = 100;//数据预处理参数特有的参数
optional LossParameter loss_param = 101;//loss层特有的参数
......
optional ConvolutionParameter convolution_param = 106;
optional DataParameter data_param = 107;
......
optional PoolingParameter pooling_param = 121;
}

Layer.hpp是抽象出来的基类，其他的xxx_Layers.hpp都是在其基础上的继承。在Layer.hpp的基础上直接衍生出来的5种Layers：data_layer 、neuron_layer 、loss_layer 、common_layer 、vision_layer 。对整个layer层做个基本介绍：data负责输入，vision负责卷积相关的计算，neuron和common负责中间部分的数据计算，而loss是最后一部分，负责计算反向传播的误差。

Note：再重复的介绍下如何看懂Message。
Message中的filed的3种形式：
//1. Required是必须有值的，
//2. optional是可选项，
//3. repeated表示后面单元为相同类型的一组向量。
Message中的类型标识符：
//1.string/float:这些都是c/c++中固有的数据类型
 //2.TransformationParameter/TransformationParameter:caffe定义的数据类型，类似与结构体类型（内部有多种类型，后续针对层进行介绍）

2.data_layer（data_layer.hpp/cpp）

数据的输入层类，处于整个网络的最底层，它可以从数据库leveldb、lmdb中读取数据，也可以直接从内存中读取，还可以从hdf5，甚至是原始的图像读入数据。作为网络的最底层，主要实现数据格式的转换。由于支持不同格式的输入，因此基于该类又往下派生出了二级子类，主要有(详细介绍一种，其余可类比)：
(1):Date

Layer type: Date //需要注意的是该参数可以从对应的cpp文件最后找到~
头文件位置：./include/caffe/layers/data_layer.hpp
CPU 执行源文件位置:./src/caffe/layers/data_layer.cpp
Date层的功能：读取LevelDB，LMDB，并进行一系列前处理。

optional DataParameter data_param = 11;
optional TransformationParameter transform_param = 100;



message DataParameter {
  enum DB {
    LEVELDB = 0;
    LMDB = 1;
  }
  optional string source = 1;
  optional uint32 batch_size = 4;//批处理尺寸
  optional uint32 rand_skip = 7 [default = 0];//在开头跳过这个数量的输入; 对异步sgd很有用
  optional DB backend = 8 [default = LEVELDB];//LMDB或LEVELDB
  optional float scale = 2 [default = 1];//数据比例缩放，用于减均值操作之后
  optional string mean_file = 3;//已经弃用，被挪到transform参数中
  optional uint32 crop_size = 5 [default = 0];//已经弃用，被挪到transform参数中
  optional bool mirror = 6 [default = false];//已经弃用，被挪到transform参数中
  optional bool force_encoded_color = 9 [default = false];//强制图像具有3个通道
  optional uint32 prefetch = 10 [default = 4];
}

//在数据层中都会用到的图像预处理参数
message TransformationParameter {
  optional float scale = 1 [default = 1];//归一化操作，主要要在取均值后才能用
  optional bool mirror = 2 [default = false];//对输入数据做随机水平镜像
  optional uint32 crop_size = 3 [default = 0];
  optional string mean_file = 4;//指定均值文件（和mean_value不同用法同样效果）
  repeated float mean_value = 5;//mean_file和mean_value不能共存
  optional bool force_color = 6 [default = false];//强制3通道
  optional bool force_gray = 7 [default = false];//强制灰度图
}

使用例子：

layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }

  transform_param {
    scale:0.0078125
    mirror:true
    crop_size:32
    mean_value: 128
    mean_value: 128
    mean_value: 128
  }
  data_param {
    source: "examples/cifar10/cifar10_train_lmdb"
    batch_size: 32
    backend: LMDB
  }
}

Note:从下述的代码可以看出，如果我们输入的图片尺寸大于crop_size，那么图片会被裁剪。当 phase 模式为 TRAIN 时，裁剪是随机进行裁剪，而当为TEST 模式时，其裁剪方式则只是裁剪图像的中间区域。


//We only do random crop when we do training.
    if (phase_ == TRAIN) {
      h_off = Rand(datum_height - crop_size + 1);
      w_off = Rand(datum_width - crop_size + 1);
    } else {
      h_off = (datum_height - crop_size) / 2;
      w_off = (datum_width - crop_size) / 2;
    }
  }

(2)MEMORY_DATA dummy_data_layer.hpp

optional DummyDataParameter dummy_data_param = 109;

message DummyDataParameter {
  // This layer produces N >= 1 top blobs.  DummyDataParameter must specify 1 or N
  // shape fields, and 0, 1 or N data_fillers.
  //
  // If 0 data_fillers are specified, ConstantFiller with a value of 0 is used.
  // If 1 data_filler is specified, it is applied to all top blobs.  If N are
  // specified, the ith is applied to the ith top blob.
  repeated FillerParameter data_filler = 1;
  repeated BlobShape shape = 6;

  // 4D dimensions -- deprecated.  Use "shape" instead.
  repeated uint32 num = 2;
  repeated uint32 channels = 3;
  repeated uint32 height = 4;
  repeated uint32 width = 5;
}

(3)HDF5_DATA hdf5_data_layer.hpp

optional HDF5DataParameter hdf5_data_param = 112;

// Message that stores parameters used by HDF5DataLayer
message HDF5DataParameter {
  // Specify the data source.
  optional string source = 1;
  // Specify the batch size.
  optional uint32 batch_size = 2;

  // Specify whether to shuffle the data.
  // If shuffle == true, the ordering of the HDF5 files is shuffled,
  // and the ordering of data within any given HDF5 file is shuffled,
  // but data between different files are not interleaved; all of a file's
  // data are output (in a random order) before moving onto another file.
  optional bool shuffle = 3 [default = false];
}

(4)HDF5_OUTPUT hdf5_output_layer.hpp

optional HDF5OutputParameter hdf5_output_param = 113;

message HDF5OutputParameter {
  optional string file_name = 1;
}

(5)IMAGE_DATA image_data_layer.hpp

optional ImageDataParameter image_data_param = 115;

message ImageDataParameter {
  // Specify the data source.
  optional string source = 1;
  // Specify the batch size.
  optional uint32 batch_size = 4 [default = 1];
  // The rand_skip variable is for the data layer to skip a few data points
  // to avoid all asynchronous sgd clients to start at the same point. The skip
  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
  // be larger than the number of keys in the database.
  optional uint32 rand_skip = 7 [default = 0];
  // Whether or not ImageLayer should shuffle the list of files at every epoch.
  optional bool shuffle = 8 [default = false];
  // It will also resize images if new_height or new_width are not zero.
  optional uint32 new_height = 9 [default = 0];
  optional uint32 new_width = 10 [default = 0];
  // Specify if the images are color or gray
  optional bool is_color = 11 [default = true];
  // DEPRECATED. See TransformationParameter. For data pre-processing, we can do
  // simple scaling and subtracting the data mean, if provided. Note that the
  // mean subtraction is always carried out before scaling.
  optional float scale = 2 [default = 1];
  optional string mean_file = 3;
  // DEPRECATED. See TransformationParameter. Specify if we would like to randomly
  // crop an image.
  optional uint32 crop_size = 5 [default = 0];
  // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
  // data.
  optional bool mirror = 6 [default = false];
  optional string root_folder = 12 [default = ""];
}

3.neuron_layer (neuron_layer.hpp/cpp)

NeuronLayer，顾名思义我们发现这一层叫做神经元层，因此该层派生出了各类激活函数。该层派生出来的激活函数层明确定义了输入ExactNumBottomBlobs()和ExactNumTopBlobs()都是常量1,即输入一个blob，输出一个blob(具有同的bottom,topsize)。其派生类主要是元素级别的运算（比如Dropout运算，激活函数ReLu，Sigmoid等），运算均为同址计算（inplacecomputation，返回值覆盖原值而占用新的内存）。Caffe中实现了大量激活函数GPU和CPU的都有很多。它们的父类都是NeuronLayer。

AbsValLayer，BNLLLayer，DropoutLayer，ExpLayer，
LogLayer，PowerLayer，ReLULayer，CuDNNReLULayer，
SigmoidLayer，CuDNNSigmoidLayer，TanHLayer，
CuDNNTanHLayer，ThresholdLayer，PReLULayer

举几个例子：

SigmoidLayer
Sigmoid函数，也称为阶跃函数，函数曲线是一个优美的S形。目前使用Sigmoid函数已经不多了，大多使用ReLU来代替
ReLULayer
目前在激活层的函数中使用ReLU是非常普遍的，一般我们在看资料或者讲义中总是提到的是Sigmoid函数，它比Sigmoid有更快的收敛性，因为sigmoid在收敛的时候越靠近目标点收敛的速度会越慢，也是其函数的曲线形状决定的。而ReLULayer则相对收敛更快，且relu激活函数的梯度值为1，一定程度上减弱了网络过深而导致的梯度消散。
DropoutLayer
DropoutLayer现在是非常常用的一种网络层，只用在训练阶段，一般用在网络的全连接层中，一定程度上能够抑制网络的过拟合。其思想是在训练过程中随机的将一部分输入x值置为0。

4.loss_layer (loss_layer.hpp/cpp)

LossLayer，顾名思义我们发现这一层叫做损失层，因此该层派生出了各类损失函数层。这个头文件包含了neuron_layers.hpp，一般来说Loss放在最后一层。caffe实现了大量loss function，它们的父类都是LossLayer。下面列举几个常用的派生损失层类。

message LossParameter {
  optional int32 ignore_label = 1;//如果该值为-1，表示标签为-1的类不参与梯度反向传播。
//利用什么策略对损失层进行归一化，（目前只支持SoftmaxWithLoss和SigmoidCrossEntropyLoss）
  enum NormalizationMode {
    FULL = 0;//会将ignore_label的类被也计算进去，除以（当前batch大小*空间维度数）
    VALID = 1;//不会将ignore_label的类别算进去
    BATCH_SIZE = 2;//只除以batch
    NONE = 3;//不对损失做归一化
  }
//由于历史原因，SigmoidCrossEntropyLoss的默认规范化是BATCH_SIZE和* not * VALID。
  optional NormalizationMode normalization = 3 [default = VALID];
}

补充链接：为没有ignore_label参数的层条件该功能

（1）SoftmaxWithLoss

Layer type: SoftmaxWithLoss //需要注意的是该参数可以从对应的cpp文件最后找到~
头文件位置： ./include/caffe/layers/softmax_loss_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/softmax_loss_layer.cpp
CUDA GPU 执行源文件位置: ./src/caffe/layers/softmax_loss_layer.cu
SoftmaxWithLoss层的功能：计算其输入的softmax的多项逻辑损失，概念上这个层就是SoftmaxLayer加上了多项式逻辑损失，但提供了更加数值稳定的梯度。在测试时，该层可用SoftmaxLayer替代。

  optional LossParameter loss_param = 101;
  optional SoftmaxParameter softmax_param = 125;

// SoftmaxLayer, SoftmaxWithLossLayer的参数
message SoftmaxParameter {
  enum Engine {
    DEFAULT = 0;
    CAFFE = 1;
    CUDNN = 2;
  }
  optional Engine engine = 1 [default = DEFAULT];
  optional int32 axis = 2 [default = 1];//确定哪个维度来计算softmax，也可以用负数标识
}

（2）HingeLoss

Layer type: HingeLoss //需要注意的是该参数可以从对应的cpp文件最后找到~
头文件位置： ./include/caffe/layers/hinge_loss_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/hinge_loss_layer.cpp
HingeLoss层的功能：提供了 L1 和 L2 两种Hinge Loss,主要针对要求”maximum-margin”的分类问题，因此尤其适用于SVM分类（图解hingeloss在caffe中的用法）。

  optional LossParameter loss_param = 101;
  optional HingeLossParameter hinge_loss_param = 114;

message HingeLossParameter {
  enum Norm {
    L1 = 1;
    L2 = 2;
  }
 
  optional Norm norm = 1 [default = L1];//指定用L1还是L2
}

补充链接：Hingeloss详解

（3）EuclideanLoss
对回归任务计算欧氏距离（L2）损失，可用于最小二乘回归任务。

（4）SigmoidCrossEntropyLoss
计算交叉熵（逻辑斯蒂）损失，通常用于以概率形式预测目标。该层可以分解为SigmoidLayer+CrossEntropyLayer，但它的梯度计算在数值上更为稳健。在测试时，该层可用SigmoidLayer替代。

（5）MultinomialLogisticLossLayer
对一对多的分类任务计算多项逻辑斯蒂损失，直接将预测的概率分布作为输入。当预测并不是概率分布时应该用SoftmaxWithLossLayer，因为它在计算多项逻辑斯蒂损失前通过SoftmaxLayer将预测映射为分布。

（6）InfogainLoss
是MultinomialLogisticLossLayer的泛化，利用“information gain”（infogain）矩阵指定所有标签对的“value“，如果infogain矩阵一致则与MultinomialLogisticLossLayer等价。

optional LossParameter loss_param = 101;
optional InfogainLossParameter infogain_loss_param = 16;

message InfogainLossParameter {
  optional string source = 1;//指定infogian矩阵
  optional int32 axis = 2 [default = 1]; // 概率维度是哪个
}

（7）ContrastiveLoss
对比损失函数，可用于训练孪生网络（Siamese网络）

补充链接：各类loss功能详解

5.common_layer (common_layer.hpp/cpp)

剩下的那些复杂的计算则通通放在了common_layers.hpp中。像ArgMaxLayer、ConcatLayer、FlattenLayer、SoftmaxLayer、SplitLayer和SliceLayer等各种对blob增减修改的操作，也就是一些特定功能层。
（1）Flatten

Layer type: Flatten
头文件位置：./include/caffe/layers/flatten_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/flatten_layer.cpp
Flatten层的功能：Flatten层是把一个输入的大小为n * c * h * w变成一个简单的向量，其大小为 n * (chw)。可以用reshape代替~，相当于第一维不变，后面的自动计算。

optional FlattenParameter flatten_param = 135;

message FlattenParameter {
  optional int32 axis = 1 [default = 1];//从哪个轴开始平铺（该轴前面的都保留原状）
  optional int32 end_axis = 2 [default = -1];//哪个轴结束平铺（该轴后面的都保留原状）
}

（2）Reshape

Layer type: Reshape
头文件位置：./include/caffe/layers/reshape_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/reshape_layer.cpp
Reshape层的功能：根据给定参数改变输入blob的维度，仅仅改变数据的维度，但内容不变。

optional ReshapeParameter reshape_param = 133;

message ReshapeParameter {
//指定输出尺寸。 如果某些尺寸设置为0，则使用底层的相应尺寸（未更改）。 
//确切地说，一个维度可以设置为-1，在这种情况下，其值是从底部blob的计数和剩余维度推断出来的。

//例如，假设我们想要重塑形状为2 x 8的2D blob“输入”：如果“input”是2D形状2 x 8
//则以下reshape_param规范都是等效的，从而产生具有形状的3D blob“输出”，都是2 x 2 x 4

  //   reshape_param { shape { dim:  2  dim: 2  dim:  4 } }
  //   reshape_param { shape { dim:  0  dim: 2  dim:  4 } }
  //   reshape_param { shape { dim:  0  dim: 2  dim: -1 } }
  //   reshape_param { shape { dim:  0  dim:-1  dim:  4 } }
 
  optional BlobShape shape = 1;

//axis和num_axes控制底部blob形状的部分，由重塑形状替换（包含在其中）。 
//默认情况下（axis == 0和num_axes == -1）

//axis的用法：
//若axis为正数，则表示axis指定的轴及其之前轴不变，该轴之后根据dim重塑。
//若axis为负数，则表示axis指定的轴及其之前轴不变，该轴之后根据dim重塑。
//所以axis的正负仅仅是表示索引策略
//例如，假设我们想要重塑形状为2 x 8的2D blob“输入”：如果“input”是2D形状2 x 8
//则以下reshape_param规范都是等效的，从而产生具有形状的3D blob“输出”，都是2 x 2 x 4
  
  //   reshape_param { shape { dim: 2  dim: 2  dim: 4 } }
  //   reshape_param { shape { dim: 2  dim: 4 } axis:  1 }
  //   reshape_param { shape { dim: 2  dim: 4 } axis: -3 }
  
//num_axes的用法
//num_axes用于指定重塑的范围
//如果num_axes> = 0（并且轴> = 0），则仅在[axis，axis+ num_axes]范围内的输入轴上执行整形。
//num_axes为默认值-1的时候，就是把所有从指点axis开始，所有轴包含了
//例如，假设我们想要重塑形状为2 x 8的2D blob“输入”：如果“input”是2D形状2 x 8
//则以下reshape_param规范都是等效的，从而产生具有形状的3D blob“输出”，都是1 x 2 x 8
  
  //   reshape_param { shape { dim:  1  dim: 2  dim:  8 } }
  //   reshape_param { shape { dim:  1  dim: 2  }  num_axes: 1 }
  //   reshape_param { shape { dim:  1  }  num_axes: 0 }
  
//另外，以下reshape_param规范都是等效的，从而产生具有形状的3D blob“输出”，都是1 x 2 x 8
  //   reshape_param { shape { dim: 2  dim: 1  dim: 8  }  }
  //   reshape_param { shape { dim: 1 }  axis: 1  num_axes: 0 }
  optional int32 axis = 2 [default = 0];
  optional int32 num_axes = 3 [default = -1];
}

Reshape layer只改变输入数据的维度，但内容不变，也没有数据复制的过程，与Flatten layer类似。
输出维度由reshape_param 指定，正整数直接指定维度大小，下面两个特殊的值：
0 => 表示copy the respective dimension of the bottom layer，复制输入相应维度的值。
-1 => 表示infer this from the other dimensions，根据其他维度自动推测维度大小。reshape_param中至多只能有一个-1。
再举一个例子：如果指定reshape_param参数为：{ shape { dim: 0 dim: -1 } } ，那么输出和Flattening layer的输出是完全一样的。

（3）Split

Layer type: Split
头文件位置：./include/caffe/layers/split_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/split_layer.cpp
CUDA GPU 执行源文件位置: ./src/caffe/layers/split_layer.cu
Split层的功能：将blob复制几份，分别给不同的layer，也就是说这些上层layer共享这个blob。

（4）Slice

Layer type: Slice
头文件位置：/include/caffe/layers/slice_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/slice_layer.cpp
CUDA GPU 执行源文件位置: ./src/caffe/layers/slice_layer.cu
Slice层的功能：根据给定的参数对输入层进行划分（以通道为单位）。

optional SliceParameter slice_param = 126;

message SliceParameter {
  //表示对输入的layer中的哪个维度的数据进行划分
 //默认情况下，是对"channl"关键字的维度轴进行划分
  optional int32 axis = 3 [default = 1];
 //参数slice_point：slice_point个数 = top个数 - 1。
  repeated uint32 slice_point = 2;

}

layer {
  name: "slicer_label"
  type: "Slice"
  bottom: "label"
  ## Example of label with a shape N x 3 x 1 x 1
  ## Example of label with a shape N x 5 x 1 x 1
  top: "label1"
  top: "label2"
  top: "label3"
  slice_param {
    axis: 1
    slice_point: 1
    slice_point: 2
  }
}
上图是对第二个通道的那个维度进行切割，不同的输入对应的输出如下：
如上图就是把label划分为三个维度，每个维度分别1。（shape N x 3 x 1 x 1）
如上图就是把label划分为三个维度，每个维度分别1，1，3。（shape N x 5 x 1 x 1）

（5）Concat

Layer type: Concat
头文件位置： ./include/caffe/layers/concat_layer.hpp
CPU 执行源文件位置:./src/caffe/layers/concat_layer.cpp
CUDA GPU 执行源文件位置: ./src/caffe/layers/concat_layer.cu
Concat层的功能：Concat层是一个实用程序层，它将多个输入blob连接到一个输出blob（按照给定的axis，注意除了规定的axis以外，被concat的输入bolb的其他维度的size必须一致）。

optional ConcatParameter concat_param = 9;

message ConcatParameter {
  //表示对输入的layer中的哪个维度的数据进行整合
 //默认情况下，是对"channl"关键字的维度轴进行划分
  optional int32 axis = 2 [default = 1];

  // DEPRECATED: alias for "axis" -- does not support negative indexing.
  optional uint32 concat_dim = 1 [default = 1];
}

（6）Eltwise

Layer type: Eltwise
头文件位置：./include/caffe/layers/eltwise_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/eltwise_layer.cpp
CUDA GPU 执行源文件位置: ./src/caffe/layers/eltwise_layer.cu
Eltwise层的功能：按元素操作层（Resnet 中的shortcut）。

optional EltwiseParameter eltwise_param = 110;

message EltwiseParameter {
  enum EltwiseOp {
    PROD = 0;//按照元素乘积
    SUM = 1;//按照元素求和
    MAX = 2;//求元素最大值
  }
  optional EltwiseOp operation = 1 [default = SUM]; // element-wise operation
  repeated float coeff = 2; // SUM操作中的系数（下面用个例子）
  // Whether to use an asymptotically slower (for >2 inputs) but stabler method
  // of computing the gradient for the PROD operation. (No effect for SUM op.)
  optional bool stable_prod_grad = 3 [default = true];
}

layer {
        name: "eltwise"
        type: "Eltwise"
        bottom: "conv1"
        bottom: "conv2"
        bottom: "conv3"
        top: "eltwise"
        eltwise_param {
             operation: SUM  
        }
}

对输入的三个卷积层的特征图做求和，最终合并成一层。 
那么问题来了，如果我想要做差呢，那么coeff参数就起到作用了，具体如下：

layer {
        name: "eltwise"
        type: "Eltwise"
        bottom: "data"
        bottom: "conv3"
        top: "eltwise"
        eltwise_param {
        operation: SUM
                coeff: 1
                coeff: -1
                }
}

这个操作就相当于data层减去conv3层（像素级的）。

（7）Reduction

Layer type: Reduction
头文件位置：./include/caffe/layers/reduction_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/reduction_layer.cpp
CUDA GPU 执行源文件位置: ./src/caffe/layers/reduction_layer.cu
Reduction层的功能：使用sum或mean等操作作用于输入blob按照给定参数规定的维度。（通俗的讲就是将输入的特征图按照给定的维度进行求和或求平均）。

optional ReductionParameter reduction_param = 136;

// Message that stores parameters used by ReductionLayer
message ReductionParameter {
  enum ReductionOp {
    SUM = 1;
    ASUM = 2;
    SUMSQ = 3;
    MEAN = 4;
  }

  optional ReductionOp operation = 1 [default = SUM]; 
//目前直支持reduct从axis到最后，不支持reduct从axis到第n维（n小于最大维度数）
//假设我们现在有一个n维的bottom输入
//     (d0, d1, d2, ..., d(m-1), dm, d(m+1), ..., d(n-1)).
//此时 如果axis == m，那么输出的top的形状为
//     (d0, d1, d2, ..., d(m-1)),
//如果axis == 0（默认值），则输出Blob始终具有空形状（计数1），
// 通常对创建新的损失函数很有用。
  optional int32 axis = 2 [default = 0];

  optional float coeff = 3 [default = 1.0]; // coefficient for output
}

我想把nchw的blob变成n1hw（使用SUM），param该如何设定？
答：ReductionLayer 干不了这事儿。因为它只支持从你指定的axis到tail axis为止的reduction操作。它不支持针对某个坐标轴独立做reduction，而是从某个坐标轴开始做到最后一个坐标轴。即无论你的指定哪个坐标轴，它都会默认reduce做到最后一个坐标轴的。nchw, 你若设axis=1，它就变成n, 若你指定2，它就变成nc。

（8）Slience

Layer type: Silence
头文件位置：./include/caffe/layers/silence_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/silence_layer.cpp
CUDA GPU 执行源文件位置: ./src/caffe/layers/silence_layer.cu
Slience层的功能：当用slice层把标签分割成多份，但又有其中一份或者多份没有用到时，对没有用到的label进行沉默操作（不让他在训练的时候打印出来，不然打印log会很乱~）。

（9）InnerProduct

optional InnerProductParameter inner_product_param = 117;

message InnerProductParameter {
  optional uint32 num_output = 1; // The number of outputs for the layer
  optional bool bias_term = 2 [default = true]; // whether to have bias terms
  optional FillerParameter weight_filler = 3; // The filler for the weight
  optional FillerParameter bias_filler = 4; // The filler for the bias

  // The first axis to be lumped into a single inner product computation;
  // all preceding axes are retained in the output.
  // May be negative to index from the end (e.g., -1 for the last axis).
  optional int32 axis = 5 [default = 1];
  // Specify whether to transpose the weight matrix or not.
  // If transpose == true, any operations will be performed on the transpose
  // of the weight matrix. The weight matrix itself is not going to be transposed
  // but rather the transfer flag of operations will be toggled accordingly.
  optional bool transpose = 6 [default = false];
}

（10）ArgMaxLayer

optional ArgMaxParameter argmax_param = 103;

message ArgMaxParameter {
  // If true produce pairs (argmax, maxval)
  optional bool out_max_val = 1 [default = false];
  optional uint32 top_k = 2 [default = 1];
  // The axis along which to maximise -- may be negative to index from the
  // end (e.g., -1 for the last axis).
  // By default ArgMaxLayer maximizes over the flattened trailing dimensions
  // for each index of the first / num dimension.
  optional int32 axis = 3;
}

（11）MVNLayer

optional MVNParameter mvn_param = 120;

message MVNParameter {
  // This parameter can be set to false to normalize mean only
  optional bool normalize_variance = 1 [default = true];

  // This parameter can be set to true to perform DNN-like MVN
  optional bool across_channels = 2 [default = false];

  // Epsilon for not dividing by zero while normalizing variance
  optional float eps = 3 [default = 1e-9];
}

（12）SoftmaxLayer

optional LossParameter loss_param = 101;
  optional SoftmaxParameter softmax_param = 125;

// SoftmaxLayer, SoftmaxWithLossLayer的参数
message SoftmaxParameter {
  enum Engine {
    DEFAULT = 0;
    CAFFE = 1;
    CUDNN = 2;
  }
  optional Engine engine = 1 [default = DEFAULT];
  optional int32 axis = 2 [default = 1];//确定哪个维度来计算softmax，也可以用负数标识
}

6.vision_layer

视觉层通常将图像作为输入并产生其他图像作为输出，尽管它们可以获取其他类型和尺寸的数据。现实世界中的典型“图像”可以具有一个颜色通道（channel = 1），如灰度图像，或三个颜色通道（channel = 3），如RGB（红色，绿色，蓝色）图片。特别地，大多数视觉层通过将特定操作应用于输入的某个区域来工作以产生输出的相应区域。它主要是实现Convolution、pooling、LRN等操作。（BatchNorm也就在这个部分介绍了。）
（1）Convolution

Layer type: Convolution
头文件位置：./include/caffe/layers/conv_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/conv_layer.cpp
CUDA GPU 执行源文件位置: ./src/caffe/layers/conv_layer.cu
Convolution层的功能：使用一组可学习的滤波器对输入图像进行卷积，每个滤波器在输出图像中生成一个特征映射。
输入
n * c_i * h_i * w_i
输出
n * c_o * h_o * w_o, where h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1 and w_o likewise.

dilation膨胀卷积说法

optional ConvolutionParameter convolution_param = 106;

message ConvolutionParameter {
  optional uint32 num_output = 1; // 滤波器个数 
  optional bool bias_term = 2 [default = true]; // 是否需要偏置
  repeated uint32 pad = 3; // The padding size; defaults to 0
  repeated uint32 kernel_size = 4; // The kernel size
  repeated uint32 stride = 6; // The stride; defaults to 1
  // Factor used to dilate the kernel, (implicitly) zero-filling the resulting
  // holes. (Kernel dilation is sometimes referred to by its use in the
  // algorithme à trous from Holschneider et al. 1987.)
  repeated uint32 dilation = 18; // The dilation; defaults to 1

  // For 2D convolution only, the *_h and *_w versions may also be used to
  // specify both spatial dimensions.
  optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
  optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
  optional uint32 kernel_h = 11; // The kernel height (2D only)
  optional uint32 kernel_w = 12; // The kernel width (2D only)
  optional uint32 stride_h = 13; // The stride height (2D only)
  optional uint32 stride_w = 14; // The stride width (2D only)
//如果g> 1，我们将每个过滤器的连接限制为输入的子集。 
//具体地，输入和输出通道被分成g组，并且第i个输出组通道将仅连接到第i个输入组通道。
//这个操作可以参考shfflenet网络。
  optional uint32 group = 5 [default = 1]; // The group size for group conv

  optional FillerParameter weight_filler = 7; // The filler for the weight
  optional FillerParameter bias_filler = 8; // The filler for the bias
  enum Engine {
    DEFAULT = 0;
    CAFFE = 1;
    CUDNN = 2;
  }
  optional Engine engine = 15 [default = DEFAULT];

  // The axis to interpret as "channels" when performing convolution.
  // Preceding dimensions are treated as independent inputs;
  // succeeding dimensions are treated as "spatial".
  // With (N, C, H, W) inputs, and axis == 1 (the default), we perform
  // N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
  // groups g>1) filters across the spatial axes (H, W) of the input.
  // With (N, C, D, H, W) inputs, and axis == 1, we perform
  // N independent 3D convolutions, sliding (C/g)-channels
  // filters across the spatial axes (D, H, W) of the input.
  optional int32 axis = 16 [default = 1];

  // Whether to force use of the general ND convolution, even if a specific
  // implementation for blobs of the appropriate number of spatial dimensions
  // is available. (Currently, there is only a 2D-specific convolution
  // implementation; for input blobs with num_axes != 2, this option is
  // ignored and the ND implementation will be used.)
  optional bool force_nd_im2col = 17 [default = false];
}

（2）Pooling

optional PoolingParameter pooling_param = 121;

message PoolingParameter {
  enum PoolMethod {
    MAX = 0;
    AVE = 1;
    STOCHASTIC = 2;
  }
  optional PoolMethod pool = 1 [default = MAX]; // The pooling method
  // Pad, kernel size, and stride are all given as a single value for equal
  // dimensions in height and width or as Y, X pairs.
  optional uint32 pad = 4 [default = 0]; // The padding size (equal in Y, X)
  optional uint32 pad_h = 9 [default = 0]; // The padding height
  optional uint32 pad_w = 10 [default = 0]; // The padding width
  optional uint32 kernel_size = 2; // The kernel size (square)
  optional uint32 kernel_h = 5; // The kernel height
  optional uint32 kernel_w = 6; // The kernel width
  optional uint32 stride = 3 [default = 1]; // The stride (equal in Y, X)
  optional uint32 stride_h = 7; // The stride height
  optional uint32 stride_w = 8; // The stride width
  enum Engine {
    DEFAULT = 0;
    CAFFE = 1;
    CUDNN = 2;
  }
  optional Engine engine = 11 [default = DEFAULT];
  // If global_pooling then it will pool over the size of the bottom by doing
  // kernel_h = bottom->height and kernel_w = bottom->width
  optional bool global_pooling = 12 [default = false];
  // How to calculate the output size - using ceil (default) or floor rounding.
  enum RoundMode {
    CEIL = 0;
    FLOOR = 1;
  }
  optional RoundMode round_mode = 13 [default = CEIL];
}

补充资料：视觉层详解
（2）BatchNorm/Scale

Layer type: BatchNorm
头文件位置：./include/caffe/layers/batch_norm_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/batch_norm_layer.cpp
CUDA GPU 执行源文件位置: ./src/caffe/layers/batch_norm_layer.cu
BatchNorm层的功能：对minibatch的数据作归一化（Normalized）。
Layer type: Scale
头文件位置：./include/caffe/layers/scale_layer.hpp
CPU 执行源文件位置: ./src/caffe/layers/scale_layer.cpp
CUDA GPU 执行源文件位置: ./src/caffe/layers/scale_layer.cu
Scale层的功能：。

optional BatchNormParameter batch_norm_param = 139;
optional ScaleParameter scale_param = 142;

message BatchNormParameter {
 
  optional bool use_global_stats = 1;// 默认训练是false，测试是true
  // What fraction of the moving average remains each iteration?
  // Smaller values make the moving average decay faster, giving more
  // weight to the recent values.
  // Each iteration updates the moving average @f$S_{t-1}@f$ with the
  // current mean @f$ Y_t @f$ by
  // @f$ S_t = (1-\beta)Y_t + \beta \cdot S_{t-1} @f$, where @f$ \beta @f$
  // is the moving_average_fraction parameter.
  optional float moving_average_fraction = 2 [default = .999];
  // Small value to add to the variance estimate so that we don't divide by
  // zero.
  optional float eps = 3 [default = 1e-5];
}


message ScaleParameter {
  // The first axis of bottom[0] (the first input Blob) along which to apply
  // bottom[1] (the second input Blob).  May be negative to index from the end
  // (e.g., -1 for the last axis).
  //
  // For example, if bottom[0] is 4D with shape 100x3x40x60, the output
  // top[0] will have the same shape, and bottom[1] may have any of the
  // following shapes (for the given value of axis):
  //    (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60
  //    (axis == 1 == -3)          3;     3x40;     3x40x60
  //    (axis == 2 == -2)                   40;       40x60
  //    (axis == 3 == -1)                                60
  // Furthermore, bottom[1] may have the empty shape (regardless of the value of
  // "axis") -- a scalar multiplier.
  optional int32 axis = 1 [default = 1];

  // (num_axes is ignored unless just one bottom is given and the scale is
  // a learned parameter of the layer.  Otherwise, num_axes is determined by the
  // number of axes by the second bottom.)
  // The number of axes of the input (bottom[0]) covered by the scale
  // parameter, or -1 to cover all axes of bottom[0] starting from `axis`.
  // Set num_axes := 0, to multiply with a zero-axis Blob: a scalar.
  optional int32 num_axes = 2 [default = 1];

  // (filler is ignored unless just one bottom is given and the scale is
  // a learned parameter of the layer.)
  // The initialization for the learned scale parameter.
  // Default is the unit (1) initialization, resulting in the ScaleLayer
  // initially performing the identity operation.
  optional FillerParameter filler = 3;

  // Whether to also learn a bias (equivalent to a ScaleLayer+BiasLayer, but
  // may be more efficient).  Initialized with bias_filler (defaults to 0).
  optional bool bias_term = 4 [default = false];
  optional FillerParameter bias_filler = 5;
///////////////////////////////////////////
//简化
optional int32 axis [default = 1] ; 默认的处理维度
optional int32 num_axes [default = 1] ; //在BN中可以忽略，主要决定第二个bottom
optional FillerParameter filler ; //初始alpha和beta的填充方式。
optional FillerParameter bias_filler;
optional bool bias_term = 4 [default = false]; //是否学习bias，
//若不学习，则简化为 y = alpha*x
}

为什么BatchNorm要和Scale结合起来使用
首先batchnorm论文中，这个操作想实习的功能如下：
(1) 输入归一化 x_norm = (x-u)/std, 其中u和std是个累计计算的均值和方差注意还有滑动系数。
(2)y=alpha×x_norm + beta，对归一化后的x进行比例缩放和位移。其中alpha和beta是通过迭代学习的。
而caffe中的BatchNorm层实现了功能一，Scale层实现了功能二。

用法简介：
layer {
    bottom: "conv1"
    top: "conv1"
    name: "bn_conv1"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: false
    }

}

layer {
    bottom: "conv1"
    top: "conv1"
    name: "scale_conv1"
    type: "Scale"
    param {
        name: "scale_conv1_0"
          lr_mult: 1
    }
    param {
        name: "scale_conv1_1"
          lr_mult: 1
    }
    scale_param{
        filler{
            value: 1
        }
        bias_term: true
        bias_filler{
            value: 0
        }
    }
}