propagate_down具体使用方法及决定是否反传的各个参数的区别
假设我们有4个卷积层A->B->C->D
propagate_down
- 我们希望C层的参数不改变,以及C前面的A、B层的参数也不改变,这种情况也就是D层的梯度不往前反向传播到D层的输入blob(也就是C层的输出blob 没有得到梯度),我们可以通过设置D层的propagate_down为false来做到 propagate_down的数量与输入blob的数量相同,假如你某个层有2个输入blob,那么你应该在该layer的Param里面写上两行:
propagate_down : 0 # 第1个输入blob不会得到反向传播的梯度 propagate_down : 0 # 第2个输入blob不会得到反向传播的梯度
这样,你这个layer的梯度就不会反向传播,前面的所有layer的参数也就不会改变了 propagate_down的设置使得前面的所有层都不能得到反传梯度,即前面所有层的参数都不更新
具体使用方法
layer{ name: "conv2_3x3" type: "ConvolutionData" bottom: "pool1_3x3" top: "conv2_3x3" param{ lr_mult: 1 decay_mult: 1 } propagate_down: 0 convolution_param { num_output: 64 bias_term: false pad: 1 kernel_size: 3 stride: 1 weight_filler { type: "xavier" } } }
r_mult
- 我们希望C层的参数不会改变,但是C前面的A、B层的参数需要改变,这种情况,只是固定了C层的参数,C层得到的梯度依然会反向传播给前面的B层。只需要将对应的参数blob的学习率调整为0:在layer里面加上 “ param { lr_mult: 0 }” 就可以了
- lr_mult的设置只是使得该层的学习率为零该层参数不更新,梯度依然会反向传播回去
- 具体使用方法
layer{ name: "ip1" type: "InnerProduct" bottom: "pool3" top: "ip1" param{ # 对应第1个参数blob的配置,也就是全连接层的参数矩阵的配置 lr_mult: 0 # 学习率为0 decay_mult: 0 } param{ # 对应第2个参数blob的配置,也就是全连接层的偏置项的配置 lr_mult: 0 # 学习率为0 decay_mult: 0 } inner_product_param { num_output: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } }
caffe convolution_param
* @param param provides ConvolutionParameter convolution_param,
* with ConvolutionLayer options:
* - num_output. The number of filters.
* - kernel_size / kernel_h / kernel_w. The filter dimensions, given by
* kernel_size for square filters or kernel_h and kernel_w for rectangular
* filters.
* - stride / stride_h / stride_w (\b optional, default 1). The filter
* stride, given by stride_size for equal dimensions or stride_h and stride_w
* for different strides. By default the convolution is dense with stride 1.
* - pad / pad_h / pad_w (\b optional, default 0). The zero-padding for
* convolution, given by pad for equal dimensions or pad_h and pad_w for
* different padding. Input padding is computed implicitly instead of
* actually padding.
* - dilation (\b optional, default 1). The filter
* dilation, given by dilation_size for equal dimensions for different
* dilation. By default the convolution has dilation 1.
* - group (\b optional, default 1). The number of filter groups. Group
* convolution is a method for reducing parameterization by selectively
* connecting input and output channels. The input and output channel dimensions must be divisible
* by the number of groups. For group @f$ \geq 1 @f$, the
* convolutional filters' input and output channels are separated s.t. each
* group takes 1 / group of the input channels and makes 1 / group of the
* output channels. Concretely 4 input channels, 8 output channels, and
* 2 groups separate input channels 1-2 and output channels 1-4 into the
* first group and input channels 3-4 and output channels 5-8 into the second
* group.
* - bias_term (\b optional, default true). Whether to have a bias.
* - engine: convolution has CAFFE (matrix multiplication) and CUDNN (library
* kernels + stream parallelism) engines.
*/