caffe finetune

propagate_down具体使用方法及决定是否反传的各个参数的区别

假设我们有4个卷积层A->B->C->D

propagate_down

  • 我们希望C层的参数不改变,以及C前面的A、B层的参数也不改变,这种情况也就是D层的梯度不往前反向传播到D层的输入blob(也就是C层的输出blob 没有得到梯度),我们可以通过设置D层的propagate_down为false来做到 propagate_down的数量与输入blob的数量相同,假如你某个层有2个输入blob,那么你应该在该layer的Param里面写上两行: 
    propagate_down : 0 # 第1个输入blob不会得到反向传播的梯度 propagate_down : 0 # 第2个输入blob不会得到反向传播的梯度 
    这样,你这个layer的梯度就不会反向传播,前面的所有layer的参数也就不会改变了
  • propagate_down的设置使得前面的所有层都不能得到反传梯度,即前面所有层的参数都不更新

  • 具体使用方法


layer{
         name: "conv2_3x3"
         type: "ConvolutionData"
         bottom: "pool1_3x3"
         top: "conv2_3x3"
         param{
                 lr_mult: 1
                 decay_mult: 1
         }
         propagate_down: 0
         convolution_param {
                 num_output: 64
                 bias_term: false
                 pad: 1
                 kernel_size: 3
                 stride: 1
                 weight_filler {
                         type: "xavier"
                 }       
         }
 }

r_mult

  • 我们希望C层的参数不会改变,但是C前面的A、B层的参数需要改变,这种情况,只是固定了C层的参数,C层得到的梯度依然会反向传播给前面的B层。只需要将对应的参数blob的学习率调整为0:在layer里面加上 “ param { lr_mult: 0 }” 就可以了
  • lr_mult的设置只是使得该层的学习率为零该层参数不更新,梯度依然会反向传播回去
  • 具体使用方法
  • layer{
        name: "ip1"
        type: "InnerProduct"
        bottom: "pool3"
        top: "ip1"
        param{ # 对应第1个参数blob的配置,也就是全连接层的参数矩阵的配置
            lr_mult: 0 # 学习率为0
            decay_mult: 0
        }
        param{ # 对应第2个参数blob的配置,也就是全连接层的偏置项的配置
            lr_mult: 0 # 学习率为0
            decay_mult: 0
        }
        inner_product_param {
            num_output: 2
            weight_filler {
                type: "gaussian"
                std: 0.01
            }
            bias_filler {
                type: "constant"
                value: 0
            }
        }
    }



caffe  convolution_param

 * @param param provides ConvolutionParameter convolution_param,
   *    with ConvolutionLayer options:
   *  - num_output. The number of filters.
   *  - kernel_size / kernel_h / kernel_w. The filter dimensions, given by
   *  kernel_size for square filters or kernel_h and kernel_w for rectangular
   *  filters.
   *  - stride / stride_h / stride_w (\b optional, default 1). The filter
   *  stride, given by stride_size for equal dimensions or stride_h and stride_w
   *  for different strides. By default the convolution is dense with stride 1.
   *  - pad / pad_h / pad_w (\b optional, default 0). The zero-padding for
   *  convolution, given by pad for equal dimensions or pad_h and pad_w for
   *  different padding. Input padding is computed implicitly instead of
   *  actually padding.
   *  - dilation (\b optional, default 1). The filter
   *  dilation, given by dilation_size for equal dimensions for different
   *  dilation. By default the convolution has dilation 1.
   *  - group (\b optional, default 1). The number of filter groups. Group
   *  convolution is a method for reducing parameterization by selectively
   *  connecting input and output channels. The input and output channel dimensions must be divisible
   *  by the number of groups. For group @f$ \geq 1 @f$, the
   *  convolutional filters' input and output channels are separated s.t. each
   *  group takes 1 / group of the input channels and makes 1 / group of the
   *  output channels. Concretely 4 input channels, 8 output channels, and
   *  2 groups separate input channels 1-2 and output channels 1-4 into the
   *  first group and input channels 3-4 and output channels 5-8 into the second
   *  group.
   *  - bias_term (\b optional, default true). Whether to have a bias.
   *  - engine: convolution has CAFFE (matrix multiplication) and CUDNN (library
   *    kernels + stream parallelism) engines.

   */



猜你喜欢

转载自blog.csdn.net/u011808673/article/details/80523953