caffe finetune

propagate_down具体使用方法及决定是否反传的各个参数的区别

假设我们有4个卷积层A->B->C->D

propagate_down

我们希望C层的参数不改变，以及C前面的A、B层的参数也不改变，这种情况也就是D层的梯度不往前反向传播到D层的输入blob（也就是C层的输出blob 没有得到梯度），我们可以通过设置D层的propagate_down为false来做到 propagate_down的数量与输入blob的数量相同，假如你某个层有2个输入blob，那么你应该在该layer的Param里面写上两行：
propagate_down : 0 # 第1个输入blob不会得到反向传播的梯度 propagate_down : 0 # 第2个输入blob不会得到反向传播的梯度
这样，你这个layer的梯度就不会反向传播，前面的所有layer的参数也就不会改变了
propagate_down的设置使得前面的所有层都不能得到反传梯度，即前面所有层的参数都不更新
具体使用方法

layer{
         name: "conv2_3x3"
         type: "ConvolutionData"
         bottom: "pool1_3x3"
         top: "conv2_3x3"
         param{
                 lr_mult: 1
                 decay_mult: 1
         }
         propagate_down: 0
         convolution_param {
                 num_output: 64
                 bias_term: false
                 pad: 1
                 kernel_size: 3
                 stride: 1
                 weight_filler {
                         type: "xavier"
                 }       
         }
 }

r_mult

我们希望C层的参数不会改变，但是C前面的A、B层的参数需要改变，这种情况，只是固定了C层的参数，C层得到的梯度依然会反向传播给前面的B层。只需要将对应的参数blob的学习率调整为0：在layer里面加上 “ param { lr_mult: 0 }” 就可以了
lr_mult的设置只是使得该层的学习率为零该层参数不更新，梯度依然会反向传播回去
具体使用方法

layer{
    name: "ip1"
    type: "InnerProduct"
    bottom: "pool3"
    top: "ip1"
    param{ # 对应第1个参数blob的配置，也就是全连接层的参数矩阵的配置
        lr_mult: 0 # 学习率为0
        decay_mult: 0
    }
    param{ # 对应第2个参数blob的配置，也就是全连接层的偏置项的配置
        lr_mult: 0 # 学习率为0
        decay_mult: 0
    }
    inner_product_param {
        num_output: 2
        weight_filler {
            type: "gaussian"
            std: 0.01
        }
        bias_filler {
            type: "constant"
            value: 0
        }
    }
}

caffe convolution_param

* @param param provides ConvolutionParameter convolution_param,
* with ConvolutionLayer options:
* - num_output. The number of filters.
* - kernel_size / kernel_h / kernel_w. The filter dimensions, given by
* kernel_size for square filters or kernel_h and kernel_w for rectangular
* filters.
* - stride / stride_h / stride_w (\b optional, default 1). The filter
* stride, given by stride_size for equal dimensions or stride_h and stride_w
* for different strides. By default the convolution is dense with stride 1.
* - pad / pad_h / pad_w (\b optional, default 0). The zero-padding for
* convolution, given by pad for equal dimensions or pad_h and pad_w for
* different padding. Input padding is computed implicitly instead of
* actually padding.
* - dilation (\b optional, default 1). The filter
* dilation, given by dilation_size for equal dimensions for different
* dilation. By default the convolution has dilation 1.
* - group (\b optional, default 1). The number of filter groups. Group
* convolution is a method for reducing parameterization by selectively
* connecting input and output channels. The input and output channel dimensions must be divisible
* by the number of groups. For group @f$ \geq 1 @f$, the
* convolutional filters' input and output channels are separated s.t. each
* group takes 1 / group of the input channels and makes 1 / group of the
* output channels. Concretely 4 input channels, 8 output channels, and
* 2 groups separate input channels 1-2 and output channels 1-4 into the
* first group and input channels 3-4 and output channels 5-8 into the second
* group.
* - bias_term (\b optional, default true). Whether to have a bias.
* - engine: convolution has CAFFE (matrix multiplication) and CUDNN (library
* kernels + stream parallelism) engines.

propagate_down具体使用方法及决定是否反传的各个参数的区别

propagate_down

r_mult

猜你喜欢