caffe | 预训练模型的使用

首先明确预训练好的模型和自己的网络结构是有差异的，预训练模型的参数如何跟自己的网络匹配的呢：

参考官网教程：http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

–If we provide the weights argument to the caffe train command, the pretrained weights will be loaded into our model, matching layers by name.

意思就是预训练的模型根据你当前网络的layer名进行匹配参数，加入预训练原始网络的第一个卷积层name是conv1，而你自己的第一个卷积层网络name是Convolution1，那么这个层在预网络中的参数就不会被匹配调用，这就没有实现我们finetune的目的！

因为没有匹配上的layer会这样处理：Since there is no layer named that in the bvlc_reference_caffenet, that layer will begin training with random weights.也就是随机初始化

原来网络结构中的全连接层fc8, 需要改一下名字，如我的改成”re-fc8”. 因为我们做的是微调。微调的意思就是先在别的数据集上进行训练，把训练好的权值，作为我们现在数据集的权值初始化，就不再需要随机初始化了。现在的数据和训练时的数据不一致，因此有些层数的设置就会有点区别。比如这个例子中，用来训练模型的数据集是imagenet，分为1000类，而我们的数据集就只有5类，因此在fc8这层上的num_output就会有区别，因此在这一层上就不能用人家的权值了，就需要把这层的名字改得和原来的网络结构不一样。

因此我们在finetune的时候一般同时使用模型和模型对应的训练网络结构，保证所有参数被正确加载和调用

常见的fintune基础思路：We will also decrease the overall learning rate base_lr in the solver prototxt, but boost the lr_multon the newly introduced layer. The idea is to have the rest of the model change very slowly with new data, but let the new layer learn fast. Additionally, we set stepsize in the solver to a lower value than if we were training from scratch, since we’re virtually far along in training and therefore want the learning rate to go down faster. Note that we could also entirely prevent fine-tuning of all layers other than fc8_flickr by setting their lr_mult to 0.

常用pre-trained模型下载地址：https://github.com/BVLC/caffe/wiki/Model-Zoo

一些实际finetune的建议：http://blog.csdn.net/nongfu_spring/article/details/51514040

caffe | 预训练模型的使用

猜你喜欢