This article is a summary of my knowledge points using caffe for deep learning, including links to other people's study notes
Introduction to caffe
The author of caffe is Jia Yangqing of UC Berkeley University. Caffe is a c++/CUDA architecture that supports command line, Python, Matlab interfaces and can run on CPU/GPU.
File structure of a Caffe project
- caffe (file to be compiled, which contains core files such as data input, network layer, calculation, etc. written in c++)
- data (the data to process or transform)
- models:
train.prototxt (网络模型) solve.prototxt(设置训练的一系列参数) xxx.caffemodel(finetune 时用的初始化参数,训练新模型则不需要)
- scripts (code for training the network, which can be python files, shell files, etc.)
caffe installation reference: https://www.jianshu.com/p/9e0a18608527
caffe network structure
Caffe uses the Blob array structure to store, exchange, and process the network (just like numpy's storage structure is narray), and uses the caffe Layer to define the neural network structure, which includes data layers, visual layers and other types
The following takes the code as an example to explain the common layers
train.prototxt (the following is part of the code of the VGG16 model)
- data layer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
name: "VGG16" layer { name: "data" type: "Data" #Input data type top: "data" top: "label" include { phase: TRAIN } #Data preprocessing to enhance data transform_param { mirror: true crop_size: 224 mean_value: 103.939 mean_value: 116.779 mean_value: 123.68 } data_param { source: "data/ilsvrc12_shrt_256/ilsvrc12_train_leveldb" #Database file path batch_size: 64 #Number of network single input data backend: LEVELDB #Select whether to use LevelDB or LMDB } }
Caffe supports input data types:
type | data |
---|---|
Data | LMDB/levelDB |
MemoryData | memory data |
HDF5Data | HDF5 data |
ImagesData | 图像数据Images |
WindowsData | 窗口Windows |
top:表示输出的方向,bottom:表示输入的数据来源(层的名称),可以有多个top和bottom
注意:在数据层中,至少有一个命名为data的top。如果有第二个top,一般命名为label
LMDB参考资料:https://zhuanlan.zhihu.com/p/23485774
- 卷积层
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
layer { bottom: "data" top: "conv1_1" name: "conv1_1" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 64 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } }
参数 | |
---|---|
num_output | 卷积核数量 |
kernel_size | 卷积核高度/宽度(可分别设置宽高) |
weight_filler | 参数初始化方案 |
bias_term | 是否给卷积输出添加偏置项 |
pad | 图像周围补0的像素个数 |
stride | 滑动步长 |
group | 指定分组卷积操作的组数 |
lr_mult | 学习率(最终的学习率要乘以 solver.prototxt 配置文件中的 base_lr) |
decay_mult | 权值衰减 |
dropout_ratio | 丢弃数据的概率 |
- | - |
dropout_ratio和decay_mult设置为了防止数据过拟合
- 池化层
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
layer { bottom: "pool1" top: "conv2_1" name: "conv2_1" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 128 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } }
参数 | |
---|---|
pool | 池化方式,Max:最大池化,AVE:均值池化,STOCHASTIC:随机池化 |
-
激活层
1 2 3 4 5 6
layer { bottom: "conv2_2" top: "conv2_2" name: "relu2_2" type: "ReLU" #常用激活函数,除此之外还有Sigmoid }
-
损失函数层
1 2 3 4 5 6 7
layer { bottom: "fc8" bottom: "label" top: "loss" name: "loss" type: "SoftmaxWithLoss" }
type | |
---|---|
SoftmaxWithLoss | 交叉信息熵损失函数 |
Softmax | 多分类损失函数 |
caffe网络模型各层信息(详细):https://wenku.baidu.com/view/f77c73d02f60ddccdb38a025.html
Caffe模型训练
- 网络可视化
当你写好自己的prototxt文件后,想要检查自己的网络框架是否搭建正确,可以借助 Netscope (在线caffe net可视化工具)http://ethereon.github.io/netscope/#/editor
- 训练参数设置
caffe模型的训练参数在solve.prototxt文件中,该文件是caffe的核心,它交替调用前向算法和反向传播算法来更新参数,使loss的值达到最小
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
net: "train_val.prototxt" test_iter: 833 # make test net, but don't invoke it from the solver itself test_interval: 1000 display: 200 average_loss: 100 base_lr: 1e-5 lr_policy: "step" gamma: 0.1 stepsize: 5000 # lr for unnormalized softmax -- see train_val definition # high momentum momentum: 0.9 # no gradient accumulation clip_gradients: 10000 iter_size: 1 max_iter: 80000 weight_decay: 0.02 snapshot: 4000 snapshot_prefix: "weight/VGG_item" test_initialization: false |
参数 | |
---|---|
train_net | 训练所需网络模型(最好写绝对路径) |
test_net | 测试所需网络模型 |
test_iter | 测试次数 (test_iter * batchsize = 训练的数据量) |
base_lr | 基本学习率 |
lr_policy | 学习率改变的方法 |
weight_decay | weight decay |
momentum | weights representing the last gradient update |
max_iter | The maximum number of iterations |
snapshot | save model interval |
snapshot_prefix | save model path + prefix |
solver_mode | Whether to use GPU |
average_loss | Take the loss of multiple fowards as an average, and display the output |
type | optimization |
- train the network
1. Command line
We can train the network by typing code on the command line
1 2 > ./build/tools/caffe train -solver solver.prototxt >
(For details, please refer to: "caffe command line analysis" https://www.cnblogs.com/denny402/p/5076285.html )
2、python
We can also use caffe's python interface to write programs to train the network
other problems
- I/O size
In the process of training or fine-tuning the network with your own data, the size of img and label_img may be different. At this time, carefully analyze the size of the input and output of each layer displayed in the training process, and change the corresponding parameters to make the final training img is the same size as label_img
Convolution kernel and pooling layer output image size calculation formula:
(W-F+2P)/S+1
parameter | illustrate |
---|---|
w | input image size |
F | Convolution kernel size (kernel_size) |
S | stride size (stride) |
P | padding size (pad) |