General structure of solver.prototxt

The solver is the core of caffe, which coordinates the operation of the entire model. One of the parameters that must be taken by the caffe program to run is the solver configuration file. The running code is generally

# caffe train --solver=*_slover.prototxt

In Deep Learning, the loss function is often non-convex and there is no analytical solution. We need to solve it through optimization methods. The main function of the solver is to alternately call the forward algorithm and the backward algorithm to update the parameters, thereby minimizing the loss, which is actually an iterative optimization algorithm.

Solver's process:

1. Design the objects to be optimized, as well as the training network for learning and the test network for evaluation. (by calling another configuration file prototxt)

2. Keep up with new parameters through forward and backward iterative optimization.

3. Regularly evaluate the test network. (You can set how many times to train and then perform a test)

4. Display model and solver status during optimization

During each iteration, the solver does the following steps:

1. Call the forward algorithm to calculate the final output value and the corresponding loss

2. Call the backward algorithm to calculate the gradient of each layer

3. According to the selected slover method, use the gradient to update the parameters

4. Record and save the learning rate, snapshot, and corresponding state of each iteration.

Let's look at an example first:

 train_net:"models/VGGNet/VOC0712/SSD_300x300/train.prototxt"
 test_net:"models/VGGNet/VOC0712/SSD_300x300/test.prototxt"
test_iter: 619
test_interval: 10000
base_lr: 0.001
display: 10
max_iter: 120000
lr_policy: "multistep"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
snapshot: 80000
snapshot_prefix: "models/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_SSD_300x300"
solver_mode: GPU
device_id: 0
debug_info: false
snapshot_after_train: true
test_initialization: false
average_loss: 10
stepvalue: 80000
stepvalue: 100000
stepvalue: 120000
iter_size: 1
type: "SGD"
eval_type: "detection"
ap_version: "11point

train_net
training configuration file

test_net
test configuration file

test_iter
*表示测试的次数；比如，你的test阶段的batchsize=100，而你的测试数据为10000张图片，则你的测试次数为10000/100=100次；即，你的test_iter=100;这个要与test layer中的batch_size结合起来理解。mnist数据中测试样本总数为10000，一次性执行全部数据效率很低，因此我们将测试数据分成几个批次来执行，每个批次的数量就是batch_size。假设我们设置batch_size为100，则需要迭代100次才能将10000个数据全部执行完。因此test_iter设置为100。执行完一次全部数据，称之为一个epoch

test_interval
表示你的网络迭代多少次才进行一次测试，你可以设置为网络训练完一代，就进行一次测试。

base_lr
表示基础学习率，在参数梯度下降优化的过程中，学习率会有所调整，而调整的策略就可通过lr_policy这个参数进行设置；

lr_policy
–>fixed:保持base_lr不变;

–>step: 如果设置为step,则还需要设置一个stepsize, 返回 base_lr * gamma ^ (floor(iter / stepsize)),其中iter 表示当前的迭代次数;

–>exp: 返回base_lr * gamma ^ iter， iter为当前迭代次数;

–>inv:如果设置为inv,还需要设置一个power, 返回base_lr * (1 + gamma * iter) ^ (- power)

–>multistep: 如果设置为multistep,则还需要设置一个stepvalue。这个参数和step很相似，step是均匀等间隔变化，而mult-step则是根据stepvalue值变化

–>poly: 学习率进行多项式误差, 返回 base_lr (1 - iter/max_iter) ^ (power)

–>sigmoid:学习率进行sigmod衰减，返回 base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))

multistep示例：

base_lr:0.01

momentum:0.9

weight_decay:0.0005

lr_policy:"multistep"

gamma:0.9

stepvalue:5000

stepvalue:7000

stepvalue:8000

stepvalue:9000

stepvalue:9500

weight_decay
表示权重衰减，用于防止过拟合

momentum
表示上一次梯度更新的权重

max_iter
最大迭代次数

snapshot
保存模型间隔

snapshot_prefix
保存模型的前缀

快照。将训练出来的model和solver状态进行保存，snapshot用于设置训练多少次后进行保存，默认为0，不保存。snapshot_prefix设置保存路径。

还可以设置snapshot_diff，是否保存梯度值，默认为false,不保存。

也可以设置snapshot_format，保存的类型。有两种选择：HDF5 和BINARYPROTO ，默认为BINARYPROTO

solver_mode
是否使用GPU

device_id
在cmdcaffe接口下，GPU序号从0开始，如果有一个GPU，则device_id:0

average_loss
取多次foward的loss作平均，进行显示输出

iter_size
处理batchsize*itersize张图片后，才调用一次ApplyUpdate函数根据学习率、method(SGD、AdaSGD等)进行梯度下降

type
caffe优化算法类型，caffe提供了六种优化算法来求解最优参数：
Stochastic Gradient Descent (type: “SGD”)；

AdaDelta (type: “AdaDelta”)；

Adaptive Gradient (type: “AdaGrad”)；

Adam (type: “Adam”)；

Nesterov’s Accelerated Gradient (type: “Nesterov”)；

RMSprop (type: “RMSProp”)

其他参数目前不太清楚，正在学习中。。。。。。
下面给出一个生成caffe的solver.prototxt的python脚本：

# -*- coding: UTF-8 -*-
 import os                                                                                          import sys                                                                                           os.environ['GLOG_minloglevel'] = '2' CAFFEE_ROOT="/Users/Nero/code_space/deeplearning/caffe/"                                              
sys.path.append(CAFFEE_ROOT+"python")
import caffe                                                     #导入caffe包

def write_sovler():
    my_project_root = CAFFEE_ROOT+"caffetest"
    sovler_string = caffe.proto.caffe_pb2.SolverParameter()                    #sovler存储
    solver_file = my_project_root + 'solver.prototxt'                        #sovler文件保存位置
    sovler_string.train_net = my_project_root + 'train.prototxt'            #train.prototxt位置指定
    sovler_string.test_net.append(my_project_root + 'test.prototxt')         #test.prototxt位置指定
    sovler_string.test_iter.append(100)                                        #测试迭代次数
    sovler_string.test_interval = 500                                        #每训练迭代test_interval次进行一次测试
    sovler_string.base_lr = 0.001                                            #基础学习率   
    sovler_string.momentum = 0.9                                            #动量
    sovler_string.weight_decay = 0.004                                        #权重衰减
    sovler_string.lr_policy = 'fixed'                                        #学习策略           
    sovler_string.display = 100                                                #每迭代display次显示结果
    sovler_string.max_iter = 4000                                            #最大迭代数
    sovler_string.snapshot = 4000                                             #保存临时模型的迭代数
    sovler_string.snapshot_format = 0                                        #临时模型的保存格式,0代表HDF5,1代表BINARYPROTO
    sovler_string.snapshot_prefix = 'examples/cifar10/cifar10_quick'        #模型前缀
    sovler_string.solver_mode = caffe.proto.caffe_pb2.SolverParameter.GPU    #优化模式

    with open(solver_file, 'w') as f:
        f.write(str(sovler_string))   

if __name__ == '__main__':
    write_sovler()

具体代码加解释：

[python]view plain copy
net: "models/bvlc_reference_caffenet/train_val.prototxt"    #设置网络模型，文件的路径要从caffe的根目录开始  
test_iter: 1000       #与test layer中的batch_size结合起来理解，假设样本总数为10000，一次性执行全部数据效率低，因此将测试数据分成几个批次来执行，每个批次的数量就是batch_size。假设batch_size为100,则需要迭代100次才能将10000个数据全部执行完，因此test_iter设置为100.  
test_interval: 1000     #测试间隔，也就是每训练test_interval次，才进行一次测试  
base_lr: 0.01    #0.01基础学习率，因为数据量小，0.01就会下降太快了，因此改成0.001  
lr_policy: "step"   #学习率变化  
gamma: 0.1          #学习率变化的比率  
 #base_lr\lr_policy\gamma\power四个可以一起理解，用于学习率的设置。只要是梯度下降法来求解优化，都会有一个学习率，也叫步长。base_lr用于设置基础学习率，在迭代过程中，可以对基础学习率进行调整。怎样进行调整，也就是调整的策略，由lr_poliy来设置。  
stepsize: 100000   #每stepsize次迭代减少学习率  
display: 20     #每训练display次在屏幕上显示一次  
max_iter: 450000   #最大迭代次数，这个数设置太小，会导致没有收敛，精确度很低。设置太大，会导致震荡，浪费时间  
momentum: 0.9     #上一次梯度更新的权重  
weight_decay: 0.0005 #权重衰减项，防止过拟合的一个参数  
snapshot: 10000     
snapshot_prefix: "models/bvlc_reference_caffenet/caffenet_train"   #snapshot\snapshot_prefix 快照。将训练出来的model和solver状态进行保存，snapshot用于设置训练多少次后进行保存，默认为0，不保存。snapshot_prefix 设置保存路径。也可以用snapshot_format来保存类型，有两种选择：HDF5 和 BINARYPROTO ，默认类型为后者  
solver_mode: CPU   #设置运行模式，默认为GPU  
#GPU模式  
#solver_mode: GPU  
#device_id: 0  #在cmdcaffe接口下，GPU序号从0开始，如果有一个GPU，则device_id:0  

个人分类：深度学习相关

solver.prototxt parameter parsing

General structure of solver.prototxt

Guess you like