caffe 学习笔记之solver层

caffe的protobuf中去掉注释和删除的部分的solver的定义为:

message SolverParameter {
  optional string net = 24;//net路径
  optional NetParameter net_param = 25;
  optional string train_net = 1; 
  repeated string test_net = 2;
  optional NetParameter train_net_param = 21; 
  repeated NetParameter test_net_param = 22; 
  optional NetState train_state = 26;
  repeated NetState test_state = 27;
  repeated int32 test_iter = 3;//test时取batch_size的图片进行测试的次数,取的总图片数num=test_iter*batch_size,通常要与测试的图片数量有关
  optional int32 test_interval = 4 [default = 0];//每test_interval进行一次test
  optional bool test_compute_loss = 19 [default = false];
  optional bool test_initialization = 32 [default = true];//网络初始时进行一次test
  optional float base_lr = 5;//初始学习率
  optional int32 display = 6;//日志输出间隔的迭代次数
  optional int32 average_loss = 33 [default = 1];
  optional int32 max_iter = 7; //最大迭代次数
  optional int32 iter_size = 36 [default = 1];//`iter_size`x`batch_size`个实例进行一次梯度计算
  optional string lr_policy = 8;//学习率策略
  optional float gamma = 9; 
  optional float power = 10; 
  optional float momentum = 11; //动量值,通常取0.9
  optional float weight_decay = 12; //权重衰减通常取0.0005
  optional string regularization_type = 29 [default = "L2"];//正则化类型,{"L1","L2"}
  optional int32 stepsize = 13;//step policy时的参数
  repeated int32 stepvalue = 34;//multi_step policy时的参数
  optional float clip_gradients = 35 [default = -1];

  optional int32 snapshot = 14 [default = 0]; //snapshot时的间隔次数,为0则不保存中间态
  optional string snapshot_prefix = 15;//snapshot时保存的文件前缀,
  optional bool snapshot_diff = 16 [default = false];//是否保存梯度,用于辅助debug,会增大保存文件的尺寸
  enum SnapshotFormat {
    HDF5 = 0;
    BINARYPROTO = 1;
  }
  optional SnapshotFormat snapshot_format = 37 [default = BINARYPROTO];//保存格式类型
  enum SolverMode {
    CPU = 0;
    GPU = 1;
  }
  optional SolverMode solver_mode = 17 [default = GPU];
  optional int32 device_id = 18 [default = 0];
  optional int64 random_seed = 20 [default = -1];
  optional string type = 40 [default = "SGD"];//优化器类型,{"SGD","Nesterov","AdaGrad","RMSProp","AdaDelta","ADAM"}
  optional float delta = 31 [default = 1e-8];
  optional float momentum2 = 39 [default = 0.999];
  optional float rms_decay = 38 [default = 0.99];
  optional bool debug_info = 23 [default = false];//若为真,打印有关网络的信息,可用于debug
  optional bool snapshot_after_train = 28 [default = true];//若为假,则训练完毕后不执行snapshot操作
  optional bool layer_wise_reduce = 41 [default = true];//用于数据并行训练的重叠计算和通讯操作
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51

相关的其它meaaage有:NetState

message NetState {
  optional Phase phase = 1 [default = TEST];//{"TRAIN","TEST"}
  optional int32 level = 2 [default = 0];
  repeated string stage = 3;
}
  • 1
  • 2
  • 3
  • 4
  • 5

NetParameter

message NetParameter {
  optional string name = 1; //net的名字
  optional bool force_backward = 5 [default = false];//层是否进行反向传播自动地取决于网络架构和学习状态,为真则强制进行反向传播计算
  optional NetState state = 6;
  optional bool debug_info = 7 [default = false];//在网络进行forward,backword,update时打印debugging信息
  // The layers that make up the net.  Each of their configurations, including connectivity and behavior, is specified as a LayerParameter.
  repeated LayerParameter layer = 100;  // ID 100 so layers are printed last.
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

Phase

enum Phase {
   TRAIN = 0;
   TEST = 1;
}
  • 1
  • 2
  • 3
  • 4

学习率的更新方式为:

  //The learning rate decay policy. The currently implemented learning rate policies are as follows:
  //    - fixed: always return base_lr.
  //    - step: return base_lr * gamma ^ (floor(iter / step))
  //    - exp: return base_lr * gamma ^ iter
  //    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
  //    - multistep: similar to step but it allows non uniform steps defined by
  //      stepvalue
  //    - poly: the effective learning rate follows a polynomial decay, to be
  //      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
  //    - sigmoid: the effective learning rate follows a sigmod decay
  //      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
  //
  // where base_lr, max_iter, gamma, step, stepvalue and power are defined
  // in the solver parameter protocol buffer, and iter is the current iteration.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

caffe的六种优化器介绍有优化方法概述

一个solver文件例子有AlexNet in caffe

net: "models/bvlc_alexnet/train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "models/bvlc_alexnet/caffe_alexnet_train"
solver_mode: GPU

猜你喜欢

转载自blog.csdn.net/Scythe666/article/details/80583501
今日推荐