Win7 caffe use notes - the use of solver parameters and loss functions

1.Caffe's solver (solver) is the optimization of the model, so that the loss function reaches the global minimum, that is, the optimal solution process.

2. 6 kinds of solver methods are introduced on the caffe official website. Denny 's solver optimization method has been introduced. Enter the type of the loss function in solver.prototxt. Message type "caffe.SolverParameter" has no field named "type" error .

3. Take a look at the root directory /caffe/src/caffe/proto/caffe.proto

// Solver type
  enum SolverType {
    SGD = 0;
    NESTEROV = 1;
    ADAGRAD = 2;
  }
  optional SolverType solver_type = 30 [default = SGD];
  // numerical stability for AdaGrad
  optional float delta = 31 [default = 1e-8];


Three types of solvers are enumerated, the default SGD, Nesterov's accelerated gradient method (Nesterov's accelerated gradient), adaptive gradient ADAGRAD (adaptive gradient), the parameter is solver_type.








4. Other parameters of the solver

The main hyperparameter
net: network model
test_iter: the number of test iterations, pay attention to the difference with batch_size, batch_size is the number of training images per iteration, an epoch is to train all training images through the network once 
test_interval: test interval
base_lr: learning rate, and finally The rate is lr_mult*base_lr, two lr_mults in the network structure, the first is the weight learning rate, and the second is the bias learning rate
lr_policy (gamma, power, step): learning rate decay strategy, which can be set as follows For these values, the corresponding learning rate is calculated as:
- fixed: keep base_lr unchanged.
- step: if set to step, you also need to set a stepsize, return base_lr * gamma ^ (floor(iter / stepsize)), where iter Indicates the current number of iterations
- exp: returns base_lr * gamma ^ iter, iter is the current number of iterations
- inv: If set to inv, you also need to set a power, return base_lr * (1 + gamma * iter) ^ (- power)
- multistep: If set to multistep, you also need to set a stepvalue. This parameter is very similar to step, step is a uniform change at equal intervals, and multistep is changed according to the stepvalue value
- poly: learning rate for polynomial error, return base_lr (1 - iter/max_iter) ^ (power)
- sigmoid: sigmod decay of the learning rate, return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
gamma: decay coefficient
momentum: forgetting factor
weight_decay: weight decay term, one that prevents overfitting Parameter
display: how many times to train, display on the screen once
max_iter: the maximum number of iterations. Setting this number too small will result in no convergence and low accuracy. Setting too large will cause oscillation and waste time.
snapshot: snapshot interval
snapshot_prefix: saved file path
solver_mode: set operating mode CPU or GPU
solver_type: solver type, SGD, Nesterov, AdaGrad

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325608071&siteId=291194637