yolov3 parameter adjustment

[Net] ★ [xxx] line represents the start of the network layer, the contents of the parameters for the subsequent layer, [NET] for a particular layer, the entire network configuration

Testing ★ # behavior at the beginning of the comment line number, when parsing cfg files will ignore the row

batch=1

subdivisions=1

Training

batch = 64 ★ batch and machine learning in the batch here a little different, only that after how many samples accumulate a network BP
Subdivisions = 16 ★ This parameter indicates a batch of images before the completion of sub sub sub-networks to spread
★★ knock blackboard: in the Darknet, and the sub BATCH are used in combination, for example, here batch = 64, sub = over 16 represented the training
process in the disposable loading images into the memory 64, and then 16 minutes before completion of propagation, means that each four, forward cycle propagated
cumulative loss averaged 64 before the pictures are to be completed after propagation, then after a one-time transfer updated parameter
★★★ Scheduling experience: sub 16 is generally provided, not is too large or too small, and a multiple of 8, in fact nothing mandatory, comfortable looking like
batch values may be dynamically adjusted based on memory occupancy, the size of the sub-time and subtraction can, usually the larger batch well, the need to
note that, at the time of the test batch and sub are set to 1, to avoid the mysterious errors!

width = 608 ★ wide web width entered
high height height = 608 ★ network input
channels = the number of channels of the network input channels 3 ★
★★★ width and height must be a multiple of 32, or can not load the network
★ Tip: width also It may be set not equal height, normally, the larger the width and height values for the small object recognition
the better, but limited by the memory, the reader may try different combinations of their own

momentum = 0.9 ★ momentum DeepLearning1 the momentum parameter optimization method, this value affects the gradient descent to the optimal speed worthy of
decay = 0.0005 ★ weight decay regularization term, to prevent over-fitting

angle = 0 ★ parameter data enhancement, more training samples is generated by the rotation angle of
saturation = 1.5 ★ enhancement parameter data to generate more training samples by adjusting the saturation
exposure = 1.5 ★ enhancement parameter data, is generated by adjusting an exposure amount is more multi training sample
hue = .1 ★ enhancement parameter data to generate more training samples by adjusting the hue

learning_rate = 0.001 ★ learning rate determines the speed of the weight update, the General Assembly set too make the results more than the optimal value, too small to cause the rate of decline is too slow.
If you are relying on human intervention to adjust the parameters, need to continue to modify the learning rate. High at the beginning of training the learning rate can be set a little,
but after a certain number of rounds, reducing it during training, the learning rate is generally set according to the dynamic changes in the number of training wheels.
The beginning of training: learning rate is appropriate from 0.01 to 0.001. After a certain number of rounds: slowing.
Near the end of the training: learning rate decay should be more than 100 times.
Adjust the learning rate reference https://blog.csdn.net/qq_33485434/article/details/80452941
★★★ learning rate adjustments must not be too dead, the actual training process change and loss of dynamically adjusted based on other indicators, manual ctrl + c knot
after the training bundles, modify the learning rate, and then load just saved models continue to complete the training manual parameter adjustment, the adjustment is based on the training
to the log, if the loss is too large fluctuations, indicating that the learning rate is too large, an appropriate reduction small, becomes 1 / 5,1 / 10 can, if loss is almost constant,
may or network has converged into a local minimum, be appropriate at this time to increase the learning rate, pay attention to every learning rate must be adjusted training for a long time
that fully observed, the assistant was deliberately, slowly pondering
★★ little note: the number of actual learning rate and GPU-related, such as your learning rate is set to 0.001, if you have four GPU, and that
real learning rate of from 0.001 /. 4
burn_in 1000 ★ = number of iterations is less than burn_in, update a way of learning rate, greater than burn_in, are used only if the policy update mode
max_batches = 500200 ★ training times to stop learning after reaching max_batches, time to finish a batch

policy = steps ★ learning rate adjustment strategies: constant, steps, exp, poly , step, sig, RANDOM, constant , etc.
Reference https://nanfei.ink/2018/01/23/YOLOv2%E8%B0%83% 8F% 82%% E5 E6 E7%%% BB 80% 93% BB / # More
Steps = 400000,450000
Scales = .1, .1 ★ Steps are set learning rate and scale changes, such as when iterating to 400,000 times, learning rate decay times, when 45,000 iterations, school
learning rate will decay times the previous rate on the basis of learning

[Convolutional] ★ one convolution layer configuration described
batch_normalize = 1 ★ whether BN process, what is not repeated here BN, is 1, 0 is not
filters = ★ number convolution core 32, but also the number of output channels
size = 3 ★ convolution kernel size
stride = 1 ★ convolution step
whether 0 padding pad = 1 ★ convolution in the time, padding number related to the size of the convolution kernel, of size / 2 rounded down, such as 3 / 2 = 1
activation ★ = Leaky network layer activation function
★★ 3 * 3 convolution kernel size of padding with a step size of 1, without changing the size of the feature map

Downsample

[convolutional] ★ downsampling layer configuration described
batch_normalize. 1 =
Filters = 64
size = 3
a stride of 2 =
PAD. 1 =
Activation = Leaky ★★ convolution kernel size of 3 * 3, with a step size of padding is 2, feature map becomes half of the original size

[shortcut] ★ shotcut layer configuration instructions
from = -3 ★ fused with the foregoing number of times, -3 indicates that a third layer
activation = linear ★ level activation function
...
...
[Convolutional] ★ layer of the foregoing convolutional layer disposed YOLO DESCRIPTION
size. 1 =
a stride of. 1 =
PAD. 1 =
Filters Filters ★ = NUM = 255 (the number of prediction block) (classes + 5), meaning that the coordinates of the four 5 plus a confidence rate, dissertation tx, ty, tw, TH,
C, the number of classes for the category, COCO is 80, the number of blocks in each cell YOLO predicted num is, 3 YOLOV3 for
when ★★★ own use, where the value must be set according to their data changes, e.g. you identify four categories, then:
Filters. 3 =
(. 4 +. 5) = 27, three fileters need to be modified, remember
activation = linear

[yolo] ★ YOLO layer configuration instructions
mask = 0,1,2 ★ use of the anchor-index, defined below indication 0,1,2 first three anchors anchor
anchors = 10,13, 16,30, 33 is, 23 is, 30 and 61, 62,45, 59,119, 116,90, 156,198, 373,326
classes ★ = 80 the number of classes
num = 9 ★ each grid cell number equal to the number predicted total box, and the anchors. When you want to use more anchors need to transfer large NUM
jitter = .3 ★ data enhancement means, here is a random jitter adjust the aspect ratio of the range, this parameter is not well understood, are described in detail in my source code comments
ignore_thresh .7 =
truth_thresh. 1 ★ = participation IOU calculated threshold size. when the predicted ground true detection frame and the IOU is greater than ignore_thresh involved
calculating the loss, otherwise, do not participate in the detection frame loss calculation.
★ understand: The aim is to control the size of loss involved in the calculation of detection frame when ignore_thresh too large, close to 1 time, then participate in
the number of detection frames return loss will be relatively small, but also easily lead to over-fitting; and if ignore_thresh set is too small, then the
number of scale involved in the calculation would be great. But also easy to detect during the time frame of the return caused underfitting.
★ parameters: Usually between values of 0.5 and 0.7, calculated on the basis of the previous scale are small (13 13) is 0.7,
(26
26) using a 0.5. The first change 0.5 0.7. Reference: HTTPS: //www.e-learn.cn/content/qita/804953
Random ★ 1 = 1 to turn multiscale random training is 0 Close
★★ Tip: When opening multiscale random training set previously the width and height dimensions of the input network will not work actually, width
random values between 320-608, and width = height, 10 did not vary randomly once their general recommendations can be modified according to need
range of the random-scale training, this can increase the batch, I hope the reader to try!

Author: Pie Pie Zi Feng
link: https: //www.jianshu.com/p/3aa0830ff5f8%20
Source: Jane book
Jane book copyright reserved by the authors, are reproduced in any form, please contact the author to obtain authorization and indicate the source.

Published 41 original articles · won praise 7 · views 3673

Guess you like

Origin blog.csdn.net/weixin_43091087/article/details/103850546