Darknet - When should I stop training? - 我什么时候应该停止训练？

https://github.com/AlexeyAB/darknet

Usually sufficient 2000 iterations for each class (object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual:
通常，每个 class (object) 需要进行 2000 次 iterations，但总次数不得少于 4000 次。但是对于何时停止训练的更精确定义，请使用以下手册：

1. training

During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg:
训练期间，您会看到各种错误指示，并且当不再减小 0.XXXXXXX avg 时，应该停止训练。

Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8

9002: 0.211667, 0.60730 avg, 0.001000 rate, 3.868000 seconds, 576128 images Loaded: 0.000000 seconds

9002 - iteration number (number of batch)
0.60730 avg - average loss (error) - the lower, the better

When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. The final avgerage loss can be from 0.05 (for a small model and easy dataset) to 3.0 (for a big model and a difficult dataset).
当您发现平均损失 0.xxxxxx avg 不再在许多次迭代中减少时，您应该停止训练。最终平均损失可以从 0.05 (对于小模型和简单数据集) 到 3.0 (对于大模型和困难数据集)。

2. mAP (mean average precision)

Once training is stopped, you should take some of last .weights-files from darknet\build\darknet\x64\backup and choose the best of them:
训练停止后，您应该从 darknet\build\darknet\x64\backup 中获取一些最后的 .weights 文件，并从中选择最好的文件：

For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting - is case when you can detect objects on images from training-dataset, but can’t detect objects on any others images. You should get weights from Early Stopping Point:
例如，您在 9000 次迭代后停止了训练，但最佳结果可以给出以前的权重之一 (7000、8000、9000)。可能由于过拟合而发生。过拟合 - 这种情况是您可以从训练数据集中检测图像上的对象，但无法检测其他图像上的对象。您应该从提前停止点获得权重：

To get weights from Early Stopping Point:

At first, in your file obj.data you must specify the path to the validation dataset valid = valid.txt (format of valid.txt as in train.txt), and if you haven’t validation images, just copy data\train.txt to data\valid.txt.

If training is stopped after 9000 iterations, to validate some of previous weights use this commands:
(If you use another GitHub repository, then use darknet.exe detector recall... instead of darknet.exe detector map...)

darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_7000.weights
darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_8000.weights
darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_9000.weights

./darknet detector map ./cfg/yolov3-tiny.data ./cfg/yolov3-tiny.cfg /media/famu/DISK_DATA/yongqiang/yolov3-tiny_best.weights -i 1

And compare last output lines for each weights (7000, 8000, 9000):
并比较每个权重 (7000、8000、9000) 的最后输出行：

Choose weights-file with the highest mAP (mean average precision) or IoU (intersect over union)

For example, bigger mAP gives weights yolo-obj_8000.weights - then use this weights for detection.

Or just train with -map flag:

darknet.exe detector train data/obj.data yolo-obj.cfg darknet53.conv.74 -map

./darknet detector train ./train_cfg/yolov3-tiny.data ./train_cfg/yolov3-tiny.cfg -gpus 0,1,2,3 -map

So you will see mAP-chart (red-line) in the Loss-chart Window. mAP will be calculated for each 4 Epochs using valid=valid.txt file that is specified in obj.data file (1 Epoch = images_in_train_txt / batch iterations)
(to change the max x-axis value - change max_batches= parameter to 2000*classes, f.e. max_batches=6000 for 3 classes)

Example of custom object detection: darknet.exe detector test data/obj.data yolo-obj.cfg yolo-obj_8000.weights

IoU (intersect over union) - average instersect over union of objects and detections for a certain threshold = 0.24.
某个 threshold = 0.24 时，objects and detections 的平均交并比。
mAP (mean average precision) - mean value of average precisions for each class, where average precision is average value of 11 points on PR-curve for each possible threshold (each probability of detection) for the same class (Precision-Recall in terms of PascalVOC, where Precision=TP/(TP+FP) and Recall=TP/(TP+FN)).
The PASCAL Visual Object Classes (VOC) Challenge
http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf

mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning.
mAP 是 PascalVOC 竞赛中默认的精度指标，与 MS COCO 竞赛中的 AP50 指标相同。在 Wiki 方面，指标 Precision 和 Recall 的含义与 PascalVOC 竞赛中的含义略有不同，但是 IoU 始终具有相同的含义。

ForeverStrong

发布了414 篇原创文章 · 获赞 1578 · 访问量 96万+

他的留言板关注