Bag of Freebies for Training Object Detection Neural Networks

Abstract: improving the accuracy of the depth model learning mainly in the following aspects: as a better model vgg-resnet-densenet; 2 more transactions; 3 better tricks.... This article from the tricks aspects to discuss some tricks to improve the model.

Moreover, the depth of learning some skills are only enhance the role for a particular model, these techniques do not have the reproducibility, good. So we need to find those skills can be copied, you do not have too many parameters can be adjusted to enhance several points. This is a good tricks.

This article explores:

mixup, and proposes a new method of visually coherent image mixup
Learning rate adjustment rules
Tag smooth
synchronized batch normalization
and paper to do the relevant experiments on single-stage and multi-stage target detector.
With these strategies, so that a maximum of 5% accuracy upgrade

3. Skills

Coherent visual image confusion Image Mixup 3.1 Visually Coherent

** image classification techniques mixup ** Insert Picture description here

Image fusion tag fusion +

Target detection techniques mixup
Here Insert Picture Description

mixup fusion ratio was taken out from the distribution of a beta (0.2,0.2) in the. The author believes this beta distribution is not perfect. With a "elephant experiment" to illustrate the. The elephant a separate box into a picture of a random nature pictures, then this image into the target detector, the detector is not robust existing target of this test can not detect a good picture the elephant.
It was found that increasing the value of the parameter beta distribution mixup better results can be obtained.
On yolov3 tests found really increase the value of the parameter beta distribution model really helps to improve the accuracy of
Here Insert Picture Description
this mixup increases the overlap between target. For each common objects obscured object detection, networks are encouraged to observe the unusual crowded patches that are either naturally present, either created by the technology of confrontation.

Help to improve overlapping target detector (not sure if applicable face detection, face detection because the paper did not do this)

in the elephant experiment, with no mixup strategy training model difficult to detect elephant. Model uses mixup skills is better able to detect. But lower average confidence mixup model. But does not affect the test results.

3.2 tag smooth

one-hot + softmax model generation tends to overfitting. So the use of label smoothing.
Here Insert Picture Description

3.3 Image Processing

In the field of image classification, neural networks often have a strong tolerance to geometric transformation of the image. In order to improve the accuracy of generalization, to avoid over-fitting, spatial recommended randomly perturbed features, such as random flip, rotate, crop. However, for image pre-processing target detection, we need extra care, because the detection network more susceptible to such conversion.

Crop include random (with constraints), extended random, random and random horizontal flip resizing (with random interpolation, various interpolation methods)
Random dithering, including brightness, chroma, saturation and contrast

Not sampled random (random Crop Allows you) in the training because of faster-rcnn generating random crop roi are similar.

3.4 learning rate adjustment

3.5 Synchronized Batch Normalization batch normalization

3.6 random training picture size, single phase detector

With H = W ∈ {320, 352, 384, 416, 448, 480, 512, 544, 576, 608} to train yolov3

4. Results

Please note that in order to eliminate the side effects of skills testing time, we always use non-maximal inhibition achieved the standard for reporting the results of a single standard, a single model. We do not use external training images or labels in the experiment