Training tricks record for object detection image classification algorithm

Series Article Directory


foreword

For some personal records of image algorithms, if there are some parts that are not very clear, you can search for keywords, and there are corresponding explanations on the Internet.
Recommend several latest articles: tricks in target detection competition , experience summary of target detection algorithm competition , ideas for improving small target detection .

1. Model selection

1. Baseline selection

MMdetection , Yolo series , Anchor Free , Anchor Base, image classification , etc. Choose a framework that is convenient and easy to modify

2. BackBone

Object Detection or Classification tasks: Generally, the backbone or COCO pre-training weights of the open source ImageNet pre-training weights are used to initialize, which is generally much better than pre-random initialization. The more extreme point is to do a pre-training on the training data set, cut out all the targets, and then train a good classification model.

BatchNorm : BN, IBN (Reid works well), GN (you can choose if the batch size is too small), syncBN, in general, syncBN will be better.

Backbone plus attention mechanism Attention module : Personally think that these two are better to use SE module, CBAM module, but if you add it, you have to modify the method of loading pre-training weights, according to the key-value pair corresponding load, for the newly added The module skips and chooses the appropriate weight initialization method.
Deformable convolutional DCN : generally added to the second half of the backbone. Currently, DCN is very unfriendly to deployment. If there are deployment needs, choose carefully, but generally adding DCN can increase the point.

3.Neck

SSH, FPN, BiFPN, PAnet can be tall, short, fat or thin according to your own needs, tall and fat, with stronger performance and slower speed, short and thin, with fast speed but limited performance

4. Head

For example, yoloV4 outputs three heads (yolo layer), the tiny version has two heads (yolo layer), Tinaface outputs six heads, and whether the two branches of cls and Regression box share weights. Which concat
to use for the measurement of cancat and add : Tensor splicing will expand the dimensions of two tensors, for example, two tensor splicing of 26 26 256 and 26 26 512, the result is 26 26 768. Concat has the same function as the route in the cfg file. add : add tensors, add tensors directly, without expanding the dimension, for example, add 104 104 128 and 104 104 128, the result is still 104 104 128. add has the same function as the shortcut in the cfg file.

5. Loss

The loss of cls : CrossEntropy Loss, BCEloss, Focal Loss, GFocal Loss
Bounding Box Regeression The development process of Loss in recent years is: L1 loss/L2 loss->Smooth L1 Loss-> IoU Loss (2016) -> GIoU Loss (2019 ) -> DIoU Loss (2020) -> CIoU Loss (2020)
landmark loss : landmark is also a regression problem, especially Wing Loss

2. Data Augmentation

Flip and rotation : Pay attention to whether the target has flip rotation invariance, and do corresponding transformation on the labels
Resize : multi-scale, multi-scale transformation of YoloV5 is very good
distort : ​​pixel-level transformation, mainly for color, hue, saturation of the image degree of transformation)
blur : Blur enhancement, Gaussian, median, motion
Expand : In the SSD, the image is reduced, and other places are filled with 0 to return to the original size. The SSD default is 300
MixUP and cramming

3. Training and testing strategy

1. Model training

1. Apex mixed precision training
2. Increase the learning rate. A larger batch size can make the gradient calculated by each batch of data closer to the entire data set, and increase the learning rate from the side. If the batch size is too small, you can accumulate several epochs through the gradient and send them back together for BP calculation.

3. Warm up : It refers to using a small learning to train several epochs first, generally rising from 0 to an initial learning rate. This is because the parameters of the network are initialized randomly, and it is easy to use a larger learning rate at the beginning. The value is unstable.

3. Learning rate decay strategy :
orderly adjustment: equal interval adjustment (Step), on-demand adjustment of learning rate (MultiStep), exponential decay adjustment (Exponential) and cosine annealing CosineAnnealing.
Adaptive adjustment: Adaptively adjust the learning rate ReduceLROnPlateau.
Custom Tuning: Custom Tuning Learning Rate LambdaLR.

4. Optimizer selection : generally SGD or Adam.

5. Label smoothing label smooth
For classification problems, especially in multi-category classification problems, the category vector is often made one-hot. It will cause problems such as overfitting. The degree to which some classifications are overly accurate can be reduced by label smoothing.

6. Multi-scale training Mult-Scale

7. Pseudo-labeling : In the test phase, use the trained model to label the test set first, then merge the labeled data into the training set to finetune, and then make the final prediction on the test set. This is generally applicable to competitions, and the pursuit of precision is blind.

8. Anchor settings : Each pixel of the feature map will have a series of anchors. This should be set in combination with the feature map corresponding to the receptive field on the original input image and the aspect ratio of the target bbox. For example, Yolo's Kmeans clustering generates anchors .

2. Model testing

TTA (Test Time Augmentation) performs data augmentation by performing multiple transformations on each test image (such as horizontal flipping, increasing image resolution) and generating new images. New images are fed into the trained model along with existing images. Therefore, corresponding to each test image, multiple predictions are generated using the augmented image. The repeated or overlapping predictions generated in this process are filtered using the non-maximum suppression (NMS) algorithm or bbox fusion with WBF (weighted Boxes Fusion). The whole method is called Ensemble Prediction (EP)

Ensemble Model integrates different variants of the model. Given that training a model involves tuning different hyperparameters, using different combinations of these parameters will result in different trained models. The authors select a subset of these models such that their overall accuracy is maximized. Each image is tested on all selected models, then the predictions of each model are averaged, and finally non-maximum suppression is applied. This ensemble technique achieves better accuracy by reducing the forecast variance.
Tips: Lowering the score threshold can improve the map, but there will be more false detections.

Summarize

Continually updated. . .

Guess you like

Origin blog.csdn.net/zengwubbb/article/details/113572787