FastestDet: Lightweight Object Detection Real-time Anchor-free Object Detection Algorithm

Pre-trained model 1.1m

The speed seems okay.

Code address: https://github.com/dog-qiuqiu/FastestDet

Network COCO mAP(0.5) Resolution Run Time(4xCore) Run Time(1xCore) FLOPs(G) Params(M)
Yolo-FastestV1.1 24.40% 320X320 26.60 ms 75.74 ms 0.252 0.35M
Yolo-FastestV2 24.10% 352X352 23.8 ms 68.9 ms 0.212 0.25M
FastestDet 27.8% 512X512 21.51ms 34.62ms * 0.25M

Multi-platform benchmark

Equipment Computing backend System Framework Run time(Single core) Run time(Multi core)
Radxa rock3a RK3568(arm-cpu) Linux(aarch64) ncnn 34.62ms 21.51ms
AMD R5-5600(X86-cpu) Linux(amd64) ncnn 2.16ms 1.73ms
Intel i7-8700(X86-cpu) Linux(amd64) ncnn 5.21ms 4.73ms

01 Overview

FastestDet is designed to replace the yolo-fastest series of algorithms. Compared with the existing lightweight target detection algorithms in the industry, such as yolov5n, yolox-nano, nanoDet, pp-yolo-tiny, FastestDet and these algorithms are not of the same order of magnitude. , FastestDet is several orders of magnitude smaller in terms of speed and parameter size (don't compare the size of the int8 model with my fp32 model, it's not fair), but the accuracy is naturally incomparable. FastestDet is designed for the ARM platform with limited computing resources , highlighting single-core performance, because in actual business scenarios, all CPU resources will not be used for model inference in the inference framework. , RK3568 to run real-time target detection, then FastestDet is a better choice, or if you don't want to occupy too much CPU resources on the mobile terminal, you can also use a single core and set cpu sleep to reason FastestDet, and run the algorithm under low power consumption conditions .

02

New Frame Algorithm

Let's talk about several important features of FastestDet: 

  • Single lightweight detection head

  • anchor-free

  • Multi-candidate targets across grids

  • Dynamic positive and negative sample allocation

  • Simple data augmentation

Let me go into detail one by one:

Single lightweight detection head

This is to optimize the algorithm model on the network structure, mainly to improve the running speed of the algorithm and simplify the post-processing steps. You can look at the network structure of this block first:

In fact, the multi-detection head is designed to adapt to the detection of objects of different scales. The high-resolution detection head is responsible for detecting small objects, and the low-resolution detection head is responsible for detecting large objects, a kind of divide and conquer idea.

I personally think that the root cause is the perceptual field. Objects of different scales require different perceptual fields, and the perceptual fields of each layer of the model are different, including FPN, which is also a summary and fusion of the characteristics of different perceptual fields. I also refer to the idea of ​​YOLOF for this single detection head. In the network structure, a 5x5 grouped convolutional parallel network structure similar to inception is used, and it is expected that the features of different perception fields can be integrated, so that a single detection head can also be adapted to detect different scales. object.

Anchor-Free

The original anchor-base algorithm needs to perform the anchor-bias operation on the dataset when training the model. Anchor-bias can be understood as clustering the width and height of the marked objects in the dataset to obtain a set of prior width and height. The network is here Optimize the width and height of the prediction box based on the prior width and height of the group. FastestDet uses the anchor-free algorithm. The model directly regresses the scale value of gt to the width and height of the feature map, and there is no prior width and height. This approach simplifies model post-processing. And for the anchor-base algorithm, the feature points of each feature map correspond to N anchor candidate boxes, and the feature points of each feature map of this anchor-free correspond to only one candidate box, so it is also advantageous in terms of inference speed. .

Multi-candidate targets across grids

This block is still borrowed from yolov5. Not only the grid where the gt center point is located is regarded as a candidate target, but also the three nearby ones are counted to increase the number of positive sample candidate frames, as shown in the following figure:

Dynamic positive and negative sample allocation

The so-called dynamic positive and negative sample allocation is actually to dynamically allocate positive and negative samples during the model training process, which is different from the previous yolo-fastest. After the anchor-bias of the original yolo-fastest is set, calculate the anchor-bias and gt For the scale of width and height, positive and negative samples are allocated to the fixed threshold of the scale card (refer to the practice of yolov5), and the anchor-bias and gt are unchanged during the training process, so the allocation of positive and negative samples is also unchanged during the training process. .

In the ATSS of FastestDet's positive and negative sample allocation reference, the mean value of SIOU calculated by the prediction frame and GT is set as the threshold for assigning positive and negative samples. If the SIOU threshold of the current prediction frame and GT is greater than the mean value, then it is a positive sample, and vice versa Of course. (Why don't you refer to simota? That's because when building the cost matrix, the weights of different losses have to be adjusted with hyperparameters, which is lazy)

Simple data augmentation

Be cautious about data enhancement of lightweight models. Originally, the learning ability is poor, and the brain is not very easy to use. You can do it for difficult problems. Therefore, simple data enhancements such as random translation and random scaling are used, and moscia and Mixup are not used.

03

Experimental results

Guess you like

Origin blog.csdn.net/jacke121/article/details/125611465