In a blink of an eye, it has been 2 years since DETR was proposed, and now it is 2023. It can be said that this is the third year that the Transformer framework has made great efforts in the CV field. Today, the doubts about Transformer are getting smaller and smaller, and its power has been recognized more and more widely. It can be said that in today's CV field, Transformer has been divided into half and half with CNN.
Nowadays, following the climax of DETR, target detection technology has entered a relatively stable development period, and may not see important breakthroughs in a short period of time, and more is to further explore the performance and possibility of existing work , such as the recently popular Segment Anything (SAM), its network structure is the common ViT structure combined with Prompt technology, etc., there are no bells and whistles, but the performance shown is extremely good.
1、paddledetection
The output result of rtdetr based on paddledetection,
2、ultralytics
Compare the two outputs. . In fact, the truck in the second picture of ultralytics actually has an output car... that is, one more is detected, but the size of the frame is basically the same.
3. TensorRT output comparison
onnx exported by paddledetection:
ultralytics export onnx:
tensorrt output comparison:
paddle ultralytics
You can see it when using rtdetr-x.onnx for inference. Multiple boxes,,, the confidence level is seriously dropped. . .