Target Detection YOLO Algorithm Series Study Notes Chapter 1 - Overview of Target Detection Algorithms

Tutorial link: Tang Yudi target detection YOLO algorithm full series

Deep learning classic object detection method

One-stage (single-stage): YOLO series
directly a CNN network, extract features, and return it, and get (x1, y1) and (x2, y2) four values ​​​​in one step (or get x, y, h, w ).

Two-stage (two stages): Faster-rcnn (2015, the pioneering work of target detection), Mask-Rcnn (upgraded version of the former) series,
an additional RPN area proposal network is added. There is an additional pre-selection stage.

Analysis of advantages and disadvantages of different algorithms

  • one-stage:
    the core advantage : very fast, suitable for real-time detection tasks!
    But there are also disadvantages, and the effect is usually not very good!

You can choose different networks in YOLO by yourself. The selected network model is simple, and the speed may be faster, reaching hundreds of FPS. Now there is no algorithm that can make both mAP and FPS very high at the same time.
Precision and recall are not used here, and the comprehensive indicator of mAP value is usually used. Because in many deep learning tasks, the accuracy rate and the recall rate are contradictory. If one is higher, the other is lower. There is no way to judge.

  • two-stage:
    Features : The speed is usually slower (5FPS), but the effect is usually good! A very practical general framework MaskRcnn, it is recommended to familiarize yourself with it!
    The result given in the Mask RCNN paper is 5FPS transmission frames per second (Frames Per Second). Can not meet the real-time requirements.

Parameters used to evaluate whether the model is good or bad

Intersection over Union (IOU) generally refers to the ratio of the intersection and union between the prediction frame predicted by the model and the real frame. The higher the IoU, the better.

insert image description here
Precision rate and recall rate (recall rate):
insert image description here
Suppose there are two types of cats and dogs, and the goal is to find cats.
TP: The number of correctly labeled positive examples . (The goal is to find cats, and find the number of cats correctly ) FP: The number of positive examples
that were incorrectly marked . (The goal is to find cats, and the number of dogs is wrongly regarded as cats ) FN: It is wrongly judged as a counterexample . (Wrongly treat cats as dogs ) In target detection, it can be understood that the target is not detected, and the detection is missed. TN: Correctly flagged as a counterexample . (mark dog as dog)

mAP indicator calculation

Calculate based on the threshold of confidence (that is, the possibility of being marked as the target, such as the possibility of a cat being framed), because many frames will be generated during prediction, and a threshold needs to be set to retain the frame with a confidence greater than the threshold, and filter the rest box.
For example, when calculating 0.9|0.8; 0.70.9 respectively: TP+FP=1, TP=1; FN=2; Precision=1/1; Recall=1/3
; The R value is an index value obtained by synthesis.
insert image description here

Guess you like

Origin blog.csdn.net/ThreeS_tones/article/details/129769373