Why use the YOLO algorithm?

At present, image recognition is divided into two categories: one is one_stage, and the other is two-stage. Both categories have advantages and disadvantages.

one_stage: The core advantage: very fast, suitable for real-time detection tasks! But there are also disadvantages, and the effect is usually not very good!

two-stage: The speed is usually slower (5FPS), but the effect is usually good!

Currently, two-stage represents Faster-rcnn Mask-Rcnn, and one-stage represents YOLO series.

What we want to achieve is to detect vehicles on the road to adjust the duration of traffic lights at the next intersection. To achieve this function, we must have a fast enough detection speed, and we can have a small amount of error in the recognition effect. In this case, the one-stage YOLO algorithm is obviously a better choice.

There are many versions of the YOLO series, we start from the actual project to choose the appropriate version. The first is index analysis, map index: comprehensively measure the detection effect (we can't just look at accuracy and recall);

PS: Accuracy detects the accuracy of single object recognition; recall refers to the ratio of the number of objects detected in one frame to the total number

IOU refers to the ratio of the intersection and union of the real object and the framed object.

Here, TP is the abbreviation of true positives, which refers to the actual goal we need; FP is the abbreviation of false positives, which means that we mistakenly regard the goals we don’t need as what we need; FN is false negatives The abbreviation refers to judging the goals we need as unnecessary; TN is the abbreviation of true negatives, which refers to actually correctly judging what we don't need.

Precision precision refers to the ratio of the required object correctly recognized to the real recognition.

The recall rate recall refers to the ratio of the required samples that we actually need to identify.

Confidence refers to the possibility that the recognition box selects an object.

MAP refers to the comprehensive measurement of the detection effect . The calculation of MAP is to take the average of all precision and recall rates. (below, calculate the area)

PS: The YOLO algorithm is faster and suitable for video recognition. The detection effect also uses MAP for comprehensive calculation.

Why use the YOLO algorithm?

Supongo que te gusta