Wu Enda [Deep Learning Engineer] 04. Target detection in the third week of convolutional neural network (2) YOLO algorithm

This note introduces the third week of the "Convolutional Neural Network" series: target detection (2) YOLO algorithm

The main contents are:

1. YOLO algorithm idea

2. Cross and compare

3. Non-maximum suppression

4.Anchor Box

5. YOLO algorithm example

YOLO algorithm idea

The basic sliding window object detection algorithm cannot accurately describe the frame, so we need to learn an algorithm that can get the accurate frame YOLO (You Only Look Ones) algorithm.

Algorithm idea: place n*n grids on the image, and apply image classification and positioning algorithms to each grid.

Differences from sliding window object detection algorithms:

　　　　a. Window sliding is replaced by placing grids. The YOLO algorithm uses image classification and image positioning algorithms for each grid, which greatly reduces the amount of computation compared to sliding windows.

　　　　b. An object may be in multiple grids, the YOLO algorithm will find the center point of the object and assign the object to the grid containing the center. (b _h , b _w can be larger than the grid size, in practice a finer grid is used to divide the grid, so the object will span multiple grids)

　　　　　　　　　c. Display the output bounding frame, so that the frame output by the neural network can have any aspect ratio, and can output more precise coordinates.

*Due to the characteristics of the YOLO algorithm, there can be no more than one object per grid, but the grid will be very finely divided in practice, so basically you don't have to worry about this problem.

Output of YOLO Algorithm

Since the YOLO algorithm is run on each grid, the output is 3*3*8 in the example given in the video

3*3 is the number of grids for the division of the image

8 is the output image prediction and border position and other information, which is actually equivalent to the specific information of the given label y:

Intersection over Union (IOU)

Calculate the ratio of the intersection and union of two borders, and IOU is used to measure the size of the overlap between the two borders.

intersection/union =

It is generally considered that IOU>=α (threshold, generally agreed to be 0.5) is acceptable, and can be set by yourself in different scenarios.

non-maximal suppression

Suppress elements that are not maxima, search for local maxima, and ensure that each object is detected only once.

There is a problem in the operation of the YOLO algorithm. In theory, each object on the picture has only one center point. In practice, there may be several grids that think the center of the object is in their own grid.

So multiple borders will be generated

The effect of non-maximum suppression is to keep only the predicted maximum value of P _c for each object .

Non-Maximum Suppression Usage

　　　　a. Remove the border where the IOU does not reach the threshold

　　　　while (there are remaining borders):

　　　　　　b. Select the border of probability Pc and output the prediction result

　　　　　　c. If there is a large intersection with the output frame in the remaining frame, their output is suppressed.

*If there are multiple types of objects (ie, c ₁ , c ₂ , c ₃ ), each type of object should run the maximum suppression separately (the intersection of different types of objects will affect the result).

Anchor Box

Using Anchor Box allows one to detect multiple objects (of different categories).

Anchor Box ideas:

　　　　a. Predefine multiple anchor boxes of different shapes, and the shape of the anchor box is associated with the prediction result.

　　　　b. Now each object is assigned to the same grid as before, but now it is also assigned to an Anchor Box. The principle of assignment is to compare and select the IOU intersection between the objects in the grid and the different shapes of the defined Anchor Box. And than the highest one.

So the label y (for the example in the video) becomes like the picture below, each P _c corresponds to this Anchor Box shape.

Benefits of using Anchor Box:

　　　　a. Deal with two different types of objects appearing in the same grid. In practice, if the grid is detailed enough, two grids generally do not appear in one object.

　　　　b. Can make the algorithm more row-specific (supervised learning), if your data gives objects roughly similar shapes.

How to choose Anchor Box:

　　Generally, the Anchor Box shape is manually formulated according to the object. You can choose 5 to 10 shapes, which can cover various objects you want to detect.

YOLO algorithm example

The above is all the knowledge that needs to be known in the construction of the YOLO algorithm. The following is an example to intersperse all the knowledge points.

The example of video use is to detect pedestrian (pedestrian), car (car) and motorcycle (motorcycle) in the picture

The output data shape is 3x3x16:

　　　　a.3x3 is the grid shape of the points

　　　　b.16(=2x8) where 2 is the number of Anchor Boxes used, and 8 is the number of output parameters (P _c ,b _w ,b _x ,b _h ,b _w ,c ₁ ,c ₂ ,c ₃ )

Output label y:

Anchor Box selection:

Use Anchor Box 1 to represent pedestrian (pedestrian), Anchor Box 2 to represent car (car) and motorcycle (motorcycle). I don't know why motorcycles don't have Anchor shapes, so I think the shape of motorcycles may be similar to Anchor Box 2.

Train with a convolutional neural network:

And finally using non-maximum suppression:

When non-maximum suppression is not used, since two Another Boxes are used, each grid will have two detection bounding boxes, but the probability of P _c is different.

What you need to do is:

　　　　a. Abandon the predicted bounding box with a relatively low probability

　　　　b. If there are three object detection classes (pedestrian, car and motorcycle), run non-maximal suppression for each class separately. The meaning here is that the probability for the shape of pedestrian Anchor Box 1 is significantly higher than that of Anchor Box 2. So remove the border of the Anchor Box 2 shape.

Wu Enda [Deep Learning Engineer] 04. Target detection in the third week of convolutional neural network (2) YOLO algorithm

Guess you like