Fourth Course: Convolutional Neural Networks (Week 3) - Object Detection

1. Targeting

Positioning and classification problem:
insert image description here
not only to determine whether there is a car in the picture, but also to mark its location , use a frame to encircle the
insert image description here
softmax function for classification operation, and determine whether there are pedestrians, cars, motorcycles or background objects in the picture. In addition, we can make the neural network output a few more units to represent a bounding box, ie bx , by , bh , bw b_x, b_y, b_h, b_wbx,by,bh,bw, the four numbers are parametric representations of the bounding box of the detected object.
insert image description here

2. Feature point detection

Landmark detection

insert image description here

3. Object Detection

Object Detection Based on Sliding Window

  • Cut out the target and train the convolutional network
  • Scan the picture in a window of a certain size, and input the picture in the window into the convolutional network prediction
  • Change the window size and repeat the above steps
    insert image description here

Disadvantages of sliding window target detection algorithm :

  • The computational cost is high , with small granularity or small stride, the window will be very large, and the convolutional network needs to be processed one by one
  • The stride used is large and the number of windows is small, which may affect performance

4. Convolution Implementation of Sliding Window

insert image description here
insert image description here

Convolve the entire image to get all the predicted values ​​at once. If you are lucky enough, the neural network can identify the location of the target
insert image description here
. Applying the sliding window algorithm on the convolutional layer improves the efficiency of the entire algorithm,
but this algorithm still exists A disadvantage is thatThe location of the bounding box may not be accurate enough

5. Bounding Box Forecast (YOLO)

In the sliding window method, none of the discrete bounding boxes may perfectly match the car position

An algorithm that can get a more accurate bounding box is the YOLO algorithm . YOLO (You only look once) means you only look once. This is an algorithm proposed by Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi

insert image description here
One advantage of the YOLO algorithm is that it is a convolutional implementation that runs very fast and can achieve real-time recognition
insert image description here
. There are other more effective methods for bounding box definition, which may be better.

6. Intersection over Union (loU)

What the Intersection Over Union (loU) function does is calculate the ratio of the intersection and union of two bounding boxes.
insert image description here

7. Non-max suppression

The algorithm may make multiple detections of the same object . Non-maximum suppression ensures that the algorithm detects each object only once .

Non-maximum suppression , non-maximum means that you only output the classification result with the highest probability, the suppression is close, but not the largest other prediction results
insert image description here
If you try to detect three objects at the same time , such as pedestrians, cars, motorcycles, then output The vector will then have three additional components.
It turns out that the correct thing to do is to do three non-maxima suppression independently, once for each output class

8. Anchor Boxes

There are two main reasons for proposing the concept of anchor box:

  • A window can only detect one target

  • Unable to solve multi-scale problems Situations that the
    insert image description here
    algorithm does not handle well :

  • If you have two anchor boxes but three objects in the same box

  • Both objects are assigned to a grid, and their anchor box shapes are the same
    . The probability of appearing is relatively small, and the impact on performance should not be great.

How to choose the anchor box?

  • Generally , the anchor box shape is manually specified , and 5 to 10 anchor box shapes are selected to cover a variety of different shapes.
  • The k-average algorithm can cluster two types of object shapes and select the most representative set of anchor boxes. This is an advanced method for automatically selecting anchor boxes.

9. YOLO algorithm

  • train:
    insert image description here
  • predict:
    insert image description here
  • Non-Maximum Suppression:
    If two anchor boxes are used, then any of the 9 grids will have two predicted bounding boxes, one of which has a low probability.
    Next you discard predictions with very low probability.
    insert image description here
    If you want to detect (3 objects) pedestrians, cars and motorcycles, then what you have to do is to run non-maxima suppression for each class individually, three times to get the final prediction.

10. Candidate Regions

insert image description here
insert image description here
Candidate regions are an interesting idea, but this method requires two steps:

  • First get the candidate area
  • and then categorize

In contrast, an algorithm similar to YOLO (You only look once) can be done in one step, and the teacher feels more hopeful for
YOLO in the long run. URL: https://pjreddie.com/darknet/yolo/

Guess you like

Origin blog.csdn.net/qq_42859149/article/details/119915564