AI practical training camp & target detection and MMDetection

A basic paradigm of target detection

Object Detection VS Image Classification
Please add image description
There is usually only one object, usually located in the center of the image, and usually occupies the main area.
Please add image description
The number, location and size of objects are not fixed

Similarity: both require algorithms to understand image content, and neural networks to learn the distribution of images.

Application areas of target detection

Please add image description
Development path of target detection
Please add image description

sliding window

Like traditional target detection algorithms, sliders of the same size are usually used to process the images in sequence. This approach is very time-consuming and has requirements on computer performance. When we use traversal to process the augmented images , is very troublesome. This is also a violent and exhaustive way. The main algorithm flow is as follows:
Please add image description

Using convolutions for dense prediction

The most representative network is the R-CNN series. It first needs to enumerate the target places in the original image, resize these images into images of uniform size, and send them to the network for learning.

anchor box

The anchor box technology is used in the faster RCNN network, mainly its RPN network, which can select a large number of anchor boxes. Through anchor box regression, classification, and finally predict the target, this algorithm has improved accuracy compared to the previous algorithm. a lot of.

2. One-stage & Anchor-free Detector

RPN

Please add image description
Faster RCNN uses RPN to propose RoI (Region of Interest), instead of using the traditional region proposal algorithm like Fast RCNN (which is extremely time-consuming). This is the essence of Faster RCNN.

Use a 3 3 filter to convolve the feature map in order to make the extracted features more robust. Note: The feature map extracted by CNN here is 256-dimension for each pixel of each image, because the last layer of CNN uses 256 filters.
Use 1
1 filter for convolution, a total of 18. The new picture generated is 18-D and then
reshaped. Take a picture as an example. The shape of the picture is (W, H, D=18). Then we will reshape it for softmax (one side of the matrix for softmax needs to be equal to num of class, here it is a two-category, that is, whether it contains objects, so it is 2). So we will reshape (W, H, D) into (2, 9 W H). This is important! ! ! !
Then we perform softmax and get two scores for each of the 9 W H, one with an object and one without an object.

SSD

You can refer to my previous blog post: https://blog.csdn.net/shengweiit/article/details/130769672

Guess you like

Origin blog.csdn.net/shengweiit/article/details/131105815