Target detection and MMDetection
A basic paradigm of target detection
Object Detection VS Image Classification
There is usually only one object, usually located in the center of the image, and usually occupies the main area.
The number, location and size of objects are not fixed
Similarity: both require algorithms to understand image content, and neural networks to learn the distribution of images.
Application areas of target detection
Development path of target detection
sliding window
Like traditional target detection algorithms, sliders of the same size are usually used to process the images in sequence. This approach is very time-consuming and has requirements on computer performance. When we use traversal to process the augmented images , is very troublesome. This is also a violent and exhaustive way. The main algorithm flow is as follows:
Using convolutions for dense prediction
The most representative network is the R-CNN series. It first needs to enumerate the target places in the original image, resize these images into images of uniform size, and send them to the network for learning.
anchor box
The anchor box technology is used in the faster RCNN network, mainly its RPN network, which can select a large number of anchor boxes. Through anchor box regression, classification, and finally predict the target, this algorithm has improved accuracy compared to the previous algorithm. a lot of.
2. One-stage & Anchor-free Detector
RPN
Faster RCNN uses RPN to propose RoI (Region of Interest), instead of using the traditional region proposal algorithm like Fast RCNN (which is extremely time-consuming). This is the essence of Faster RCNN.
Use a 3 3 filter to convolve the feature map in order to make the extracted features more robust. Note: The feature map extracted by CNN here is 256-dimension for each pixel of each image, because the last layer of CNN uses 256 filters.
Use 1 1 filter for convolution, a total of 18. The new picture generated is 18-D and then
reshaped. Take a picture as an example. The shape of the picture is (W, H, D=18). Then we will reshape it for softmax (one side of the matrix for softmax needs to be equal to num of class, here it is a two-category, that is, whether it contains objects, so it is 2). So we will reshape (W, H, D) into (2, 9 W H). This is important! ! ! !
Then we perform softmax and get two scores for each of the 9 W H, one with an object and one without an object.
SSD
You can refer to my previous blog post: https://blog.csdn.net/shengweiit/article/details/130769672