Depth articles - Target Detection History (c) elaborate SPP-Net Target Detection

Skip to main content

Returns the directory object detection history

Previous: Depth articles - History of target detection (ii)  elaborate R-CNN target detection

Next: Depth articles - target detection history (four)  elaborate Fast R-CNN from the target detection Faster R-CNN

 

论文地址:《Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition》

 

In this section, the target detection elaborate SPP-Net, the next section from the elaborate to the Fast R-CNN target detection Faster R-CNN

 

Three. SPP-Net Target Detection (2014)

1. SPP-Net is MSRA Kai Ming Ho et al, the main idea is to remove the crop / warp operation and the like on the original image, into a space pyramid layer on the pool of features convolution (Spatial Pyramid Pooling, SPP) . The introduction of SPP layer, mainly because CNN is fully connected layer requires the input image is the same size as the input image and reality is often different sizes, if directly scaled to the same size, it is likely some of the object will fill the entire image, while objects may only account for a corner of the image. The traditional solution is to carry out crop in different locations, but these crop technologies may lead to some problems, such as crop failure will lead to an object, warp lead after an object is stretched severely deformed. SPP is to solve this problem. SPP extracting features of the fixed dimension of the whole map, then the images are divided into 4 parts, each extraction features the same dimensions, then the images are divided into 16 parts, and so on. As can be seen, regardless of image size, extracted dimensions are the same, so that you can fully connected to the unified layer.

 

2. SPP-Net's main processes

   (1) Regional nominate region proposal

         Generated around 2k candidate block from the original image with selective search.

   (2) the region size scaling

         SPP-Net area size do not normalized, but scaled to min (w, h) = s, i.e., uniform length and width of the shortest edge length, s is selected from {480, 576, 688, 864, 1200} is a selecting criterion is scaled such that the size of the bounding boxes of 224 x 224 and the closest.

   (3) feature extraction

         SPP-Net network structure feature extraction SPP layer between the full connection layer conv

   (4) Classification and Regression

         Similar R-CNN, the use of SVM training classification model based on the above characteristics, with a return to the border to fine-tune the position of bounding boxes.

    crop when SPP-Net solves the regional nomination R-CNN / warp bias problem caused by proposed SPP layer so that the candidate input box can be very flexible, but others remain and R-CNN different, so there is still a lot of problem.

 

3. SPP-Net flowchart

 

     

       In order to better describe SPP layer, I used two images to illustrate. SPP is the ROI feature maps multiresolution (pyramid-shaped) cell operation, the feature maps obtained fixed, then the RESHAPE, poured only after fully connected layer.

        Paper images is as follows:

 

4. In the R-CNN, bounding boxes of different sizes, and the input image through convolution CNN, also the need for a fully connected, it is clear that the input images of different sizes would not lead to full working connection layer. Therefore, all the bounding boxes before entering CNN Crop Allows you to go through (clipping) or Warp (twisted stretch), in fixed-size input, although the R-CNN authors tried to use a different modification operation so that high precision, but, anyway are undeniable is indeed distorted image, to a certain extent so that the image distortion, thus affecting the final result. And SPP-Net is thought regardless of the input image size is the number, using different scales pooling layer wherein it into fixed size. SPP-Net approach is the entire input image CNN network, obtain an entire feature maps, then selective bounding boxes (actually these bounding boxes ROI) search is mapped onto the obtained feature maps, region mapping obtained SPP operation carried out, the results were FC operation, and then perform classification and regression operation to get the result.

 

5. SPP-Net is disposable conv operation input image, with respect to R-CNN greatly enhanced efficiency. SPP-Net highlight is the SPP layer, which allows different input image, and finally get the same feature maps, make the network multi-scale training and testing more convenient. SPP-Net solves the many problems a lot of redundant R-CNN, however, due to the R-CNN also follows the structure and the SVM classifier, only bbox regression alone.

 

6. paper, object detection results

 

 

                  

 

Skip to main content

Returns the directory object detection history

Previous: Depth articles - History of target detection (ii)  elaborate R-CNN target detection

Next: Depth articles - target detection history (four)  elaborate Fast R-CNN from the target detection Faster R-CNN

Published 63 original articles · won praise 16 · views 5986

Guess you like

Origin blog.csdn.net/qq_38299170/article/details/104470644