Depth articles - History of target detection (ii) R-CNN elaborate target detection

Skip to main content

Returns the directory object detection history

Previous: depth articles - Target Detection History (a)  with respect to the classical target detection

Next: depth articles - Target Detection History (c)   elaborate SPP-Net Target Detection

 

论文地址:《Rich feature hierarchies for accurate object detection and semantic segmentation》

 

In this section, elaborate R-CNN target detection, the next section target detection elaborate SPP-Net

 

Two. R-CNN target detection (2013)

1. R-CNN (Region-based Convolutional Neural Networks, R-CNN) based on convolutional neural network region, a region nominated (Region Proposal) and a convolutional neural network (CNN) method for detecting a target binding. R-CNN is Ross Girshick released in 2013. Ross Girshick as pedigree in this area, there is always the same God, R-CNN, Fast R-CNN, Faster R-CNN, YOLO all about him. In fact, before R-CNN has a lot of researchers in the process of trying to do with Deep learning of target detection, including OverFeast, but R-CNN is the first truly industrial solution applications.

 

2. The depth of learning objectives related to detection methods look roughly divided into two factions:

   (1) Based on the nomination area

         如 R-CNN,SPP-Net,Fast R-CNN,Faster R-CNN,R-FCN

   (2) end (end-to-end)

          No need to nominate areas, such as SSD, YOLO, CornerNet etc.

 

3. R-CNN main steps

   (1). Region proposal to nominate area

        Extracting a bounding boxes around 2k frame region candidates from the original image by selective search

   (2) the region size normalization

         Bounding boxes of all the candidate frame image scaled to a fixed size (using the original 227 x 227).

   (3) feature extraction

         By CNN network, extracting a feature

   (4) Classification and Regression

         Wherein the base layer is added on two fully connected layers, and then do the SVM classification identification, by linear regression to trim the border location and size, wherein each class separately training a classifier border.

 

4. R-CNN flow chart

      Another observation angles are as follows:

Figure above Warped image regions, a region size normalization information will bounding boxe box, or by Warp crop, all the processing of the same size. This has led to some of the images will be tensile deformation, resulting in image distortion.

  Also, is separately convolution operation on the image after each the size of normalization, due to the bounding boxes probably around the 2k, and that there are likely to overlap, which results in the process of extracting features, there will be many repeat, and very time-consuming resources and space resources.

   Further, limited SVM for processing data, if the data mass, it can easily exceed the capabilities of SVM. Calculation is too large, the model is too large, efficiency is too low.

 

5. R-CNN difference with the classical target detection, feature extraction convolution except CNN, alternative extraction process HOG features + bag-of-words and other characteristics. These changes, so that the R-CNN ILSVRC2013 detection data, obtained 31.4% mAP such good results. Relative to the previous best results of those 24.3% mAP has been a huge improvement. And, for the first time to attract attention, CNN extracts image features with extraordinary ability. From then on, the image processing, the basic CNN used to extract features.

    Although R-CNN the later ones, such as insufficient. However, at the time, but it is a good detector, the effect is just fine

 

6. R-CNN disadvantages

   (1) Repeat the calculation

         R-CNN although not exhaustive, but there is still bounding boxes around the 2k, these bounding boxes need to be CNN operation, the amount of computation is still large, many of which are actually double-counting

   (2). SVM model

          There linear models, when the label is no shortage of data is obviously not the best choice.

   (3) The training is divided into multi-step test

          region proposal, feature extraction, classification, regression training process is turned off, the intermediate data also needs to be saved separately. Training is divided into several stages, not cumbersome. SVM + Network + training to fine-tune training border regressor

   (4). High cost of training space and time

         Wherein the first convolution processing on the hard disk, of them need storage space of several hundred G. Save and read process needs to consume time and resources.

   (5) Slow

          The disadvantage of the foregoing resulting in R-CNN surprisingly slow, an image processing needs 13s on a GPU, 53s required on the CPU.

    (6) Regional size normalization, will target crop or warp non-scaling, resulting in size normalization to get the goal deformed or missing pieces of information, and the real target a certain gap, which will train in the back and when testing, leading to its accuracy rate.

 

 

                  

 

Skip to main content

Returns the directory object detection history

Previous: depth articles - Target Detection History (a)  with respect to the classical target detection

Next: depth articles - Target Detection History (c)   elaborate SPP-Net Target Detection

Published 63 original articles · won praise 16 · views 5987

Guess you like

Origin blog.csdn.net/qq_38299170/article/details/104470641