Returns the directory object detection history
Previous: depth articles - Target Detection History (a) with respect to the classical target detection
Next: depth articles - Target Detection History (c) elaborate SPP-Net Target Detection
论文地址:《Rich feature hierarchies for accurate object detection and semantic segmentation》
In this section, elaborate R-CNN target detection, the next section target detection elaborate SPP-Net
Two. R-CNN target detection (2013)
1. R-CNN (Region-based Convolutional Neural Networks, R-CNN) based on convolutional neural network region, a region nominated (Region Proposal) and a convolutional neural network (CNN) method for detecting a target binding. R-CNN is Ross Girshick released in 2013. Ross Girshick as pedigree in this area, there is always the same God, R-CNN, Fast R-CNN, Faster R-CNN, YOLO all about him. In fact, before R-CNN has a lot of researchers in the process of trying to do with Deep learning of target detection, including OverFeast, but R-CNN is the first truly industrial solution applications.
2. The depth of learning objectives related to detection methods look roughly divided into two factions:
(1) Based on the nomination area
如 R-CNN,SPP-Net,Fast R-CNN,Faster R-CNN,R-FCN
(2) end (end-to-end)
No need to nominate areas, such as SSD, YOLO, CornerNet etc.
3. R-CNN main steps
(1). Region proposal to nominate area
Extracting a bounding boxes around 2k frame region candidates from the original image by selective search
(2) the region size normalization
Bounding boxes of all the candidate frame image scaled to a fixed size (using the original 227 x 227).
(3) feature extraction
By CNN network, extracting a feature
(4) Classification and Regression
Wherein the base layer is added on two fully connected layers, and then do the SVM classification identification, by linear regression to trim the border location and size, wherein each class separately training a classifier border.
4. R-CNN flow chart
Another observation angles are as follows:
Figure above Warped image regions, a region size normalization information will bounding boxe box, or by Warp crop, all the processing of the same size. This has led to some of the images will be tensile deformation, resulting in image distortion.
Also, is separately convolution operation on the image after each the size of normalization, due to the bounding boxes probably around the 2k, and that there are likely to overlap, which results in the process of extracting features, there will be many repeat, and very time-consuming resources and space resources.
Further, limited SVM for processing data, if the data mass, it can easily exceed the capabilities of SVM. Calculation is too large, the model is too large, efficiency is too low.
5. R-CNN difference with the classical target detection, feature extraction convolution except CNN, alternative extraction process HOG features + bag-of-words and other characteristics. These changes, so that the R-CNN ILSVRC2013 detection data, obtained 31.4% mAP such good results. Relative to the previous best results of those 24.3% mAP has been a huge improvement. And, for the first time to attract attention, CNN extracts image features with extraordinary ability. From then on, the image processing, the basic CNN used to extract features.
Although R-CNN the later ones, such as insufficient. However, at the time, but it is a good detector, the effect is just fine
6. R-CNN disadvantages
(1) Repeat the calculation
R-CNN although not exhaustive, but there is still bounding boxes around the 2k, these bounding boxes need to be CNN operation, the amount of computation is still large, many of which are actually double-counting
(2). SVM model
There linear models, when the label is no shortage of data is obviously not the best choice.
(3) The training is divided into multi-step test
region proposal, feature extraction, classification, regression training process is turned off, the intermediate data also needs to be saved separately. Training is divided into several stages, not cumbersome. SVM + Network + training to fine-tune training border regressor
(4). High cost of training space and time
Wherein the first convolution processing on the hard disk, of them need storage space of several hundred G. Save and read process needs to consume time and resources.
(5) Slow
The disadvantage of the foregoing resulting in R-CNN surprisingly slow, an image processing needs 13s on a GPU, 53s required on the CPU.
(6) Regional size normalization, will target crop or warp non-scaling, resulting in size normalization to get the goal deformed or missing pieces of information, and the real target a certain gap, which will train in the back and when testing, leading to its accuracy rate.
Returns the directory object detection history
Previous: depth articles - Target Detection History (a) with respect to the classical target detection
Next: depth articles - Target Detection History (c) elaborate SPP-Net Target Detection