论文解读1——Rich feature hierarchies for accurate object detection and semantic segmentation

background

  In 2012 Imagenet LSVRC game, Alexnet to top-5 error rate of 15.3% of easily come out on top (second top-5 error rate of 26.2%). Thus, convNet potential is widely recognized, a hit. Since convNet can achieve good results in the image classification task, is not it also can put it on target detection task. This paper is the first to explore the use convNet solve target detection task. mAP on PASCAL VOC 2010 reached 53.7%.

method

  Model altogether divided into three modules.

  (1) region proposals (area recommended)). Hou produce a lot of boxes in a whole view of the above, because the target detection task is not only to do the classification, had to carry out the goal box. In fact, here it is a traversal of thought, first possible position of the target list them all, and then one by one classification. Here is a selective search algorithm used.

  (2) Feature extraction (feature extraction). This part is to produce image 224 * 224 * 3] in the above by the first 5 conv layer and the front two FC layers AlexNet feature extraction (note: generating a sub-image to resize to 224 * 224 * 3, because the back FC receiving layer is a fixed length vector), it is ultimately produced 4096-d of the feature vector.

  (3) SVM classification. The method of using the SVM classification feature vector for each of the 4096-d. Note that the original classification AlexNet which is followed by one of FC layers, the vector 4096-d is converted into Nd (N is the number of classes), then do softmax classification algorithm. The reason why the authors do not use this method because the experiment SVM method better.

to sum up

  Can be found, RCNN simple principle, to generate the n th in the entire view of FIG selective search algorithm to resize then subgraph 224 * 224 * 3, which is then thrown into AlexNet (FC-softmax layers removed) to produce n 4096-d is a vector with the SVM classification algorithm.

  So its contribution in solving the problem of target detection by convNet for the mountains, mAP model than traditional methods, to allow more people to solve the problem of target detection by convNet.

Shortcoming

  (1) The disadvantage of course is obvious, first this way too violent, a picture to produce thousands of sheets subgraph, take up a lot of disk space;

  (2) There is inefficiency, we can find thousands of pictures certainly has a lot of overlapping sections, these overlapping parts must be calculated separately, resulting in a large number of redundant computations.

Guess you like

Origin www.cnblogs.com/xin1998/p/11371615.html