Interpretation of the paper detection target 3 - Fast R-CNN

background

  deep ConvNet rise, VGG16 used in image classification task to perform well, this paper VGG16 to solve the inspection task. The method of training before a disadvantage CNN layer can not fine tuning the SPP NET, and are divided into several stages, feature extraction + SVM classification + frame return, these problems in the Fast R-CNN have been resolved.

method

  Network model employed VGG16 structure, with the following improvements compared with SPP NET.

  ROI pooling

  The final layer is replaced RoI pooling max pooling layer, may be considered as a special case of SPP NET, only one layer of the pyramid, feature map is divided into a bin H * W, as max pooling.

  bbox regressor

  End two different networks using fully connected layers, respectively, and the output sorter position result, achieved the training process of the end-to-end.

  CNNs parameter update

  In fact, SPP NET is also not not update parameters CNNs, but such expenses are too high, because the SPP NET will first seek out a bunch of pictures of ROI, N Zhang randomly after training disrupted, the ROI may come from many different pictures, so if you want to reverse the spread, you must save the training feature these pictures in each layer map, huge overhead; Fast R-CNN uses a hierarchical thinking, only in the original R = 2 Zhang take the N ROI, such storage is calculated using only two pictures, the cost is greatly reduced.

to sum up

  It is made on the basis of SPP NET improvement, training process into end-to-end, the entire network parameters can be updated.

Shortcoming

  ROI extraction method used or SS.  

 

Guess you like

Origin www.cnblogs.com/xin1998/p/11374221.html