Faster R-CNN paper notes

  Papers Address: https://arxiv.org/abs/1506.01497

  After proposed Fast R-CNN, the main bottleneck reduction target detection time came to be selected on the computing area, so the birth Faster R-CNN. Authors propose a new network configuration, i.e. the RPN, which functions as a network through a series of convolution to obtain candidate region, the Faster R-CNN can actually be seen as a combination RPN + Fast R-CNN's. Further this layer will RPN portion convolutional feature maps generated convolutional layer is used in combination. Network structure is as follows:

  

            FIG 1 Faster R-CNN network structure

  Faster than the basic structure R-CNN, where it is divided into four parts:

  1. Conv layers. Or by vgg16 resnet, obtained feature maps.

  2. RPN. The feature maps in step 1 by a convolution layer and a fully connected layer (actually the convolution kernel convolution layer 1 * 1) to give the candidate region 4k, and classification values ​​2k, screened by calculating roi part candidate regions (ROIs) for subsequent use of target detection.

  3. ROI Pooling. Obtained in Step 1 of the input feature maps and RPN rois obtained, the region corresponding to rois feature maps in a fixed size obtained by the pattern ROI Pooling.

  4. Step 3 was predicted to give a fixed size pattern classification result and the prediction result by total bounding box connection layer, calculated loss.


 

  RPN specific configuration is as follows:

  

            2 RPN network structure of FIG.

  由图2可知,RPN先对feature map进行卷积核为3*3大小的卷积,得到512维(vgg处理中为512维而不是这里的256)新的特征图,然后再为新特征图的每一个像素点设置k(论文中为9,即为3种scale和3中aspect ration的乘积)个可能区域,所以2k个cls值即为在某特定scale和aspect ratio情况下对应的区域中,图像为目标的概率和其为背景的概率(2*k);4k个reg值即为在某特定scale和aspect ratio情况下对应区域的中心点坐标x、y和高宽h、w(4*k)。

 

Guess you like

Origin www.cnblogs.com/ylwn/p/10987479.html