Papers Address: https://arxiv.org/abs/1506.01497
After proposed Fast R-CNN, the main bottleneck reduction target detection time came to be selected on the computing area, so the birth Faster R-CNN. Authors propose a new network configuration, i.e. the RPN, which functions as a network through a series of convolution to obtain candidate region, the Faster R-CNN can actually be seen as a combination RPN + Fast R-CNN's. Further this layer will RPN portion convolutional feature maps generated convolutional layer is used in combination. Network structure is as follows:
FIG 1 Faster R-CNN network structure
Faster than the basic structure R-CNN, where it is divided into four parts:
1. Conv layers. Or by vgg16 resnet, obtained feature maps.
2. RPN. The feature maps in step 1 by a convolution layer and a fully connected layer (actually the convolution kernel convolution layer 1 * 1) to give the candidate region 4k, and classification values 2k, screened by calculating roi part candidate regions (ROIs) for subsequent use of target detection.
3. ROI Pooling. Obtained in Step 1 of the input feature maps and RPN rois obtained, the region corresponding to rois feature maps in a fixed size obtained by the pattern ROI Pooling.
4. Step 3 was predicted to give a fixed size pattern classification result and the prediction result by total bounding box connection layer, calculated loss.
RPN specific configuration is as follows:
2 RPN network structure of FIG.
由图2可知,RPN先对feature map进行卷积核为3*3大小的卷积,得到512维(vgg处理中为512维而不是这里的256)新的特征图,然后再为新特征图的每一个像素点设置k(论文中为9,即为3种scale和3中aspect ration的乘积)个可能区域,所以2k个cls值即为在某特定scale和aspect ratio情况下对应的区域中,图像为目标的概率和其为背景的概率(2*k);4k个reg值即为在某特定scale和aspect ratio情况下对应区域的中心点坐标x、y和高宽h、w(4*k)。