Deep learning (target detection)---Detailed explanation of ROI Pooling layer

Original link: https://blog.deepsense.ai/region-of-interest-pooling-explained/

The target detection typical architecture can usually be divided into two stages:
(1) region proposal: Given an input image, find all possible locations where objects may exist. The output of this stage should be a set of bounding boxes for the possible positions of the object. These are often called region proposals or regions of interest (ROI).
(2) Final classification: Determine whether each region proposal in the previous stage belongs to the target category or background.
Some problems with this architecture are:
  • Generating a large number of region proposals can lead to performance problems, and it is difficult to achieve real-time object detection.
  • It is suboptimal in terms of processing speed.
  • End-to-end training cannot be done.
This is the fundamental reason proposed by ROI pooling.
The ROI pooling layer can achieve significant speedup in training and testing, and improve detection accuracy. This layer has two inputs:
  • Fixed-size feature maps obtained from deep networks with multiple convolution kernel pooling;
  • An N*5 matrix representing all ROIs, where N is the number of ROIs. The first column represents the image index, and the remaining four columns represent the remaining upper-left and lower-right coordinates;
The specific operation of ROI pooling is as follows:
(1) According to the input image, map the ROI to the corresponding position of the feature map;
(2) Divide the mapped area into sections of the same size (the number of sections is the same as the dimension of the output);
(3) Perform a max pooling operation on each section;
This way we can get the corresponding feature maps of fixed size from boxes of different sizes. It is worth mentioning that the size of the output feature maps does not depend on the size of the ROI and convolutional feature maps. The biggest benefit of ROI pooling is that it greatly improves the processing speed.
ROI pooling example
Consider a feature map of size 8*8, a ROI, and an output size of 2*2.
(1) Input fixed-size feature map 

(2) The position after region proposal projection (coordinates of upper left corner, lower right corner): (0, 3), (7, 8).


(3) Divide it into (2*2) sections (because the output size is 2*2), we can get:


(4) Do max pooling for each section, you can get:


ROI pooling summary:
(1) Used for target detection tasks; (2) Allows us to reuse feature maps in CNN; (3) Can significantly speed up training and testing; (4) Allow end-to-end training of target detection systems.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325598051&siteId=291194637