Comparison of FAST_RCNN and MASK_RCNN

MASK_RCNN = FAST_RCNN + FCN

The MASK_RCNN algorithm is composed of FAST_RCNN and semantic segmentation algorithm FCN . FAST_RCNN is responsible for completing the task of target detection, and FCN is responsible for completing the task of semantic segmentation. The FCN module is added to the original FAST_RCNN level to generate the corresponding MASK branch, which is responsible for classification at the pixel level.

FAST_RCNN: Backbone+FPN (optional)+RPN+ROI Pooling + detection head

MASK_RCNN: Backbone+FPN (optional)+RPN+ROI Align + detection head

The processing methods of the two after passing through the RPN module are different

In fact, RoIAlign is slightly modified on RoI pooling. It can be said that ROI Align is for better classification at the pixel level.

It can be seen from the book

                               

 

ROI Pooling on the left and ROI Align on the right

Two rounding operations are used in ROI Pooling, respectively in the two dimensions of the width and height of the graph. Although the error of a single rounding is only 0.78, when it is restored to the original image, it will cause 0.78*32=24.96. After the first rounding, there will be an error of 25 pixels. After the second rounding, the error will be even greater , which is completely unacceptable at the level of semantic segmentation. Because the MASK is not aligned, it is visually obvious. The proposal of RoIAlign is to solve this problem and solve the problem of misalignment.

In the original paper, the author tried 3 different approaches

They are ROI Pooling, ROI Align, PrROI Pooling.

ROI Pooling

1. Image coordinates - after decimals appear in the feature map coordinates, round up and quantize for the first time
2. Feature map coordinates - ROI feature coordinates appear decimals, round up, and quantize for the second time

ROI Align

On the basis of ROI Pooling, when decimals appear, the bilinear interpolation method is used instead of rounding to solve the pixel floating point problem. That is to say, four real pixel values ​​around the virtual point in the original image are used to jointly determine a pixel value in the target image, that is, the pixel value corresponding to the virtual position point of 20.56 floating-point pixel value is estimated.

PrROI Pooling

The main calculation idea of ​​PrPool is to sum the values ​​in the ROI area and then divide by the area of ​​the ROI.

According to the author's actual comparison in the paper, ROI Align has the best effect.

In the traditional image algorithm, bilinear interpolation is also used to deal with the edge or image accuracy after zooming in and out. The most common processing method is not only bilinear interpolation, but also the nearest neighbor interpolation method, high First-order interpolation, when choosing what kind of interpolation, not only the display requirements of the image and the amount of calculation should be considered, but also the impact of the interpolation results on the conclusion analysis should be considered.

MASK forecast

In the part of the detection head, MASK_RCNN has one more than FAST_RCNN, the prediction of the mask.

From a macro perspective, mask prediction is to segment objects in the scene, but to understand at the pixel level, segmentation is actually a binary classification problem, which is divided into background and target. Therefore, the loss function of the mask part is processed by the loss of cross entropy.

 The prediction of the mask is also performed after the ROI through FCN. But it should be noted that this is semantic segmentation instead of instance segmentation . But because each ROI only corresponds to one object, it only needs to be semantically segmented, which is equivalent to instance segmentation. This is also the difference between Mask-RCNN and other segmentation frameworks, which is to classify first and then segment.
 

For the coco data set used in the original paper, there are 80 categories for each ROI mask, because the data set on coco has 80 categories, and this is done to weaken the competition between categories, so as to get better the result of.

Guess you like

Origin blog.csdn.net/qq_35326529/article/details/127988170