https://blog.csdn.net/WZZ18191171661/article/details/79453780
Network Mask-RCNN schematic structure:
wherein the black part of the original Faster-RCNN, red part Faster modifications in the network:
1) layer was replaced with the Roi Pooling RoiAlign;
2) was added FCN layer (mask layer juxtaposed) ;
to recap some features of the Mask-RCNN (from Paper of the Abstract):
. 1) is added on the basis of the border branch network identification, for identifying semantic Mask;
2) training is simple, with respect to only add a small Faster the Overhead, can go 5FPS;
. 3) can be easily extended to other tasks, such as human pose estimation;
4) without the aid of Trick, on each task, all current better than single-model entries;
FIG. gray is part of the original RCNN combination ResNet or FPN network, following the black part of the parallel Mask layer newly added, this figure itself above the map to be no different, are intended to illustrate the generalization adapt Mask RCNN method the authors proposed ability - and a variety of RCNN framework combined, are good.
Mask-RCNN Techniques
- Techniques 1-- basic network enhanced
extraction network through ResNeXt-101 + FPN characteristics as to achieve state-of-the-art results. - Techniques 2 - ROIAlign
using ROIAlign Alternatively RoiPooling (improved cell operation). Introduces an interpolation process, first
by bilinear interpolation to14*14
, and then pooling to7*7
a large extent solve the problem only by Pooling alignment Misalignment direct sampling brings.
PS: Although Misalignment on issues affecting the classification is not large, but there is a big error on the level of Pixel Mask. ROIAlign bring major improvements, Stride greater the more obvious improvements. - Techniques 3 - Loss Function
each ROIAlign outputs corresponding to K * m ^ 2 dimensions. K corresponds to the number of classes, i.e., the K output
mask, m corresponding to a resolution of the pool (7 * 7). Loss function definition:
Lmask (Cls_k) = Sigmoid (Cls_k), the average value of two cross-entropy (average binary cross-entropy) Loss , obtained by Sigmoid calculated pixel by pixel.
Why K a mask? Mask through a correspondence for each Class can effectively avoid inter-class competition (other Class does not contribute Loss).