Understanding neural networks (xii) Mask-RCNN

https://blog.csdn.net/WZZ18191171661/article/details/79453780

Here Insert Picture Description

Network Mask-RCNN schematic structure:
Here Insert Picture Description
wherein the black part of the original Faster-RCNN, red part Faster modifications in the network:
1) layer was replaced with the Roi Pooling RoiAlign;
2) was added FCN layer (mask layer juxtaposed) ;
to recap some features of the Mask-RCNN (from Paper of the Abstract):
. 1) is added on the basis of the border branch network identification, for identifying semantic Mask;
2) training is simple, with respect to only add a small Faster the Overhead, can go 5FPS;
. 3) can be easily extended to other tasks, such as human pose estimation;
4) without the aid of Trick, on each task, all current better than single-model entries;
Here Insert Picture Description
FIG. gray is part of the original RCNN combination ResNet or FPN network, following the black part of the parallel Mask layer newly added, this figure itself above the map to be no different, are intended to illustrate the generalization adapt Mask RCNN method the authors proposed ability - and a variety of RCNN framework combined, are good.
Mask-RCNN Techniques

  • Techniques 1-- basic network enhanced
    extraction network through ResNeXt-101 + FPN characteristics as to achieve state-of-the-art results.
  • Techniques 2 - ROIAlign
    using ROIAlign Alternatively RoiPooling (improved cell operation). Introduces an interpolation process, first
    by bilinear interpolation to 14*14, and then pooling to 7*7a large extent solve the problem only by Pooling alignment Misalignment direct sampling brings.
    PS: Although Misalignment on issues affecting the classification is not large, but there is a big error on the level of Pixel Mask. ROIAlign bring major improvements, Stride greater the more obvious improvements.
  • Techniques 3 - Loss Function
    each ROIAlign outputs corresponding to K * m ^ 2 dimensions. K corresponds to the number of classes, i.e., the K output
    mask, m corresponding to a resolution of the pool (7 * 7). Loss function definition:
    Lmask (Cls_k) = Sigmoid (Cls_k), the average value of two cross-entropy (average binary cross-entropy) Loss , obtained by Sigmoid calculated pixel by pixel.
    Why K a mask? Mask through a correspondence for each Class can effectively avoid inter-class competition (other Class does not contribute Loss).
    Here Insert Picture Description
Published 163 original articles · won praise 117 · views 210 000 +

Guess you like

Origin blog.csdn.net/u010095372/article/details/91347888