Thesis: Segmentation Is All You Need to read notes

Disclaimer: This article is a blogger original article, reproduced, please attach Bowen link! https://blog.csdn.net/m0_37263345/article/details/90701524

A thesis

Segmentation Is All You Need

https://arxiv.org/abs/1904.13300

Second, the paper notes

1, very poor (poor recall) the RPN method for detecting the effect of an object inside the image based on some special cases, such as three

 

 

Poor for two main reasons:

(A) RPN highly dependent bounding box, but for extreme cases, manual annotation of ground truth contains many noises

(B) NMS threshold can be difficult to find a suitable coupling various examples of suitable threshold

 

2, innovation

(1), proposed A weakly Supervised Multimodal Annotation Segmentation (WSMA-Seg), Anchor-Free and the NMS-Free, semantic segmentation method (advantages, avoids some of the selected super-parameters, ease complex occlusion, pixel-level compared semantic annotation bounding box is more accurate)

 

(2), the use of multi-modal (three modes state) label instead of bouding box label. Supervision and training information as semantic segmentation model and designed a boundary tracking algorithm.

Three kinds of modes are the contour of the object boundary, and the boundary of the two objects joined together. Figure:

Three kinds of labels mode production process:

Given an image with bounding box annotations, we first obtain an inscribed ellipse for each bounding box, then the interior mask (channel 0) is obtained by setting the values of pixels on the edge of or inside the ellipses to 1, and setting the values of other pixels to 0. Then, the boundary mask (channel 1) is obtained by setting the values of pixels on the edge of or within the inner width w of the ellipses to 1, and setting the rest to 0. Similarly, the boundary on the interior mask (channel 2) is generated by setting the values of pixels on the edge of or within the inner width w of the area of the elliptical overlap to 1.

 

边界跟踪算法:

 

 

(3)、使用语义分割的方法,那么效果就非常依赖语义分割的模型,因此提出multi-scale pooling segmentation (MSP-Seg) model,

multi-scale pooling utilizes four pooling kernals with sizes 1 × 1, 3 × 3, 5 × 5, and 7 × to simultaneously conduct average pooling operations on the previous feature maps generated by residual blocks on skip connections

 

 

 

3. Process

Training Process

1, according to the model using a split multi-modal conversion bouding box labeled training

Testing Process

1, three heat FIG model output divided by the pixel level operations into a logistic regression perceptual-segmentation example of FIG.

2, FIG perform segmentation using contour tracing operation to generate a contour of the object, and create an object of the bounding box of the contour of the external quadrangular

 

4, Thinking

The idea is very good, but do work is very limited, the effect is not very convincing (selectively choose to do some baseline comparison)

1, do not know the initial segmentation model output boundary mask and final boundary using boundary tracking algorithm to identify out what's the difference

3, marked three modes of interpretation of the meaning it is not very clear

 

5, summary

one stage anchor free NMS-free to do semantic-based segmentation  

Git Hub paper list:https://github.com/zhiAung/Paper

 

Guess you like

Origin blog.csdn.net/m0_37263345/article/details/90701524