论文阅读:InstanceFCN:Instance-sensitive Fully Convolutional Networks

1、论文总述

在这里插入图片描述

深度学习之后,instance segmentation(实例分割)这个task最开始采用和R-CNN类似的两阶段算法:segmentation proposal提取以及proposal精细分类,如SDS,MNC,DeepMask和InstanceFCN,看过论文就知道,InstanceFCN(2016年)只是提取分割的proposals(二分类),相当于是实例分割的前一步,并不是实力分割;上一篇的FCN只是语义分割,有分类但是不能区分同一类的每个个体,这篇论文不能分类,但是可以区分每个个体。。。

2015年一篇论文是实例分割:Instance-aware Semantic Segmentation via Multi-task Network Cascades(后面看这篇),这篇实例分割论文要比InstanceFCN早些,虽然提出了但问题很多,实例分割比较好的就是2017年的FCIS(Fully Convolutional Instance-aware Semantic Segmentation)以及MaskRCNN了,FCIS借鉴了InstanceFCN提出的instance-sensitive score maps,InstanceFCN就是在发展了FCN的基础上靠instance-sensitive score maps来区分不同的个体的,感觉这篇论文比较好的点好像就是instance-sensitive score maps了,这个也被目标检测网络R-FCN借鉴了。

2、instance-sensitive score maps的工作原理

在这里插入图片描述
传统的图像分割中,由于对每个像素点采用交叉熵损失训练,所以每个像素点的语义信息只能有一个,那对于两种类别的重叠区域,实现实例分割就很困难,通过这篇论文提出的K * K个instance-sensitive score maps(类似于feature map,让每个像素产生K * K个语义信息,),而每个score map表示每个ROI的某一个位置,如第一个score map表示ROI的左上得分,第二表示上中等等。。。这样的话就是每个score map只表示instance的某个相对位置的得分,通过这个就可以实现instance-sensitive,下面举个例子:

在这里插入图片描述如上图,虽然是同一个位置(红点),但它在不同的roi里对应的位置是不一样的,在第一个ROI里对应的中右,而在第二个ROI里对应的上左,所以这俩ROI提取这个红点的特征时候(因为重合,所以都去提了),去的score map是不一样的,第一个ROI去的第6个score map,而第二个去的是第一个score map,这也是为啥俩图上显示的响应不一样的原因,因为这俩图负责响应的相对位置不一样,第一个图明显对每个instance的右中响应得分高,而第二个图对instance的左上响得分高。

一句话总结:虽然不同instance之间有相邻部分,但是相邻部分在不同的instance里处于不同的位置。

【注】:这篇论文寻找ROI的方式是通过sliding window的方式,FCIS是通过RPN网络。

3、与DeepMask的比较

Most related to our method, DeepMask [8] is an instance segment proposal
method driven by convolutional networks. DeepMask learns a function that maps
an image sliding window to an m2
-d vector representing an m×m-resolution
mask (e.g., m = 56). This is computed by an m2
-d fully-connected (fc) layer.
See Fig. 2. Even though DeepMask can be implemented in a fully convolutional
way (as at inference time in [8]) by recasting this fc layer into a convolutional
layer with m2
-d outputs, it fundamentally differs from the FCNs in [1] where
each output pixel is a low-dimensional classifier. Unlike DeepMask, our method
has no layer whose size is related to the mask size m2
, and each pixel in our
method is a low-dimensional classifier. This is made possible by exploiting local
coherence [9] of natural images for generating per-window pixel-wise predictions.
We will discuss and compare with DeepMask in depth.

4、Instance assembling module

The instance-sensitive score maps have not yet produced object instances.
But we can simply assemble instances from these maps. We slide a window of
resolution m×m on the set of instance-sensitive score maps (Fig. 1 (bottom)).
In this sliding window, each mk × mk
sub-window directly copies values from the
same sub-window in the corresponding score map. The k2
sub-windows are then
put together (according to their relative positions) to assemble a new window of
resolution m×m. This is the instance assembled from this sliding window.

5、 Local Coherence

By local
coherence we mean that for a pixel in a natural image, its prediction is most
likely the same when evaluated in two neighboring windows. One does not need
to completely re-compute the predictions when a window is shifted by a small
step.

在这里插入图片描述

The local coherence property has been exploited by our method. For a window
that slides by one stride (Fig. 3 (bottom)), the same pixel in the image coordinate
system will have the same prediction because it is copied from the same score
map (except for a few pixels near the partitioning of relative positions). This
allows us to conserve a large number of parameters when the mask resolution
m2 is high. (就是sliding window移动几步也不影响对同一个instance的预测,这样就可以减少sliding window的数量,这是由于图像的局部连通性)

6、性能比较

在这里插入图片描述

参考文献

1、实例分割初探,Fully Convolutional Instance-aware Semantic Segmentation论文解读

2、论文阅读:Instance-sensitive Fully Convolutional Networks

发布了71 篇原创文章 · 获赞 56 · 访问量 6万+

猜你喜欢

转载自blog.csdn.net/j879159541/article/details/101867227
今日推荐