Paper notes: ParseNet: Looking Wider to See Better

1 Summary

Small wild feel a lot of problems for the actual experience of the FCN field than the theoretical - VGG + FCN stands to reason that the receptive field fc7 is 404 × 404, can actually feel the FCN field fc7 but not so big, we propose a ParseNet network, through the integration of global information to compensate for the lack of actual field experience.

2 Highlights

2.1 global pool layer

The authors found that the use of the FCN networks VGG fc7 receptive field should be 404 × 404, in order to test the actual size of the receptive field fc7, the original author of a sliding window, and the obtained noise information is superimposed FIG view of fc7 in response, so that you can see the actual fc7 of the receptive field, as shown below:
Here Insert Picture Description
(a) original image, (b) to the corresponding features of FIG heat, (c) after the theoretical sliding window fc7 observed receptive field, (d) is the actual fc7 receptive field. You can see the actual receptive field only 1/4 less than the original, much smaller than the actual receptive field, so during the feature extraction when there might weaken the global semantic links, but there are some articles in use CRF is the results after treatment, although this can get good results, but CRF and other post-treatment are complex and will consume a lot of computing resources. To solve this problem, the authors propose the concept of a global pool layer. The layer is performed prior to a fc7 average global pool layer (shallow characteristic diagrams global pool of global information to be relatively rare), then the global information obtains the global pool of information directly through CNN information obtained by fusing (wherein splicing channel). FIG follows:
Here Insert Picture Description
the same feature map is divided into two branches: one for the production of global feature information from the CNN trunk another feature map information produced, the last two branches combined. ParseNet this way is relatively simple and the effect of CRF accuracy with CRF treated the same. (Some may say better integration of the two, but the article "Are spatial and global constraints really necessary for segmentation?" Noted that if the feature stitching to deal with feature maps, the equivalent of a smooth process, and no need to use smoothing the CRF.)

2.2 L2 normalization layer

在融合问题上,作者提出有一个“早融合”——就是在分类之前进行全局信息的融合。一个“后融合”,就是主线和支线分别分类得到两个分数再进行两个分数的加权。经过作者的试验,如果使用了L2归一化层,这早融合和晚融合两者并没很大的差别。因为在特征融合的时候,不同的层的特征有着不同的尺度大小,而可能会由于特征尺度大小相差悬殊,而使得效果变差,如下图:
Here Insert Picture Description
上图中不同的颜色代表不同的层的特征,而横坐标表示尺度信息,纵坐标表示特征的权重信息。蓝色和青色的特征尺度是一个合适的尺度比例,但是红色和绿色的尺度比前者大两个数量级,直接进行融合,由于尺度的不同会使得准确率下降。因此,在融合特征之前, 将L2范数应用于特征图的每个像素,而非整个图。
Here Insert Picture DescriptionHere Insert Picture Description
在反向传播的时候并增加一个γ参数作为尺度大小,让其在反向传播的时候能够自动学习尺度大小进行归一化:
Here Insert Picture Description

2.3 ParseNet总体结构

ParseNet通过引入了全局池化层进行了特征的融合,其整体结构如下图:
Here Insert Picture Description
特征图一方面通过主分支进行L2归一化,另一方面进行一个平均池化层得到全局特征图,再进行L2归一化,进行一个上池化得到放大后的特征图与主分支得到的特征图进行拼接。经过L2归一化以后融合了全局语义信息使得原本较小的实际感受野获得了更多的全局信息,效果更好。

3 部分效果图

Here Insert Picture Description上图显示的是FCN和ParseNet的对比,可能由于感受野的问题,只看到了猫的上半部分,有一小下部分没有看到,所以造成了下半部分分类成其它。而ParseNet则能比较好的处理这个图片。
Here Insert Picture Description
上图中,ParseNet Baseline为没有添加全局语义信息的,而ParseNet为添加了语义信息的,可见添加了全局语义信息以后得到的效果类似于使用了CRF、RNN等后处理方式得到的效果。
Here Insert Picture Description
这个上图,则表明了融合全局信息有时候使得效果变差了。

4 结论

Global feature maps in this paper is a study of multi-scale problems and receptive field, while adding a global information supplements the lack of practical experience in this field is the idea behind a lot of articles also used. In addition, the article mentioned the use of this method to achieve the effect and the effect of CRF and other post-processing mode obtained almost, but after CRF and other treatment too complex and consume a lot of computing resources, and this article was able to network simpler modified to achieve a better result.

5 References

(1)ParseNet: Looking Wider to See Better
(2)【阅读笔记】《ParseNet: Looking Wider to See Better》

Published 24 original articles · won praise 27 · views 10000 +

Guess you like

Origin blog.csdn.net/gyyu32g/article/details/104387176