Object segmentation (eight) Large Kernel Matters --- GCN explain

1
Original: GCN
included: CVPR 2017 (IEEE Conference on Computer Vision and Pattern Recognition)
Code: GCN-GitHub


 

ABSTRACT

 

  • In today's Internet architecture design, at the same computational complexity, often using a small stack of a convolution kernel (e.g., 1x1, 3x3) to simulate the large convolution kernel, as a convolution kernel stack smaller more efficient than the large convolution kernels;
  • But in the field of semantic segmentation, we need intensive pixel by pixel prediction, this time we found a large convolution kernel (and effective receptive field) plays an important role, and thereforeTo address the classification and localization problem, a Global Convolutional Network
  • Using a Residuals boundary refinement processed object boundary (residual-based boundary refinement)

 

1. INTRODUCTION

 
       Image segmentation consists of two parts, namely positioning , classification , since the two parts contradict each other, it is possible to deal with the perfect model of the relationship between these two is a good model.

       Q1 : Classification and Localization of contradiction is reflected in where, what is the solution?

 
       ① contradictions :
              Classification: classification model should be insensitive to transform (e.g.: move, rotate, scale), shift-invariant model requires;
              Targeting: transformation model is sensitive to, if having a very good model of translation invariance , it is difficult to determine the spatial position of the specific object, the accuracy will be low.
                                                               (Positioning required for each pixel of the object which are determined semantic information, and to find the position based on semantic information)

 
       ② Solution :
              Category: convolution kernel as large as possible, in order to achieve feature mapping and intensive connection between the per-pixel classifier, and if the convolution kernel feature maps as large (global convolution), it will be able to take advantage of global information;
              positioning: the network is full convolution, can not contain FC layer and global pooling layer, global pool of information will result in a loss position.

                                   2
       , The network A, B, C as shown above analysis:

  • In the classification of network A, all features inputted to a classifier, the classifier to determine the type of object;
  • In the conventional dividing network B, pixel by pixel classification result determined by the position and the corresponding features of FIG was found between the features and the classification result is a sparse FIG connection,
  • The article proposed network C, isAchieve dense connection between the classification result of each feature map, so that the classification result of each pixel can take advantage of the global information

 

2. Related Work

 
       First, some work before the next review of the network:

       Embedding context : Dilated-Net reached multiscale context information aggregation effect of the hollow convolution, DeeplabV2 space pyramid using pooled (combination of convolution), embedded directly from the context feature map.
       Increased resolution : FCN first proposed deconvolution to improve small fraction of map resolution. Further, DeconvNet and SegNet introduced anti pooled (i.e., the inverse of the cell) and a glass-like network to learn the upsampling process. Recently, LRR [12] FIG characterized in that sampling be better than the score map (score map). Deeplab and Dilated-Net is not the sampling process, but presents a special hollow convolution can be increased directly feature map, resulting in a greater score map.
       Boundary alignment : Conditional Random Fields (CRF) and its good mathematical form frequently used. Deeplab used directly after denseCRF post-processing method as CNN, which is a variant of CRF in full communication with FIG construct.


 

3. Global Convolutional Network

3

Q1 : Why should I use GCN very large convolution kernel?

 
       Since the classifier is partially connected to the connection instead of global feature map, it is difficult to handle different classifiers input changes, as shown above, the center alignment with the input object, so it is desirable to provide semantic tags to objects, beginning a, VRF (effective receptive field) can completely contain the entire target, but to find a B picture after amplification, VRF can contain only a small part, this will only become more unfavorable classification; but the use of GCN in B where you can feel the wild expand into the C above, covering almost the whole image.

       Thus GCN must be very large convolution kernel is best to cover the entire feature map.
 
Q2 : GCN large convolution kernel parameters can cause drastic increases, how to deal with?

 
       We also know that before with a large number of small convolution kernel convolution kernels instead is to reduce the amount of calculation, the use of small convolution kernel stack is the same theory in order to reduce the feelings of parameters under the premise of wild, non-linear increase, but obviously the 3x3 switch back 5x5,7x7 even greater amount of calculation shall likewise expand exponentially, and too many parameters are also difficult to make network convergence.
       Thus the paper of a large convolution kernel is an exploded two k × k 1 × k and k × 1, and the intermediate activation function is not used ReLU like, and the number and complexity of our calculation parameters is only O (2 / k).

                                                 4

Boundary Re fi nement based residuals refining boundary (BR): the effect is to increase the effect of the boundary, the structure of the module is very simple, that is, a residual base module.
                                                5

                     6


Reference

  1. Interpretation of Large Kernel Matters papers
  2. https://blog.csdn.net/heiheiya/article/details/87860546
Published 36 original articles · won praise 5 · views 30000 +

Guess you like

Origin blog.csdn.net/qq_40520596/article/details/104553801