DenseASPP paper summarizes

Papers Address: http://openaccess.thecvf.com/content_cvpr_2018/papers/Yang_DenseASPP_for_Semantic_CVPR_2018_paper.pdf

First, the relevant work

1, FCN

FCN created a precedent for semantic segmentation tasks, advanced semantic information plays a vital role in the network division. In order to extract high-level information, FCN using multiple layers to increase the pool size of the accepted domain output neuron. However, increasing the number of cell layer will lead to reduction in the size of the feature map, sample back to full resolution caused serious challenges which will split the output. In order to solve the contradiction between the characteristic diagram larger and larger receiving field resolution, we proposed a new hollow convolution algorithm.

2, hollow convolution ( Atrous Convolution )

And compared to the conventional convolution operator, atrous convolution is possible to obtain larger field sizes accepted without increasing the number of core parameters. feature map generated by the convolution atrous size of the input may be the same, but each has a larger output neuron accepted domain, and therefore may encode a higher level semantics. Although atrous convolution to solve the contradiction between the feature map with a resolution to accept the domain size, atrouss - all neurons feature map convolution output in the accepted domains have the same size, which means that the semantic mask generation process uses only features on a single scale. However, multi-scale information will help resolve the ambiguity situation, and create a more robust classification results.

3, ASPP

For this purpose, ASPP [2,3] proposed atrous convolution at different rates of expansion generated feature map are connected in series, so that the output neurons in feature map comprising a plurality of accepted domain size of multi-scale information is encoded, and ultimately improve performance .

However, with the expansion rate of increase (eg d> 24), atrous convolution become increasingly ineffective, and gradually lose the ability to modeling. Therefore, the design is capable of encoding a multi-scale information, while the network structure to obtain a sufficiently large reception field is very important.

Two, DenseASPP

DenseASPP by the underlying network and a convolution of a series of stacked layers. DenseASPP proposed and binding parallel concatenated convolutional layer using the advantages of the cavity, resulting in more scale features in a wider range. Characterized by a series connection, each of the intermediate neurons FIG encode semantic feature information from a plurality of scales, wherein FIG different intermediate encode multi-scale information from different scales. Through a series of hollow convolution, later levels get more and more neurons in the receptive field, and will not appear nuclear degradation of ASPP. Thus, DenseASPP final output characteristic of FIG encompasses not only a wide range of semantic information, but also a very densely cover the range.

1, contributions

1) DenseASPP can be generated to cover a very wide range of characteristics (in terms of accepted domain size).
2) DenseASPP generated feature can be very densely cover the range of scales.

2, operating mode

 

 

Hollow convolution cascade tissue layer, each layer is added to the hierarchy expansion coefficient. Small expansion coefficient in a lower layer, the upper layer of a large expansion ratio. The output of each feature map for all input and output layer and lower layers are connected together, and these are connected together into one feature map. DenseASPP final output is generated by a plurality of hollow, multi-scale features convolution FIG. The proposed structure can form while a more intensive and more features of the pyramid, only a few empty convolution layer. The original ASPP [3] compared, DenseASPP all voids convolution layers stacked together, with tight junctions and connect them. This change gives us two advantages: more intensive features and greater acceptance of the pyramid field.

3, the advantages

3.1 Density Pyramid

The term "density" means not only the diversity of scale and better features of the pyramid, also said pixel convolution involved in more than ASPP.

Intensive sample size: DenseASPP is an effective architecture that can be sampled on the input of different sizes. A key design DenseASPP is to use a tight connection to achieve different expansion rates of the different levels of integration

For the expansion rate is d, the hollow core of size K convolutional layer, which is equivalent to the accepted domain size:

R = (d - 1) × (K - 1) + K

D = 3 in a 3 × 3 convolution layer, for example, corresponding to the accepted domain size of 7.

The convolution of two laminated together can be greater accepted domain. Suppose we have two filter sizes are K1 and K2 convolution layer, receiving a new domain:

K = K1 + K2 - 1

For example, a convolution kernel size and kernel size layer 7 is laminated convolution plus 13, 19 receiving domain size.

Was expanded by the expansion DenseASPP convolution 3,6,12,18 comprising a combination of the expansion ratio Each number represents the length of an equivalent size of the convolution kernel, k represents the actual reception field, as follows:

 

 

 

 

 

 

Dense pixel sample: Compared with ASPP, DenseASPP involves more pixel features in the calculation of the pyramid. ASPP were overrun with four layers 6,12,18,24 convolution characteristics pyramid configuration. Compared with the traditional convolutional receiving layer of the same domain, the pixel sampling rate convolutional layer is large expansion ratio is very sparse. In DenseASPP, the expansion ratio increased layer by layer, and therefore, the convolution may be utilized wherein the upper layer of the lower layer, more dense sampling of the pixel.

 3.2 more accepted domain

Another benefit is brought greater DenseASPP accepted domain. Atrous convolutional layers work in parallel in the conventional ASPP in the four branches of the feed-forward process does not share any information. In contrast, in the cavity by a convolution layer DenseASPP skipping connected to share information. Between the layers of small spreading factor and a large spreading factor are interdependent, wherein the feedforward process not only constitutes a more intensive in the pyramid, and will produce a larger filter to sense a larger context.

 

Rmax is provided wherein the maximum accepted domain pyramid, the RK function, d is the core size is K, the expansion ratio of d-domain convolution receiving layer, the maximum accepted domain of ASPP (12, 18,
24) is:

Rmax = max [R3,6, R3,12, R3,18, R3,24]
= R3,24
= 51

 

The DenseASPP (6,12,18,24) is the largest recipient domain:

Rmax = R3,6 + R3,12 + R3,18 + R3,24 - 3
= 122

The acceptance of such a large field can provide global information for the high-resolution images of large objects. For example, Cityscapes [4] resolution of 2048 × 1024, and the last feature map divided our network 256 × 128.
DenseASPP (6,12,18,24) characterized in FIG cover 122, DenseASPP (3,6,12,18,24) covers feature 128 of FIG.

 

Guess you like

Origin www.cnblogs.com/qqw-1995/p/11728861.html