Atrous convolution study notes

The theme of this blog comes from this article - How to understand Dilated Convolutions (dilated convolutions) , but the author seems to have written it a long time ago, and the text layout is very confusing. I will write a new one myself.

1. Proposal of dilated convolution

  1. Multi-Scale Context Aggregation by Dilated Convolutions
  2. Dilated Residual Networks

Here are a paper note recommended by the original author. If you are interested, you can take a look at the paper note - CVPR 2017 Dilated Residual Networks .

Insert image description here

2. Difficulties in understanding

The picture above is the original picture of the paper Multi-Scale Context Aggregation by Dilated Convolutions . We can ask a few questions.

  • What does the red dot mean?
  • Why does the image size not change with dilated convolution?
  • What does the outermost image in the picture represent?

Insert image description here


Insert image description here

This picture comes from the blog A guide to receptive field arithmetic

The picture above can help you better understand the receptive field intuitively. This picture comes from a blog, A guide to receptive field arithmetic for Convolutional Neural Networks . If you don’t understand it, you can read the Chinese translation version , which is very valuable for reference. .

According to the calculation formula of the receptive field, we

l k = l k − 1 + ( ( f k − 1 ) ∗ ∐ i = 1 k − 1 s i ) l_{k}=l_{k-1}+\left(\left(f_{k}-1\right) * \coprod_{i=1}^{k-1} s_{i}\right) lk=lk1+((fk1)i=1k1si)

Among them, lk l_{k}lkis the kth − 1 k-1kReceptive field size of layer 1 , fk f_kfkis the convolution kernel size of the current layer, si s_isiThis is number iiThe step size of layer i . The calculation formula of atrous convolution can be deduced, which is essentially adding 0 in the middle of the convolution sum, which expands the size of the convolution sum.

Let the size of the ordinary convolution sum be fk f_kfk, then the size of the equivalent atrous convolution kernel is dk d_kdk, there is a formula

d k = ( f k − 1 ) × ( r a t e − 1 ) + f k d_k=(f_k-1)\times (\mathrm{rate}-1)+f_k dk=(fk1)×(rate1)+fk

Back to the original question

  • What does the red dot mean? Represents the center of the receptive field
  • The size of the feature map obtained by dilated convolution is unchanged

Regarding the calculation of feature map size, we have the following formula

n out  = ⌊ n in  + 2 p − k s ⌋ + 1 n_{\text {out }}=\left\lfloor\frac{n_{\text {in }}+2 p-k}{s}\right\rfloor+1 nout =snin +2pk+1

Among them, n out n_{\text {out }}nout  n in  n_{\text {in }} nin The sub-table represents the feature map scale of the output and input, kkk represents the convolution kernel size,ppp represents the size of the padding,sss represents the stride of the convolution.

Insert image description here

Guess you like

Origin blog.csdn.net/m0_51143578/article/details/132327142