Article directory
The theme of this blog comes from this article - How to understand Dilated Convolutions (dilated convolutions) , but the author seems to have written it a long time ago, and the text layout is very confusing. I will write a new one myself.
1. Proposal of dilated convolution
Here are a paper note recommended by the original author. If you are interested, you can take a look at the paper note - CVPR 2017 Dilated Residual Networks .
2. Difficulties in understanding
The picture above is the original picture of the paper Multi-Scale Context Aggregation by Dilated Convolutions . We can ask a few questions.
- What does the red dot mean?
- Why does the image size not change with dilated convolution?
- What does the outermost image in the picture represent?
The picture above can help you better understand the receptive field intuitively. This picture comes from a blog, A guide to receptive field arithmetic for Convolutional Neural Networks . If you don’t understand it, you can read the Chinese translation version , which is very valuable for reference. .
According to the calculation formula of the receptive field, we
l k = l k − 1 + ( ( f k − 1 ) ∗ ∐ i = 1 k − 1 s i ) l_{k}=l_{k-1}+\left(\left(f_{k}-1\right) * \coprod_{i=1}^{k-1} s_{i}\right) lk=lk−1+((fk−1)∗i=1∐k−1si)
Among them, lk l_{k}lkis the kth − 1 k-1k−Receptive field size of layer 1 , fk f_kfkis the convolution kernel size of the current layer, si s_isiThis is number iiThe step size of layer i . The calculation formula of atrous convolution can be deduced, which is essentially adding 0 in the middle of the convolution sum, which expands the size of the convolution sum.
Let the size of the ordinary convolution sum be fk f_kfk, then the size of the equivalent atrous convolution kernel is dk d_kdk, there is a formula
d k = ( f k − 1 ) × ( r a t e − 1 ) + f k d_k=(f_k-1)\times (\mathrm{rate}-1)+f_k dk=(fk−1)×(rate−1)+fk
Back to the original question
- What does the red dot mean? Represents the center of the receptive field
- The size of the feature map obtained by dilated convolution is unchanged
Regarding the calculation of feature map size, we have the following formula
n out = ⌊ n in + 2 p − k s ⌋ + 1 n_{\text {out }}=\left\lfloor\frac{n_{\text {in }}+2 p-k}{s}\right\rfloor+1 nout =⌊snin +2p−k⌋+1
Among them, n out n_{\text {out }}nout 和 n in n_{\text {in }} nin The sub-table represents the feature map scale of the output and input, kkk represents the convolution kernel size,ppp represents the size of the padding,sss represents the stride of the convolution.