How to calculate Receptive Field

Receptive Field in Deep Neural Networks

This article is reprinted from Zhihu: Receptive Field in Deep Neural Networks - Lan Muda's article - Zhihu

In the field of deep neural networks in the field of machine vision, there is a concept called receptive field, which is used to represent the size of the receptive range of neurons at different locations within the network to the original image. The reason why neurons cannot perceive all the information of the original image is because convolutional layers and pooling layers are commonly used in these network structures, and the layers are locally connected (through sliding filters). The larger the value of the neuron's receptive field, the larger the original image range it can access, which also means that it may contain more global and higher-semantic features; while the smaller the value, the smaller the number of features it contains. Tend to localities and details. Therefore, the value of the receptive field can be roughly used to determine the abstraction level of each layer.

So how to calculate this receptive field? Let's look at the following example first.

Insert image description here

It can be seen that the original image range that each unit in Conv1 can see is 3*3, and since each unit of Conv2 is composed of Conv1 in the 2×2 range, it is actually possible to trace back to the original image. See the 5×5 original image extent. So we say that the receptive field of Conv1 is 3 and the receptive field of Conv2 is 5. The receptive field of each unit of the input image is defined as 1, which should be easy to understand because each pixel can only see itself.

Through the diagram above, we can "visually check" how big the receptive field of each layer is, but for network structures with too many layers and too complex, this method may not be smart enough. Therefore, we hope to summarize the rules and describe them with formulas, so that we can calculate the receptive field of each layer of any complex network structure. So let’s take a look at the rules below.

Since the image is two-dimensional and contains spatial information, the essence of the receptive field is actually a two-dimensional area. However, the industry usually defines the receptive field as a square area, so the side length is used to describe its size. In the following discussion, this article only considers the width direction .

Next, we use an uncommon way to show the relationship between the layers of CNN:

Insert image description here

Insert image description here

Insert image description here

Guess you like

Origin blog.csdn.net/IYXUAN/article/details/127589560