[Deep Learning] 6-1 Convolutional Neural Network - Convolutional Layer

Convolutional Neural Network ( CNN ).
CNN is used in various occasions such as image recognition and speech recognition. In the image recognition competition, almost all methods based on deep learning are based on CNN.

First, let's take a look at the network structure of CNN to understand the general framework of CNN. CNNs, like the neural networks introduced earlier, can be built by assembling layers like Lego bricks. However, new convolutional layers (Convolution layer) and pooling layers (Pooling layer) have appeared in CNN.

In the neural network introduced earlier, all neurons in adjacent layers are connected, which is called fully -connected.
The fully connected layer is implemented with the Affine layer. If this Affine layer is used, a 5-layer fully connected neural network can be realized through the network structure shown in the figure below
insert image description here

The Convolution layer and Pooling layer are added to CNN .
insert image description here
The connection order of CNN's layers is " Convolution - ReLU - (Pooling) " (Pooling layer is sometimes omitted). This can be understood as the previous "Affine - ReLU" connection was replaced by a "Convolution - ReLU - (Pooling)" connection.

Also note that in the CNN above, the previous " Affine - ReLU " combination is used in the layers close to the output. Also, the previous " Affine - Softmax " combination is used in the final output layer. These are relatively common structures in general CNNs .

Problems in the fully connected layer
The fully connected layer ignores the shape and treats all input data as the same neuron (neuron of the same dimension), so it cannot use shape-related information .
Convolutional layers, on the other hand , can maintain shape invariance . When the input data is an image, the convolutional layer receives the input data in the form of 3D data and outputs the same to the next layer in the form of 3D data. Therefore, in CNN, it is possible (possible) to correctly understand data with shapes such as images .

In addition, in CNN, the input and output data of the convolutional layer are sometimes called feature maps . Among them, the input data of the convolutional layer is called the input feature map , and the output data is called the output feature map .

Convolution operation
The processing performed by the convolution layer is the convolution operation. The convolution operation is equivalent to the "filter operation" in image processing.
insert image description hereExample of Convolution Operation: insert image description hereSymbolic Representation of Convolution Operation

Now let's explain what calculations are performed in the example of the convolution operation in the figure.
For the input data, the convolution operation slides the window of the filter at a certain interval and applies it. Multiply the elements of the oscillator at each position with the corresponding elements of the input, and then sum (this calculation is sometimes called a multiply-accumulate operation ). Then, save this result to the corresponding location of the output. Perform this process at all positions to get the output of the convolution operation.

In a fully connected neural network, in addition to weight parameters, there are also biases. In CNN, the parameters of the filter correspond to the previous weights . Also, there is bias in CNN.
insert image description here

In a fully connected neural network, in addition to weight parameters, there are also biases. In CNN, the parameters of the filter correspond to the previous weights. Also, there are biases in CNNs. The processing of the convolution operation including bias is as follows:
insert image description here
the bias will be added to all elements

Filling
Before performing the processing of the convolutional layer, sometimes it is necessary to fill in fixed data (such as 0, etc.) around the input data. This is called filling , and it is often used in convolution operations.

insert image description here
In the figure above, padding with a magnitude of 1 and content of 0 is applied to the input data of size (4, 4)

使用填充主要是为了调整输出的大小
如果每次进行卷积运算都会缩小空间,那么在某个时刻输出大小就有可能变为1,导致无法再应用卷积运算。为了避免出现这样的情况,就要使用填充。在刚才的例子中,将填充的幅度设为1,那么相对于输入大小(4,4),输出大小也保持为原来的(4,4)。因此,卷积运算就可以在保持空间大小不变的情况下将数据传给下一层

步幅
应用滤波器的位置间隔称为步幅。之前的例子中步幅都是1,如果将步幅设为2,应用滤波器的窗口的间隔变为2个元素
insert image description here

综上,增大步幅后,输出大小会变小。而增大填充后,输出大小会变大。
这里,假设输人大小为(H,W),滤波器大小为(FH,FW),输出大小(OH,OW),填充为P,步幅为S。此时,输出大小可通过下面的式子进行计算
insert image description here
这里需要注意的是,虽然只要代人值就可以计算输出大小,但是所设定的值必须使上面两个式子都可以除尽。当输出大小无法除尽时(结果是小数时),需要采取报错等对策。但在不同的框架中,当值无法除尽时,有时会向最接近的整数四舍五入,不进行报错而继续运行。

3维数据的卷积运算
通道方向上有多个特征图时,会按通道进行输入数据和滤波器的卷积运算,并将结果相加,从而得到输出。
insert image description here

In the convolution operation of 3D data, the number of channels of the input data and the filter is set to the same value (the same number of layers) . Oscillator size can be set to any value

The calculation process is as follows:
insert image description here

Batch processing
In neural network processing, batch processing is performed to pack input data. The implementation of the previous fully connected neural network also corresponds to batch processing. Through batch processing, it is possible to achieve efficient processing and corresponding to mini-batch during learning.

If you want the convolution operation to also correspond to batch processing. The data passed between layers is to be saved as 4-dimensional data. Specifically, the data is saved in the order of (batch_num, channel, height, width) .

As shown in the figure below:
insert image description here
In the data flow of the batch version in the above figure, the batch dimension is added at the beginning of each data. As such, data is passed between layers as a 4D shape. It should be noted here that the 4-dimensional data is transmitted between the networks, and the convolution operation is performed on the N data. That is to say, batch processing aggregates N times of processing into 1 time .

おすすめ

転載: blog.csdn.net/loyd3/article/details/131161629