[Deep Learning] 6-1 Convolutional Neural Network - Convolutional Layer

Convolutional Neural Network ( CNN ).
CNN is used in various occasions such as image recognition and speech recognition. In the image recognition competition, almost all methods based on deep learning are based on CNN.

First, let's take a look at the network structure of CNN to understand the general framework of CNN. CNNs, like the neural networks introduced earlier, can be built by assembling layers like Lego bricks. However, new convolutional layers (Convolution layer) and pooling layers (Pooling layer) have appeared in CNN.

In the neural network introduced earlier, all neurons in adjacent layers are connected, which is called fully -connected.
The fully connected layer is implemented with the Affine layer. If this Affine layer is used, a 5-layer fully connected neural network can be realized through the network structure shown in the figure below
insert image description here

The Convolution layer and Pooling layer are added to CNN .
insert image description here
The connection order of CNN's layers is " Convolution - ReLU - (Pooling) " (Pooling layer is sometimes omitted). This can be understood as the previous "Affine - ReLU" connection was replaced by a "Convolution - ReLU - (Pooling)" connection.

Also note that in the CNN above, the previous " Affine - ReLU " combination is used in the layers close to the output. Also, the previous " Affine - Softmax " combination is used in the final output layer. These are relatively common structures in general CNNs .

Problems in the fully connected layer
The fully connected layer ignores the shape and treats all input data as the same neuron (neuron of the same dimension), so it cannot use shape-related information .
Convolutional layers, on the other hand , can maintain shape invariance . When the input data is an image, the convolutional layer receives the input data in the form of 3D data and outputs the same to the next layer in the form of 3D data. Therefore, in CNN, it is possible (possible) to correctly understand data with shapes such as images .

In addition, in CNN, the input and output data of the convolutional layer are sometimes called feature maps . Among them, the input data of the convolutional layer is called the input feature map , and the output data is called the output feature map .

Convolution operation
The processing performed by the convolution layer is the convolution operation. The convolution operation is equivalent to the "filter operation" in image processing.
insert image description hereExample of Convolution Operation: insert image description hereSymbolic Representation of Convolution Operation

Now let's explain what calculations are performed in the example of the convolution operation in the figure.
For the input data, the convolution operation slides the window of the filter at a certain interval and applies it. Multiply the elements of the oscillator at each position with the corresponding elements of the input, and then sum (this calculation is sometimes called a multiply-accumulate operation ). Then, save this result to the corresponding location of the output. Perform this process at all positions to get the output of the convolution operation.

In a fully connected neural network, in addition to weight parameters, there are also biases. In CNN, the parameters of the filter correspond to the previous weights . Also, there is bias in CNN.
insert image description here

In a fully connected neural network, in addition to weight parameters, there are also biases. In CNN, the parameters of the filter correspond to the previous weights. Also, there are biases in CNNs. The processing of the convolution operation including bias is as follows:
insert image description here
the bias will be added to all elements

Filling
Before performing the processing of the convolutional layer, sometimes it is necessary to fill in fixed data (such as 0, etc.) around the input data. This is called filling , and it is often used in convolution operations.

insert image description here
In the figure above, padding with a magnitude of 1 and content of 0 is applied to the input data of size (4, 4)

Padding is used primarily to resize the output .
If the space is reduced each time a convolution is performed, there is a chance that at some point the output size will become 1 , making it no longer possible to apply the convolution. To avoid this situation, padding is used. In the example just now, setting the magnitude of padding to 1 will keep the output size (4,4) relative to the input size (4,4). Therefore, the convolution operation can pass the data to the next layer while keeping the space size constant .

Stride
The interval of positions over which the filter is applied is called the stride . In the previous example, the stride is 1. If the stride is set to 2, the interval of the window to which the filter is applied becomes 2 elements .
insert image description here

In summary, after increasing the stride, the output size will become smaller. And after increasing the padding, the output size will become larger.
Here, assume that the input size is (H, W), the filter size is (FH, FW), the output size is (OH, OW), the padding is P, and the stride is S. At this time, the output size can be calculated by the following formula. It
insert image description here
should be noted here that although the output size can be calculated as long as the value is substituted, the set value must make the above two formulas divisible . When the output size cannot be divisible (when the result is a decimal), it is necessary to take countermeasures such as error reporting . But in different frameworks, when the value is not divisible, it will sometimes round to the nearest integer and continue without error.

Convolution operation of 3D data
When there are multiple feature maps in the channel direction, the input data and filter convolution operation will be performed by channel , and the results will be added to obtain the output.
insert image description here

In the convolution operation of 3D data, the number of channels of the input data and the filter is set to the same value (the same number of layers) . Oscillator size can be set to any value

The calculation process is as follows:
insert image description here

Batch processing
In neural network processing, batch processing is performed to pack input data. The implementation of the previous fully connected neural network also corresponds to batch processing. Through batch processing, it is possible to achieve efficient processing and corresponding to mini-batch during learning.

If you want the convolution operation to also correspond to batch processing. The data passed between layers is to be saved as 4-dimensional data. Specifically, the data is saved in the order of (batch_num, channel, height, width) .

As shown in the figure below:
insert image description here
In the data flow of the batch version in the above figure, the batch dimension is added at the beginning of each data. As such, data is passed between layers as a 4D shape. It should be noted here that the 4-dimensional data is transmitted between the networks, and the convolution operation is performed on the N data. That is to say, batch processing aggregates N times of processing into 1 time .

Guess you like

Origin blog.csdn.net/loyd3/article/details/131161629