Implementation of convolution operation

本文参考 Why GEMM is at the heart of deep learning

The full name of BLAS is Basic Linear Algebra Subprograms, which provides some low-level general-purpose linear algebra operations, such as vector addition, number multiplication, dot product, and matrix multiplication. The implementation of BLAS differs depending on the hardware platform. It often uses the hardware characteristics of a specific processor to accelerate calculations (such as vector registers and SIMD instruction sets on the processor), and provides C and Fortran language support.
Different manufacturers have developed their own acceleration libraries, Intel's MKL, ATLAS and OpenBLAS under the unified framework of BLAS according to the characteristics of their own hardware. The latter three can be configured and used in Caffe.
In BLAS, the function gemm (GEMM: General Matrix to Matrix Multiplication) for multiplying matrix and matrix and the function gemv for multiplying matrix and vector are implemented. The efficient implementation of these two mathematical operations is related to the operation speed of the entire DL framework. .

It can be seen that in the forward calculation process, whether it is CPU or GPU, a lot of time is spent on convolutional layers and fully connected layers.

fully connected layer

For the operation of the fully connected layer, the output neuron needs to multiply and sum the input to obtain the output. The process is shown in the following figure:

k is the dimension of the input data, and n is the number of neurons in the fully connected layer. It can be seen that each neuron must multiply and sum the input to obtain an output, and n neurons and obtain n outputs.

convolutional layer

When the convolution kernel processes multi-channel feature maps or input images, the effect is shown in the figure:
—>

Implementation process

The first step is to convert the input image or feature map from a 3D array into a 2D array. The places where the convolution operation is applied are all a small 3D cube in the image, so we straighten the values ​​in these cubes as a row of the matrix. This is called im2col, as shown here:

In order to realize the operation of matrix product, it is necessary to pull the convolution kernel into the matrix. At this time, the convolution kernel needs to be stretched into a column. The specific GEMM process is:

if the size of the convolution kernel is 3x3, the channel of the input feature map The number is 16, then k=3x3x16, and the number of feature maps output by the convolution operation is the number of convolution kernels, that is, the number of columns of the kernel matrix.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324759876&siteId=291194637