Convolutional Neural Networks(1): Architecture

Concolutional Neural Networks (CNN) also uses a three-layer structure, but the structure is very different from the Feedforward Neural Network. Its structure is as follows:

Input layer: For a single image, the input data is 3D (Width*Length*Depth), see the stereogram below. But if we use mini-batch to train the neural network, the input becomes 4D data (Width*Length*Depth*Batch_size).

 

Feature-extraction layers: Convolution layer (+Relu) and Pooling layer appear in pairs for high-order mapping and feature extraction. As shown in the figure below, for a single image (3D data), use Filter (also called Kernel, which can be regarded as a Sliding Window, generally smaller in size than the data) to scan the image and do convolution. Under normal circumstances, from the input volume to the Convolutional Layer, the Width and Length of the data are slightly reduced, while the 3rd Dimension Depth is expanded because there are multiple Kernels. In the middle process, it can be seen that after the Filter, the depth becomes 6, which proves that there are 6 Filters, extending the original Depth=3 to 6. Then through the pooling layer, Width and Length are further compressed. Here, Average or Maximum is generally used to decompress.

 

 

Classification layer: Before the final output layer, there will be a Fully-connected layer, the form is the same as the Hidden layer of Feedfoward Neural Network, fully connected with Neurons of the previous Pooling layer, and fully connected with the Output nodes of the output layer. , one or more layers as needed.

Output layer: Use Softmax to output probabilities or output other forms as needed. If Mini batch is used, the output is 2D (Probabilities*Batch_size).

 

The actual workflow can be illustrated with the following example:

a. Input is 32*32 (depth=1)

b. There are 6 kernels in Layer 1, and 6 Activation Maps of 28*28 after Convolution. After selecting one of four by Pooling (Subsample), it becomes 6*14*14.

c. There are 16 kernels in Layer2, and there are 16 Activation Maps of 10*10 after Convolution. After selecting one of four by Pooling (Subsample), it becomes 16*5*5.

d.C5 is a fully connected layer with 120 neurons, and F6 is the output layer

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324858518&siteId=291194637