nndl-book-Notes-Basic Model Chapter 5-Convolutional Neural Network

Chapter 5 Convolutional Neural Networks

Convolutional Neural Network (CNN or ConvNet) is a deep feed-forward neural network with local connection, weight sharing and other characteristics.

    1) At first, it was mainly used to process image information. There are two disadvantages of using full connection to process images: too many parameters; local invariant characteristics (so convolution operations appear).

    2) The convolutional neural network is generally a feed-forward neural network formed by cross-stacking convolutional layers, pooling layers, and fully connected layers, using the backpropagation algorithm for training (updating parameters.

    3) CNN features: local connection; weight sharing (a group of connections share a parameter); aggregation.

    4) CNN application: Various tasks of image and video analysis, such as image classification, face recognition, object recognition, image segmentation, etc., have been applied to natural language processing, recommendation systems and other fields in recent years.

  5.1 Convolution

    1) One-dimensional convolution: One-dimensional convolution is often used in signal processing to calculate the delayed accumulation of signals.

                 

    2) Two-dimensional convolution: Convolution is similar to cross-correlation operations, except that the convolution operation first flips the convolution kernel by 180°, and then performs summation of products. Among them, the selection of the convolution kernel is different, which will lead to different results and different effects after the convolution operation. The result obtained after an image undergoes a convolution operation is called a feature map.

                  

    5.1.1 Cross-correlation

    1) In the specific implementation, the convolution is generally replaced by cross-correlation operations, which will reduce some unnecessary operations or overhead (convolution kernel flipping).

    2) Convolution operations in many deep learning tools are actually cross-correlation operations. Can be expressed as: Y = W ⊗ X

    5.1.2 Variations of convolution

    1) On the basis of convolution, the sliding step size of the filter and zero padding are introduced to increase the diversity of convolution, which can perform feature extraction more flexibly.

    1) Assuming that the number of input neurons in the convolutional layer is n, the convolution size is m, the stride is s, and p zero paddings are filled at both ends of the input neurons, then the convolutional layer’s The number of neurons is (n − m + 2p)/s + 1.

         Among them, n+2p is the total length after filling, minus m is the total length of movement, /s is the number of times of movement, and +1 is the complement of -2p between.

         Therefore, an appropriate value is usually taken so that (n+2p-m)/s+1 is an integer

    2) There are three commonly used convolutions, and the convolution is generally equal-width convolution by default.

                 

    5.1.3 Mathematical properties of convolution

    1) In the case of wide convolution, that is, when p=m-1, it is commutative, x⊗y = y⊗x.

    1) The conclusion of the derivative in the convolution (the derivation is somewhat confusing...): Suppose Y = W⊗X, where X ∈ RM×N, W ∈ Rm×n, Y ∈ R(M−m+1)×(N −n+1

                   1. The partial derivative of f(Y ) with respect to W is the convolution of X and ∂f(Y )/∂Y,

                   2. The partial derivative of f(Y ) with respect to X is the wide convolution of W and ∂f(Y )/∂Y.

  5.2 Convolutional Neural Networks

                  Convolutional neural networks generally consist of convolutional layers, pooling layers, and fully connected layers.

    5.2.1 Replacing full connection with convolution

    1) Due to too many parameters caused by the fully connected layer, it is thought of using the convolutional layer instead of the fully connected layer.

         Convolution is used instead of fully connected, the net input z(l) of layer l is the convolution of activity value a(l−1) of layer l −1 and filter w(l) ∈ Rm, namely z(l
                                                     ) = w(l) ⊗ a(l−1) + b(l),

              Where filter w(l) is a learnable weight vector, b(l) ∈ Rnl−1 is a learnable bias. (The purpose is to reduce parameters)

                                      

    2) The characteristics of the convolution operation: local connection (neurons in the next layer are connected to all neurons in the previous layer);

                                      Weight sharing (the filter w(l) as a parameter is the same for all neurons in the l-th layer. That is, the weights for different neurons are the same)

    5.2.2 Convolution layer

    1) The function of the convolution layer is to extract the features of a local area, and different convolution kernels are equivalent to different feature extractors.

    2) Feature Map (Feature Map): It is a feature extracted by convolution for an image (or other feature map), and each feature map can be used as a class of extracted image features. (Can it be said that the image matrix is ​​the result matrix generated after the convolution operation?) In order to improve the representation ability of the convolutional network, different convolution kernels can be used on different layers to obtain different feature maps.

    3) The following figure shows a 3-dimensional convolutional layer representation, where the input feature map is the input image matrix, and the full connection is replaced by convolution to calculate the output feature map.

        

                

    5.2.3 Pooling layer (pooling layer pooling layer)

    1) The convolutional layer can reduce the number of connections in the network, but the number of neurons in the feature map group does not decrease significantly. The reason why classifiers cannot be added directly after the convolutional layer is because the dimension is too high and overfitting is prone to occur. Therefore, a pooling layer (convergence layer) is needed to reduce the feature dimension and avoid overfitting. And thus reduce the number of parameters. 

    2) Aggregation operation: maximum aggregation (take out the value with the largest value in the same area); average aggregation (represented by the average value in an area)

    3) A typical pooling layer divides each feature map into non-overlapping regions of size 2×2, and then uses the maximum pooling method for downsampling. The pooling layer can also be regarded as a special convolution layer, the size of the convolution kernel is m×m, the step size is s×s, and the convolution kernel is a max function or a mean function. An overly large sampling area will drastically reduce the number of neurons and cause excessive information loss.

  5.2.4 Typical convolutional network structure

    1) Convolution layer --> activation layer (the purpose of the activation layer is to add a nonlinear function after the convolution operation, otherwise the superposition of linear functions cannot achieve the effect of learning parameters)

        (Convolutional layer --> Activation layer) --> Convergence layer (pooling layer) (In order to reduce the feature map parameters after convolution operation, the maximum convergence in the common 2*2 area)

          The above -----> fully connected layer (in order to complete the required tasks, such as the operation of the fully connected layer in the classification task as the final division result)

          The final softmax should be a normalization operation to complete the mapping.

              

    2) The network structure tends to use smaller convolution kernels (such as 1 × 1 and 3 × 3) and deeper structures (such as layers greater than 50). As the operability of convolution becomes more and more flexible (such as different step sizes), the role of the pooling layer becomes smaller and smaller. Therefore, in the currently popular convolutional networks, the proportion of the pooling layer is gradually reduced, tending to Fully Convolutional Network. ( The convolution kernel is an odd number to achieve equal-width convolution, and the deeper the depth is to be able to learn higher-level features )

  5.3 Parameter learning (derivation from reading....)

  5.4 Several Typical Convolutional Neural Networks

    5.4.1 LeNet-5: Handwritten Digit Recognition

                     https://blog.csdn.net/saw009/article/details/80590245 will be very clear

         

When calculating the parameters *6 *6 *3 *1 origin,

    5.4.2 AlexNet

    1) It is the first modern deep convolutional network model, using GPU for parallel training, using ReLU as a nonlinear activation function, using Dropout to prevent overfitting, and using data enhancement to improve model accuracy, etc.

    2) AlexNet structure: including 5 convolutional layers, 3 fully connected layers and 1 softmax layer.

                      About the structure explanation:   https://blog.csdn.net/forsch/article/details/84893277

           

    5.4.3 Inception network (as in the paper, not repeated)

    5.4.4 ResNet network (same as above)

  5.5 Other convolution methods

          Designed to perform different convolution operations with stride and zero padding.

    5.5.1 Transposed Convolution

    1) In order to achieve the mapping from low dimension to high dimension, this process is called transposed convolution.

               

    5.5.2 Hole convolution (in order to increase the receptive field of the output result)

           For a convolutional layer, if you want to increase the receptive field of the output unit , you can generally achieve it in three ways: (1) increase the size of the convolution kernel ; (2) increase the number of layers ; (3) perform a pooling operation before convolution . The first two operations increase the number of parameters, while the third loses some information.

    1) Hole convolution is a method that does not increase the number of parameters while increasing the receptive field of the output unit.

    2) Atrous convolution increases its size in disguise by inserting "holes" into the convolution kernel. If d − 1 holes are inserted between every two elements of the convolution kernel, the effective size of the convolution kernel is:
                         m′ = m + (m − 1) × (d − 1)       See the animation of hole convolution : https://nndl.github.io/v/cnn-conv-more
           where d is called the dilation rate (Dilation Rate). When d = 1, the convolution kernel is an ordinary convolution kernel.

Guess you like

Origin blog.csdn.net/qq_41427834/article/details/107945920