[Wu Enda CNN] week2: Deep Convolutional Network

Deep convolutional network and its case analysis

Classic network

LeNet-5

1. Network structure
Insert picture description here
2. Network mode

From this network, we can see the general characteristics of the convolutional network, as well as the construction mode that is often used now.

  • As the number of network layers deepens, the size of the data (length x width) will gradually shrink , but the data channels will gradually increase
  • Deep convolutional neural networks are often built in the mode of [ conv-...-conv-pool-conv-...-conv-pool-FC-FC ], that is, multiple convolutional layers followed by a pooling layer, and finally connected to a number of full The connection layer gets the output of the model.

LeNet-5

1. Network structure
Insert picture description here
2. Network mode

AlexNet's network construction philosophy is similar to LeNet, but obviously AlexNet is more complicated.

  • Compared with LeNet network, AlexNet has a larger number of network parameters
  • In the realization and discussion of the paper, LRU (Local Response Normalization Layer) was actually added, but this technology is now considered to have little effect.

VGG-16

1. Network structure

Compared with the network structure discussed earlier, this is a network that only needs to focus on building the convolutional layer .

Insert picture description here
2. Network Mode

  • The network is indeed very large, containing about 138 million parameters in total
  • However, the network structure of VGG-16 is not complicated, and the structure is very regular-several convolutional layers are followed by a pooling layer that can compress the image size
  • The number of filters in the convolutional layer also has a certain rule: from 64 to 128, to 256 and 512; the number of filters is doubled in each group of convolutional layers.
  • Because the network structure is organized according to [several convolutions-pooling], and each group of convolutions will double the number of filters, each pooling will halve the size of the data, which makes the image shrink The ratio of and the ratio of channel increase are regular .

Residual network

In fact, building a deep neural network is not so easy because of the disappearance of gradients and the existence of gradient explosions. Therefore, we propose a [ long jump connection ]

long jump connection: the activation value can be obtained from a certain network layer, and then quickly fed back to another layer, or even to the deeper layer of the neural network;
we can use the long jump connection to construct a training depth ResNets of the network.

Definition and structure of residual network

"Using the residual block can build a deep neural network, so ResNet is to quickly accumulate several residuals together to form a deep neural network"

1. Residual block
Insert picture description here
2. Residual network
Insert picture description here
3. Advantages of residual network

In the past, deep neural networks, with the further deepening of the network, a large number of parameters will make the network bloated, and it is difficult to use optimization algorithms for training.

The emergence of the residual network can solve this problem. It allows the activation value of the input X to reach the deeper network to deal with the problem of the gradient disappearance and gradient explosion of the deep neural network.

Insert picture description here

Why does the residual network have good performance?

1. The residual network can learn the identity
Insert picture description here
efficiently. 2. In addition to efficiency,
the hidden unit contained in the residual block will also learn more hidden information, and its effect may be better than [identity function]


1x1 convolution

1. Computer System

  • For single-channel data, 1x1 convolution is just to multiply all the width×height data units one by one.
    Insert picture description here
  • For multi-channel data (widthxheightxchannel), 1x1 convolution requires a 1x1xchannel size convolution kernel, and each convolution-traverses widthxheight units one by one, each unit composes a data vector composed of channel data in all channels Do the dot product operation with the data vector corresponding to the convolution kernel.
    Insert picture description here

1x1 convolution is usually called network in network , because each convolution kernel can be regarded as a neuron in a fully connected layer. The original data is passed through channel input data, and the channel weight of the neuron is performed. The output is performed after the operation; if the number of convolution kernels is changed, it is equivalent to changing the number of neurons in this layer and changing the number of outputs.

2. The meaning of calculation
In short, by introducing the concept of 1x1 convolution in the neural network, the depth of the data can be easily changed .

When we talked about the related knowledge of convolution, we already know that the width and length of the incoming data can be easily adjusted through the convolution operation and pooling operation; now we have learned that 1x1 convolution can facilitate the depth of the incoming data Compress or stretch.

Insert picture description here


Transfer learning

The so-called transfer learning, you can download the open source weight parameters that others have trained, use them as the initialization parameters on your own neural network, and use [transfer learning] to transfer the knowledge on the public data set to your own problems.

Now suppose we need to do a three-category problem for cat images:

  • When the data set we have is small:
    Insert picture description here

There is also a trick available in this step-because the network structure is fixed from the input to the final softmax, we can throw each original data into the network in advance and learn that it is before the last softmax layer The mapping value of is used as the feature vector of the original data and stored in the hard disk ;
after training, the feature vector can be directly read from the hard disk as input for network learning.

  • When the data set you have is larger:

"A basic rule is-the more data you have, the fewer network layers you need to freeze, and the more network layers and parameters you can train."

Insert picture description here

  • When the data set is large, we can initialize the network structure and parameters pre-trained by others, and then retrain the entire network.

Guess you like

Origin blog.csdn.net/kodoshinichi/article/details/110347068