Deep convolutional network and its case analysis
Classic network
LeNet-5
1. Network structure
2. Network mode
From this network, we can see the general characteristics of the convolutional network, as well as the construction mode that is often used now.
- As the number of network layers deepens, the size of the data (length x width) will gradually shrink , but the data channels will gradually increase
- Deep convolutional neural networks are often built in the mode of [ conv-...-conv-pool-conv-...-conv-pool-FC-FC ], that is, multiple convolutional layers followed by a pooling layer, and finally connected to a number of full The connection layer gets the output of the model.
LeNet-5
1. Network structure
2. Network mode
AlexNet's network construction philosophy is similar to LeNet, but obviously AlexNet is more complicated.
- Compared with LeNet network, AlexNet has a larger number of network parameters
- In the realization and discussion of the paper, LRU (Local Response Normalization Layer) was actually added, but this technology is now considered to have little effect.
VGG-16
1. Network structure
Compared with the network structure discussed earlier, this is a network that only needs to focus on building the convolutional layer .
2. Network Mode
- The network is indeed very large, containing about 138 million parameters in total
- However, the network structure of VGG-16 is not complicated, and the structure is very regular-several convolutional layers are followed by a pooling layer that can compress the image size
- The number of filters in the convolutional layer also has a certain rule: from 64 to 128, to 256 and 512; the number of filters is doubled in each group of convolutional layers.
- Because the network structure is organized according to [several convolutions-pooling], and each group of convolutions will double the number of filters, each pooling will halve the size of the data, which makes the image shrink The ratio of and the ratio of channel increase are regular .
Residual network
In fact, building a deep neural network is not so easy because of the disappearance of gradients and the existence of gradient explosions. Therefore, we propose a [ long jump connection ]
long jump connection: the activation value can be obtained from a certain network layer, and then quickly fed back to another layer, or even to the deeper layer of the neural network;
we can use the long jump connection to construct a training depth ResNets of the network.
Definition and structure of residual network
"Using the residual block can build a deep neural network, so ResNet is to quickly accumulate several residuals together to form a deep neural network"
1. Residual block
2. Residual network
3. Advantages of residual network
In the past, deep neural networks, with the further deepening of the network, a large number of parameters will make the network bloated, and it is difficult to use optimization algorithms for training.
The emergence of the residual network can solve this problem. It allows the activation value of the input X to reach the deeper network to deal with the problem of the gradient disappearance and gradient explosion of the deep neural network.
Why does the residual network have good performance?
1. The residual network can learn the identity
efficiently. 2. In addition to efficiency,
the hidden unit contained in the residual block will also learn more hidden information, and its effect may be better than [identity function]
1x1 convolution
1. Computer System
- For single-channel data, 1x1 convolution is just to multiply all the width×height data units one by one.
- For multi-channel data (widthxheightxchannel), 1x1 convolution requires a 1x1xchannel size convolution kernel, and each convolution-traverses widthxheight units one by one, each unit composes a data vector composed of channel data in all channels Do the dot product operation with the data vector corresponding to the convolution kernel.
1x1 convolution is usually called network in network , because each convolution kernel can be regarded as a neuron in a fully connected layer. The original data is passed through channel input data, and the channel weight of the neuron is performed. The output is performed after the operation; if the number of convolution kernels is changed, it is equivalent to changing the number of neurons in this layer and changing the number of outputs.
2. The meaning of calculation
In short, by introducing the concept of 1x1 convolution in the neural network, the depth of the data can be easily changed .
When we talked about the related knowledge of convolution, we already know that the width and length of the incoming data can be easily adjusted through the convolution operation and pooling operation; now we have learned that 1x1 convolution can facilitate the depth of the incoming data Compress or stretch.
Transfer learning
The so-called transfer learning, you can download the open source weight parameters that others have trained, use them as the initialization parameters on your own neural network, and use [transfer learning] to transfer the knowledge on the public data set to your own problems.
Now suppose we need to do a three-category problem for cat images:
- When the data set we have is small:
There is also a trick available in this step-because the network structure is fixed from the input to the final softmax, we can throw each original data into the network in advance and learn that it is before the last softmax layer The mapping value of is used as the feature vector of the original data and stored in the hard disk ;
after training, the feature vector can be directly read from the hard disk as input for network learning.
- When the data set you have is larger:
"A basic rule is-the more data you have, the fewer network layers you need to freeze, and the more network layers and parameters you can train."
- When the data set is large, we can initialize the network structure and parameters pre-trained by others, and then retrain the entire network.