Deep Convolutional Networks: A Case Study
- 1. Why do case studies
- 2. Classic Network
- 3. Residual network ResNets
- 4. Why Residual Networks are useful
- 5. Networks within networks and 1×1 convolutions
- 6. Introduction to Google Inception Network
- 7. Inception Network
- 8. Use open source implementations
- 9. Transfer Learning
- 10. Data augmentation
- 11. State of Computer Vision
1. Why do case studies
Learn the methods of the predecessors to build a network, and learn from it to solve your own problems.
Classic network model:
- LeNet-5
- AlexNet
- VGG
- ResNet
- Inception
2. Classic Network
3. Residual network ResNets
Very, very deep neural networks are hard to train because of vanishing and exploding gradients
- Intermediate activations can reach deeper layers of the network
- This approach does help to solve the gradient disappearance and gradient explosion problems , allowing us to train deeper networks while maintaining good performance
- ResNet is very effective in training deep networks
4. Why Residual Networks are useful
The main reasons why residual networks work:
- Residual blocks are very easy to learn the identity function , you can be sure that the performance of the network will not be affected, and many times it can even improve the efficiency, or at least not reduce the efficiency of the network
Personally understand that the previous network is a one-way transfer, a bit like a linked list, the current layer is only related to the previous layer, and only affects the next layer; while the residual network structure breaks through this mode, the current layer is not only affected by Affects the next layer, and affects the next multiple layers, a bit like a graph structure, and from the practical effect, the effect is good. We know that deep learning will abstractly combine into more comprehensive features through the continuous deepening of hidden layers.
In addition, I think the most criticized part of deep learning today is its black-box inexplicability. Therefore, we should not be limited to the existing structural model diagrams, but open up our ideas and design more novel network structures to approach The performance of the human brain.
5. Networks within networks and 1×1 convolutions
We know that the height and width of the input can be compressed by the pooling layer, but the pooling layer does not change the number of channels
1×1 convolutional layer adds a nonlinear function to the neural network, thereby reducing or keeping the number of channels in the input layer the same, or increasing the number of channels
6. Introduction to Google Inception Network
When building a convolutional layer, you decide whether the filter size is 1×1, 3×3, or 5×5, and whether to add a pooling layer.
The role of the Inception network is to decide on your behalf. Although the network architecture has become more complex, the network performance is very good. The
basic idea is:
The Inception network does not need to manually decide which filter to use or whether to pool or not. Instead, the network determines these parameters by itself. You can add all possible values of these parameters to the network, then connect these outputs and let the network learn it needs. What parameters and which filter combinations to use
Computational cost is greatly reduced by using 1×1 convolutions to build bottleneck layers
It turns out that as long as the bottleneck layer is properly constructed, the scale of the presentation layer can be significantly reduced without degrading the network performance, thus saving computation
7. Inception Network
Inception Module:
Inception Network : Stack of Inception Modules
8. Use open source implementations
It turns out that many neural networks are complex and detailed, making it difficult to replicate, because some details of parameter adjustment, such as learning rate decay, etc., will affect performance
- Choose a neural network framework you like
- Then look for an open source implementation, download it from GitHub, and start building from that. The advantage
of doing this is that these networks usually take a long time to train, and maybe someone has used multiple GPUs, pre-trained on a huge dataset With these networks, you can use these networks for transfer learning
9. Transfer Learning
10. Data augmentation
Data augmentation is a technique often used to improve the performance of computer vision systems
- Vertical mirror symmetry (commonly used)
- Random crop (commonly used)
- Rotate, Shear (local warp) (not commonly used)
- Color conversion (adding distortion values to RGB channels by distribution), making the algorithm more robust to color changes in photos
Commonly used methods for implementing data augmentation:
- Use one thread or multiple threads to load data and achieve deformation distortion
- Then pass it to other threads or other processes for training, which can be implemented in parallel
There are also some hyperparameters in the data augmentation process , such as color change, random cropping parameters.
Alternatively, use someone else's open source implementation to see how they implement data augmentation. You can also adjust these parameters yourself
11. State of Computer Vision
To improve performance:
- Ensemble: use multiple neural networks, average their outputs, cons, more time, larger memory usage
- Multi-crop data expansion also has a long running time, only one neural network occupies memory