Course 4: Convolutional Neural Networks (Week 2) - Deep Convolutional Networks: Case Studies

1. Why do case studies

Learn the methods of the predecessors to build a network, and learn from it to solve your own problems.

Classic network model:

  • LeNet-5
  • AlexNet
  • VGG
  • ResNet
  • Inception

2. Classic Network

insert image description here
insert image description here
insert image description here

3. Residual network ResNets

Very, very deep neural networks are hard to train because of vanishing and exploding gradients
insert image description here
insert image description here

  • Intermediate activations can reach deeper layers of the network
  • This approach does help to solve the gradient disappearance and gradient explosion problems , allowing us to train deeper networks while maintaining good performance
  • ResNet is very effective in training deep networks

4. Why Residual Networks are useful

insert image description here
The main reasons why residual networks work:

  • Residual blocks are very easy to learn the identity function , you can be sure that the performance of the network will not be affected, and many times it can even improve the efficiency, or at least not reduce the efficiency of the network

Personally understand that the previous network is a one-way transfer, a bit like a linked list, the current layer is only related to the previous layer, and only affects the next layer; while the residual network structure breaks through this mode, the current layer is not only affected by Affects the next layer, and affects the next multiple layers, a bit like a graph structure, and from the practical effect, the effect is good. We know that deep learning will abstractly combine into more comprehensive features through the continuous deepening of hidden layers.
In addition, I think the most criticized part of deep learning today is its black-box inexplicability. Therefore, we should not be limited to the existing structural model diagrams, but open up our ideas and design more novel network structures to approach The performance of the human brain.

5. Networks within networks and 1×1 convolutions

insert image description here
We know that the height and width of the input can be compressed by the pooling layer, but the pooling layer does not change the number of channels

1×1 convolutional layer adds a nonlinear function to the neural network, thereby reducing or keeping the number of channels in the input layer the same, or increasing the number of channels

6. Introduction to Google Inception Network

When building a convolutional layer, you decide whether the filter size is 1×1, 3×3, or 5×5, and whether to add a pooling layer.

The role of the Inception network is to decide on your behalf. Although the network architecture has become more complex, the network performance is very good. The
insert image description here
basic idea is:

The Inception network does not need to manually decide which filter to use or whether to pool or not. Instead, the network determines these parameters by itself. You can add all possible values ​​of these parameters to the network, then connect these outputs and let the network learn it needs. What parameters and which filter combinations to use

Computational cost is greatly reduced by using 1×1 convolutions to build bottleneck layers

insert image description here
It turns out that as long as the bottleneck layer is properly constructed, the scale of the presentation layer can be significantly reduced without degrading the network performance, thus saving computation

7. Inception Network

Inception Module:
insert image description here
Inception Network : Stack of Inception Modules
insert image description here

8. Use open source implementations

It turns out that many neural networks are complex and detailed, making it difficult to replicate, because some details of parameter adjustment, such as learning rate decay, etc., will affect performance

  • Choose a neural network framework you like
  • Then look for an open source implementation, download it from GitHub, and start building from that. The advantage
    of doing this is that these networks usually take a long time to train, and maybe someone has used multiple GPUs, pre-trained on a huge dataset With these networks, you can use these networks for transfer learning

9. Transfer Learning

insert image description here

10. Data augmentation

Data augmentation is a technique often used to improve the performance of computer vision systems

  • Vertical mirror symmetry (commonly used)
  • Random crop (commonly used)
  • Rotate, Shear (local warp) (not commonly used)
  • Color conversion (adding distortion values ​​to RGB channels by distribution), making the algorithm more robust to color changes in photos

Commonly used methods for implementing data augmentation:

  1. Use one thread or multiple threads to load data and achieve deformation distortion
  2. Then pass it to other threads or other processes for training, which can be implemented in parallel

There are also some hyperparameters in the data augmentation process , such as color change, random cropping parameters.
Alternatively, use someone else's open source implementation to see how they implement data augmentation. You can also adjust these parameters yourself

11. State of Computer Vision

insert image description here
To improve performance:

  • Ensemble: use multiple neural networks, average their outputs, cons, more time, larger memory usage
  • Multi-crop data expansion also has a long running time, only one neural network occupies memory

Guess you like

Origin blog.csdn.net/qq_42859149/article/details/119809826