Model Design in Deep Learning

Article directory


Note:
Record some of the knowledge in the video lessons .

1. How to learn deep learning model design and optimization

1.1 Model design for deep learning

1.1.1 Model depth - the key to performance improvement

insert image description here

1.1.2 Channel Dimension Transformation—Basic Design Unit

insert image description here

1.1.3 Model Width—Model Expressability and Compression Key

Increase the width of the model:
(1) Increase the number of channels
(2) Increase the branch, such as: from one channel to multiple channels
insert image description here

1.1.4 Residual network - key technology for deep model training

insert image description here

1.1.5 Packet network - the core technology of model compression

insert image description here

1.1.6 Multi-scale and abnormal convolution - the core technology for improving the performance of complex task models

insert image description here

1.1.7 Dynamic Reasoning and Attention Mechanism Network—Core Technology for Model Optimization

insert image description here

1.1.8 Generative Adversarial Networks - A New Generation of Deep Learning Basic Technology

insert image description here

2. Model optimization for deep learning

2.1 Requirements for industrial-grade networks

insert image description here

2.2 Technical points of model simplification and optimization

insert image description here

2.2.1 The use of convolution kernel

(1) Full connection and global pooling
(2) Small convolution replaces large convolution
(3) Dimensional transformation
insert image description here

2.2.2 Convolution topology design

insert image description here

2.2.3 Transfer Learning

insert image description here

2.2.4 Automated Search

insert image description here

3. How to design a CNN structure with stronger performance

3.1 Design of Network Depth

3.1.1 Why deepening can improve performance

insert image description here

3.1.2 What depth to start with

(1) Networks with more than 30 layers are rare
(2) Usually, the depth of selection can start from 5 to 8 layers
insert image description here

3.1.3 Problems prone to occur in network deepening

insert image description here

3.2 Design of Network Width

3.2.1 Why do we need enough width - more width can learn richer features

insert image description here
The left part of the figure shows the visualization of the 96 channels of the first convolutional layer of AlexNet. It can be seen that
some channels are the shape features of the extracted pictures, and some are color information.

So if there are not enough channels, not so many features can be extracted.

3.2.2 Characteristics of network width

insert image description here
(1) As the width of the model increases, the calculation amount will also increase exponentially.
(2) Increasing the network width can improve the model performance, but it will be saturated after reaching a certain critical point.

3.2.3 Width Design Principles

insert image description here

3.3 Dropout and BN layer design

3.3.1 Benefits of Dropout

(1) Increase the generalization ability
(2) Alleviate the dead issue of ReLU
(3) Reduce the complex co-adaption between neurons. Figure a
insert image description here
indicates that there is dropout, and its neuron performance is more abundant. Figure b indicates that there is no dropout, and its Neurons represent very little.

3.3.2 Problems using Dropout

Dropout is equivalent to adding noise and causing gradient loss, so it is necessary to use a larger learning rate and momentum item, and the training will take longer to converge.

3.3.3 Why BN is good

insert image description here

3.3.4 Precautions for using BN

insert image description here

3.4 Design of convolution kernel size, step size and pooling

3.4.1 How stride and pooling affect performance

(1) Generally, the first convolutional layer does not exceed 2
insert image description here

3.4.2 How to choose a pooling scheme

insert image description here

3.4.3 How to design the step size

insert image description here

3.4.4 How the size of the convolution kernel affects model performance

insert image description hereinsert image description here

3.4.5 How to design the convolution kernel size

insert image description here

3.5 Understanding and Design of Residual Structure

3.5.1 Why the residual structure is effective

insert image description here
insert image description here
insert image description here

3.5.2 How to design a residual network

insert image description here
insert image description here
insert image description here

3.6 Multi-scale and information fusion

3.6.1 How to design a multi-scale network

insert image description here

3.7 Others

3.7.1 Attention

insert image description here
insert image description here
insert image description here

3.7.2 Receptive field control

insert image description here
insert image description here
insert image description here

3.7.3 AutoML

insert image description here

3.8 GAN network

4. How to obtain a more efficient CNN model structure

4.1 What is a more efficient CNN network

insert image description here

4.2 Channel Compensation and Channel Reuse

insert image description here
insert image description here

4.3 Group convolution

Grouped convolution can greatly reduce the amount of computation.
insert image description here
The types of group convolution are:
(1)
insert image description here
(2)
insert image description here
(3)
insert image description here
(4)
insert image description here
(5)
insert image description here

4.4 Pruning

insert image description here
insert image description here
insert image description here

4.5 Quantization

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

4.6 Knowledge Distillation

insert image description here
insert image description here
insert image description here

4.7 Dynamic Reasoning

Guess you like

Origin blog.csdn.net/qq_41915623/article/details/126228559