Backbone, head, neck and other concepts in deep learning

1.backbone

Translated as the meaning of the backbone network, since it is said to be the backbone network, it means that it is a part of the network. Most of the time, this backbone network refers to the network that extracts features, and its function is to extract the information in the picture for use by the subsequent network. These networks often use ResNet VGG, etc., instead of our own designed networks, because these networks have proved that the feature extraction ability on classification and other issues is very strong. When using these networks as the backbone, we directly load the officially trained model parameters, followed by our own network. Let these two parts of the network be trained at the same time, because the loaded backbone model already has the ability to extract features. During our training process, it will be fine-tuned to make it more suitable for our own tasks.

In the neural network, especially in the CV field, the feature extraction of the image is generally performed first (vgg, resnet are common), this part is the foundation of the entire CV task, because the subsequent downstream tasks are based on the extracted image features. Articles (such as taxonomy, generation, etc.).

The backbone network is a network used for feature extraction, which represents a part of the network. It is generally used to extract image information at the front end and generate a feature map for use by the subsequent network. VGG or Resnet is usually used, because these backbones have strong feature extraction capabilities, and can load official model parameters trained on large datasets (Pascal, Imagenet), and then connect to their own network for fine-tuning.

The backbone network is generally not a network designed by ourselves, because these networks have proven to have strong feature extraction capabilities in classification problems. When using these networks as the backbone, they are directly loaded with official model parameters that have been trained. Followed by our own network, let these two parts of the network be trained at the same time, the loaded backbone model already has feature extraction capabilities, and during our training process, it will be fine-tuned to make it more suitable for our own tasks .

2.head

The head is a network that obtains the output content of the network. Using the previously extracted features, the head uses these features to make predictions.

3.neck

The neck is placed between the backbone and the head, in order to make better use of the features extracted by the backbone

4.bottleneck

Bottleneck means bottleneck. It usually refers to the fact that the input data dimension of the network is different from the output dimension. The output dimension is much smaller than the input, just like the neck, which becomes thinner. The parameter bottle_num=256 that is often set means that the dimension of the data output by the network is 256, but the input may be of 1024 dimensions.

5.GAP

In the designed network, you can often see the gap layer. I didn’t know what it was for before, but after I learned it, it is Global Average Pool global average pooling, which averages the characteristics of a certain channel. AdaptativeAvgpool is often used. In pytorch, this represents adaptive global average pooling, which is to average the features of a certain channel in human terms.

6.Embedding

Deep learning methods use linear and nonlinear transformations to automatically extract features from complex data, and represent features as "vectors". This process is generally called "embedding".

7.downstream task

The tasks used for pre-training are called pretext tasks, and the tasks used for fine-tuning are called downstream tasks.

8.temperature parameters

This temperature parameter can often be seen in papers. It can smooth the output of softmax. Examples are as follows:

import torch
x = torch.tensor([1.0,2.0,3.0])
y = torch.softmax(x,0)
print(y)
 
x1 = x / 2  # beta 为2
y = torch.softmax(x1,0)
print(y)
 
x2 = x/0.5  # beta 为0.5
y = torch.softmax(x2,0)
print(y)

#输出结果如下:

tensor([0.0900, 0.2447, 0.6652])
tensor([0.1863, 0.3072, 0.5065])
tensor([0.0159, 0.1173, 0.8668])

When beta>1, the output result can be smoothed, and when beta<1, the output result can be made more different and sharper. If the beta is relatively large, the crossentropy loss of the classification will be very large, and different beta values ​​can be used in different iterations, which is somewhat similar to the effect of the learning rate.

9.Warm up

Warm up (warm up) refers to the first training of several epochs with a small learning. This is because the parameters of the network are randomly initialized, and it is easy to be numerically unstable if a large learning rate is used at the beginning.

Guess you like

Origin blog.csdn.net/weixin_45277161/article/details/129224252