[Transfer Learning] Pre-training and fine-tuning

I. Overview

        Generally supervised transfer learning is divided into the following three types:

                ① Use the trained model as a feature extraction module (for example, use resnet for feature extraction)

                ②Use directly after training in a related task (such as gpt)

                ③Fine-tuning on the basis of the trained model

        In addition, there are unsupervised learning methods

                zero-shot: no tag information

                few-shot: Only a small amount of label information can be obtained

Two, fine-tuning

        Generally speaking, a neural network can be divided into two parts: an encoder (Encoder) and a decoder (Decoder). The role of the encoder is to convert the original pixels into linearly separable semantic features in the semantic space (feature embedding); the role of the decoder is to map the semantic features of the encoder into labels (linear classifier).

         Pre-trained model : A model trained on a larger data set (such as ImageNet), which generally has better generalization ability. In contrast, the neural network is generally trained from scratch, and the parameters in the network are randomly initialized, which is difficult to tune.

        The specific method is as follows:

                ① Build a new model whose architecture should be consistent with the pre-trained model

                ② When the new model is initialized, the encoder (Encoder) directly loads the weights in the pre-trained model, and the decoder (Decoder) uses random initialization.

                 ③-1 Limit the search space : control the training rounds and learning rate (because the pre-training model itself is near the optimal solution, it is necessary to avoid too much offset)

               ③-2 Freeze the bottom layer : The bottom layer network generally learns some local features, and the things learned at the upper layer are more comprehensive. The specific method is to freeze the following layers (the learning rate is set to 0)

3. Acquisition of pre-trained models

        Take Pytorch as an example, you can visit this website: Pytorch pre-training model , and model conversion can refer to previous articles

[PyTorch] Pre-training weight conversion https://blog.csdn.net/weixin_37878740/article/details/130259766         or directly callthe timmpackage

import timm
from torch import nn

model = timm,create_model('resnet50',pretrained=True)
model.fc = nn.Linear(model.fc.in_features,n_classes)

Guess you like

Origin blog.csdn.net/weixin_37878740/article/details/131150632