How to use the pre-trained model to fine-tune the model (such as freezing certain layers, setting different learning rates for different layers, etc.)

Because the weights of the pre-trained model and the data set we want to train have certain differences, and the data set that needs to be trained varies, it is important to fine-tune the model and set different learning rates. Discuss, please correct the mistakes or shortcomings.
(1) When the data set to be trained is small and the similarity to the data set of the pre-trained model is high. For example, when the data in the training data set exists in the pre-trained model, there is no need to retrain the model, only the last output layer needs to be modified.
(2) When the data set to be trained is small and the similarity to the data set of the pre-trained model is small. You can freeze the first k layers of the model and re-create the back nk layers of the model. Freeze the top-k layer of the model to compensate for the smaller data set.
(3) When the data set to be trained is large and the similarity to the data set of the pre-trained model is large. Using a pre-trained model will be very effective, keeping the model structure unchanged and the initial weight unchanged, and retraining the model
(4) When the data set to be trained is large and the similarity to the pre-trained model data set is small. Using the pre-trained model will not have much effect. You can use the pre-trained model or not, and then retrain.

Published 36 original articles · won praise 1 · views 6384

Guess you like

Origin blog.csdn.net/qq_34291583/article/details/105328711