Looking through many blogs and forums, generally freezing parameters include two steps:
- Set the attribute of the parameter to False, ie requires_grad=False
- When the optimizer is defined, the parameters that do not update the gradient are filtered out, which is generally the case
optimizer.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-3)
I won't go into details above, most of Baidu is like this.
Let me talk about my task first:
I have a model consisting of an encoder and a decoder. The parameters of the decoder are fixed during pre-training, and only the parameters of the encoder are trained. Then train all the parameters during fine-tune.
problem:
According to the above method, an error of inconsistent length will be reported when reloading the model.
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
After debugging for a long time, I found that the model I loaded only saved the parameters of the encoder part, but the new model is the parameters of the encoder and decoder. Therefore, the pre-trained parameters cannot be loaded into the new model.
Solution:
Only set the attribute of the parameter to True/False, without filtering the parameter in the optimizer, so that the length will be consistent.
Moreover, the fixed parameters during the pre-training process are indeed not updated, and all the parameters are updated during the fine-tune, which just meets our requirements.
Attach my adjustment process:
- Pre-training: only modify attributes, do not filter parameters
for param in model.parameters():
param.requires_grad = False
for param in model.encoder.parameters():
param.requires_grad = True
Output the two updated parameters, you can find that only the encoder is updated, the decoder is not updated.
- fine-tune:
for param in model.parameters():
param.requires_grad = True
Also output the updated parameters twice, you can find that the decoder parameters have also been updated. over!