PyTorch | loaded model / parameter initialization / Finetune

First, the model save / load

1.1 All model parameters

Training process, sometimes for various reasons stopped training, this time we need to pay attention to the training process model will save each round epoch of (usually save the best model with the current round model). General use pytorch inside the recommended method of preservation. This method is stored parameters of the model.

# Save the model to checkpoint.pth.tar 
torch.save (model.module.state_dict (), 'checkpoint.pth.tar')

Loading model corresponding method (this method need to deserialize model dictionary acquisition parameters, it is necessary to load the model, then load_state_dict):

mymodel.load_state_dict(torch.load(‘checkpoint.pth.tar’))

With the above preservation, is an example of how to use the inference AND / OR resume train.

# Saved state of the model, some parameters may be provided, may be used subsequent 
State = { ' Epoch ' : + Epoch. 1, # current number of rounds stored 
         ' state_dict ' : mymodel.state_dict (), # trained parameter 
         ' Optimizer ' : optimizer.state_dict (), # optimization parameters, for a subsequent Resume 
         ' best_pred ' : best_pred # current best accuracy 
          , ...., ...} 

# saved model to checkpoint.pth.tar 
torch.save (State , 'checkpoint.pth.tar')
 # If is best, then copy the past 
IF is_best: 
    shutil.copyfile (filename, Directory + 'model_best.pth.tar ' ) 

the checkpoint = torch.load ( ' model_best.pth.tar ' ) 
model.load_state_dict (the checkpoint [ ' state_dict ' ]) # model parameters 
optimizer.load_state_dict (the checkpoint [ ' Optimizer ' ]) # optimization parameter 
epoch checkpoint = [ ' Epoch ' ] # Epoch, can be used to update the learning rate, etc. 

# With these things, you can continue retraining, and there is no need to worry about re-training programs to stop. 
Train / eval 
.... 
....

1.2 parts of the model parameters

In many cases, we have been trained to load a model, while the trained model may not be exactly the same as we define the model, and we just want to use the same parameters of those layers.

There are several solutions:

(1) directly trained model began to build their own model, the trained model is loaded first, and then define its own model based on;

model_ft = models.resnet18(pretrained=use_pretrained)
self.conv1 = model_ft.conv1
self.bn = model_ft.bn
... ...


(2) own definition of a good model, direct load model

# The first method: 
mymodelB = TheModelBClass (* args, ** kwargs)
 # strict = False, set to false, but retain the same key parameters 
mymodelB.load_state_dict (model_zoo.load_url (model_urls [ ' resnet18 ' ]), strict = False) 

# the second method: 
# load the model 
model_pretrained = models.resnet18 (pretrained = use_pretrained) 

# MyModel's state_dict, 
# such as: conv1.weight 
#      conv1.bias   
mymodelB_dict = mymodelB.state_dict () 

# will be built with self model_pretrained defining the model built by comparing different culling 
pretrained_dict = {K: V for K, V inmodel_pretrained.items () IF k in mymodelB_dict}
 # update existing model_dict 
mymodelB_dict.update (pretrained_dict) 

# load state_dict we really need 
mymodelB.load_state_dict (mymodelB_dict) 

# Method 2 may be more intuitive

Second, initialization parameters

The second problem is the problem of initialization parameters, to use in a lot of code which will, after all, not all are all pre-training parameter. Then you need to pre-training parameter is not initialized. Each Tensor pytorch inside of the package is actually Variabl comprising data, grad interfaces, these interfaces can be assigned directly. There are also providing pytorch how to other frameworks (caffe / tensorflow / mxnet / gluonCV etc.) trained model parameters are directly assigned to the fact that the direct assignment of the data.

pytorch provides a method of initialization parameters:

 def weight_init(m):
    if isinstance(m,nn.Conv2d):
        n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
        m.weight.data.normal_(0,math.sqrt(2./n))
    elif isinstance(m,nn.BatchNorm2d):
        m.weight.data.fill_(1)
        m.bias.data.zero_()

But in general if there is no great demand for initialization parameters, there is no problem (in the case of uncertainty whether affect performance), internal pytorch is the default initialization parameters.

Third, the model trimming / Finetune

Finally, fine-tuning, usually do the experiment, with at least backbone model pre-trained, which is used as a feature extractor, or on top of it do fine-tuning.

When used for feature extraction, feature extraction section requires no learning parameters, and provides pytorch requires_grad into parameters to determine whether the gradient calculation, i.e. whether to update the parameters. In the following minist example, as feature extraction used resnet18:

# Load pre-training model 
Model = torchvision.models.resnet18 (pretrained = True) 

# through each parameter, set it to not update parameters that do not learn 
for param in model.parameters (): 
    param.requires_grad = False 

# will full connection layer 10 to the desired class mnist, Note: requires_grad default after such change True 
model.fc nn.Linear = (512, 10 ) 

# optimization 
optimizer = optim.SGD (model.fc.parameters (), lr = 1e-2, momentum = 0.9          )

When used in the global fine-tuning, we generally need to set up different layers of different learning rate, pre-trained layer of smaller learning rate, other layers bigger. This is how to do it?

# Loading pretraining Model 
Model = torchvision.models.resnet18 (pretrained = True) 
model.fc = nn.Linear (512, 10 ) 

# Reference: HTTPS: //blog.csdn.net/u012759136/article/details/65634477 
ignored_params = List (Map (ID, model.fc.parameters ())) 
base_params = filter ( the lambda P: ID (P) Not  in ignored_params, model.parameters ()) 

# of different parameters of different learning rate 
params_list = [{ ' the params ' : base_params, ' LR ' : from 0.001 },] 
params_list.append ({ ' the params ': model.fc.parameters(), 'lr': 0.01})

optimizer = torch.optim.SGD(params_list,
                    0.001,
                    momentum=args.momentum,
                    weight_decay=args.weight_decay)

 

reference:

https://zhuanlan.zhihu.com/p/48524007

https://zhuanlan.zhihu.com/p/38056115

Guess you like

Origin www.cnblogs.com/geo-will/p/11311608.html