pytorch 冻结层操作 + 学习率超参数设置

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/jacke121/article/details/86261105

pytorch finetune冻结层操作 + 学习率超参数设置

原文:https://blog.csdn.net/jdzwanghao/article/details/83239111

1、冻结层不参与训练方法:

######### 模型定义 #########

class MyModel(nn.Module):

def __init__(self, feat_dim): # input the dim of output fea-map of Resnet:

super(MyModel, self).__init__()

BackBone = models.resnet50(pretrained=True)

  1.  
  2. add_block = []

  3. add_block += [nn.Linear(2048, 512)]

  4. add_block += [nn.LeakyReLU(inplace=True)]

  5. add_block = nn.Sequential(*add_block)

  6. add_block.apply(weights_init_xavier)

  7.  
  8. self.BackBone = BackBone

  9. self.add_block = add_block

  10.  
  11.  
  12. def forward(self, input): # input is 2048!

  13.  
  14. x = self.BackBone(input)

  15. x = self.add_block(x)

  16.  
  17. return x

  18. ##############################

  19.  
  20. # 模型准备

  21. model = MyModel()

  22.  
  23. # 优化、正则项、权重设置与冻结层

  24.  
  25. for param in model.parameters():

  26. param.requires_grad = False

  27. for param in model.add_block.parameters():

  28. param.requires_grad = True

  29.  
  30. optimizer = optim.SGD(

  31. filter(lambda p: p.requires_grad, model.parameters()), # 记住一定要加上filter(),不然会报错

  32. lr=0.01,

  33. weight_decay=1e-5, momentum=0.9, nesterov=True)

  34.  
  35.  
  36.  
  37. ignored_params = list(map(id, model.add_block.parameters()))

  38. base_params = filter(lambda p: id(p) not in ignored_params, model.parameters())

2、各层采用不同学习率方法

 
  1. ######### 模型定义 #########

  2. class MyModel(nn.Module):

  3. def __init__(self, feat_dim): # input the dim of output fea-map of Resnet:

  4. super(MyModel, self).__init__()

  5.  
  6. BackBone = models.resnet50(pretrained=True)

  7.  
  8. add_block = []

  9. add_block += [nn.Linear(2048, 512)]

  10. add_block += [nn.LeakyReLU(inplace=True)]

  11. add_block = nn.Sequential(*add_block)

  12. add_block.apply(weights_init_xavier)

  13.  
  14. self.BackBone = BackBone

  15. self.add_block = add_block

  16.  
  17.  
  18. def forward(self, input): # input is 2048!

  19.  
  20. x = self.BackBone(input)

  21. x = self.add_block(x)

  22.  
  23. return x

  24. ##############################

  25.  
  26. # 模型准备

  27. model = MyModel()

  28.  
  29. # 不同层学习率设置

  30.  
  31. ignored_params = list(map(id, model.add_block.parameters()))

  32. base_params = filter(lambda p: id(p) not in ignored_params, model.parameters())

  33.  
  34. optimizer = optim.SGD(

  35. [

  36. {'params': base_params, 'lr': 0。01},

  37. {'params': model.add_block.parameters(), 'lr': 0.1},

  38. ]

  39. weight_decay=1e-5, momentum=0.9, nesterov=True)

3、调整学习率衰减

方法一:使用torch.optim.lr_scheduler()函数:

 
  1. ####################

  2. # model structure

  3. #-------------------

  4. model = Mymodel()

  5. if use_gpu:

  6. model = model.cuda()

  7.  
  8. ####################

  9. # loss

  10. #-------------------

  11. criterion = nn.CrossEntropyLoss()

  12.  
  13. ####################

  14. # optimizer

  15. #-------------------

  16. ignored_params = list(map(id, model.ViewModel.viewclassifier.parameters())) + list(map(id, model.Block.parameters()))

  17. base_params = filter(lambda p: id(p) not in ignored_params, model.parameters())

  18. optimizer_ft = optim.SGD([

  19. {'params': base_params, 'lr': 0.01},

  20. {'params': model.ViewModel.viewclassifier.parameters(), 'lr': 0.001},

  21. {'params': model.Block.parameters(), 'lr': 0.01}

  22. ], weight_decay=1e-3, momentum=0.9, nesterov=True)

  23.  
  24.  
  25. ####################

  26. #** Set lr_decay **

  27. #-------------------

  28. exp_lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=60, gamma=0.1)

  29.  
  30. scheduler.step() # put it before model.train(True)

  31. model.train(True) # Set model to training mode

  32.  
  33. ....

方法二:使用optimizer.param_groups方法。(好处:能分别设定不同层的衰减率!)

 
  1. ####################

  2. # model structure

  3. #-------------------

  4. model = Mymodel()

  5. if use_gpu:

  6. model = model.cuda()

  7.  
  8. ####################

  9. # loss

  10. #-------------------

  11. criterion = nn.CrossEntropyLoss()

  12.  
  13. ####################

  14. # optimizer

  15. #-------------------

  16. ignored_params = list(map(id, model.ViewModel.viewclassifier.parameters())) + list(map(id, model.Block.parameters()))

  17. base_params = filter(lambda p: id(p) not in ignored_params, model.parameters())

  18. optimizer_ft = optim.SGD([

  19. {'params': base_params, 'lr': 0.01},

  20. {'params': model.ViewModel.viewclassifier.parameters(), 'lr': 0.001},

  21. {'params': model.Block.parameters(), 'lr': 0.03}],

  22. weight_decay=1e-3, momentum=0.9, nesterov=True)

  23.  
  24.  
  25. ####################

  26. #** Set lr_decay **

  27. #-------------------

  28.  
  29. def adjust_lr(epoch):

  30. step_size = 60

  31. lr = args.lr * (0.1 ** (epoch // 30))

  32. for g in optimizer.param_groups:

  33. g['lr'] = lr * g.get('lr')

  34.  
  35. ######################################

  36. ### optimizer.param_groups 类型与内容

  37. [

  38. { 'params': base_params, 'lr': 0.01, 'momentum': 0.9, 'dampening': 0,

  39. 'weight_decay': 0.001, 'nesterov': True, 'initial_lr': 0.01 },

  40. { 'params': model.ViewModel.viewclassifier.parameters(), 'lr': 0.001,

  41. 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0.001, 'nesterov': True,

  42. 'initial_lr': 0.001 },

  43. { 'params': model.Block.parameters(), 'lr': 0.03, 'momentum': 0.9,

  44. 'dampening': 0, 'weight_decay': 0.001, 'nesterov': True, 'initial_lr':

  45. 0.03 }

  46. ]

  47. ### optimizer.param_groups 类型与内容

  48. ######################################

  49.  
  50.  
  51. for epoch in range(start_epoch, args.epochs):

  52. adjust_lr(epoch) # 每epoch更新一次。

  53. model.train(True) # Set model to training mode

  54. ....

补充知识:python中的字典方法.get()

猜你喜欢

转载自blog.csdn.net/jacke121/article/details/86261105