代码:
optimizer = torch.optim.SGD([
{'params': paras_wo_bn + [kernel], 'weight_decay': 5e-4},
{'params': paras_only_bn}
], lr=config['lr'], momentum=config['momentum'], weight_decay=config['weight_decay'])
报错:
Traceback (most recent call last):
File "/home/use1/test_0708.py", line 364, in <module>
main()
File "/home/use1/test_0708.py", line 168, in main
], lr=config['lr'], momentum=config['momentum'], weight_decay=config['weight_decay'])
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/optim/sgd.py", line 64, in __init__
super(SGD, self).__init__(params, defaults)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/optim/optimizer.py", line 50, in __init__
self.add_param_group(param_group)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/optim/optimizer.py", line 215, in add_param_group
raise ValueError("some parameters appear in more than one parameter group")
ValueError: some parameters appear in more than one parameter group
上述这个报错纠结我很长时间,一直导致我都不敢把模型放大GPU。而且还一直单卡训练,。,。,醉
解决:在一次模型构建的训练调试中复现并解决了这个问题
把GPU多卡并行的这句model = torch.nn.DataParallel(model).cuda()
放在加载参数之后就行了。
正确的顺序:
# for arcface 先加载参数
paras_only_bn, paras_wo_bn = arcface.separate_bn_paras(model)
embedding_size, classnum = 512, 51332,
kernel = nn.Parameter(torch.Tensor(embedding_size, classnum))
kernel.data.uniform_(-1, 1).renorm_(2, 1, 1e-5).mul_(1e5)
optimizer = torch.optim.SGD([
{'params': paras_wo_bn + [kernel], 'weight_decay': 5e-4},
{'params': paras_only_bn}
], lr=lr, momentum=0.9, weight_decay=1.0e-4)
# 后放到多GPU
model = torch.nn.DataParallel(model).cuda()