Optimizer

`optimizer.param_groups`Example analysis of usage

Date: July 25, 2022

pytorch version: 1.11.0

For `param_groups`the exploration

optimizer.param_groups: is a list whose elements are dictionaries;

optimizer.param_groups[0]: A dictionary with a length of 7, including [' params ', ' lr ', ' betas ', ' eps ', ' weight_decay ', ' amsgrad ', ' maximize '] these 7 parameters;

The Adam optimizer used below creates a optimizervariable:

>>> optimizer.param_groups[0].keys()
>>> dict_keys(['params', 'lr', 'betas', 'eps', 'weight_decay', 'amsgrad', 'maximize'])

You can assign different learning rates to the training parameters, so that there will be more than one element in the list, but multiple dictionaries.

paramsIs a list[…], which stores parameters

>>> len(optimizer.param_groups[0]['params'])
>>> 48
>>> optimizer.param_groups[0]['params'][0]
>>> 
Parameter containing:
tensor([[ 0.0212, -0.1151,  0.0499,  ..., -0.0807, -0.0572,  0.1166],
        [-0.0356, -0.0397, -0.0980,  ...,  0.0690, -0.1066, -0.0583],
        [ 0.0238,  0.0316, -0.0636,  ...,  0.0754, -0.0891,  0.0258],
        ...,
        [ 0.0603, -0.0173,  0.0627,  ...,  0.0152, -0.0215, -0.0730],
        [-0.1183, -0.0636,  0.0381,  ...,  0.0745, -0.0427, -0.0713],

lris the learning rate

>>> optimizer.param_groups[0]['lr']
>>> 0.0005

betasis a tuple (...), associated with the momentum

>>> optimizer.param_groups[0]['betas']
>>> (0.9, 0.999)

eps

>>> optimizer.param_groups[0]['eps']
>>> 1e-08

weight_decayis an int variable

>>> optimizer.param_groups[0]['weight_decay']
>>> 0

amsgradis a bool variable

>>> optimizer.param_groups[0]['amsgrad']
>>> False

maximizeis a bool variable

>>> optimizer.param_groups[0]['maximize']
>>> False

Continue experimenting with examples from the Internet:

import torch
import torch.optim as optim


w1 = torch.randn(3, 3)
w1.requires_grad = True
w2 = torch.randn(3, 3)
w2.requires_grad = True
o = optim.Adam([w1])
print(o.param_groups)

# 输出
>>> 
[{
    
    'params': [tensor([[-0.1002,  0.3526, -1.2212],
        			 [-0.4659,  0.0498, -0.2905],
        			 [ 1.1862, -0.6085,  0.4965]], requires_grad=True)],
  'lr': 0.001, 
  'betas': (0.9, 0.999),
  'eps': 1e-08,
  'weight_decay': 0,
  'amsgrad': False,
  'maximize': False}]

The following are the main methods Optimizerof this classadd_param_group

# Per the docs, the add_param_group method accepts a param_group parameter that is a dict. Example of use:

import torch
import torch.optim as optim


w1 = torch.randn(3, 3)
w1.requires_grad = True
w2 = torch.randn(3, 3)
w2.requires_grad = True
o = optim.Adam([w1])
print(o.param_groups)

# 输出
>>> [{
    
    'params': [tensor([[-1.5916, -1.6110, -0.5739],
        [ 0.0589, -0.5848, -0.9199],
        [-0.4206, -2.3198, -0.2062]], requires_grad=True)], 'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False, 'maximize': False}]


o.add_param_group({
    
    'params': w2})
print(o.param_groups)

# 输出
>>> [{
    
    'params': [tensor([[-1.5916, -1.6110, -0.5739],
        [ 0.0589, -0.5848, -0.9199],
        [-0.4206, -2.3198, -0.2062]], requires_grad=True)], 'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False, 'maximize': False}, 
     {
    
    'params': [tensor([[-0.5546, -1.2646,  1.6420],
        [ 0.0730, -0.0460, -0.0865],
        [ 0.3043,  0.4203, -0.3607]], requires_grad=True)], 'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False, 'maximize': False}]

How to dynamically modify the learning rate when writing code (routine operation)

for param_group in optimizer.param_groups:
    param_group["lr"] = lr

Supplement: Summary of optimizers in pytorch

Take the SGD optimizer as an example:

from torch import nn as nn
import torch as t
from torch.autograd import Variable as V
from torch import optim  # 优化器

# 定义一个LeNet网络
class LeNet(t.nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.features = t.nn.Sequential(
            t.nn.Conv2d(3, 6, 5),
            t.nn.ReLU(),
            t.nn.MaxPool2d(2, 2),
            t.nn.Conv2d(6, 16, 5),
            t.nn.ReLU(),
            t.nn.MaxPool2d(2, 2)
        )
        # 由于调整shape并不是一个class层，
        # 所以在涉及这种操作（非nn.Module操作）需要拆分为多个模型
        self.classifiter = t.nn.Sequential(
            t.nn.Linear(16*5*5, 120),
            t.nn.ReLU(),
            t.nn.Linear(120, 84),
            t.nn.ReLU(),
            t.nn.Linear(84, 10)
        )
        
    def forward(self, x):
        x = self.features(x)
        x = x.view(-1, 16*5*5)
        x = self.classifiter(x)
        return x

net = LeNet()

# 通常的step优化过程
optimizer = optim.SGD(params=net.parameters(), lr=1)
optimizer.zero_grad()  # 梯度清零，相当于net.zero_grad()

input = V(t.randn(1, 3, 32, 32))
output = net(input)
output.backward(output)  
optimizer.step()  # 执行优化

For different sub-network parameters with different learning rates, finetune is commonly used, so that the classifier learning rate parameters are higher and the learning speed is faster (in theory).

1. Set the learning rate through the divided modules when building the network,

# 为不同子网络设置不同的学习率，在finetune中经常用到
# 如果对某个参数不指定学习率，就使用默认学习率
optimizer = optim.SGD(
    [{
    
    'params': net.features.parameters()},  # 学习率为1e-5
     {
    
    'params': net.classifiter.parameters(), 'lr': 1e-2}], lr=1e-5
)

2. Group by network layer object and set the learning rate

# 只为两个全连接层设置较大的学习率，其余层的学习率较小
# 以层为单位，为不同层指定不同的学习率

# 提取指定层对象
special_layers = nn.ModuleList([net.classifiter[0], net.classifiter[3]])
# 获取指定层参数id
special_layers_params = list(map(id, special_layers.parameters()))
# 获取非指定层的参数id
base_params = filter(lambda p: id(p) not in special_layers_params, net.parameters())

optimizer = t.optim.SGD([
    {
    
    'params': base_params},
    {
    
    'params': special_layers.parameters(), 'lr': 0.01}], lr=0.001)

Reference:
https://blog.csdn.net/weixin_43593330/article/details/108490956
https://www.cnblogs.com/hellcat/p/8496727.html
https://www.yisu.com/zixun/456082. html

Example analysis on optimizer.param_groups usage

Optimizer

`optimizer.param_groups`Example analysis of usage

For `param_groups`the exploration

How to dynamically modify the learning rate when writing code (routine operation)

Supplement: Summary of optimizers in pytorch

Guess you like

Example analysis on optimizer.param_groups usage

Optimizer

optimizer.param_groupsExample analysis of usage

For param_groupsthe exploration

How to dynamically modify the learning rate when writing code (routine operation)

Supplement: Summary of optimizers in pytorch

Guess you like

`optimizer.param_groups`Example analysis of usage

For `param_groups`the exploration