pytorch-parameter management

overview

Our goal is to find the model parameter values that minimize the loss function. After training, we will need to use these parameters to make future predictions. Also, sometimes we wish to extract parameters in order to reuse them in other environments, save the model so it can be executed in other software, or examine it for scientific understanding.

# 创建一个单隐藏层的MLP
import torch
from torch import nn

net = nn.Sequential(nn.Linear(4,8),nn.ReLU(),nn.Linear(8,1))
X = torch.rand(size = (2,4))
net(X)

parameter access

# 参数访问  全连接层包含两个参数  分别是该层的权重和偏置  两者都为存储单精度浮点数
print(net[2].state_dict())

insert image description here

print(type(net[2].bias))
print(net[2].bias)
print(net[2].bias.data)

insert image description here

# 一次性访问所有参数
print(*[(name,param.shape) for name,param in net[0].named_parameters()])
print(*[(name,param.shape) for name,param in net.named_parameters()])

insert image description here

Nested block collection parameters


def block1():
    return nn.Sequential(nn.Linear(4,8),nn.ReLU(),
                         nn.Linear(8,4),nn.ReLU())

def block2():
    net = nn.Sequential()
    for i in range(4):
        net.add_module(f'block{
      
      i}',block1())

    return net

#  块和层之间进行组合
rgnet = nn.Sequential(block2(),nn.Linear(4,1))
rgnet(X)

insert image description here

Access the first level offset of the second sub-block in the first main block
insert image description here

parameter initialization

pytorch uniformly initializes the weight and bias matrix according to a range. This range is calculated according to the input and output dimensions. The Pytorch.init module provides a variety of preset initialization methods.

built-in initialization

The following code initializes all weight parameters as Gaussian random variables with a standard deviation of 0.01 and sets the bias parameter to 0

def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight,mean = 0,std = 0.01)
        nn.init.zeros_(m.bias)

net.apply(init_normal)
net[0].weight.data[0],net[0].bias.data[0]

All parameters can be initialized to 1


def init_constant(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight,1)
        nn.init.zeros_(m.bias)

net.apply(init_constant)
net[0].weight.data[0],net[0].bias.data[0]

Initialize for different blocks

def init_xavier(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)

def init_42(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight,42)

net[0].apply(init_xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data)

custom initialization

def my_init(m):
    if type(m) == nn.Linear:
        print("Init", *[(name, param.shape)
                        for name, param in m.named_parameters()][0])
        nn.init.uniform_(m.weight, -10, 10)
        m.weight.data *= m.weight.data.abs() >= 5

net.apply(my_init)
net[0].weight[:2]

parameter sharing

The third and fourth layers share a parameter

shared = nn.Linear(8,8)
net = nn.Sequential(nn.Linear(4,8),nn.ReLU(),
                    
                    shared,nn.ReLU(),
                    shared,nn.ReLU(),
                    nn.Linear(8,1))


net(X)

print(net[2].weight.data[0] == net[4].weight.data[0])

[Hands-on deep learning] pytorch-parameter management

pytorch-parameter management

overview

parameter access

Nested block collection parameters

parameter initialization

built-in initialization

custom initialization

parameter sharing

Guess you like