pytorch-parameter management
overview
Our goal is to find the model parameter values that minimize the loss function. After training, we will need to use these parameters to make future predictions. Also, sometimes we wish to extract parameters in order to reuse them in other environments, save the model so it can be executed in other software, or examine it for scientific understanding.
# 创建一个单隐藏层的MLP
import torch
from torch import nn
net = nn.Sequential(nn.Linear(4,8),nn.ReLU(),nn.Linear(8,1))
X = torch.rand(size = (2,4))
net(X)
parameter access
# 参数访问 全连接层包含两个参数 分别是该层的权重和偏置 两者都为存储单精度浮点数
print(net[2].state_dict())
print(type(net[2].bias))
print(net[2].bias)
print(net[2].bias.data)
# 一次性访问所有参数
print(*[(name,param.shape) for name,param in net[0].named_parameters()])
print(*[(name,param.shape) for name,param in net.named_parameters()])
Nested block collection parameters
def block1():
return nn.Sequential(nn.Linear(4,8),nn.ReLU(),
nn.Linear(8,4),nn.ReLU())
def block2():
net = nn.Sequential()
for i in range(4):
net.add_module(f'block{
i}',block1())
return net
# 块和层之间进行组合
rgnet = nn.Sequential(block2(),nn.Linear(4,1))
rgnet(X)
Access the first level offset of the second sub-block in the first main block
parameter initialization
pytorch uniformly initializes the weight and bias matrix according to a range. This range is calculated according to the input and output dimensions. The Pytorch.init module provides a variety of preset initialization methods.
built-in initialization
The following code initializes all weight parameters as Gaussian random variables with a standard deviation of 0.01 and sets the bias parameter to 0
def init_normal(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight,mean = 0,std = 0.01)
nn.init.zeros_(m.bias)
net.apply(init_normal)
net[0].weight.data[0],net[0].bias.data[0]
All parameters can be initialized to 1
def init_constant(m):
if type(m) == nn.Linear:
nn.init.constant_(m.weight,1)
nn.init.zeros_(m.bias)
net.apply(init_constant)
net[0].weight.data[0],net[0].bias.data[0]
Initialize for different blocks
def init_xavier(m):
if type(m) == nn.Linear:
nn.init.xavier_uniform_(m.weight)
def init_42(m):
if type(m) == nn.Linear:
nn.init.constant_(m.weight,42)
net[0].apply(init_xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data)
custom initialization
def my_init(m):
if type(m) == nn.Linear:
print("Init", *[(name, param.shape)
for name, param in m.named_parameters()][0])
nn.init.uniform_(m.weight, -10, 10)
m.weight.data *= m.weight.data.abs() >= 5
net.apply(my_init)
net[0].weight[:2]
parameter sharing
The third and fourth layers share a parameter
shared = nn.Linear(8,8)
net = nn.Sequential(nn.Linear(4,8),nn.ReLU(),
shared,nn.ReLU(),
shared,nn.ReLU(),
nn.Linear(8,1))
net(X)
print(net[2].weight.data[0] == net[4].weight.data[0])