Start from scratch pytorch (X): Parameter access model / initialization / share

Initialization and shared access model parameters

Parameter access

Access parameters: two methods described below. These two methods are implemented in the class nn.Module. Subclasses that inherit from this class have the same method.

  • .parameters()
  • .named_parameters()
import torch
from torch import nn
from torch.nn import init

net = nn.Sequential(nn.Linear(4, 3), nn.ReLU(), nn.Linear(3, 1))  # pytorch已进行默认初始化

print(type(net.named_parameters()))
for name, param in net.named_parameters():
    print(name, param.size())

Export

<class 'generator'>
0.weight torch.Size([3, 4])
0.bias torch.Size([3])
2.weight torch.Size([1, 3])
2.bias torch.Size([1])

Visible return automatically add the name as a prefix to the number of layers of index.
Let us access netparameters in a single layer. Using Sequentialthe neural network-based configuration, we can brackets []to access any network layer. Index 0 indicates the hidden layer is a Sequentiallayer added to the first example.

for name, param in net[0].named_parameters():
    print(name, param.size(), type(param))

Output:

weight torch.Size([3, 4]) <class 'torch.nn.parameter.Parameter'>
bias torch.Size([3]) <class 'torch.nn.parameter.Parameter'>

Because there is a single layer of layers so there is no prefix index. In addition to return paramtype is torch.nn.parameter.Parameter, in fact, this is a Tensorsub-class, and the Tensordifference is that if one Tensoris Parameter, then it will automatically be added to the list of parameters in the model, look at the following example.

class MyModel(nn.Module):
    def __init__(self, **kwargs):
        super(MyModel, self).__init__(**kwargs)
        self.weight1 = nn.Parameter(torch.rand(20, 20))
        self.weight2 = torch.rand(20, 20)
    def forward(self, x):
        pass
    
n = MyModel()
for name, param in n.named_parameters():
    print(name)

Output:

weight1

The above code weight1in the parameter list, but weight2no one in the parameter list.

As Parameteris Tensor, i.e., Tensorhas a property that it has, such as may databe accessed parameter values, with gradaccess to the gradient parameter.

weight_0 = list(net[0].parameters())[0]
print(weight_0.data)
print(weight_0.grad) # 反向传播前梯度为None
Y.backward()
print(weight_0.grad)

Output:

tensor([[ 0.2719, -0.0898, -0.2462,  0.0655],
        [-0.4669, -0.2703,  0.3230,  0.2067],
        [-0.2708,  0.1171, -0.0995,  0.3913]])
None
tensor([[-0.2281, -0.0653, -0.1646, -0.2569],
        [-0.1916, -0.0549, -0.1382, -0.2158],
        [ 0.0000,  0.0000,  0.0000,  0.0000]])

Initialization parameters

Typically the various layer, pytorch has achieved good reasonable default initialized, we do not need to worry about. (Initialization method which different types of samples can be layer specific reference source code ).

If you want to initialize its own weight, then through all the parameters of the net, to perform, initialization strategy. For example, in the following example, we will initialize the weight parameter to zero mean and standard deviation of normally distributed random number 0.01, and still will offset parameter is cleared.

for name, param in net.named_parameters():
    if 'weight' in name:
        init.normal_(param, mean=0, std=0.01)
        print(name, param.data)
    elif 'bias' in name:
        init.constant_(param,0)
        print(name, param.data)

Above using torch.nn.init comes with initialization method, you can own initialization method to achieve a satisfying their needs.
Let's take a look at PyTorch is how to achieve these initialization methods, such as torch.nn.init.normal_:

def normal_(tensor, mean=0, std=1):
    with torch.no_grad():
        return tensor.normal_(mean, std)

It can be seen that this is a change inplace Tensorfunction value, and this process is not recorded gradient.
We like to implement a custom initialization method. In the following example, we let the probability of the weights is initialized to 0 half, the other half has initialized probability \ ([- 10 -5] \) and \ ([5,10] \) two uniformly distributed in the interval random number.

def init_weight_(tensor):
    with torch.no_grad():
        tensor.uniform_(-10, 10)
        tensor *= (tensor.abs() >= 5).float()

for name, param in net.named_parameters():
    if 'weight' in name:
        init_weight_(param)
        print(name, param.data)

Since init_weight_ (), changing these param, declares torch.no_grad(), so we can change these parameters datato rewrite the model parameter values without compromising gradient:

for name, param in net.named_parameters():
    if 'bias' in name:
        param.data += 1
        print(name, param.data)

Output:

0.bias tensor([1., 1., 1.])
2.bias tensor([1.])

Parameter sharing

In some cases, we want to share the model parameters between multiple layers. Bowen mentioned before how to share the model parameters: Moduleclass of forwardfunction calls in the same multiples. In addition, if we pass Sequentialthe module is the same Module, then also shared examples of parameters , let's look at an example:

linear = nn.Linear(1, 1, bias=False)
net = nn.Sequential(linear, linear) 
print(net)
for name, param in net.named_parameters():
    init.constant_(param, val=3)
    print(name, param.data)

Output:

Sequential(
  (0): Linear(in_features=1, out_features=1, bias=False)
  (1): Linear(in_features=1, out_features=1, bias=False)
)
0.weight tensor([[3.]])

Since in the memory, the linear two layers fact an object:

print(id(net[0]) == id(net[1]))
print(id(net[0].weight) == id(net[1].weight))

Output:

True
True

Because the gradient contains the model parameters, so in the back propagation calculations, shared gradient parameters are accumulated:

x = torch.ones(1, 1)
y = net(x).sum()
print(y)
y.backward()
print(net[0].weight.grad) # 单次梯度是3,两次所以就是6

Output:

tensor(9., grad_fn=<SumBackward0>)
tensor([[6.]])

By comparison is

linear1 = nn.Linear(1, 1, bias=False)
linear2 = nn.Linear(1, 1, bias=False)
net = nn.Sequential(linear1, linear2) 
print(net)
for name, param in net.named_parameters():
    init.constant_(param, val=3)
    print(name, param.data)

x = torch.ones(1, 1)
y = net(x).sum()
print(y)
y.backward()
print(net[0].weight.grad) 

Export

Sequential(
  (0): Linear(in_features=1, out_features=1, bias=False)
  (1): Linear(in_features=1, out_features=1, bias=False)
)
0.weight tensor([[3.]])
1.weight tensor([[3.]])

tensor(9., grad_fn=<SumBackward0>)
tensor([[3.]])

It can be seen here linear1 linear2 not the same object, and so there are two parameters of the net after backpropagation, net [0] .weight.grad is tensor ([[3.]])

Guess you like

Origin www.cnblogs.com/sdu20112013/p/12134330.html