Article Directory
Computation in Pytorch
Record some usage methods of Tensor and Parameter in pytorch.
Parameter
Parameter Access
The parameters of each layer are stored in Parameter
the attributes of the corresponding class. eg. weight
, bias
.
# 查看所有的参数和对应属性名称
print(net.state_dict())
net[2].bias # Parameter Class
net[2].bias.data = torch.randn(3,3) # the numerical value
Note that if you want to directly manipulate the value, should be through
.data
the assignment changes. (Or if you want to tensor auto_grad the assignment, and needs to avoid inplace error, should adopt.data
evaluation method)
All Paramters at once
nn.Module
Provided .named_parameters()
method returns an iterator for traversing all the parameters used (such as the initialization parameter, whether to set a gradient).
for name, param in net.named_parameters():
# name 是相应的 attribute 的名字
# parameter operation
param.data = ...
# you can also use name to directly access parameter
net.state_dict()[name].data # eg. name = '2.bias'
Parameter Initialization
def init_normal(m):
"""
判断传入 m 的类型,然后据此初始化模型。
"""
if type(m) == nn,Linear:
nn.init.normal_(m.weight, std=0.01)
def my_init(m):
if isinstance(m, nn.Linear):
nn.init.uniform_(m.weight, -10, 10)
m.weight.data *= m.weight.data.abs() >= 5 # `.data` 操作
net.apply(init_normal)
Parameter sharing
Some modules in the neural network share the same parameters. Since pytorch is building a computational graph, it essentially uses the same object for multiple operations.
shared = nn.Linear(8, 8)
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
shared, nn.ReLU(),
shared, nn.ReLU())
When calculating gradients for the second and third layers, the actual gradient (only one copy) in the memory is the sum of the two, because pytorch calculates the gradient based on the principle of addition and accumulation.
Layer
Customize the network layer to achieve acquisition, initialization, saving, loading, and sharing of parameters. The specific parameter creation can be seen in the initialization function of each module.
class MyLinear(nn.Module):
def __init__(self, in_units, out_units):
super().__init__()
self.weight = nn.Parameter(torch.randn(in_uints, out_units))
self.bias = nn.Parameter(torch.randn(out_uints,))
def forward(self, x):
linear = torch.matmul(x, self.weight.data) + self.bias.data
return linear
reference:
- d2l