Computation in Pytorch

Computation in Pytorch

Record some usage methods of Tensor and Parameter in pytorch.

Parameter

Parameter Access

The parameters of each layer are stored in Parameterthe attributes of the corresponding class. eg. weight, bias.

# 查看所有的参数和对应属性名称
print(net.state_dict())

net[2].bias # Parameter Class
net[2].bias.data = torch.randn(3,3) # the numerical value

Note that if you want to directly manipulate the value, should be through .datathe assignment changes. (Or if you want to tensor auto_grad the assignment, and needs to avoid inplace error, should adopt .dataevaluation method)

All Paramters at once

nn.ModuleProvided .named_parameters()method returns an iterator for traversing all the parameters used (such as the initialization parameter, whether to set a gradient).

for name, param in net.named_parameters():
	# name 是相应的 attribute 的名字
	# parameter operation
	param.data = ...

# you can also use name to directly access parameter
net.state_dict()[name].data # eg. name = '2.bias'

Parameter Initialization

def init_normal(m):
	"""
	判断传入 m 的类型,然后据此初始化模型。
	"""
	if type(m) == nn,Linear:
		nn.init.normal_(m.weight, std=0.01)

def my_init(m):
	if isinstance(m, nn.Linear):
		nn.init.uniform_(m.weight, -10, 10)
		m.weight.data *= m.weight.data.abs() >= 5 # `.data` 操作

net.apply(init_normal)

Parameter sharing

Some modules in the neural network share the same parameters. Since pytorch is building a computational graph, it essentially uses the same object for multiple operations.

shared = nn.Linear(8, 8)
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
					shared, nn.ReLU(),
					shared, nn.ReLU())

When calculating gradients for the second and third layers, the actual gradient (only one copy) in the memory is the sum of the two, because pytorch calculates the gradient based on the principle of addition and accumulation.

Layer

Customize the network layer to achieve acquisition, initialization, saving, loading, and sharing of parameters. The specific parameter creation can be seen in the initialization function of each module.

class MyLinear(nn.Module):
	def __init__(self, in_units, out_units):
		super().__init__()
		self.weight = nn.Parameter(torch.randn(in_uints, out_units))
		self.bias = nn.Parameter(torch.randn(out_uints,))
		
	def forward(self, x):
		linear = torch.matmul(x, self.weight.data) + self.bias.data
		return linear

reference:

  1. d2l

Guess you like

Origin blog.csdn.net/lib0000/article/details/113916142