Table of contents
foreword
When creating a deep learning model, we often encounter nn.Sequential
these nn.ModuleList
three nn.ModuleDict
things, especially during transfer learning training. What are they, how are they used, and what precautions should be taken when using them? Take a look at this blog post.
1.nn.Module
Before introducing these three containers, we need to know what is Module
. When we create models, almost all models inherit from this class. He is the base class of all networks, used to manage the properties of the network. There are two modules associated with it: nn.Parameter
and nn.functional
. All these modules come from torch.nn
. Below we briefly introduce these modules.
1.1. nn.Parameter
The first nn.Parameter
, in Pytorch
, nn.Parameter
is a special class for creating model parameters. In a model, there are often many parameters, and it is not an easy task to manually manage these parameters. Pytorch
Generally, parameters are nn.Parameter
used to represent and nn.Module
manage all parameters under its structure.
## nn.Parameter 具有 requires_grad = True 属性
w = nn.Parameter(torch.randn(2,2))
print(w) # tensor([[ 0.3544, -1.1643],[ 1.2302, 1.3952]], requires_grad=True)
print(w.requires_grad) # True
## nn.ParameterList 可以将多个nn.Parameter组成一个列表
params_list = nn.ParameterList([nn.Parameter(torch.rand(8,i)) for i in range(1,3)])
print(params_list)
print(params_list[0].requires_grad)
## nn.ParameterDict 可以将多个nn.Parameter组成一个字典
params_dict = nn.ParameterDict({
"a":nn.Parameter(torch.rand(2,2)),
"b":nn.Parameter(torch.zeros(2))})
print(params_dict)
print(params_dict["a"].requires_grad)
The parameters defined above can be managed through Module:
# module.parameters()返回一个生成器,包括其结构下的所有parameters
module = nn.Module()
module.w = w
module.params_list = params_list
module.params_dict = params_dict
num_param = 0
for param in module.parameters():
print(param,"\n")
num_param = num_param + 1
print("number of Parameters =",num_param)
In actual use, nn.Module
the module class is generally constructed by inheritance, and all parts containing parameters that need to be learned are placed in the constructor.
#以下范例为Pytorch中nn.Linear的源码的简化版本
#可以看到它将需要学习的参数放在了__init__构造函数中,并在forward中调用F.linear函数来实现计算逻辑。
class Linear(nn.Module):
__constants__ = ['in_features', 'out_features']
def __init__(self, in_features, out_features, bias=True):
super(Linear, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = nn.Parameter(torch.Tensor(out_features))
else:
self.register_parameter('bias', None)
def forward(self, input):
return F.linear(input, self.weight, self.bias)
1.2. nn.functional
nn.functional
(Generally renamed as F after introduction) There are function implementations of various functional components. for example:
- Activation function series (
F.relu, F.sigmoid, F.tanh, F.softmax
)- Model Layer Series (
F.linear, F.conv2d, F.max_pool2d, F.dropout2d, F.embedding
)- loss function series (
F.binary_cross_entropy, F.mse_loss, F.cross_entropy
)
In order to facilitate the management of parameters, it is generally nn.Module
converted into a class implementation form through inheritance, and directly encapsulated under nn
the module:
- The activation function becomes (
nn.ReLu, nn.Sigmoid, nn.Tanh, nn.Softmax
)- model layer (
nn.Linear, nn.Conv2d, nn.MaxPool2d, nn.Embedding
)- loss function (
nn.BCELoss, nn.MSELoss, nn.CrossEntorpyLoss
)
nn
So the activation functions, layers, and loss functions we have established on the surface are all functional
implemented behind the scenes. If you continue to look down, you will know that nn.Module
this module is indeed very powerful. In addition to managing various parameters referenced by it, it can also manage submodules referenced by it.
1.3. nn.Module
Our focus is to introduce this nn.Module
module. nn.Module
There are many important dictionary attributes in:
self.training = True
self._parameters: Dict[str, Optional[Parameter]] = OrderedDict()
self._buffers: Dict[str, Optional[Tensor]] = OrderedDict()
self._non_persistent_buffers_set: Set[str] = set()
self._backward_hooks: Dict[int, Callable] = OrderedDict()
self._is_full_backward_hook = None
self._forward_hooks: Dict[int, Callable] = OrderedDict()
self._forward_pre_hooks: Dict[int, Callable] = OrderedDict()
self._state_dict_hooks: Dict[int, Callable] = OrderedDict()
self._load_state_dict_pre_hooks: Dict[int, Callable] = OrderedDict()
self._modules: Dict[str, Optional['Module']] = OrderedDict()
We only need to focus on two of them: _parameters
and_modules
_parameters
: Store and managenn.Parameter
attributes belonging to the class, such as weights, biasing these parameters_modules
: Storage managementnn.Module
class, such asLeNet
in the classic network, will build sub-modules, convolutional layers, pooling layers, and will be stored in _modules
Here is a question: What is the difference between nn.Parameter
and nn.Module
in _parameters
?
nn.Parameter
: It istorch.Tensor
a subclass of and is used to mark tensors as learnable parameters of the model. In the process of defining a model, we usually usenn.Parameter
to create learnable parameters as attributes of the model. The advantage of this is thatnn.Parameter
the object will be automatically registered as a parameter of the model and participate in gradient calculation and parameter update._parameters
: It isnn.Module
an attribute in the class, which is a dictionary used to store the learnable parameters of the model. The keys of the dictionary are the names of the parameters, and the values are the corresponding parameter tensors (nn.Parameter
types)._parameters
The value of the attribute automatically extracts the learnable parameters from the attributes of the model and adds them to the dictionary.
can be _parameters
thought of as a container for storing the learnable parameters of a model, and nn.Parameter
is a special class for creating and labeling these parameters. By using to nn.Parameter
create parameters and use them as model attributes, these parameters will be automatically added to _parameters
the dictionary to facilitate their unified management and operation. That is, nn.Parameter
is a special class for creating model parameters, and _parameters
is a dictionary attribute that stores model parameters. nn.Parameter
Parameters created with are automatically added _parameters
to the dictionary for easy management and access to the model's parameters.
nn.Module
What is the process of building a fabric network like? Take the following network as an example:
import torch
from torch import nn
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 1, 1, 1)
self.bn = nn.BatchNorm2d(1)
self.relu = nn.ReLU()
self.conv2 = nn.Conv2d(1, 1, 1, 1)
def forward(self, x):
x = self.conv1(x)
x = self.bn(x)
x = self.relu(x)
x = self.conv2(x)
return x
if __name__ == "__main__":
dat = torch.rand(1, 1, 10, 10)
net = Net().cuda()
The build process is as follows:
We first have a large
Module
(created aboveNet
) inheritingnn.Module
this base class, such as the one aboveNet
, and thenNet
there can be many sub-modules in it, and these sub-modules are also inherited fromnn.Module
. In theseModule
methods__init__
, they will be called first. The initialization method of the parent class performs an initialization of the properties of the parent class. Then, when building each sub-module, it is actually divided into two steps. The first step is initialization, and then the type__setattr__
judged by this methodvalue
is saved in the corresponding attribute dictionary, and then assigned to the corresponding member. Such sub-modules are constructed one by one, and the wholeNet
construction is finally completed. You can debug the specific process by yourself.
Summarize:
- One
module
can contain multiple subclassesmodule
(Net
including convolutional layers,BN
layers, activation functions)- One
module
is equivalent to one operation, andforward()
the function must be implemented (the forward of some modules needs to be rewritten by yourself, you will know when you read down)- Each
module
has many dictionaries to manage its attributes (the most commonly used is_parameters
,_modules
)
Knowing the construction process of the network, we can analyze the model created by others and extract some parts of it. For the introduction of this part, please refer to this blog post: Pytorch extracts neural network layer structure, layer parameters and custom initialization .
二. nn.Sequential
After introducing nn.Module
the modules above, let's start to introduce the container. First of all, let's look at it nn.Sequential
, nn.Sequential
which is PyTorch
a module container in , which is used to combine multiple modules in sequence. It can connect a series of modules in order to form a serial model structure. Let's take a look at how it pytorch
is implemented in , here we only look at the constructor and the forward propagation part, and the code of other parts is omitted:
class Sequential(Module):
...
def __init__(self, *args):
super(Sequential, self).__init__()
if len(args) == 1 and isinstance(args[0], OrderedDict):
for key, module in args[0].items():
self.add_module(key, module)
else:
for idx, module in enumerate(args):
self.add_module(str(idx), module)
...
def forward(self, input):
for module in self:
input = module(input)
return input
As you can see from the above code, nn.Sequential
it is inherited from Module
, and the description Sequential
itself is also one Module
, so it will also have those dictionary parameters. You can see nn.Sequential
that you have implemented forward
the method. nn.Sequential
The most commonly used methods are as follows:
forward(input)
: Defines the forward propagation process of the model. Innn.Sequential
, this method callsforward
the method of each module sequentially in the order of the modules, passing the output of the previous module as input to the next module to calculate the final output.add_module(name, module)
:nn.Sequential
Add a submodule to the .name
is the name of the submodule andmodule
is the submodule object to add. Modules will be forward-propagated sequentially in the order they were added.parameters()
: Returnsnn.Sequential
an iterator over all learnable parameters in . The learnable parameters of the model can be accessed and manipulated via iterators.zero_grad()
:nn.Sequential
Set the parameter gradients of all modules in to zero. This method is usually called before each gradient update
add_module(name, module)
Just list the above methods, which are actually quite right. This method is usually used the most to add modules. Let's see nn.Sequential
how to use it?
class Net(nn.Module):
def __init__(self, classes):
super(Net, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 6, 5),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(6, 16, 5),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),)
self.classifier = nn.Sequential(
nn.Linear(16*5*5, 120),
nn.ReLU(),
nn.Linear(120, 84),
nn.ReLU(),
nn.Linear(84, classes),)
def forward(self, x):
x = self.features(x)
x = x.view(x.size()[0], -1)
x = self.classifier(x)
return x
It can also be created using the add_module method:
import torch
import torch.nn as nn
class Net(nn.Module):
def __init__(self, classes):
super(Net, self).__init__()
self.features = nn.Sequential()
self.features.add_module('conv1', nn.Conv2d(3, 6, 5))
self.features.add_module('relu1', nn.ReLU())
self.features.add_module('pool1', nn.MaxPool2d(kernel_size=2, stride=2))
self.features.add_module('conv2', nn.Conv2d(6, 16, 5))
self.features.add_module('relu2', nn.ReLU())
self.features.add_module('pool2', nn.MaxPool2d(kernel_size=2, stride=2))
self.classifier = nn.Sequential()
self.classifier.add_module('fc1', nn.Linear(16*5*5, 120))
self.classifier.add_module('relu3', nn.ReLU())
self.classifier.add_module('fc2', nn.Linear(120, 84))
self.classifier.add_module('relu4', nn.ReLU())
self.classifier.add_module('fc3', nn.Linear(84, classes))
def forward(self, x):
x = self.features(x)
x = x.view(x.size()[0], -1)
x = self.classifier(x)
return x
Through the above network construction, we can see that forward
only one sentence is used in the function self.features(x)
to complete the execution of six sentences. The reason why it can complete this operation is due to the functions nn.Sequential
in forward
the program. During the execution of the program, the parameters will be passed to it nn.Sequential
for analysis. The specific implementation process can be debugged and observed.
Summary:
nn.Sequential
It is nn.module
a container, used to package a set of network layers in order, and has the following two characteristics:
- Sequentiality: Each network layer is constructed strictly in order. At this time, we must pay attention to the relationship between the data of the front and back layers
- Self-contained
forward()
:forward
In the self-contained,for
the forward propagation operation is performed sequentially through the loop
3. nn.ModuleList
nn.ModuleList
It is also nn.module
a container, which is used to wrap a group of network layers and call the network layer iteratively . The commonly used methods are as follows, which are very similar to the use of list:
append()
:ModuleList
add network layer afterextend()
: splicing twoModuleList
insert()
: SpecifiesModuleList
to insert the network layer at the middle position
Just look at an example, how to use nn.ModuleList
to build a network:
class ModuleListNet(nn.Module):
def __init__(self):
super(ModuleListNet, self).__init__()
self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])
def forward(self, x):
for i, linear in enumerate(self.linears):
x = linear(x)
return x
The example above creates 10 nn.Linear
modules by using list comprehensions. On the whole, it is still very simple to use, and friends who are interested in the specific implementation process can view it by debugging the code.
3. nn.ModuleDict
Let's look at nn.ModuleDict
this module again. nn.ModuleDict
It is also nn.module
a container, which is used to package a set of network layers and call the network layer by index . The commonly used methods are as follows, which are similar to the operation of dictionaries:
clear()
: emptyModuleDict
items()
: returns an iterable of key-value pairs (key-value pairs
)keys()
: returns the key of the dictionary (key
)values()
: returns the value of the dictionary (value)
pop()
: Return a pair of key-value pairs and delete them from the dictionary
Look at an example:
import torch
import torch.nn as nn
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.module_dict = nn.ModuleDict({
'conv1': nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),
'relu1': nn.ReLU(),
'conv2': nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
'relu2': nn.ReLU(),
'flatten': nn.Flatten(),
'linear': nn.Linear(128 * 32 * 32, 10)
})
def forward(self, x):
for module in self.module_dict.values():
x = module(x)
return x
# 创建模型实例
model = MyModel()
# 随机生成输入
x = torch.randn(1, 3, 32, 32)
# 进行前向传播
output = model(x)
print(output.shape)
By nn.ModuleDict
creating a network above, it is still very simple overall, similar to dictionary operations.
The basic use of , , is basically the introduction of wine, if there is any mistake, please correct nn.Sequential
me nn.ModuleList
!nn.ModuleDict