pytorch中输出模型结构的几种方式：nn.Module.modules()、nn.Module.children()、nn.Module.parameters()和其中关于复制层的探究

写在前面

有时候需要将模型的结构输出出来，有三种函数可被使用，分别是nn.Module.modules()、nn.Module.children()、nn.Module.parameters()，包括其对应的nn.Module.named_modules()、nn.Module.named_children()、nn.Module.named_parameters()，加了named就是能够将层或者结构的名字一起输出，更加方便。因此实验都使用带named函数。

参考

pytorch MODULE

实验模型

定义一个实验模型，如下所示：

import torch

class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.fatherlayer1 = torch.nn.Sequential(
            torch.nn.Linear(5, 5),
            torch.nn.Linear(5, 10),
            torch.nn.Linear(10, 3),
        )
        self.layer2 = torch.nn.Linear(3, 6)
        self.layer3 = torch.nn.Linear(6, 5)
        self.fatherlayer4 = torch.nn.Sequential(
            torch.nn.Linear(5, 7),
            torch.nn.Linear(7, 5),
        )
    def forward(self, x):
        x = self.fatherlayer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.fatherlayer4(x)
        return x
        
        
model = MyModel()  # 定义模型

该模型有四个直接的孩子，但是，有的孩子还有自己的孩子。比如self.fatherlayer1、self.fatherlayer2，它们都是torch.nn.Sequential类型，内部还有子层。

torch.nn.Module.children

该函数只能返回模型直接的孩子。执行如下代码：

for name, layer in model.named_children():
    print(name, type(layer), sep=" ")

输出结果为：

fatherlayer1 <class 'torch.nn.modules.container.Sequential'>
layer2 <class 'torch.nn.modules.linear.Linear'>
layer3 <class 'torch.nn.modules.linear.Linear'>
fatherlayer4 <class 'torch.nn.modules.container.Sequential'>

可以看到，只能输出直接子层， Sequential类型的层不能再把其子层输出。因此我们只能使用递归方式将它们都输出。递归函数如下所示：

def flatten(module):
    for name, child in module.named_children():
        if isinstance(child, torch.nn.Sequential):
            for sub_name, sub_child in flatten(child):
                yield (f'{
      
      name}_{
      
      sub_name}', sub_child)
        else:
            yield (name, child)

我们使用对模型使用该函数：

for (name, layer) in flatten(model):
    print(name, layer, sep=" ")

执行结果如下所示：

fatherlayer1_0 Linear(in_features=5, out_features=5, bias=True)
fatherlayer1_1 Linear(in_features=5, out_features=10, bias=True)
fatherlayer1_2 Linear(in_features=10, out_features=3, bias=True)
layer2 Linear(in_features=3, out_features=6, bias=True)
layer3 Linear(in_features=6, out_features=5, bias=True)
fatherlayer4_0 Linear(in_features=5, out_features=7, bias=True)
fatherlayer4_1 Linear(in_features=7, out_features=5, bias=True)

可以看到，成功把7个Linear层输出了出来。

torch.nn.Module.parameters

该函数又一个参数recurse=True，若为True，表示将递归的输出该模型所有的参数。否则只输出直接孩子。

我们执行代码为：

for name, layer in model.named_parameters(recurse=True):
    print(name, layer.shape, sep=" ")

结果为：

fatherlayer1.0.weight torch.Size([5, 5])
fatherlayer1.0.bias torch.Size([5])
fatherlayer1.1.weight torch.Size([10, 5])
fatherlayer1.1.bias torch.Size([10])
fatherlayer1.2.weight torch.Size([3, 10])
fatherlayer1.2.bias torch.Size([3])
layer2.weight torch.Size([6, 3])
layer2.bias torch.Size([6])
layer3.weight torch.Size([5, 6])
layer3.bias torch.Size([5])
fatherlayer4.0.weight torch.Size([7, 5])
fatherlayer4.0.bias torch.Size([7])
fatherlayer4.1.weight torch.Size([5, 7])
fatherlayer4.1.bias torch.Size([5])

可以看到，七个Linear层的全部参数都能输出。
如果将recurse设置为False，

代码为：

for name, layer in model.named_parameters(recurse=False):
    print(name, layer.shape, sep=" ")

结果是什么都不会输出。

这有点出乎我意料。我认为可能是模型的第一层就是个Sequential导致的。我尝试把layer2和layer3定义为模型最前面的两层，再次执行，结果还是什么都不会输出。

我们可以执行以下代码：

print(type(model.layer2))
print(type(model.layer2.weight))

结果为：

<class 'torch.nn.modules.linear.Linear'>
<class 'torch.nn.parameter.Parameter'>

所以说，我们要想输出Linear层的参数，其实也是得递归的，因为相当于Linear有两个孩子weight和bias，它们才是参数。但是它们并不是model的直接孩子。
也就是说，如果设置了recurse = False，就只有模型中的torch.nn.parameter.Parameter可以直接输出。

torch.nn.Module.module

该函数有一个比较重要的参数为remove_duplicate = True，表示是否在结果中删除复制的module实例。

执行如下代码：

index = 0
for name, layer in model.named_modules(remove_duplicate=True):
    print(index, name, type(layer), sep=" ")
    index += 1

结果为：

0  <class '__main__.MyModel'>
1 fatherlayer1 <class 'torch.nn.modules.container.Sequential'>
2 fatherlayer1.0 <class 'torch.nn.modules.linear.Linear'>
3 fatherlayer1.1 <class 'torch.nn.modules.linear.Linear'>
4 fatherlayer1.2 <class 'torch.nn.modules.linear.Linear'>
5 layer2 <class 'torch.nn.modules.linear.Linear'>
6 layer3 <class 'torch.nn.modules.linear.Linear'>
7 fatherlayer4 <class 'torch.nn.modules.container.Sequential'>
8 fatherlayer4.0 <class 'torch.nn.modules.linear.Linear'>
9 fatherlayer4.1 <class 'torch.nn.modules.linear.Linear'>

可以看到，它也是递归的输出所有的层，但是会把父层和子层都输出，比如刚开始会把整个模型输出来，然后输出fatherlayer1，然后输出fatherlayer1的子层，即同时输出了fatherlayer1 和 fatherlayer1的子层。

执行如下代码：

index = 0
for name, layer in model.named_modules(remove_duplicate=False):
    print(index, name, type(layer), sep=" ")
    index += 1

结果和上面一致。

重新定义模型

如果我们定义模型如下：

import torch

class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.commonlayer = torch.nn.Linear(5, 5)
        self.fatherlayer1 = torch.nn.Sequential(
            torch.nn.Linear(5, 5),
            self.commonlayer,
            torch.nn.Linear(10, 5),
        )
        self.layer2 = self.commonlayer
        self.layer3 = torch.nn.Linear(5, 7)
        self.fatherlayer4 = torch.nn.Sequential(
            torch.nn.Linear(7, 5),
            self.commonlayer,
        )
    def forward(self, x):
        x = self.fatherlayer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.fatherlayer4(x)
        return x

执行如下代码：

index = 0
for name, layer in model.named_modules(remove_duplicate=False):
    print(index, name, type(layer), sep=" ")
    index += 1

结果为：

0  <class '__main__.MyModel'>
1 commonlayer <class 'torch.nn.modules.linear.Linear'>
2 fatherlayer1 <class 'torch.nn.modules.container.Sequential'>
3 fatherlayer1.0 <class 'torch.nn.modules.linear.Linear'>
4 fatherlayer1.1 <class 'torch.nn.modules.linear.Linear'>
5 fatherlayer1.2 <class 'torch.nn.modules.linear.Linear'>
6 layer2 <class 'torch.nn.modules.linear.Linear'>
7 layer3 <class 'torch.nn.modules.linear.Linear'>
8 fatherlayer4 <class 'torch.nn.modules.container.Sequential'>
9 fatherlayer4.0 <class 'torch.nn.modules.linear.Linear'>
10 fatherlayer4.1 <class 'torch.nn.modules.linear.Linear'>

可以看到，与上面的模型相比，只是多输出了commonlayer 。

执行如下代码：

index = 0
for name, layer in model.named_modules(remove_duplicate=True):
    print(index, name, type(layer), sep=" ")
    index += 1

结果为：

0  <class '__main__.MyModel'>
1 commonlayer <class 'torch.nn.modules.linear.Linear'>
2 fatherlayer1 <class 'torch.nn.modules.container.Sequential'>
3 fatherlayer1.0 <class 'torch.nn.modules.linear.Linear'>
4 fatherlayer1.2 <class 'torch.nn.modules.linear.Linear'>
5 layer3 <class 'torch.nn.modules.linear.Linear'>
6 fatherlayer4 <class 'torch.nn.modules.container.Sequential'>
7 fatherlayer4.0 <class 'torch.nn.modules.linear.Linear'>

可以看到，使用了commonlayer的fatherlayer1.1、layer2和fatherlayer4.1都没有被输出，因为他们都是复制品。

执行如下代码：

for name, layer in model.named_parameters(recurse=True):
    print(name, layer.shape, sep=" ")

结果为：

commonlayer.weight torch.Size([5, 5])
commonlayer.bias torch.Size([5])
fatherlayer1.0.weight torch.Size([5, 5])
fatherlayer1.0.bias torch.Size([5])
fatherlayer1.2.weight torch.Size([5, 10])
fatherlayer1.2.bias torch.Size([5])
layer3.weight torch.Size([7, 5])
layer3.bias torch.Size([7])
fatherlayer4.0.weight torch.Size([5, 7])
fatherlayer4.0.bias torch.Size([5])

可以看到，作为复制品的层，同样没有输出。

执行如下代码：

for name, layer in model.named_children():
    print(name, type(layer), sep=" ")

结果为：

commonlayer <class 'torch.nn.modules.linear.Linear'>
fatherlayer1 <class 'torch.nn.modules.container.Sequential'>
layer3 <class 'torch.nn.modules.linear.Linear'>
fatherlayer4 <class 'torch.nn.modules.container.Sequential'>

作为复制品的层layer2，也是没有输出。