pytroch nn.Module源码解析(1)

　　今天在写一个分类网络时，要使用nn.Sequential中的一个模块，因为nn.Sequential中模块都没有名字，我一时竟无从下笔。于是决定写这篇博客梳理pytorch的nn.Module类，看完这篇博客，你大概率可以学会：

提取nn.Sequential中任意一个模块
能初始化一个网络的所有权重，不管是随机初始化还是权重文件
对nn.Module类有个总体把握

1 init方法

　　我们先不看代码，自己小脑袋里想一想这个类应该有什么东西，我们想过之后应该至少会想到里面一定存着各种操作，如卷积或者relu，肯定还会有操作对应的权重。没错，这些东西类里都有：

from ..backends.thnn import backend as thnn_backend

def __init__(self):
        self._backend = thnn_backend
        self._parameters = OrderedDict()
        self._buffers = OrderedDict()
        self._backward_hooks = OrderedDict()
        self._forward_hooks = OrderedDict()
        self._forward_pre_hooks = OrderedDict()
        self._modules = OrderedDict()
        self.training = True

　　其中_modules和_parameters就是存储这些的，那么剩下这些奇奇怪怪的属性是什么呢？第一个奇怪的东西就是这个_backend属性，他的值是thnn_backend，好像和torch的底层代码有关啊！有点麻烦啊！不用怕，这个属性我们只要大概了解它的意思就可以，对之后这个模块的使用没有任何影响。想象说我们有maxpooling层，pytorch底层会用cudnn实现，也会用cunn实现，pytorch在前向传播时会自动选择一个速度最快的实现，这里thnn_backend就是指定我们前向传播时用thnn这种实现方式，更多过于backend的直觉理解可以参考这个thread。

　　第二个奇怪的东西就是_buffer属性，如果我说这个buffer存储的也是网络参数，你会感觉到更加迷惑吗？一个提示：batch norm的参数。（花一分钟想一想）batch norm中除了有参数$ alpha $和$ beta $外，还有running_mean和running_var，这些在整个学习过程中也需要存储起来但是不需要学习的参数我们就把它们存储到buffer中。

　　第三个奇怪的东西就是3个钩子，这三个钩子具体的实现我不会讲（码字太麻烦了），不过为了一些好奇宝宝，如果forward_hook的功能是在module完成前向传播时做一些事，你能推断出其他两个钩子的功能吗？（答案）

2 children方法和modules方法

　　children方法和modules方法的作用是很类似的，我们先看一下children方法的代码。

def named_children(self):
    memo = set()
    for name, module in self._modules.items():
        if module is not None and module not in memo:
            memo.add(module)
            yield name, module
def children(self):
    for name, module in self.named_children():
        yield module

　　我们看完代码发现children方法的作用就是把_modules遍历一遍，我们来看一下具体例子（你也可以自己在命令行中把这两个命令输入进去，尽量不要复制粘贴）：

>>>model=nn.Sequential(nn.Linear(3,1), \
                       nn.Sequential(nn.BatchNorm2d(1), \
                       nn.Linear(1,3)))
>>> for m in model.children():
...     print(m)
Linear(in_features=3, out_features=1, bias=True)
Sequential(
  (0): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True)
  (1): Linear(in_features=1, out_features=3, bias=True)
)

　　如果我们还要把更里层的modules提取出来，我们就需要用到modules方法：

def named_modules(self, memo=None, prefix=''):
    if memo is None:
        memo = set()
    if self not in memo:
       memo.add(self) 
　　　　yield prefix, self
　　　　for name, module in self._modules.items():
 　　　 　　if module is None: 
 　　continue 
　　　　　　submodule_prefix = prefix + ('.' if prefix else '') + name 
　　　　　　for m in module.named_modules(memo, submodule_prefix): 
　　　　　　　　yield mdef modules(self):
    for name, module in self.named_modules():
        yeild module

>>> for m in model.modules():
...     print(m)
Sequential(
  (0): Linear(in_features=3, out_features=1, bias=True)
  (1): Sequential(
    (0): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True)
    (1): Linear(in_features=1, out_features=3, bias=True)
  )
)
Linear(in_features=3, out_features=1, bias=True)
Sequential(
  (0): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True)
  (1): Linear(in_features=1, out_features=3, bias=True)
)
BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True)
Linear(in_features=1, out_features=3, bias=True)

了解了这两个方法你应该可以完成我们的第一个目标：提取nn.Sequential的任一个模块。在下一节中我们会完成我们的第二个目标，初始化权重。在结束之前给大家个小问题：有时候你只需要底层的module，而不需要module的子类，如nn.Sequential，那么怎么去除呢？

机器学习小贴士：支持向量机的意思就是我们最后选择的模型只与支持向量有关。

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　最后编辑于2018-10-1019:39:53 有什么错误请不吝赐教