《FRACTALNET: ULTRA-DEEP NEURAL NETWORKS WITHOUT RESIDUALS》阅读笔记

一、论文

《FRACTALNET:ULTRA-DEEP NEURAL NETWORKS WITHOUT RESIDUALS》

介绍了一种基于自相似性的神经网络宏架构设计策略。 重复应用简单的扩展规则将生成深度网络,其结构布局将精确地截断为分形。 这些网络包含不同长度的交互子路径,但不包括任何直通或残余连接; 每个内部信号都会先经过滤波器和非线性变换,然后才能被后续层看到。 在实验中,分形网络在CIFAR和ImageNet分类任务上均具有标准残差网络的出色性能,从而证明残差表示对于超深度卷积神经网络的成功并非根本。 相反,关键可能是在训练过程中从有效的浅层过渡到深层的能力。 我们注意到与学生-教师行为的相似之处,并开发了辍学路径(辍学的自然延伸),以规范分形体系结构中子路径的共同适应。 这种正则化允许提取高性能的固定深度子网。 此外,分形网络还具有随时性:浅子网提供了快速的答案,而深层子网具有更高的延迟,可以提供更准确的答案。

二、学习资料

分形网络结构——FractalNet: Ultra-Deep Neural Networks without Residuals论文笔记

论文笔记:分形网络(FractalNet: Ultra-Deep Neural Networks without Residuals)

基于分形结构的极深神经网络,超越 ImageNet 2015 冠军 ResNet(附论文下载)

三、网络模型

图1:分形架构。 左图:一个简单的扩展规则生成带有C个交织列的分形架构。 基本情况在输入和输出之间具有所选类型的单层(例如卷积)。 连接层计算按元素的均值。 右图:深层卷积网络会定期通过池化来降低空间分辨率。 分形版本使用作为池化层之间的构件。 堆叠B个这样的块会生成一个网络,该网络的总深度(按卷积层度量)为。 本示例的深度为40(B=5,C=4)。 (其实此网络的重点在于正则化丢弃路径)

四、代码

github:https://github.com/khanrc/pt.fractalnet

""" Fractal Model - per batch drop path """
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np


class Flatten(nn.Module):
    def forward(self, x):
        return x.view(x.size(0), -1)


class ConvBlock(nn.Module):
    """ Conv - Dropout - BN - ReLU """
    def __init__(self, C_in, C_out, kernel_size=3, stride=1, padding=1, dropout=None,
                 pad_type='zero'):
        super().__init__()

        if pad_type == 'zero':
            self.pad = nn.ZeroPad2d(padding)
        elif pad_type == 'reflect':
            # [!] the paper used reflect padding - just for data augmentation?
            self.pad = nn.ReflectionPad2d(padding)
        else:
            raise ValueError(pad_type)

        self.conv = nn.Conv2d(C_in, C_out, kernel_size, stride, padding=0, bias=False)
        if dropout is not None and dropout > 0.:
            self.dropout = nn.Dropout2d(p=dropout, inplace=True)
        else:
            self.dropout = None
        self.bn = nn.BatchNorm2d(C_out)

    def forward(self, x):
        out = self.pad(x)
        out = self.conv(out)
        if self.dropout:
            out = self.dropout(out)
        out = self.bn(out)
        out = F.relu_(out)

        return out


class FractalBlock(nn.Module):
    def __init__(self, n_columns, C_in, C_out, p_local_drop, p_dropout, global_drop_ratio,
                 pad_type='zero', doubling=False):
        """ Fractal block
        Args:
            - n_columns: # of columns
            - C_in: channel_in
            - C_out: channel_out
            - p_local_drop: local droppath prob
            - p_dropout: dropout prob
            - global_drop_ratio: global droppath ratio
            - pad_type: padding type of conv
            - doubling: if True, doubling by 1x1 conv in front of the block.
        """
        super().__init__()

        self.n_columns = n_columns
        self.columns = nn.ModuleList([nn.ModuleList() for _ in range(n_columns)])
        self.max_depth = 2 ** (n_columns-1)
        self.p_local_drop = p_local_drop
        self.global_drop_ratio = global_drop_ratio

        if doubling:
            #self.doubler = nn.Conv2d(C_in, C_out, 1, padding=0)
            self.doubler = ConvBlock(C_in, C_out, 1, padding=0)
        else:
            self.doubler = None

        dist = self.max_depth
        self.count = np.zeros([self.max_depth], dtype=np.int)
        for col in self.columns:
            for i in range(self.max_depth):
                if (i+1) % dist == 0:
                    first_block = (i+1 == dist) # first block in this column
                    if first_block and not doubling:
                        # if doubling, always input channel size is C_out.
                        cur_C_in = C_in
                    else:
                        cur_C_in = C_out
                    module = ConvBlock(cur_C_in, C_out, dropout=p_dropout, pad_type=pad_type)
                    self.count[i] += 1
                else:
                    module = None

                col.append(module)

            dist //= 2

    def local_drop_sampler(self, N):
        """ drop path probs sampler """
        drops = np.random.binomial(1, self.p_local_drop, size=[N]).astype(np.bool)
        if drops.all(): # all droped case
            i = np.random.randint(0, N)
            drops[i] = False

        return drops

    def join(self, outs):
        """ join with local drop path
        outs: [cols, N, C, H, W] (list)
        [!] make it better:
            - per-sample (not per-batch) local drop path
            - fully numpy or torch based (no for loop)
        """
        if len(outs) == 1:
            return outs[0]

        # apply local drop path only in training
        if self.training:
            # local drop path
            drops = self.local_drop_sampler(len(outs))
            outs = [o for drop, o in zip(drops, outs) if not drop]

        out = torch.stack(outs)
        return out.mean(dim=0)

    def forward_global(self, x, global_col):
        """ Global drop path """
        out = self.doubler(x) if self.doubler else x
        dist = 2 ** (self.n_columns-1 - global_col) # distance between module
        for i in range(dist-1, self.max_depth, dist):
            out = self.columns[global_col][i](out)

        return out

    def forward_local(self, x):
        """ Local drop path """
        out = self.doubler(x) if self.doubler else x
        outs = [out] * self.n_columns
        for i in range(self.max_depth):
            st = self.n_columns - self.count[i]
            cur_outs = [] # outs of current depth

            for c in range(st, self.n_columns):
                cur_in = outs[c] # current input
                cur_module = self.columns[c][i] # current module
                cur_outs.append(cur_module(cur_in))

            # join
            #print("join in depth = {}, # of in_join = {}".format(i, len(cur_out)))
            joined = self.join(cur_outs)

            for c in range(st, self.n_columns):
                outs[c] = joined

        return outs[0]

    def forward(self, x, deepest=False):
        if self.training == False:
            # eval
            if deepest:
                deepest_col = self.n_columns-1
                return self.forward_global(x, deepest_col)
            else:
                return self.forward_local(x)
        else:
            # training
            if np.random.rand() < self.global_drop_ratio:
                global_col = np.random.randint(0, self.n_columns)
                return self.forward_global(x, global_col)
            else:
                return self.forward_local(x)


class FractalNet(nn.Module):
    def __init__(self, data_shape, n_columns, channels, p_local_drop, dropout_probs,
                 global_drop_ratio, gap=0, init='xavier', pad_type='zero', doubling=False):
        """
        Args:
            - data_shape: (C, H, W, n_classes). e.g. (3, 32, 32, 10) - CIFAR 10.
            - n_columns: the number of columns
            - channels: channel outs (list)
            - p_local_drop: local drop prob
            - dropout_probs: dropout probs (list)
            - global_drop_ratio: global droppath ratio
        """
        super().__init__()
        self.B = len(channels) # the number of blocks
        C_in, H, W, n_classes = data_shape
        assert len(channels) == len(dropout_probs)
        assert H == W
        size = H

        layers = []
        C_out = C_in # work like C_out of block0 == data channels.
        total_layers = 0
        for b, (C, p_dropout) in enumerate(zip(channels, dropout_probs)):
            C_in, C_out = C_out, C
            #print("Channel in = {}, Channel out = {}".format(C_in, C_out))
            fb = FractalBlock(n_columns, C_in, C_out, p_local_drop, p_dropout, global_drop_ratio,
                              pad_type=pad_type, doubling=doubling)
            layers.append(fb)
            if gap == 0 or b < self.B-1:
                # Originally, every pool is max-pool in the paper (No GAP).
                layers.append(nn.MaxPool2d(2))
            elif gap == 1:
                # last layer and gap == 1
                layers.append(nn.AdaptiveAvgPool2d(1)) # average pooling

            size //= 2
            total_layers += fb.max_depth

        print("Last featuremap size = {}".format(size))
        print("Total layers = {}".format(total_layers))

        if gap == 2:
            layers.append(nn.Conv2d(channels[-1], 10, 1, padding=0)) # 1x1 conv
            layers.append(nn.AdaptiveAvgPool2d(1)) # gap
            layers.append(Flatten())
        else:
            layers.append(Flatten())
            layers.append(nn.Linear(channels[-1] * size * size, n_classes)) # fc layer

        self.net = nn.Sequential(*layers)

        if init == 'xavier':
            # xavier init as in the paper
            for n, p in self.named_parameters():
                if p.dim() > 1: # weights only
                    nn.init.xavier_uniform_(p)
                else: # bn w/b or bias
                    if 'bn.weight' in n:
                        nn.init.ones_(p)
                    else:
                        nn.init.zeros_(p)

    def forward(self, x):
        out = self.net(x)
        return out

猜你喜欢

转载自blog.csdn.net/LiuJiuXiaoShiTou/article/details/106068968
今日推荐