Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Inception-v4、Inception-ResNet和残差连接对学习的影响

本文提出Inception-v4、Inception-ResNet-V1、Inception-ResNet-V2架构；

google团队；

发表时间：[Submitted on 23 Feb 2016 (v1), last revised 23 Aug 2016 (this version, v2)]；

发表会议/期刊：Computer Vision and Pattern Recognition；

论文地址：https://arxiv.org/abs/1602.07261；

Inception发展演变：

GoogLeNet/Inception V1：2014年9月《Going deeper with convolutions》；
BN-Inception 2015年2月《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》；
Inception V2/V3 2015年12月《Rethinking the Inception Architecture for Computer Vision》；
Inception V4、Inception-ResNet 2016年2月《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》；
Xception 2016年10月《Xception: Deep Learning with Depthwise Separable Convolutions》；

0 摘要

深层卷积网络近年来图像识别性能最大进步的核心；Inception结构也被证明是一个计算成本低、性能好的网络架构；

最何恺明团队提出残差架构，在2015ILSVRC挑战中，取得最好的成绩；ResNet性能与最新一代Inception-v3网络相似；

这就提出一个问题：将Inception架构与残差连接架构综合起来，是否可以提升性能？

本文用实验说明，残差结构可以显著的加快网络的训练，并且有残差模块的网络比没有残差模块的网络性能稍高；

本文提出Inception-ResNet-V1、Inception-ResNet-V2和Inception-V4；

在2012年ILSVRC分类任务上评估性能；

本文进一步证明了适当的缩放如何稳定非常广泛的残余初始网络的训练；

1 简介

自2012年(AlexNet)以来，深度学习在计算机视觉各种任务中表现突出；

本文结合两个最新想法：残差连接和Inception模块；残差连接对于训练非常深的网络架构非常重要，由于Inception网络通常很深，自然想到用残差连接来替换Inception架构的filter concatenation操作；Inception在保持其计算效率的同时还可以获得残差连接的好处；

除了研究两者的集成，本文还研究了Inception模块本身是否可以通过增加深度和宽度，让模型变得更有效，为此，设计出了Inception-V4模型，它比Inception-v3具有更统一的简化架构和更多的初始模块。

本文，比较了Inception-v3和Inception-v4；
比较了Inception-Resnet-v1，Inception-Resnet-v2；

模型设计基本原则：与非残差模型参数/计算复杂度相比，新模型的参数量和计算复杂度不能增加；

2 相关工作

AlexNet；

VGG；

GoogLeNet；

残差连接(Residual connection)，如下图所示：

左图：将卷积处理后的结果与原来的结构相加，进行激活传给下一层；

右图：先用1×1卷积进行降维，减少计算量，再进行卷积，和原来的结构相加，进行激活传给下一层；

图：左，原生残差连接；右，优化后的残差连接；

3 架构选择

3.1 纯Inception block——Inception-V4

早期Inception模块必须进行分区训练，才能把整个网络放入内存中(内存算力不够的情况)；

然而Inception架构是高度可变的：各个层，滤波器数量都可以调整——这些变化不会影响完全训练的网络质量；

之前为了优化训练速度，需要人工来平衡各个子网络之间的大小，但是——

随着TensorFlow的引入，模型可以在不分区的情况下进行训练(这是因为TensorFlow反向传播使用的内存进行了优化，减少张量数量等实现)；

之前都是单独的调整一个模块，导致网络看起来非常复杂，现在的Inception-V4，采用了一个统一的策略来调整；

Inception-V4架构概述见图9，其中各个模块详情见图3~图8；

3.1.1 pytorch inception-v4架构实现

图9：Inception-V4架构

class Inception_v4(nn.Module):
    def __init__(self):
        super(Inception_v4, self).__init__()
        self.stem = Stem(3, 384)
        self.icpA = InceptionA(384, 384)
        self.redA = ReductionA(384, 1024, 192, 224, 256, 384)
        self.icpB = InceptionB(1024, 1024)
        self.redB = ReductionB(1024, 1536)
        self.icpC = InceptionC(1536, 1536)
        self.avgpool = nn.AvgPool2d(kernel_size=8)
        self.dropout = nn.Dropout(p=0.8)
        self.linear = nn.Linear(1536, 1000)

    def forward(self, x):
        # Stem Module
        out = self.stem(x)
        # InceptionA Module * 4
        out = self.icpA(self.icpA(self.icpA(self.icpA(out))))
        # ReductionA Module
        out = self.redA(out)
        # InceptionB Module * 7
        out = self.icpB(self.icpB(self.icpB(self.icpB(self.icpB(self.icpB(self.icpB(out)))))))
        # ReductionB Module
        out = self.redB(out)
        # InceptionC Module * 3
        out = self.icpC(self.icpC(self.icpC(out)))
        # Average Pooling
        out = self.avgpool(out)
        out = out.view(out.size(0), -1)
        # Dropout
        out = self.dropout(out)
        # Linear(Softmax)
        out = self.linear(out)

        return out

3.1.2 pytorch Stem模块实现

图3：Stem模块

class Stem(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Stem, self).__init__()
        # conv3*3(32 stride2 valid)
        self.conv1 = BasicConv2d(in_channels, 32, kernel_size=3, stride=2)
        # conv3*3(32 valid)
        self.conv2 = BasicConv2d(32, 32, kernel_size=3)
        # conv3*3(64)
        self.conv3 = BasicConv2d(32, 64, kernel_size=3, padding=1)
        # maxpool3*3(stride2 valid) & conv3*3(96 stride2 valid)
        self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2)
        self.conv4 = BasicConv2d(64, 96, kernel_size=3, stride=2)

        # conv1*1(64) --> conv3*3(96 valid)
        self.conv5_1_1 = BasicConv2d(160, 64, kernel_size=1)
        self.conv5_1_2 = BasicConv2d(64, 96, kernel_size=3)
        # conv1*1(64) --> conv7*1(64) --> conv1*7(64) --> conv3*3(96 valid)
        self.conv5_2_1 = BasicConv2d(160, 64, kernel_size=1)
        self.conv5_2_2 = BasicConv2d(64, 64, kernel_size=(7, 1), padding=(3, 0))
        self.conv5_2_3 = BasicConv2d(64, 64, kernel_size=(1, 7), padding=(0, 3))
        self.conv5_2_4 = BasicConv2d(64, 96, kernel_size=3)

        # conv3*3(192 valid)
        self.conv6 = BasicConv2d(192, 192, kernel_size=3, stride=2)
        # maxpool3*3(stride2 valid)
        self.maxpool6 = nn.MaxPool2d(kernel_size=3, stride=2)

    def forward(self, x):
        # 两个分支 concat
        y1_1 = self.maxpool4(self.conv3(self.conv2(self.conv1(x))))
        y1_2 = self.conv4(self.conv3(self.conv2(self.conv1(x))))
        y1 = torch.cat([y1_1, y1_2], 1)

        # 分支
        y2_1 = self.conv5_1_2(self.conv5_1_1(y1))
        y2_2 = self.conv5_2_4(self.conv5_2_3(self.conv5_2_2(self.conv5_2_1(y1))))
        y2 = torch.cat([y2_1, y2_2], 1)

        y3_1 = self.conv6(y2)
        y3_2 = self.maxpool6(y2)
        y3 = torch.cat([y3_1, y3_2], 1)

        return y3

3.1.3 pytorch Inception-A

图4：Inception-A模块，把5*5卷积换成3*3卷积

class InceptionA(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(InceptionA, self).__init__()
        # branch1: avgpool --> conv1*1(96)
        self.b1_1 = nn.AvgPool2d(kernel_size=3, padding=1, stride=1)
        self.b1_2 = BasicConv2d(in_channels, 96, kernel_size=1)

        # branch2: conv1*1(96)
        self.b2 = BasicConv2d(in_channels, 96, kernel_size=1)

        # branch3: conv1*1(64) --> conv3*3(96)
        self.b3_1 = BasicConv2d(in_channels, 64, kernel_size=1)
        self.b3_2 = BasicConv2d(64, 96, kernel_size=3, padding=1)

        # branch4: conv1*1(64) --> conv3*3(96) --> conv3*3(96)
        self.b4_1 = BasicConv2d(in_channels, 64, kernel_size=1)
        self.b4_2 = BasicConv2d(64, 96, kernel_size=3, padding=1)
        self.b4_3 = BasicConv2d(96, 96, kernel_size=3, padding=1)

    def forward(self, x):
        y1 = self.b1_2(self.b1_1(x))
        y2 = self.b2(x)
        y3 = self.b3_2(self.b3_1(x))
        y4 = self.b4_3(self.b4_2(self.b4_1(x)))

        outputsA = [y1, y2, y3, y4]
        return torch.cat(outputsA, 1)

3.1.4 pytorch Inception-B

图5：Inception-B模块，把7*7卷积换成7*1卷积和1*7卷积

class InceptionB(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(InceptionB, self).__init__()
        # branch1: avgpool --> conv1*1(128)
        self.b1_1 = nn.AvgPool2d(kernel_size=3, padding=1, stride=1)
        self.b1_2 = BasicConv2d(in_channels, 128, kernel_size=1)

        # branch2: conv1*1(384)
        self.b2 = BasicConv2d(in_channels, 384, kernel_size=1)

        # branch3: conv1*1(192) --> conv1*7(224) --> conv1*7(256)
        self.b3_1 = BasicConv2d(in_channels, 192, kernel_size=1)
        self.b3_2 = BasicConv2d(192, 224, kernel_size=(1, 7), padding=(0, 3))
        self.b3_3 = BasicConv2d(224, 256, kernel_size=(1, 7), padding=(0, 3))

        # branch4: conv1*1(192) --> conv1*7(192) --> conv7*1(224) --> conv1*7(224) --> conv7*1(256)
        self.b4_1 = BasicConv2d(in_channels, 192, kernel_size=1, stride=1)
        self.b4_2 = BasicConv2d(192, 192, kernel_size=(1, 7), padding=(0, 3))
        self.b4_3 = BasicConv2d(192, 224, kernel_size=(7, 1), padding=(3, 0))
        self.b4_4 = BasicConv2d(224, 224, kernel_size=(1, 7), padding=(0, 3))
        self.b4_5 = BasicConv2d(224, 256, kernel_size=(7, 1), padding=(3, 0))

    def forward(self, x):
        y1 = self.b1_2(self.b1_1(x))
        y2 = self.b2(x)
        y3 = self.b3_3(self.b3_2(self.b3_1(x)))
        y4 = self.b4_5(self.b4_4(self.b4_3(self.b4_2(self.b4_1(x)))))

        outputsB = [y1, y2, y3, y4]
        return torch.cat(outputsB, 1)

3.1.5 pytorch Inception-C

图6：Inception-C模块，加宽模型

class InceptionC(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(InceptionC, self).__init__()
        # branch1: avgpool --> conv1*1(256)
        self.b1_1 = nn.AvgPool2d(kernel_size=3, padding=1, stride=1)
        self.b1_2 = BasicConv2d(in_channels, 256, kernel_size=1)

        # branch2: conv1*1(256)
        self.b2 = BasicConv2d(in_channels, 256, kernel_size=1)

        # branch3: conv1*1(384) --> conv1*3(256) & conv3*1(256)
        self.b3_1 = BasicConv2d(in_channels, 384, kernel_size=1)
        self.b3_2_1 = BasicConv2d(384, 256, kernel_size=(1, 3), padding=(0, 1))
        self.b3_2_2 = BasicConv2d(384, 256, kernel_size=(3, 1), padding=(1, 0))

        # branch4: conv1*1(384) --> conv1*3(448) --> conv3*1(512) --> conv3*1(256) & conv7*1(256)
        self.b4_1 = BasicConv2d(in_channels, 384, kernel_size=1, stride=1)
        self.b4_2 = BasicConv2d(384, 448, kernel_size=(1, 3), padding=(0, 1))
        self.b4_3 = BasicConv2d(448, 512, kernel_size=(3, 1), padding=(1, 0))
        self.b4_4_1 = BasicConv2d(512, 256, kernel_size=(3, 1), padding=(1, 0))
        self.b4_4_2 = BasicConv2d(512, 256, kernel_size=(1, 3), padding=(0, 1))

    def forward(self, x):
        y1 = self.b1_2(self.b1_1(x))
        y2 = self.b2(x)
        y3_1 = self.b3_2_1(self.b3_1(x))
        y3_2 = self.b3_2_2(self.b3_1(x))
        y4_1 = self.b4_4_1(self.b4_3(self.b4_2(self.b4_1(x))))
        y4_2 = self.b4_4_2(self.b4_3(self.b4_2(self.b4_1(x))))

        outputsC = [y1, y2, y3_1, y3_2, y4_1, y4_2]
        return torch.cat(outputsC, 1)

3.1.6 pytorch Reduction-A

图7：Recution-A模块

class ReductionA(nn.Module):
    def __init__(self, in_channels, out_channels, k, l, m, n):
        super(ReductionA, self).__init__()
        # branch1: maxpool3*3(stride2 valid)
        self.b1 = nn.MaxPool2d(kernel_size=3, stride=2)

        # branch2: conv3*3(n stride2 valid)
        self.b2 = BasicConv2d(in_channels, n, kernel_size=3, stride=2)

        # branch3: conv1*1(k) --> conv3*3(l) --> conv3*3(m stride2 valid)
        self.b3_1 = BasicConv2d(in_channels, k, kernel_size=1)
        self.b3_2 = BasicConv2d(k, l, kernel_size=3, padding=1)
        self.b3_3 = BasicConv2d(l, m, kernel_size=3, stride=2)

    def forward(self, x):
        y1 = self.b1(x)
        y2 = self.b2(x)
        y3 = self.b3_3(self.b3_2(self.b3_1(x)))

        outputsRedA = [y1, y2, y3]
        return torch.cat(outputsRedA, 1)

3.1.7 pytorch Reduction-B

图8：Recution-B模块

class ReductionB(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(ReductionB, self).__init__()
        # branch1: maxpool3*3(stride2 valid)
        self.b1 = nn.MaxPool2d(kernel_size=3, stride=2)

        # branch2: conv1*1(192) --> conv3*3(192 stride2 valid)
        self.b2_1 = BasicConv2d(in_channels, 192, kernel_size=1)
        self.b2_2 = BasicConv2d(192, 192, kernel_size=3, stride=2)

        # branch3: conv1*1(256) --> conv1*7(256) --> conv7*1(320) --> conv3*3(320 stride2 valid)
        self.b3_1 = BasicConv2d(in_channels, 256, kernel_size=1)
        self.b3_2 = BasicConv2d(256, 256, kernel_size=(1, 7), padding=(0, 3))
        self.b3_3 = BasicConv2d(256, 320, kernel_size=(7, 1), padding=(3, 0))
        self.b3_4 = BasicConv2d(320, 320, kernel_size=3, stride=2)

    def forward(self, x):
        y1 = self.b1(x)
        y2 = self.b2_2(self.b2_1((x)))
        y3 = self.b3_4(self.b3_3(self.b3_2(self.b3_1(x))))

        outputsRedB = [y1, y2, y3]
        return torch.cat(outputsRedB, 1)

3.1.8 pytorch 基本卷积

import torch.nn as nn
import torch
import torch.nn.functional as F

class BasicConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
        self.bn = nn.BatchNorm2d(out_channels)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return F.relu(x)

3.2 Residual Inception Blocks

对于Inception的res版本，本文使用比原Inception更轻的Inception模块；每个Inception模块之后是不带激活函数的1×1卷积模块，用来升维(为了匹配输入维度，逐元素的相加)；

本文尝试了几个不同的残差模块与Inception模块相结合，这里重点介绍两个版本，一个是Inception-ResNet-v1，性能大致相当于Inception-v3，一个是Inception-ResNet-v2，性能相当于新引入的Inception-v4原始版本；

3.2.1 Inception-ResNet-V1 架构实现

架构见图15；

图15：Inception-ResNet-v1/v2架构，两种架构不同之处在使用的内部模块不同，组合顺序是一致的，如上图所示；

class Inception_ResNet(nn.Module):
    def __init__(self,classes=2):
        super(Inception_ResNet, self).__init__()
        blocks = []
        blocks.append(StemV1(in_planes=3))
        for i in range(5):
            blocks.append(Inception_ResNet_A(input=256))
        blocks.append(Reduction_A(input=256))
        for i in range(10):
            blocks.append(Inception_ResNet_B(input=896))
        blocks.append(Reduction_B(input=896))
        for i in range(10):
            blocks.append(Inception_ResNet_C(input=1792))

        self.features = nn.Sequential(*blocks)

        self.avepool = nn.AvgPool2d(kernel_size=3)

        self.dropout = nn.Dropout(p=0.2)
        self.linear = nn.Linear(1792, classes)

    def forward(self,x):
        x = self.features(x)
        # print("x",x.shape)
        x = self.avepool(x)
        # print("avepool", x.shape)
        x = self.dropout(x)
        # print("dropout", x.shape)
        x = x.view(x.size(0), -1)
        x = self.linear(x)

        return  x

3.2.2 Inception-ResNet-V1 Stem模块实现

图14：Inception-ResNet-V1 Stem

class StemV1(nn.Module):
    def __init__(self, in_planes):
        super(StemV1, self).__init__()

        self.conv1 = conv3x3(in_planes =in_planes,out_channels=32,stride=2, padding=0)
        self.conv2 = conv3x3(in_planes=32, out_channels=32,  stride=1, padding=0)
        self.conv3 = conv3x3(in_planes=32, out_channels=64, stride=1, padding=1)
        self.maxpool = nn.MaxPool2d(kernel_size=3,  stride=2, padding=0)
        self.conv4 = conv3x3(in_planes=64, out_channels=64,  stride=1, padding=1)
        self.conv5 = conv1x1(in_planes =64,out_channels=80,  stride=1, padding=0)
        self.conv6 = conv3x3(in_planes=80, out_channels=192, stride=1, padding=0)
        self.conv7 = conv3x3(in_planes=192, out_channels=256,  stride=2, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.maxpool(x)
        x = self.conv4(x)
        x = self.conv5(x)
        x = self.conv6(x)
        x = self.conv7(x)

        return x

3.2.3 Inception-ResNet-A

图10：Inception-ResNet-A

class Inception_ResNet_A(nn.Module):
    def __init__(self, input ):
        super(Inception_ResNet_A, self).__init__()
        self.conv1 = conv1x1(in_planes =input,out_channels=32,stride=1, padding=0)
        self.conv2 = conv3x3(in_planes=32, out_channels=32, stride=1, padding=1)
        self.line =  nn.Conv2d(96, 256, 1, stride=1, padding=0, bias=True)

        self.relu = nn.ReLU()

    def forward(self, x):

        c1 = self.conv1(x)
        # print("c1",c1.shape)
        c2 = self.conv1(x)
        # print("c2", c2.shape)
        c3 = self.conv1(x)
        # print("c3", c3.shape)
        c2_1 = self.conv2(c2)
        # print("c2_1", c2_1.shape)
        c3_1 = self.conv2(c3)
        # print("c3_1", c3_1.shape)
        c3_2 = self.conv2(c3_1)
        # print("c3_2", c3_2.shape)
        cat = torch.cat([c1, c2_1, c3_2],dim=1)#torch.Size([4, 96, 15, 15])
        # print("x",x.shape)

        line = self.line(cat)
        # print("line",line.shape)
        out =x+line
        out = self.relu(out)

        return out

3.2.4 Reduction-A

图7：Reduction-A

class Reduction_A(nn.Module):
    def __init__(self, input,n=384,k=192,l=224,m=256):
        super(Reduction_A, self).__init__()
        self.maxpool = nn.MaxPool2d(kernel_size=3,  stride=2, padding=0)
        self.conv1 = conv3x3(in_planes=input, out_channels=n,stride=2,padding=0)
        self.conv2 = conv1x1(in_planes=input, out_channels=k,padding=1)
        self.conv3 = conv3x3(in_planes=k,  out_channels=l,padding=0)
        self.conv4 = conv3x3(in_planes=l,  out_channels=m,stride=2,padding=0)

    def forward(self, x):

        c1 = self.maxpool(x)
        # print("c1",c1.shape)
        c2 = self.conv1(x)
        # print("c2", c2.shape)
        c3 = self.conv2(x)
        # print("c3", c3.shape)
        c3_1 = self.conv3(c3)
        # print("c3_1", c3_1.shape)
        c3_2 = self.conv4(c3_1)
        # print("c3_2", c3_2.shape)
        cat = torch.cat([c1, c2,c3_2], dim=1)

        return cat

3.2.5 Inception-ResNet-B

图11：Inception-ResNet-B

class Inception_ResNet_B(nn.Module):
    def __init__(self, input):
        super(Inception_ResNet_B, self).__init__()
        self.conv1 = conv1x1(in_planes =input,out_channels=128,stride=1, padding=0)
        self.conv1x7 = nn.Conv2d(in_channels=128,out_channels=128,kernel_size=(1,7), padding=(0,3))
        self.conv7x1 = nn.Conv2d(in_channels=128, out_channels=128,kernel_size=(7,1), padding=(3,0))
        self.line = nn.Conv2d(256, 896, 1, stride=1, padding=0, bias=True)

        self.relu = nn.ReLU()

    def forward(self, x):

        c1 = self.conv1(x)
        # print("c1",c1.shape)
        c2 = self.conv1(x)
        # print("c2", c2.shape)
        c2_1 = self.conv1x7(c2)
        # print("c2_1", c2_1.shape)

        c2_1 = self.relu(c2_1)

        c2_2 = self.conv7x1(c2_1)
        # print("c2_2", c2_2.shape)

        c2_2 = self.relu(c2_2)

        cat = torch.cat([c1, c2_2], dim=1)
        line = self.line(cat)
        out =x+line

        out = self.relu(out)
        # print("out", out.shape)
        return out

3.2.6 Reduction-B

图12：Reduction-B

class Reduction_B(nn.Module):
    def __init__(self, input):
        super(Reduction_B, self).__init__()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.conv1 = conv1x1(in_planes=input,  out_channels=256, padding=1)
        self.conv2 = conv3x3(in_planes=256,  out_channels=384, stride=2, padding=0)
        self.conv3 = conv3x3(in_planes=256,  out_channels=256,stride=2, padding=0)
        self.conv4 = conv3x3(in_planes=256,  out_channels=256,  padding=1)
        self.conv5 = conv3x3(in_planes=256,  out_channels=256, stride=2, padding=0)
    def forward(self, x):
        c1 = self.maxpool(x)
        # print("c1", c1.shape)
        c2 = self.conv1(x)
        # print("c2", c2.shape)
        c3 = self.conv1(x)
        # print("c3", c3.shape)
        c4 = self.conv1(x)
        # print("c4", c4.shape)
        c2_1 = self.conv2(c2)
        # print("cc2_1", c2_1.shape)
        c3_1 = self.conv3(c3)
        # print("c3_1", c3_1.shape)
        c4_1 = self.conv4(c4)
        # print("c4_1", c4_1.shape)
        c4_2 = self.conv5(c4_1)
        # print("c4_2", c4_2.shape)
        cat = torch.cat([c1, c2_1, c3_1,c4_2], dim=1)
        # print("cat", cat.shape)
        return cat

3.2.7 Inception-ResNet-C

图13：Inception-ResNet-C

class Inception_ResNet_C(nn.Module):
    def __init__(self, input):
        super(Inception_ResNet_C, self).__init__()
        self.conv1 = conv1x1(in_planes=input, out_channels=192, stride=1, padding=0)
        self.conv1x3 = nn.Conv2d(in_channels=192, out_channels=192, kernel_size=(1, 3), padding=(0,1))
        self.conv3x1 = nn.Conv2d(in_channels=192, out_channels=192, kernel_size=(3, 1), padding=(1,0))
        self.line = nn.Conv2d(384, 1792, 1, stride=1, padding=0, bias=True)

        self.relu = nn.ReLU()

    def forward(self, x):
        c1 = self.conv1(x)
        # print("x", x.shape)
        # print("c1",c1.shape)
        c2 = self.conv1(x)
        # print("c2", c2.shape)
        c2_1 = self.conv1x3(c2)
        # print("c2_1", c2_1.shape)

        c2_1 = self.relu(c2_1)
        c2_2 = self.conv3x1(c2_1)
        # print("c2_2", c2_2.shape)

        c2_2 = self.relu(c2_2)
        cat = torch.cat([c1, c2_2], dim=1)
        # print("cat", cat.shape)
        line = self.line(cat)
        out = x+ line
        # print("out", out.shape)
        out = self.relu(out)

        return out

3.2.8 基本卷积模块

import torch.nn as nn



class conv3x3(nn.Module):
    def __init__(self, in_planes, out_channels, stride=1, padding=0):
        super(conv3x3, self).__init__()
        self.conv3x3 = nn.Sequential(
             nn.Conv2d(in_planes, out_channels, kernel_size=3, stride=stride, padding=padding),#卷积核为3x3
             nn.BatchNorm2d(out_channels),#BN层，防止过拟合以及梯度爆炸
             nn.ReLU()#激活函数
        )
        
    def forward(self, input):
        return self.conv3x3(input)
        
class conv1x1(nn.Module):
    def __init__(self, in_planes, out_channels, stride=1, padding=0):
        super(conv1x1, self).__init__()
        self.conv1x1 = nn.Sequential(
             nn.Conv2d(in_planes, out_channels, kernel_size=1, stride=stride, padding=padding),#卷积核为1x1
             nn.BatchNorm2d(out_channels),
             nn.ReLU()
        )

    def forward(self, input):
        return self.conv1x1(input)

一个技巧：在带残差连接的Inception模块里，batch-normalization只在传统的层之后进行，不在残差块求和之后进行；

因为虽然在每层之后都使用batch-normalization是有利的，但是在GPU中会导致内存爆炸，消耗大量内存；

3.3 Scaling of the Residuals 残差的缩放

对残差块输出进行缩放；

本文发现，如果滤波器的数量超过1000，则残差变量会变得不稳定，网络在训练早期，一些神经元会输出0，说明神经元“死亡”，这意味着平均池化之前的最后一层在数万次迭代后开始只产生零。

降低学习率和加入额外的BN都不能解决这个问题；

我们发现在加法融合之前，对残差分支的结果乘以一个幅度缩小的系数可以解决这个问题，系数一般在0.1~0.3之间；网络越深，系数越小；

如图20所示：

图20：残差缩放

在He et al. 2015 中也发现了这个问题，他们使用两阶段的方法：

阶段一：预热 warmup，以非常低的学习率完成；
阶段二：大学习率来学习；

本文发现，如果滤波器的数量非常高，那么即使是非常低（0.00001）的学习率也不足以应对不稳定性，并且具有高学习率的训练有机会破坏其效果；

本文发现仅缩放残差更可靠；

4 训练方法

TensorFlow分布式机器学习系统；

20个NVidia Kepler GPU；

剩下参数同Inception-V3；

5 实验结果

图21显示，Inception-ResNet-V1性能与Incetion-V3差不多，单收敛更快；

Inception-V4, Inception-ResNet and the Impact of Residual Connections on Learning