[Interview] GoogLeNet, ResNet, ShuffleNet, MobileNet text version


  • Compared with AlexNet and VGG, GoogLeNet (Inception-v1) has multiple branches, and introduces 1×1 convolution to help reduce the amount of network calculations

  • Inception-v2
    introduces Batch Normalization (BN); 5×5 convolution uses two 3×3 convolutions instead

  • Inception-v3
    asymmetric convolution (n×n convolution is divided into 1×n and n×1 two convolutions);
    new pooling (in order to prevent information loss and not increase the amount of calculation, the series pooling conv is changed to parallel conv, pool and concat)
    Label smooth

  • Inception-v4
    introduces ResNet's shortcut idea

  • Xception
    Separable Convolution
    normal conv (3×3,256)
    (1×1,256)
    (3×3,1)

  • ResNeXt
    introduces a new dimension of cardinality

    256d in-(256,1×1,64)-(64,3×3,64)-(64,1×1,256)-sum x-256-d out
    → change to →
    256d in-(256,1× 1,4)*32-(4,3×3,4)*32-(4,1×1,256)*32-concat-sum x-256-d out (32 paths use the same convolution parameters)
    block depth Use when >3

  • PreAct ResNet
    conv-bn-relu-sum x-relu → 改为 → bn-relu-conv-bn-relu-conv-sum x

  • SENet
    x→(c×h×w)→Global pooling(c×1×1)→fc(c/16×1×1)→fc(c×1×1)→sigmoid(c×1×1)→scale*x(c×h×w)

  • MobileNet V1
    Depthwise Separable Convolution
    normal conv (et. 3×3,256)
    depthwise convolution (3×3,1)
    pointwise convolution (1×1,256)

  • MobileNet V2
    Depthwise Separable Convolution Same as v1
    improvement: the inverted residual with linear bottleneck;
    Residuals block (1×1→3×3→1×1 channel first compression and then expansion)
    Inverted residuals (1 ×1→3×3→1×1 channel is first expanded and then compressed, because DW convolution does not have the ability to change the number of channels, and the effect of feature extraction in low-dimensional space is not good)
    linear bottleneck (remove last PW Relu. The activation function is in high-dimensional space It can effectively increase the nonlinearity, but it will destroy the feature in the low-dimensional space. The main function of the second PW is to reduce the dimensionality)

    v1 in → DW 3 × 3 → Relu → PW 1 × 1 → Relu → out
    v2 in → PW 1 × 1 → DW 3 × 3 → Relu → PW 1 × 1 → out


  • The lightweight attention model based on the squeeze and excitation structure introduced by MobileNet V3
    optimizes the activation function
    using NAS

  • DPN
    High Order RNN structure (HORNN) combines ResNeXt and DenseNet

  • ShuffleNet V1
    Channel Shuffle for Group Convolutions

  • ShuffleNet V2
    only uses FLOPs as a measurement standard is not comprehensive. One of the overlooked factors is that MAC (memory access cost)
    uses a "balanced" convolutional layer (input and output channels are the same);
    use packet convolution carefully;
    reduce the use of fragments Operation;
    reduce element-level operations;

    The 1x1 group convolution
    Channel Split is abandoned : the feature map is divided into two groups A and B.
    Group A is considered a short-cut; group B passes through bottleneck input and output channels, and
    finally concat A and B
    concat and perform Channel Shuffle

Guess you like

Origin blog.csdn.net/qq_31622015/article/details/102786825