Lightweight Network: Packet depth separable convolution convolution &

ARTICLE two classic network lightweight compression methods: packet convolution (Group Convolution) and depth separable convolution (depthwise separable convolution)

references:

0. convolution

For ordinary purposes of convolution, convolution Filters layer determines the amount of parameters, wherein

  • Dimensions (h1, w1) and the input / output of the (H, W) related
  • C1 is the same as the number of channels of the input channels
  • The number of channels c2 and the same number of output

Thus, the amount of common parameters convolution layer: P 1 = h 1 × w 1 × C 1 × C 2 = k 2 C 1 C 2 P1 = h1×w1×C1×C2 = k^2C1C2
Here Insert Picture Description

1. Packet convolution (Group Conv)

Packet from the first convolution concept AlexNet , basic principle as FIG.

At that time only because of memory GTX580 3GB, so AlexNet run on two GPU, in order to meet the needs of its large amount of parameters

Here Insert Picture Description

  • The input / output channels divided by the g groups is the dimension (H1, W1, C1 / g) and (H2, W2, C2 / g)
  • Thereby also filters g groups C2 / g of dimensions (h1, w1, C1 / g) of the filters

In fact, the equivalent of the original convolution layer split into g convolution parallel layers , in fact, we did not change the number of filters, but each filter is only responsible for C1 / g channels of information, and thus the amount of parameters becomes: P 2 = P 1 / g P2 = P1/g

2. separable convolution depth (Depthwise Seperable Conv)

In MobileNet , the depth of the separable convolution described as two steps: the depth of convolution (Depthwise Conv) and point-wise convolution (Pointwise Conv)

  • Convolution depth , in fact, above a special case of packet convolution , so that C2 = C1, g = C1, the packet will become a deep convolution convolution. Plainly, the depth of each channel is the convolution of the input convolutional alone do so ( Hl, W1 of, a C1) → ( H2 of, W2 of, a C1)
  • Pointwise convolution , in fact, an ordinary convolutional a special case , only convolution kernel size of 1 × 1 only; it means using a 1 × 1 channel adjusting Filters do so (H2 of, W2 of, a C1 ) → (H2 of, W2, C2 )

Parameter calculation:

  • Convolution amount of depth parameters : h1 × w1 × 1 × C1
  • Convolution parameters stagnation amount : 1 × 1 × C1 × C2

therefore, P 3 = k 2 C 1 + C 1 C 2 = ( 1 / C 2 + 1 / k 2 ) P 1 P3=k^2C1+C1C2=(1/C2+1/k^2)P1
(if k for larger with respect to C2, then P 3 P 1 / k 2 P3≈P1/k^2 )
Here Insert Picture Description
The following figure fromMobileNet-v1
Here Insert Picture Description

3. Summary

  • Packet convolution : compression \ propto g
  • Separable convolution depth : crushing effect \ propto k 2 k^2
  • The equivalent of the separable convolution depth and general packet convolution convolution which is in charge of two-dimensional convolution, which is responsible for adjusting the channel
  • 1 × 1 filter should be used can be traced NIN (, 2013) , then a number of networks (including GoogleNet and the ResNet ) such design uses network design (such as the amount of compression parameter)
  • MoblieNet from V1 to V3, structural adjustments in the details, see the relevant papers for
Published 52 original articles · won praise 4 · Views 2156

Guess you like

Origin blog.csdn.net/qq_42191914/article/details/103491048