ARTICLE two classic network lightweight compression methods: packet convolution (Group Convolution) and depth separable convolution (depthwise separable convolution)
references:
- Short: ZJC Brother's small essay
- Details: R.JD analysis of MobileNet
0. convolution
For ordinary purposes of convolution, convolution Filters layer determines the amount of parameters, wherein
- Dimensions (h1, w1) and the input / output of the (H, W) related
- C1 is the same as the number of channels of the input channels
- The number of channels c2 and the same number of output
Thus, the amount of common parameters convolution layer:
1. Packet convolution (Group Conv)
Packet from the first convolution concept AlexNet , basic principle as FIG.
At that time only because of memory GTX580 3GB, so AlexNet run on two GPU, in order to meet the needs of its large amount of parameters
- The input / output channels divided by the g groups is the dimension (H1, W1, C1 / g) and (H2, W2, C2 / g)
- Thereby also filters g groups C2 / g of dimensions (h1, w1, C1 / g) of the filters
In fact, the equivalent of the original convolution layer split into g convolution parallel layers , in fact, we did not change the number of filters, but each filter is only responsible for C1 / g channels of information, and thus the amount of parameters becomes:
2. separable convolution depth (Depthwise Seperable Conv)
In MobileNet , the depth of the separable convolution described as two steps: the depth of convolution (Depthwise Conv) and point-wise convolution (Pointwise Conv)
- Convolution depth , in fact, above a special case of packet convolution , so that C2 = C1, g = C1, the packet will become a deep convolution convolution. Plainly, the depth of each channel is the convolution of the input convolutional alone do so ( Hl, W1 of, a C1) → ( H2 of, W2 of, a C1)
- Pointwise convolution , in fact, an ordinary convolutional a special case , only convolution kernel size of 1 × 1 only; it means using a 1 × 1 channel adjusting Filters do so (H2 of, W2 of, a C1 ) → (H2 of, W2, C2 )
Parameter calculation:
- Convolution amount of depth parameters : h1 × w1 × 1 × C1
- Convolution parameters stagnation amount : 1 × 1 × C1 × C2
therefore,
(if k for larger with respect to C2, then
)
The following figure fromMobileNet-v1
3. Summary
- Packet convolution : compression g
- Separable convolution depth : crushing effect
- The equivalent of the separable convolution depth and general packet convolution convolution which is in charge of two-dimensional convolution, which is responsible for adjusting the channel
- 1 × 1 filter should be used can be traced NIN (, 2013) , then a number of networks (including GoogleNet and the ResNet ) such design uses network design (such as the amount of compression parameter)
- MoblieNet from V1 to V3, structural adjustments in the details, see the relevant papers for