General convolution kernel for compression and acceleration model

Introduce a common method for convolution kernel compression model recently read, just checked, the original author of the blog in https://zhuanlan.zhihu.com/p/82710870  presentations, papers portal  https: / /papers.nips.cc/paper/7433-learning-versatile-filters-for-efficient-convolutional-neural-networks.pdf  , GitHub open source address  https://github.com/huawei-noah/Versatile-Filters

Here's my understanding.

This model compression method is divided into two steps, one is in a spatial-wise, step in channel-wise, in fact, is the same idea, which also corresponds to our usual two-step a filter, the assumption is 7x7x24x100, then 7x7 is the corresponding spatial-wise, 24 is the corresponding channel-wise. 24 is a depth of input feature, this change can not go on. From the spatial point of view, the 100 input of filters, each of which is 7x7x24, in fact, no need to have 100, 25 can be, how do you say? 7x7 This filter can be split into 7x7,5x5,3x3,1x1, is the four sub-convolution kernels, so there are actually filter 25 Ge 7x7x24, it can transform the filter out of 100. The 7x7 how split it? Below, in an example of 5x5, that is the intermediate (b) in FIG, 5x5 convolution kernel is the largest part of the sub-3x3 convolution kernel is green plus the intermediate portion, actually part of a 5x5 convolution kernel parameters, the equivalent of the blue circle to fill out the child 0,1x1 convolution kernel is the middle of the red dots, which are made of three sub-convolution kernel convolution kernel transformation from a 5x5, shared parameters, in actual fact, calculated they can also be implemented to calculate share, compared with three actual 5x5 convolution kernel, is to calculate the amount of the province, the size parameter is also less, memory requirements are reduced.

 

 Another is channel-wise compression, the basic idea is identical with the above, this time is split in this direction depth 24, for example, 1 to 22 as a sub convolution kernel is a 3-24 sub convolution kernel, so that there may be two convolution kernels, the spatial-wise compression binding of the above, for as long as the filter 13 can be a 7x7x24. And spatial-wise difference is that last time the 7x7 convolution deity, this time without the whole 24 of the channel. Such amount is calculated to be saved, only the original 22/24, and in the middle there are some of the results can be reused, and actually save some more.

 

 Above it is probably the essence of this approach. He tested his idea on the image super-resolution image classification and a few classic network, proved the effectiveness of this method.

=========================================

About this method, I have some questions, in fact, this approach reduces the degrees of freedom parameter, then why he can achieve and before the network the same effect? Network parameters is not before there are a lot of redundancy, or in fact, this parameter selection is meaningful, in fact, is based on a priori.

 

Guess you like

Origin www.cnblogs.com/sunny-li/p/11565136.html