Model Pruning
-
Related paper: Learning Efficient Convolutional Networks through Network Slimming (ICCV 2017)
-
Consider a question, the convolutional layer in the deep learning model has a lot of features, will there be
存在一些没有价值的特征
related connections? And how to judge whether a feature and its connection are valuable?
Give the answer first:
在 Batch Normalize 层的缩放因子上施加 L1 正则化
·
advantage
- Does not require any changes to existing CNN architectures
- Use L1 regularization to push the value of the BN scaling factor towards zero
- enables us to identify
不重要的通道
(or neurons) since each scaling factor corresponds to a specific convolutional channel (or neuron of a fully connected layer) - This facilitates channel-level pruning in the next step
- enables us to identify
- Additional regularization terms rarely hurt performance. Not only that, but in some cases, it leads to higher generalization accuracy
- Pruning unimportant channels may sometimes temporarily degrade performance, but this effect can be compensated by subsequent fine-tuning of the pruned network
- After pruning, the resulting narrower network is more compact than the original wide network in terms of model size, runtime memory, and computational operations. The above process can be repeated several times