Pruning basics and actual combat (3): model pruning and sparse training process

Model Pruning

insert image description here

Give the answer first: 在 Batch Normalize 层的缩放因子上施加 L1 正则化·

advantage

  • Does not require any changes to existing CNN architectures
  • Use L1 regularization to push the value of the BN scaling factor towards zero
    • enables us to identify 不重要的通道(or neurons) since each scaling factor corresponds to a specific convolutional channel (or neuron of a fully connected layer)
    • This facilitates channel-level pruning in the next step
  • Additional regularization terms rarely hurt performance. Not only that, but in some cases, it leads to higher generalization accuracy
  • Pruning unimportant channels may sometimes temporarily degrade performance, but this effect can be compensated by subsequent fine-tuning of the pruned network
  • After pruning, the resulting narrower network is more compact than the original wide network in terms of model size, runtime memory, and computational operations. The above process can be repeated several times

Guess you like

Origin blog.csdn.net/weixin_38346042/article/details/132395397