Pruning model study notes 1 - Networks Slimming-Learning Efficient Convolutional Networks through Network Slimming

Networks Slimming-Learning Efficient Convolutional Networks through Network Slimming

2017 ICCV article, belong to channel pruning.
Innovation:

  1. Using the scaling factor γ batch normalization factor of importance as, i.e. the smaller the gamma], the corresponding channel is less important, can be cut (pruning).
  2. Constraint γ size, increase in the object equation a regular item on γ, so pruning can be done automatically in training, which is not available in the previous model of compression.

Network slimming, using the scaling factor γ BN layer, among the training process to measure the importance of the channel, the channel will be important for exclusion, reaches the compressed size of the model, the effect of improving the operation speed.
FIG model look left among the training model, is an intermediate scaling factors, which is among the BN layer scaling factor gamma], when gamma] is small (as shown in 0.001,0.003), corresponding to the channel will be cut give the model shown on the right.
Here Insert Picture Description

The objective function as follows:
Here Insert Picture Description
The first prediction model is generated by the loss, the second term is used to constrain the γ, [lambda] is a measure of two super-reference, will be given later in the experiment, generally or 1e-4 to 1e-5 . g (*) uses g (s) = | s | , is the L1 norm, sparse effect can be achieved.

Overall flow diagram is shown below:
Here Insert Picture Description
divided into three parts, the first step of training; the second step, pruning; a third step, fine tuning the model pruning loop.

Specific details of the operation:
gamma] is usually taken 1e-4, or 1e-5, specific conditions,
after gamma] obtained, how should cut, gamma] How small is small? Energy used here almost in proportion similar to PCA, the γ current layer are all together, and then arranged in descending order, selecting that part of the larger, typically about 70% selecting (specific conditions).

Effect of γ λ selection as shown:
Here Insert Picture Description

Published 29 original articles · won praise 12 · views 10000 +

Guess you like

Origin blog.csdn.net/c2250645962/article/details/103753205