Networks Slimming-Learning Efficient Convolutional Networks through Network Slimming
2017 ICCV article, belong to channel pruning.
Innovation:
- Using the scaling factor γ batch normalization factor of importance as, i.e. the smaller the gamma], the corresponding channel is less important, can be cut (pruning).
- Constraint γ size, increase in the object equation a regular item on γ, so pruning can be done automatically in training, which is not available in the previous model of compression.
Network slimming, using the scaling factor γ BN layer, among the training process to measure the importance of the channel, the channel will be important for exclusion, reaches the compressed size of the model, the effect of improving the operation speed.
FIG model look left among the training model, is an intermediate scaling factors, which is among the BN layer scaling factor gamma], when gamma] is small (as shown in 0.001,0.003), corresponding to the channel will be cut give the model shown on the right.
The objective function as follows:
The first prediction model is generated by the loss, the second term is used to constrain the γ, [lambda] is a measure of two super-reference, will be given later in the experiment, generally or 1e-4 to 1e-5 . g (*) uses g (s) = | s | , is the L1 norm, sparse effect can be achieved.
Overall flow diagram is shown below:
divided into three parts, the first step of training; the second step, pruning; a third step, fine tuning the model pruning loop.
Specific details of the operation:
gamma] is usually taken 1e-4, or 1e-5, specific conditions,
after gamma] obtained, how should cut, gamma] How small is small? Energy used here almost in proportion similar to PCA, the γ current layer are all together, and then arranged in descending order, selecting that part of the larger, typically about 70% selecting (specific conditions).
Effect of γ λ selection as shown: