A more detailed record of the use of NNI model pruning compression

Using the Microsoft nni tool, based on the pytorch model compression record summary, the final conclusion is an experiment a long time ago, there may be some errors, or the NNI tool has been updated. But the overall process of nni model pruning and compression is like this, here is a brief record.

1. Model pruning and training

NNI can realize automatic pruning and training of various pruning algorithms. For example: SlimPruner, L1FilterPruner, L2FilterPruner, FPGMPruner, LevelPruner, AGP_Pruner and other pruning algorithms.

1. Train the network structure model normally on your own data set.

2. After training, call NNI's model pruning API, select the pruning algorithm to be implemented, and configure config_list, including sparsity, op_types, op_names parameters, and configure the correct parameters.

 

   3. Based on the previously trained model and configured parameters, perform finetune training of the pruned model.

  4. After the pruning model finetune training is completed, two files are obtained.

The pruner file and Mask file are the same size as the original model.

Experiment and conclusion analysis

The VGG19 and VGG16 officially provided by nni are trained based on the cifar10 dataset, and are pruned using SlimPruner and L1FilterPruner. Both training and pruning Finetune are performed for only 10 epochs, which can achieve the same effect, and the verification set acc is 80%.

Using the resnet network and training based on the cifar10 data set, the effect before pruning can also be achieved after using the above pruning.

Using CRNN network structure and L1FilterPruner pruning algorithm, on the synthetic digital data set, before and after pruning, the ACC during training can reach 98%.

2. Model compression and forward acceleration

To truly realize the compression of the model file size and the acceleration of forward inference, it is necessary to call the model speedup API of NNI to complete. Currently, this part of NNI is in the test version, and the number of supported models, OP and pruning algorithms is limited. Currently, only coarse-grained ones are supported. pruning. To get small models and forward acceleration Pytorch version >=1.3.1 is required.

1. The model weight is multiplied by the Mask.

2. Use nni's api to replace the moudle to get a smaller model and faster inference speed.

      For PyTorch, only replacement moudle is currently provided, if it is a function in forward, it is not currently supported. One solution is to turn the function into a PyTorch module

When performing model compression and acceleration, Nni itself will resolve some Mask conflicts, but some conflicts cannot be resolved, such as some conflicts of BN.

 

3 . Save small model with network structure

    Using the resnet18 network, in the Cifar10 dataset, use L1FilterPruner to prune. The model size changed from 42.7M to 7.68M, the Input was (1, 3, 128,128) Flops from 596.151M to 129.464M, and Params from 11.182M to 2.002M.

In terms of inference speed, using the input of 128*128 size, the measured resnet18, the acceleration of 32 iterations is not obvious, when a small number of iterations, the acceleration effect is 0.009 after 2 iterations, and 0.004 after acceleration (algorithm training platform test, sometimes fast and sometimes slow) . The official ( 64, 3, 32, 32) input is measured as follows, and the acceleration of 32 iterations is not obvious.

Using the CRNN network structure, although the training of the pruned model can be performed, when the model is compressed, some OPs cannot support it for the time being, such as dimension transformation.

Guess you like

Origin blog.csdn.net/qq_36276587/article/details/113126527