[Model] Compression Deep Compression, mixing classic paper in various ways

Paper:Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Papers Link: https://arxiv.org/abs/1510.00149

ICLR 2016 the best paper, by pruning, quantization, Huffman coding to do three-hybrid compression model.


Introduction

image

The whole algorithm is the main process of FIG three parts:

1. Pruning: A very small part of the weight is set to 0, the weight matrix into a sparse matrix.

2. Quantification: The weight retained after pruning quantized value, so that right after the pruning value used to retain the shared value, which can reduce the space used to save weight, further compressing the storage space required.

3. The Huffman coding: Huffman coding is a form of encoding, to further reduce the storage space required for data storage.

 

Pruning

image

image

The CSR original matrix divided into three parts, AA, JA, IC
original size n × n sparse matrix 2a + n + 1 is represented by numerical values.

On the basis of CSR and CSC, based on the value of the index into the absolute coordinate offset, said reducing storage

image

Pruning implementation process:

1. Set a threshold value, the absolute value is greater than this threshold weight value is retained, the other weights are set to zero.
2. Using compressed storage sparse matrix stored in memory after the pruning weight matrix (e.g. CSR, CSC).

 

Quantization


image

Process:

Quantification of implementation:

1. Initialize k-means centroid: Initial k-means a great influence on the result of the centroid, there are three methods: a uniform quantization, the quantization and stochastic quantized according to the density, that the use of uniform quantization initialization better.

2. Determine the quantization threshold value: determining a weight for each of which the output of the quantization used instead.

3. fine-tune: the centroid of the k-means of further fine-tuning.

image

For k-means the centroid of the fine-tuning:

Since pruning effect, has become a sparse matrix, the weight matrix is ​​0 indicates that the connection is removed, so the position of these gradients are discarded.

image

image

n is the number of weights, b is the number of bits of the original value of each weight, k is the number of quantization cluster

Quantification:
After completing quantify, the original sparse matrix into a sparse matrix plus a look-up table to achieve the purpose of compression.

 

Huffman Coding

image

The figure shows the length distribution prior to compression and after compression

 

Experiment

Deep Compression is possible without loss of accuracy the compression parameters to 35 to 49 times.

 

Published 49 original articles · won praise 41 · views 30000 +

Guess you like

Origin blog.csdn.net/DL_wly/article/details/99058255