Binary neural network model to quantify review 2020

BNN review 2020

Binary neural network computational cost saving storage and deployment on the edge of a computing device. However, the binarization result in serious loss of information; it is difficult to optimize the discontinuity.
Directly into binarized original solution, and the use of the quantization error is minimized, improving the network and reduce the loss function gradient error techniques such optimized solution.
We will also explore other practical aspects of binary neural networks, such as hardware-friendly design and training techniques.
Including image classification, object detection and segmentation semantic different tasks discussed and evaluated. Looking to the future research challenges likely to face.

background

A method for compressing the deep network can be divided into five categories: pruning parameter, quantization parameter, a low rank decomposition, migration / compression distillation convolution filter and knowledge.

It is a binary 1 bit quantization data which has only two possible values, i.e. 1 (0) or +1. After using the binary compression, network weights and can be activated by an express, without taking up too much memory.
Further, by binarizing the binary network may use lightweight and Bitcount bit XNOR operations instead of floating-point multiply-add arithmetic onerous.
Using a convolution of XNOR-Net acceleration on the CPU can reach 58 times, the compression ratio up to 32 times.
Here Insert Picture Description

classification

Here Insert Picture Description

Simple binary neural networks

BinaryConnect Binarized Neural Network
Bitwise Neural Network

3.2 based optimization of binary neural networks

The usual practice is to reduce the weight of the quantization error and activation. This is a simple solution to
quantify mechanism similar to the standard, that is,Quantization parameter should be as close to full precision parametersAnd binary neural network model is desirable performance close to full accuracy of the model.

3.2.1 minimize quantization error

Here Insert Picture Description
Here Insert Picture Description

3.2.2 improve network loss function

General binary programs focus only on accurate approximation partial float, while ignoring the impact of the global binary parameters loss. Concerned only one layer is difficult to ensure the final output after a series of layers of accuracy. Therefore, the global training network needs to be able to consider the binary and specific mission objectives.

Increase the perceived loss items

LAB)(INQ)

Knowledge Distillation

(DQ) (DBNN)CI-BCNN
Here Insert Picture Description
Here Insert Picture Description

3.2.3 gradient error reduction

Obvious mismatch between the gradient function and the sign of the gradient STE generated. In addition, there is a problem: [- 1, + 1] parameter outside the range will not be updated.
Here Insert Picture Description

Improved quantification gradient solver

Bi-Real ApproxSign provides a custom function to replace the sign function to calculate the gradient back propagation of

Solving improvements before vectorization

Gong et al soft quantization (DSQ) A differential method, the quantized soft quantization function instead of the traditional functions:
Here Insert Picture Description

3.3 efficient binary neural network computing architecture

Here Insert Picture Description

3.5 binary neural network training tips

Converter from the network configuration, optimization, and ultra-parameter selection, gradient approximation and asymptotic aspects summarized common quantization efficient binary neural network training method widely used in the literature.

3.5.1 Network restructuring

Model binarization and activation of the right weight to {1, + 1}. This is actually equivalent to the data regularization, so that binarized data distribution changes, the network structure is to adjust the data distribution scheme of reasonable changes effective.

The layers reorder network performance may be improved binary neural networks

  • Use pooled layer immediately after the layer can be avoided by convolution of the largest pool after binarization leading to loss of information. Experiments show that the rearrangement position has greatly improved in accuracy.

  • TSQ and the quantization operation HWGQ inserted before all batches normalized to correct the data layer. After this transformation, the quantized input obey stable distribution (sometimes close to Gaussian distribution), therefore the mean and variance remains within reasonable limits, and the training process becomes smoother.

  • Bi-Real subsequent network to be connected to each input feature FIG convolved

  • Widening the low accuracy network (WRPN), which increases the number of network filter in each layer, thereby changing the distribution of the data

3.5.2 Optimizer and ultra preferences

Adam can make use of the training process better, faster, and a smoothing coefficient of the second derivative is particularly critical.

If the history information is not considered a fixed learning rate optimization, such as stochastic gradient descent (SGD) algorithm, the need to use a larger batch of data to improve performance.
At the same time, batch normalization settings momentum factor is also critical. By comparing the accuracy of results under different momentum coefficients, it can be found necessary to appropriately set parameters batches normalized to adapt the binarization jitter caused by the operation.

3.5.3 asymptotic quantization

Since the quantization operation have a negative impact on training, many asymptotic methods using quantitative strategies, increasing to quantify the extent to minimize loss of performance by the parameters of the binary lead.
For example, the parameters are grouped INQ, and to gradually increase the number of quantization group participation to achieve progressive quantization based group.
Towards effective low-bitwidth convolutional neural network is proposed to lower the accuracy of quantization precision, to compensate for a gradient of the training process quantization error parameters.

3.5.4 gradient approximation

Because through the use of an estimator, usually in the presence of the gradient back propagation of error. Found close binarization function approximation function is a simple and practical solution.

Result analysis

classification

Here Insert Picture Description

Here Insert Picture Description

CIBCNN CBCN BCGD

analysis

  • Activation of the impact of large binary
    main motivation PACT study of RAD and so on. After adding a reasonable distribution of activation regularization, will reduce the harmful effects caused by activation of the binary pair, followed by natural increase accuracy.
  • Robustness its structure two neural networks are highly relevant
    connection structure proposed Bi-Real and blocks made wider in WRPN in that the information is substantially made possible through the entire network. Although structural modifications may increase the amount of computation, but benefit from the XNOR-Bitcount operation, they still can be significantly accelerated.
  • More specifically designed for the special characteristics of BNN
    such as XNOR-Net ++, CBCN, Self -Binarizing Networks, BENN so on.
  • General Procedure
    for example a scale factor, reverse smoothing approximation, additional structural connections. Since these methods are simple and low coupling embodiment,
    by design or learning fine quantizer, some complex calculations or even multi-stage training pipeline, which sometimes hardware unfriendly and difficult to reproduce.

Detect

Here Insert Picture Description

analysis

In classification tasks, the network is more concerned about the global feature, while ignoring local characteristics caused by the loss of the binary. However, the more important local features among other tasks. Therefore, when designing a binary neural networks for other tasks, we need to pay more attention to local features characteristic map.

https://mp.weixin.qq.com/s/QGva6fow9tad_daZ_G2p0Q
quantify review 2018
https://www.jiqizhixin.com/articles/2018-06-01-11
https://chenrudan.github.io/blog/2018 /10/02/networkquantization.html

Published 452 original articles · won praise 271 · views 730 000 +

Guess you like

Origin blog.csdn.net/qq_35608277/article/details/104905801