Mixed-Precision Training of Deep Neural Networks

By: zengzeyu 2019.01.16

1. Introduction

Due to loss of precision FP16 question, if we calculate the network parameters directly in the form of FP16 in the training process of the neural network, the situation could lead to the emergence of numerical instability model performance. In this regard, Baidu Institute and NVIDIA in Mixed Precision Training proposed mixed-precision training of this paper method, while taking full advantage of the FP16 advantage of accelerated computing to ensure the accuracy of the model. Here are the main points of their paper.

FP32 master copy of weigths

FP32 master copy that is a copy of the network maintenance FP32 FP16 precision accuracy parameters. Calculations shown below, the forward propagation process using the parameters obtained by FP16 precision master copy type conversion operation performed; after calculating the gradient, gradient acting to counter-propagating to the accuracy in the master copy FP32 parameter update.

FP32 master copyFP32 master copy

There are two main reasons to do so.

The first is that if the parameter update FP16 precision directly in the presence of a gradient resulting in too small a value of 0 is updated. The figure is a histogram of the gradient order model parameters summarized network training process, 5% of the gradient value is distributed within a range less than 2-24. If the optimizer to direct this part of the learning rate multiplied by the gradient effect on the accuracy of the updated parameter FP16, then the updated value will be zero. This will affect the accuracy of the model. But if the updated value is applied to the FP32 precision master copy, then overflow to 0 at the updated value does not appear.

If the second parameter is compared to its updated value is too large, it may also be due to floating point adder mechanism resulting updated value is zero. In floating-point addition process, the number needed to align the two operated. If the parameter is the size of 2048 times or more its update value, then the updated value of decimal places to the right of at least 11 need to align with the former, which is beyond the range expressed FP16 accuracy. This is generally not a problem in FP32 precision.

  1. gradient histogramgradient histogram

  2. loss scaling

    Scaling loss value loss will soon enlarge to ensure between them falling gradient back propagation process can be represented FP16 accuracy range.

    FIG Multibox SSD is the Network Activation gradient distribution means during the training process. Wherein there is a gradient of 67% falls within the range of less than 2-24, accuracy can not be expressed in FP16. If the gradient does not amplify, training the network will result in divergence FP16 accuracy. Activation gradient amplification, and then fold reduction parameters corresponding gradient can solve this problem.

    activation gradientsactivation gradients

    The chain rule of calculating the gradient of the gradient of the easiest way is an enlarged amplification loss. Amplification factor of no fixed size selection criteria, for the above Multibox SSD network, the authors attempted amplification factor of 8-32K are training a success. It represents the upper limit as long as the gradient does not exceed the amplified accuracy FP16 (65504), choose a larger amplification factor of side effects. [1]

  1. Nvidia Reources: https://github.com/NvidiaResources/nvidia_mixed_precision_training
  2. Nvidia: Mixed-Precision Training of Deep Neural Networks: https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/

3. Method

2.1 Direct modification can be achieved in the framework of python2 pytorch training in:

https://github.com/suvojit-0x55aa/mixed-precision-pytorch

2.2 Use Nvidia apex library

Library Address: https://github.com/NVIDIA/apex

Nvidia official apex Tutorial: http://on-demand.gputechconf.com/gtc-cn/2018/pdf/CH8302.pdf

Reference

  1. Mixing precision training

Original: Big Box  Mixed-Precision Training of Deep Neural Networks


Guess you like

Origin www.cnblogs.com/chinatrump/p/11606744.html