【论文阅读】Gradient Centralization: A New Optimization Technique for Deep Neural Networks

New work by Hong Kong Polytechnic & Ali Dharma Institute, elegant operation, one line of code embedded in optimizer to improve performance!

 ArXiv link: https://arxiv.org/abs/2004.01461

Github link: https://github.com/Yonghongwei/Gradient-Centralization

 


1. Summary

Optimization techniques are of great significance for the effective training of deep neural networks (DNN). The results show that using first- and second-order statistics (such as mean and variance) to perform Z-score normalization on network activation or weight vectors (such as batch normalization (BN) and weight normalization (WS)) can improve training performance. Different from the existing methods, this paper proposes a new optimization technique, namely gradient centralization (GC) , which realizes the direct optimization of the gradient by centralizing the gradient vector to zero mean. GC can be regarded as a projected gradient descent method with a constrained loss function. The research results show that GC can simultaneously regularize the weight space and output feature space to improve the generalization performance of DNNs. In addition, GC improves the Lipschitzness (continuity?) Of the loss function and its gradient, making the training process more efficient and stable. The implementation of GC is very simple, and it can be easily embedded into the existing gradient-based DNN optimizer (such as Adam-GC, SGD-GC, etc.) with only one line of code. It can also be used directly to fine-tune the pre-trained DNN. Experiments in applications such as general image classification, fine-grained image classification, detection, and segmentation show that GC can continuously improve DNN learning performance.


2. Realization

structure:

 algorithm:

 

Published 14 original articles · won 12 · views 717

Guess you like

Origin blog.csdn.net/qq_39478403/article/details/105474593