Exploration and Solution of Gradient Vanishing Problem

Author: Zen and the Art of Computer Programming

1 Introduction

Deep learning (Deep Learning) is currently a very hot machine learning direction. It has not only achieved great success in the fields of images and text, but also demonstrated strong performance in many fields such as automatic driving, sign language recognition, virtual reality, and medical diagnosis. strength. However, another important characteristic of deep learning is that its parameter scale and training time are very large. The gradient disappearance problem is caused by too many parameters, that is, too many parameters or too many layers in the neural network cause some layers to learn a unified optimal solution and it is difficult to learn other more complex parameter configurations, thus lead to a decrease in the performance of the entire model. Therefore, how to solve the problem of gradient disappearance is the key to improving the effect of deep learning models. This article will systematically explain the gradient disappearance problem through the definition, cause analysis, typical case analysis and solution of the gradient disappearance problem, and verify it by implementing the gradient repair algorithm through TensorFlow.

2. Definition of vanishing gradient problem

In deep learning, when the number of weight parameters of each layer of the neural network or the number of model parameters increases, it will bring two problems: 1. The speed of model
training slows down; 2. The accuracy of the model decreases.

Among them, the second problem - Gradient Vanishing Problem (Gradient Vanishing Problem) refers to that during the training process of the deep neural network, as the depth of the network deepens, the activation value of the neuron is getting closer to 0, and the change of its output value also becomes extremely small, a phenomenon known as vanishing gradient. In layman's terms, after the number of parameters increases, the activation value of neurons is saturated, causing the gradient value to tend to 0, making the optimization process very difficult.

The key to solving the problem of gradient disappearance is to reduce the number or amount of model parameters. However, in fact, the activation value range of each layer of neurons is controlled so that they will not be completely saturated, further alleviating the problem of gradient disappearance. In general, restrictions can be made in the following aspects:

1. Use the ReLU activation function, which can make the output of neurons close to 0&#x in the negative interval

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132256028