Depth study: the activation function

Original Address

Activation function is a key node promote deep learning development.

Deep learning framework, particularly those based on artificial neural network framework can be traced back neocognitron 1980 Kunihiko Fukushima proposed [ 11] , and the history of artificial neural networks more distant. In 1989, Jan Le Qiuen (Yann LeCun), who began the 1974 proposed standard back-propagation algorithm [ 12] applied to the depth neural network, the network is used for handwriting recognition zip code. Although the algorithm is executed successfully, but the computational cost is huge, neural network training time reached three days, and therefore can not be put into practical use [ 13] . Many factors led to this slow process of training, one of which is composed of Jurgen Schmid Huber student Sapp · Huokelaite proposed in 1991 gradient disappearing [ 14] [ 15] .

Quoted from Wikipedia _ deep learning

Early networks are straight, layers of calculation performed by the weighted summation of the way.

1580873633422

This network is also known as MLP, it was later proved unable to solve nonlinear problems, it has also become a reason for restricting the depth of that time learning development.

Appears to activate the function of this situation changes, the role of activation function is to break it into bent straight (linear) of the (non-linear).

Function is a nonlinear function of depth, but it is not the original input X , but after the weighting calculation result (linear process), the process can be represented by the following formula

1580874296061

Where f () is a nonlinear function.

  • Common activation function

1580874420615

  • Select the activation function

You can even create your own incentive function to handle their own problems, but to ensure that these incentives must be differentiable function, because when backpropagation error back pass, only these differentiable activation function in order to pass back error .

Want appropriate use of these activation function, there are still tricks, such as when your only two three-layer neural network, not a lot of time, for the hidden layer, use any activation function, casually breaking the bend is possible, there will be no particularly affected. However, when you use a special multi-layer neural network, in turn breaking when the play are not free to choose the weapon because it would involve an explosion gradient, gradient go away.

In convolutional neural network convolution Convolutional neural networks layer, the excitation function is recommended relu. Recurrent neural networks in a recurrent neural network, or tanh is recommended relu.

Quoted from what is excitation function (Activation Function)

references

What is the incentive function (Activation Function)

Published 78 original articles · won praise 9 · views 3808

Guess you like

Origin blog.csdn.net/BBJG_001/article/details/104219180