Activation function is a key node promote deep learning development.
Deep learning framework, particularly those based on artificial neural network framework can be traced back neocognitron 1980 Kunihiko Fukushima proposed [ 11] , and the history of artificial neural networks more distant. In 1989, Jan Le Qiuen (Yann LeCun), who began the 1974 proposed standard back-propagation algorithm [ 12] applied to the depth neural network, the network is used for handwriting recognition zip code. Although the algorithm is executed successfully, but the computational cost is huge, neural network training time reached three days, and therefore can not be put into practical use [ 13] . Many factors led to this slow process of training, one of which is composed of Jurgen Schmid Huber student Sapp · Huokelaite proposed in 1991 gradient disappearing [ 14] [ 15] .
Quoted from Wikipedia _ deep learning
Early networks are straight, layers of calculation performed by the weighted summation of the way.
This network is also known as MLP, it was later proved unable to solve nonlinear problems, it has also become a reason for restricting the depth of that time learning development.
Appears to activate the function of this situation changes, the role of activation function is to break it into bent straight (linear) of the (non-linear).
Function is a nonlinear function of depth, but it is not the original input X , but after the weighting calculation result (linear process), the process can be represented by the following formula
Where f () is a nonlinear function.
- Common activation function
- Select the activation function
You can even create your own incentive function to handle their own problems, but to ensure that these incentives must be differentiable function, because when backpropagation error back pass, only these differentiable activation function in order to pass back error .
Want appropriate use of these activation function, there are still tricks, such as when your only two three-layer neural network, not a lot of time, for the hidden layer, use any activation function, casually breaking the bend is possible, there will be no particularly affected. However, when you use a special multi-layer neural network, in turn breaking when the play are not free to choose the weapon because it would involve an explosion gradient, gradient go away.
In convolutional neural network convolution Convolutional neural networks layer, the excitation function is recommended relu. Recurrent neural networks in a recurrent neural network, or tanh is recommended relu.
Quoted from what is excitation function (Activation Function)
references