Depth study common activation function

  Previous article describes the role of activation function, wanted to sort out the depth of field of study together common activation function. In deep learning, a common activation function, there are three: the Sigmoid function, tanh function, ReLU functions are described separately below.

  1.sigmoid function (S-type growth function)

  can be sigmoid function value of $ - Number (\ infty, + \ infty) $ $ maps to (0,1) $. And the following graphic formula: $$ S (x) = \ frac {1} {1 + e ^ {- z}} $$

  often a sigmoid function is not used as a nonlinear activation function, it has the following disadvantages:

  1. When x is very large or very small, the derivative of the sigmoid function close to 0, which will cause the weight a gradient close to zero, so that the gradient is very slow to update.

  2. The output of functions not as mean 0, will not be easy to calculate the lower layer.

  In short: sigmoid function can be used in the final layer of the network, as the output layer binary, try not to use the hidden layer.

  

  2.tanh function (hyperbolic tangent function)

  Functions common than sigmoid, the value of $ (- \ infty, + \ infty) $ are mapped to the number of $ (- 1,1) $. And the following graphic formula: $$ \ tanh (x) = \ frac {e ^ ze ^ {- z}} {e ^ z + e ^ {- 1}} $$

Guess you like

Origin www.cnblogs.com/chester-cs/p/11691509.html