Convolution neural network activation function

        Currently basis of all neural networks and support vector machine is put forward by Perceptron 1957. Perceptron (Perceptron) is a second-class classification linear classification model, which is an example of the input feature vector, the output of the class instances, taking values ​​of +1 and -1 two. Space corresponding to the input perceptron (feature space) will be divided into separate instance of plus or minus two hyperplane discriminant model belongs. But Perceptron can not be divided XOR this simple Boolean operations. And later joined the activation function will only perceive machine to save the back, only now but also a variety of neural networks.

        Activation function (activation function) to activate a certain part of the neural network neurons is running, the activation information passed back under the layer of the neural network. Non-linear neural network has been able to solve the problem, in essence, is to activate the function joined the nonlinear factors make up the expressive linear model, the "feature activation of neurons in the" reserved and mapped to the next level through the function.

        Because the mathematical foundations of neural networks is everywhere differentiable, so choose the activation function to be able to ensure that the data input and output is differentiable. Then the activation function in TensorFlow is how to express it? Activation function does not change the dimensions of the input data, that is, the input and output dimensions are the same. TensorFlow included smooth nonlinear activation function, such as sigmoid, tanh, elu, softplus and softsign, including but not everywhere differentiable continuous function relu, relu6, crelu and relu_x, and random regularization function dropout.

        Are input to the activation function is to be calculated x (a tensor), and x are the same output data type tensor. Common activation function has a sigmoid, tanh, relu and softplus these four. Here we have to explain one by one.

(1) sigmoid function

        This is one of the traditional activation function most commonly used neural network (the other is tanh), and the image corresponding to the formula shown below.


        Use as follows:

import tensorflow as tf

a = tf.constant([[1.0, 2.0], [1.0, 2.0], [1.0, 2.0]])
sess = tf.Session()
print(sess.run(tf.sigmoid(a)))

        The results are as follows:

[[0.7310586 0.880797 ]
 [0.7310586 0.880797 ]
 [0.7310586 0.880797 ]]

        advantage is that the sigmoid function, mapped within its output (0,1), monotone continuously, is suitable for use as the output layer, and relatively easy derivation. However, it also has disadvantages, because the soft saturation , once the input falls within a saturation region, f '(x) becomes close to 0, it is prone gradient disappears .

    软饱和是指激活函数 h(x)在取值趋于无穷大时,它的一阶导数趋于 0。硬饱和是指当|x| > c 时,其中 c 为常数,f '(x)=0。relu 就是一类左侧硬饱和激活函数。

    梯度消失是指在更新模型参数时采用链式求导法则反向求导,越往前梯度越小。最终的结果是到达一定深度后梯度对模型的更新就没有任何贡献了。

(2) tanh function

        And an image corresponding to the formula as shown below.

        tanh function may also have a soft saturation. Because its output is centered at 0, the convergence rate faster than the sigmoid. But this does not solve the problem gradient disappears.

 (3) relu function is the most popular function is activated

        ReLU Alex function, though not for the first time put forward, but it was in 2012 when AlexNet session of image recognition contest winner was made known to the public. softplus can be seen as a smoothed version of ReLU. relu defined as f (x) = max (x, 0). softplus defined as f (x) = log (1 + exp (x)).

        Seen from the FIG., Relu at x <0 hard saturation. Since x> 0 the derivative is 1, so, relu can> remain at x 0 gradient does not decay, thus easing the gradient disappears problem, it can also be more convergence, and provides the ability to express sparse neural networks. However, as the training progresses, part of the input will fall hard saturation region, leading to corresponding weight can not be updated, known as "neuronal death."

a = tf.constant([-1.0, 2.0])
with tf.Session() as sess:
    b = tf.nn.relu(a)
    print(sess.run(b))

        result:

[0. 2.]

(4) dropout function.

        A neuron probability keep_prob will decide whether to be suppressed. If inhibited, the output of the neuron to be 0; if not suppressed, then the neuron output values ​​are amplified to the original 1 / keep_prob times.

        在默认情况下,每个神经元是否被抑制是相互独立的。但是否被抑制也可以通过 noise_shape 来调节。当 noise_shape[i] == shape(x)[i]时, x 中的元素是相互独立的。如果 shape(x) = [k, l, m, n],x 中的维度的顺序分别为批、行、列和通道,如果 noise_shape = [k, 1, 1, n],那么每个批和通道都是相互独立的,但是每行和每列的数据都是关联的,也就是说,要不都为 0,要不都还是原来的值。使用示例如下:
 

a = tf.constant([[-1.0, 2.0, 3.0, 4.0]])
print(a)
with tf.Session() as sess:
    b1 = tf.nn.dropout(a, 0.5, noise_shape = [1,4])
    print(sess.run(b1))
    b1 = tf.nn.dropout(a, 0.5, noise_shape = [1,4])
    print(sess.run(b1))
    
    b2 = tf.nn.dropout(a, 0.5, noise_shape = [1,1])
    print(sess.run(b2))
    b2 = tf.nn.dropout(a, 0.5, noise_shape = [1,1])
    print(sess.run(b2))

        结果:

Tensor("Const_26:0", shape=(1, 4), dtype=float32)
[[-0.  0.  0.  0.]]
[[-2.  0.  0.  0.]]
[[-2.  4.  6.  8.]]
[[-0.  0.  0.  0.]]

激活函数的选择:

        当输入数据特征相差明显时,用 tanh 的效果会很好,且在循环过程中会不断扩大特征效果并显示出来。当特征相差不明显时, sigmoid 效果比较好。同时,用 sigmoid 和 tanh 作为激活函数时,需要对输入进行规范化,否则激活后的值全部都进入平坦区,隐层的输出会全部趋同,丧失原有的特征表达。而 relu 会好很多,有时可以不需要输入规范化来避免上述情况。因此,现在大部分的卷积神经网络都采用 relu 作为激活函数。

PS:全部摘自李嘉璇的<<TensorFlow 技术解析与实战>>第四章4.7.1节

Guess you like

Origin blog.csdn.net/a857553315/article/details/93378086