Andrew Ng deep learning course notes -2

The third week shallow neural network

3.1 Overview of Neural Networks

The two logistic classifier cascade can get a very simple neural network.

Represents a new symbol to distinguish different predetermined intermediate variable layer \ (z ^ {[l]}, W ^ {[l]}, b ^ {[l]} \), distinct from the sample indicates \ (x ^ { (i)} \)

Comb: each sample in the training set to go through the neural network, and computing the final output loss function, loss of all samples to be accumulated, and then the gradient back-propagation, updated parameters. Such a step to be recycled several times until optimal.

3.2 & 3.3 & 3.4 & 3.5 neural network representation and calculation

As shown is a simple neural network comprising an input layer, a hidden layer and output layer, since only the hidden layer and output layer comprises parameters, so called two-layer neural network.

Andrew Ng machine learning courses at each level adds an additional constant neurons connect it right to the next layer of each neuron heavy as b, and b deemed this course each neuron in a single parameter, nature Like, representing different.

Neural network outputs may be calculated by a matrix operation, mainly achieved by the quantization.

The eigenvector matrix of samples composed of a plurality of samples, different samples represent lateral, longitudinal represent different neurons, such as:

\( Z^{[l]} = [ z^{[l](1)}, z^{[l](2)}, \dots, z^{[l](m)} ] \)

By Andrew Ng's words, when a matrix multiplication, is bristling sweep through all neurons, sideways sweep over all samples.

3.6 & 3.7 & 3.8 activation function

The optional activation function include:

  • sigmoid function: \ (a = g (z) = \ frac {1} {1 + e ^ {- z}} \)
  • tanh 函数: \ (a = g (z) = \ frac {e ^ {z} - e ^ {- z}} {e ^ {z} + e ^ {- z}} \)
  • ReLU: \( a = g(z) = max(0, z) \)
  • Leaky ReLU: \( a = g(z) = max(0.01z, z) \)

sigmoid函数一般不再用在隐藏层,tanh函数的效果要更好,但在二分类问题中可以用在输出层;现在ReLU还是隐藏层最常用的激活函数,如果不知道如何选择激活函数,闭着眼睛选ReLU,它还有Leaky ReLU等变体,又引入了其他的参数。

为什么要用激活函数呢?没有激活函数的神经网络输出只是输入的线性加权,没有激活函数的隐藏层不如直接去掉隐藏层,没有激活函数的神经网络么得灵魂!

自己动手求一下上面激活函数的导数,或者...直接抄答案:

  • sigmoid的导数:\( g'(z) = g(z)(1 - g(z)) \)
  • tanh的导数:\( g'(z) = 1 - g^2(z) \)
  • ReLU的导数:\( g'(z) = \begin{cases} 0 & \text{if } z < 0 \\ 1 & \text{if } z \ge 0 \end{cases} \)
  • Leaky ReLU的导数:\( g'(z) = \begin{cases} 0.01 & \text{if } z < 0 \\ 1 & \text{if } z \ge 0 \end{cases} \)

 其实后两个函数在\(z=0\)处并不可导,但是可以人为指定一个值,这并不影响最后结果。(概率论知识,一点的概率为0)

 3.9 & 3.10 神经网络的梯度下降法

只有一个隐藏层的神经网络梯度反向传播公式:

\( \begin{aligned} dz^{[2]} &= a^{[2]} - y \\ dW^{[2]} &= dz^{[2]}a^{[1]^T} \\ db^{[2]} &= dz^{[2]} \\ dz^{[1]} &= W^{[2]^T}dz^{[2]} * g^{[1]'}(z^{[1]}) \\ dW^{[1]} &= dz^{[1]}x^T \\ db^{[1]} &= dz^{[1]} \end{aligned} \)

注意关于公式中\(dz\)等表述的意思在前一篇博客介绍过了,是代价函数对各中间变量的导数。梯度反向传播最重要的就是\(dz\)的计算与反向传播,在吴恩达机器学习课程中曾用\(\delta\)表示,其实就是代价函数对各层\(z\)变量的导数,\(z\)表示各层神经元中通过激活函数之前的变量。

在理解向量化公式时,关注矩阵的形状变化很有作用。

进一步,多样本的梯度反向传播:

 \( \begin{aligned} dZ^{[2]} &= A^{[2]} - Y \\ dW^{[2]} &= \frac{1}{m}dZ^{[2]}A^{[1]^T} \\ db^{[2]} &= \frac{1}{m}np.sum(dZ^{[2]},axis=1,keepdims=True) \\ dZ^{[1]} &= W^{[2]^T}dZ^{[2]} * g^{[1]'}(Z^{[1]}) \\ dW^{[1]} &= \frac{1}{m}dZ^{[1]}X^T \\ db^{[1]} &= \frac{1}{m}np.sum(dZ^{[1]},axis=1,keepdims=True)  \end{aligned} \)

3.11 随机初始化

如果神经网络的权重全部初始化为0,那么无论网络训练多久,每个神经元对应的权重都在进行同样的变化,即“对称”。所以初始化权重有两个最基本的原则:

  • 随机初始化。
  • 初始权重尽量小。防止梯度饱和,训练速度减慢。 

Guess you like

Origin www.cnblogs.com/tofengz/p/12234137.html