Depth articles - Neural Networks (c) network topology and train the neural network

Skip to main content

Returns the neural network directory

Previous: Depth articles - Neural Network (b)  on ANN and the common activation function DNN

Next: Depth articles - Neural Network (iv)  elaborate tuning neural network

 

In this section, elaborate network topology and neural network is trained, the next section elaborate tuning neural network

 

Two. ANN and DNN

3. Network Topology

(1) A single layer network

     

 

(2) The multilayer network

     

     Multi-layer network has at least one hidden layer, there may be a plurality of hidden layers. There bias value, you can make the function do not have to go through the origin.

 

(3) whether it is ANN, or DNN, network connections are fully connected.

      Hidden layer activation function and the activation function of the output layer may be the same or different, according to needs. output value of the output may be a single layer, or may be a multi-value output.

 

4. train the neural network

(1) The two stages of training the neural network

  ①. Phase Forward

        That is the stage before the forward propagation

        The forward propagation data set and the activation function, the current  \large w value of the loss function to calculate the loss

  ②. After the stage

        That is, after back-propagation to stage

        Using the chain rule of the back propagation  \large w values derivative, substituting the data set, the gradient is obtained \large g

  ③. Chain rule

        If the function  \large u = \varphi (t), \; v = \psi (t)  at the point  \large t can be turned on, \large z = f(u, v) then

            \large \frac{\partial z}{\partial t} = \frac{\partial z}{\partial u} \times \frac{\partial u}{\partial t} + \frac{\partial z}{\partial v} \times \frac{\partial v}{\partial t}

  ④. Chestnuts

     a. a chestnut

         There are  \large f(x, y, z) = (x + y) \cdot z, and  \large x = -2, y = 5, z = -4.

          Solution: Let  \large q = x + y, then \large f = q \cdot z

                  Forward propagation:

                      \large \because x = -2, y = 5, z = -4

                     \large \therefore q = x + y = 3, \;\;\; f = q \cdot z = -12

                  Back-propagation:

                      \large \frac{\partial f}{\partial q} = z = -4

                      \large \frac{\partial f}{\partial z} = q = 3

                      \large \frac{\partial q}{\partial x} = 1

                     \large \frac{\partial q}{\partial y} = 1

                     \large \frac{\partial f}{\partial x} = \frac{\partial f}{\partial q} \cdot \frac{\partial q}{\partial x} = z \cdot 1 = -4

                     \large \frac{\partial f}{\partial y} = \frac{\partial f}{\partial q} \cdot \frac{\partial q}{\partial y} = z \cdot 1 = -4

                 Graphic:

                  

 

     b. 栗二  Sigmoid 函数

           有 \large w = (w_{0}, w_{1}, w_{2}) \;\;\;\; x = (x_{0}, x_{1}, x_{2})

           且 \large w_{0} = -3, \; w_{1} = 2, \; w_{2} = -3

                \large x_{0} = 1,\; x_{1} = -1, \; x_{2} = -2

                \large z = wx

                \large g(z) = \frac{1}{1 + e^{-z}}

                \large f(w, x) = \frac{1}{1 + e ^{-(w_{1}x_{1} + w_{2}x_{2} + w_{0}x_{0})}}

           解:令 \large u = w_{1}x_{1}, \;\; v = w_{2}x_{2}

                   正向传播

                      \large f(w, x) = \frac{1}{1 + e^{-(w_{1}x_{1} + w_{2}x_{2} + w_{0}x_{0})}}

                                         \large = \frac{1}{1 + e^{-[2 \times (-1) + (-3) \times (-2) + (-3) \times 1]}}

                                         \large = \frac{1}{1 + e^{-1}}

                                         \large = 0.73

                    反向传播

                        \large \frac{\partial f}{\partial z} = g(z)(1 - g(z)) = 0.20

                       \large \frac{\partial f}{\partial w_{1}} = \frac{\partial f}{\partial z} \cdot \frac{\partial z}{\partial w_{1}} = 0.20 \cdot x_{1} = -0.2

                      \large \frac{\partial f}{\partial w_{2}} = \frac{\partial f}{\partial z} \cdot \frac{\partial z}{\partial w_{2}} = 0.20 \cdot x_{2} = -0.4

                      \large \frac{\partial f}{\partial w_{0}} = \frac{\partial f}{\partial z} \cdot \frac{\partial z}{\partial w_{0}} = 0.20 \cdot x_{0} = 0.2

                   图解:

                      

                        图解这里,没有 \large x_{0} 出现,是因为,\large x_{0} = 1,对 \large w_{0} 的乘积和导数都没影响,所以,可以人为的忽略它。但是,它实际上,是存在的。

 

(2). 梯度下降法

    根据反向传播求得的梯度 \large g,对 \large w 进行更新操作:

           \large w_{t + 1} = w_{t} - \alpha g_{t}

            \large w_{t+1}:为 \large t+1 时刻的 权重值 \large w

            \large w_t:为 \large t 时刻的权重值 \large w

            \large \alpha:为学习率

            \large g_{t}:为 \large t 时刻的梯度 \large g

 

(3). 神经网络的具体流程

    输入:训练集数据

    输出:一组最优解 \large w^{*}

  ①. 随机一组 \large w_{0} (这里的 \large w_{0} 是指 0 时刻 \large w 的值,和上面栗二 的 \large w_{0} 不同)

  ②. 根据当前 \large w_{t} 进行正向传播,求取 \large l(w_{t}) 的损失函数

         \large if \; |l(w_{t} - l(w_{t + 1}))| < \varepsilon :

                 输出(保存) \large w_{t}

         \large else:

                 反向传播求出梯度 \large g_{t},更新 \large w_{t}

                 \large w_{t + 1} = w_{t} - \alpha g_{t}

  ③. 反复迭代 ②,直到输出最优解 \large w^{*}

 

(4). 总结

      训练神经网络其实和单层的梯度下降法的思想是一样的:

          都是先初始化一组 \large w_{0}

          利用当前的 \large w_{t} 进行正向传播求出 \large l(w_{t})

          再根据梯度下降法求出梯度 \large g

          对 \large w 进行更新操作 \large w = w - \alpha g

          反复迭代,直至 \large |l(w_{t} - l(w_{t+1}))| < \varepsilon

          输出一组最优解 \large w^{*}

      只是,单层的梯度下降的导数关系相对简单点,容易求出梯度值 \large g;而神经网络的导数关系复杂,对于链式法则更依赖。用一句话总结神经网络的训练:正向求损失,反向求梯度,往复迭代,求出最优解 \large w^{*}

 

学到这里,已经掌握了ANN 和 DNN 的大体框架了,可以开始训练了。但是,训练是可以训练,可是,也许很难得到较好的 \large w^{*},为了得到这个较好的 \large w^{*},我们还需要对神经网络进行调优。这,也是我接下来为大家分析讲解的,有需要或有兴趣的,请点击下一小节观看。

 

 

 

                

 

 

返回主目录

返回神经网络目录

上一章:深度篇——神经网络(二)  ANN与DNN 和常用激活函数

下一章:深度篇——神经网络(四)  细说 调优神经网络

发布了42 篇原创文章 · 获赞 15 · 访问量 2768

Guess you like

Origin blog.csdn.net/qq_38299170/article/details/104118921