Yongxing tensorflow -8 notes back propagation (BP) and the optimization function Detailed

Here Insert Picture Description

First, what is the back-propagation?

Back propagation (English: Backpropagation, abbreviated as BP) is the "error back propagation" for short, and an optimization method (such as gradient descent) used in combination, a common method used to train the artificial neural network. The calculation method on the network all the weight loss function gradient calculation. This will be fed back to the gradient optimization method for updating the weights to minimize the loss function. The main gradient descent algorithm is executed on a neural network. The algorithm calculates the pre press propagation mode (and cache) the output of each node, and then traversing back propagation embodiment of FIG loss function value calculated partial derivatives with respect to each parameter.

Second, the back-propagation What is the role?

Here Insert Picture Description
The method in the network all the weight loss function gradient calculation. This will be fed back to the gradient optimization method for updating the weights to minimize the loss function.
Backpropagation is required to output a desired known for each input value obtained by calculating the gradient of the loss function. Thus, it is generally considered to be a supervised learning method, although it is also used in some unsupervised network (such as automatic encoder) in. It is a generalization of Multilayer Feedforward Neural Networks Delta Rule, the chain rule can be calculated for each iteration gradient. Backpropagation artificial neural required (or "node") differentiable function of excitation.

Third, the back-propagation how it works?

BP网络的结构降法的基础上。BP网络的输入输出关系实质上是一种映射关系:一个 输入m输出的BP神经网络所完成的功能是从 一维欧氏空间向m维欧氏空间中一有限域的连续映射,这一映射具有高度非线性。它的信息处理能力来源于简单非线性函数的多次复合,因此具有很强的函数复现能力。这是BP算法得以应用的基础。
反向传播算法主要由两个环节(激励传播、权重更新)反复循环迭代,直到网络的对输入的响应达到预定的目标范围为止。
BP算法的学习过程由正向传播过程和反向传播过程组成。在正向传播过程中,输入信息通过输入层经隐含层,逐层处理并传向输出层。如果在输出层得不到期望的输出值,则取输出与期望的误差的平方和作为目标函数,转入反向传播,逐层求出目标函数对各神经元权值的偏导数,构成目标函数对权值向量的梯量,作为修改权值的依据,网络的学习在权值修改过程中完成。误差达到所期望值时,网络学习结束。
Here Insert Picture Description
每次迭代中的传播环节包含两步:

  • (前向传播阶段)将训练输入送入网络以获得激励响应;
  • (反向传播阶段)将激励响应同训在这里插入图片描述练输入对应的目标输出求差,从而获得隐层和输出层的响应误差。
    Here Insert Picture Description
    Here Insert Picture Description
    Here Insert Picture Description
    Here Insert Picture Description
    权重更新:
    对于每个突触上的权重,按照以下步骤进行更新:
    Here Insert Picture Description
  • 将输入激励和响应误差相乘,从而获得权重的梯度;
    Here Insert Picture Description
  • 将这个梯度乘上一个比例并取反后加到权重上。
    Here Insert Picture Description
  • 这个比例将会影响到训练过程的速度和效果,因此称为“学习率”或“步长”。梯度的方向指明了误差扩大的方向,因此在更新权重的时候需要对其取反,从而减小权重引起的误差。
    Here Insert Picture Description

四、如何选择好BP的优化函数?

反向传播训练方法:以减小 loss 值为优化目标,有梯度下降、momentum 优化 器、adam 优化器等优化方法。
这三种优化方法用 tensorflow 的函数可以表示为:

  • train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
  • train_step=tf.train.MomentumOptimizer(learning_rate, momentum).minimize(loss)
  • train_step=tf.train.AdamOptimizer(learning_rate).minimize(loss)
1、梯度下降(一般最常用)

tf.train.GradientDescentOptimizer()使用随机梯度下降算法,使参数沿着 梯度的反方向,即总损失减小的方向移动,实现更新参数。
Here Insert Picture Description

2、tf.train.MomentumOptimizer()在更新参数时,利用了超参数。

Here Insert Picture Description

3、tf.train.AdamOptimizer()

使用时可以不填写完所有参数

tf.train.AdamOptimizer.__init__(
	learning_rate=0.001, 
	beta1=0.9, 
	beta2=0.999, 
	epsilon=1e-08, 
	use_locking=False, 
	name='Adam'
)

Here Insert Picture Description
是一个寻找全局最优点的优化算法,引入了二次方梯度校正。
利用自适应学习率的优化算法,Adam 算法和随 机梯度下降算法不同。随机梯度下降算法保持单一的学习率更新所有的参数,学习率在训练过程中并不会改变。而 Adam 算法通过计算梯度的一阶矩估计和二 阶矩估计而为不同的参数设计独立的自适应性学习率。
相比于基础SGD算法,1.不容易陷于局部优点。2.速度更快

4、指数衰减学习率:

如果学习率太小,那么我们每次训练之后得到的效果都太小,这无疑增大了我们的无谓的时间成本。如果如图右所示,学习率太大,那我们有可能直接跳过最优解,进入无限的训练中。所以解决的方法就是,学习率也需要随着训练的进行而变化。
Tensorflow提供了一种灵活的学习率设置—指数衰减法。先从一个较大的学习率开始快速得到一个比较优的解,然后随着迭代的继续逐步减小学习率:

decayed_learning_rate = learning_rate(decay_rate^(global_steps/decay_steps)

tensorflow 函数:

current_learning_rate = tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=True/False)
  • current_learning_rate: 当前使用的学习率
  • learning_rate: initial learning rate
  • decay_rate: attenuation coefficient
  • decay_steps: attenuation step
  • global_step: Total number of steps in training
  • strircase: parameter to true will
    Here Insert Picture Description
    be rounding to the learning rate will be decreased stepwise

【借鉴的出处:Principles of training multi-layer neural network using backpropagation ,URL:http://www.cs.unibo.it/~roffilli/pub/Backpropagation.pdf】

Published 45 original articles · won praise 28 · views 10000 +

Guess you like

Origin blog.csdn.net/m0_43505377/article/details/103936404