Depth study cornerstone: to understand an article back propagation

https://blog.csdn.net/goldfish288/article/details/79835550

Original Address: https: //mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

 

And gradually reverse the spread of examples

background

Back propagation neural network is a common method of training, before this has been thorough enough understanding of this article let me be completely get to know the details of the back-propagation.

Overview

For this tutorial, we will use two inputs, two hidden neurons, two output neurons of the neural network. In addition, hidden and output neurons will include a bias.

The basic structure is as follows:

neural_network(7)

In order to use some numbers, here is the initial weight, prejudice and training inputs / outputs:

neural_network(9)

Goal back propagation is to optimize the weight, so that the neural network can learn how to properly map any input to the output.

For the remainder of this tutorial, we will use a single training set: 0.05 and 0.10 for a given input, the output of the neural network we want 0.01 and 0.99.

Forward Pass

First, let's take a look at the neural network to predict what is currently given 0.05 and 0.10 of weights and bias. To this end, we will feed these inputs through the network forward.

We calculate the total net input to each hidden layer neuron activation function of the total input net (here we use the logical function) Squash and then repeat the above process with the output neuron layer.

Total net input is also known as just a net importer of some sources .

Here's how we calculate the total net investment H_1:

net_ {h1} = w_1 * i_1 + w_2 * i_2 + b_1 * 1

net_ {h1} = 0.15 * 0.05 + 0.2 * 0.1 + 0.35 * 1 = 0.3775

Then we use the logic function to compress it in order to get the following output H_1:

out_ {h1} = \ frac {1} {1 + e ^ { -  net_ {h1}}} = \ frac {1} {1 + e ^ { -  0.3775}} = 0.593269992

Follow the same procedure, H_2we get:

out_ {h2} = 0.596884378

We repeat this process for the output layer neurons, using the output of the hidden layer neurons as input.

Here is the output O-1:

net_ {o1} = w_5 * out_ {h1} + w_6 * out_ {h2} + b_2 * 1

net_ {o1} = 0.4 * 0.593269992 + 0.45 * 0.596884378 + 0.6 * 1 = 1.105905967

out_ {o1} = \ frac {1} {1 + e ^ { -  net_ {o1}}} = \ frac {1} {1 + e ^ { -  1.105905967}} = 0.75136507

And performs the same process, 0-2we get:

out_ {o2} = 0.772928465

Calculation of the error

现在我们可以使用平方误差函数来计算每个输出神经元的误差,并将它们相加得到总误差:

E_ {total} = \ sum \ frac {1} {2}(目标 - 输出)^ {2}

有些来源将目标称为 理想,而输出 则以实际为目标。
\压裂{1} {2}是包括的,以便指数在我们稍后区分时被取消。无论如何,结果最终会乘以学习率,所以我们在这里引入一个常数并不重要[  1 ]。

例如,目标输出为O-10.01,但神经网络输出为0.75136507,因此其误差为:

E_ {o1} = \ frac {1} {2}(target_ {o1} -out_ {o1})^ {2} = \ frac {1} {2}(0.01-0.75136507)^ {2} = 0.274811083

重复这个过程0-2(记住目标是0.99),我们得到:

E_ {o2} = 0.023560026

神经网络的总误差是这些误差的总和:

E_ {total} = E_ {o1} + E_ {o2} = 0.274811083 + 0.023560026 = 0.298371109

向后传递

我们使用反向传播的目标是更新网络中的每个权重,使它们使实际输出更接近目标输出,从而最大限度地减少每个输出神经元和整个网络的误差。

输出层

考虑一下的例句。我们想知道变化的例句会影响总误差,也就是说\ frac {\ partial E_ {total}} {\ partial w_ {5}}

\ frac {\ partial E_ {total}} {\ partial w_ {5}}读作“部分的衍生物 E_ {}总相对于 W_ {5}”。你也可以说“关于梯度 W_ {5}”。

通过应用链式规则,我们知道:

\ frac {\ partial E_ {total}} {\ partial w_ {5}} = \ frac {\ partial E_ {total}} {\ partial out_ {o1}} * \ frac {\ partial out_ {o1}} {\部分net_ {o1}} * \ frac {\ partial net_ {o1}} {\ partial w_ {5}}

在视觉上,这是我们正在做的事情:

output_1_backprop(4)

我们需要找出这个方程中的每一部分。

首先,总误差相对于输出的变化有多大?

E_ {total} = \ frac {1} {2}(target_ {o1} -out_ {o1})^ {2} + \ frac {1} {2}(target_ {o2}  -  out_ {o2})^ { 2}

\ frac {\ partial E_ {total}} {\ partial out_ {o1}} = 2 * \ frac {1} {2}(target_ {o1}  -  out_ {o1})^ {2-1} * -1 + 0

\ frac {\ partial E_ {total}} {\ partial out_ {o1}} =  - (target_ {o1}  -  out_ {o1})=  - (0.01  -  0.75136507)= 0.74136507

- (目标出) 有时表达为  超出目标
当我们取总偏差的偏导数时 OUT_ {} O1,数量 \ frac {1} {2}(target_ {o2}  -  out_ {o2})^ {2}变为零,因为 OUT_ {} O1它不影响它,这意味着我们正在取一个常数为零的导数。

接下来,O-1相对于其总净投入的变化输出多少?

逻辑函数的偏导数是输出乘以1减去输出:

out_ {o1} = \ frac {1} {1 + e ^ { -  net_ {o1}}}

(1-out_ {o1})= 0.75136507(1-0.75136507)= 0.186815602

最后,关于O1变化的总净投入是的例句多少?

net_ {o1} = w_5 * out_ {h1} + w_6 * out_ {h2} + b_2 * 1

\ frac {\ partial net_ {o1}} {\ partial w_ {5}} = 1 * out_ {h1} * w_5 ^ {(1-1)} + 0 + 0 = out_ {h1} = 0.593269992

把它放在一起:

\ frac {\ partial E_ {total}} {\ partial w_ {5}} = \ frac {\ partial E_ {total}} {\ partial out_ {o1}} * \ frac {\ partial out_ {o1}} {\部分net_ {o1}} * \ frac {\ partial net_ {o1}} {\ partial w_ {5}}

\ frac {\ partial E_ {total}} {\ partial w_ {5}} = 0.74136507 * 0.186815602 * 0.593269992 = 0.082167041

您经常会看到以delta规则的形式组合这个计算:

\ frac {\ partial E_ {total}} {\ partial w_ {5}} =  - (target_ {o1}  -  out_ {o1})* out_ {o1}(1  -  out_ {o1})* out_ {h1}

或者,我们有\ frac {\ partial E_ {total}} {\ partial out_ {o1}}\ frac {\ partial out_ {o1}} {\ partial net_ {o1}}可以写成\ frac {\ partial E_ {total}} {\ partial net_ {o1}},又名\ delta_ {O1}(希腊字母三角洲)aka 节点三角洲。我们可以用它来重写上面的计算:

\ delta_ {o1} = \ frac {\ partial E_ {total}} {\ partial out_ {o1}} * \ frac {\ partial out_ {o1}} {\ partial net_ {o1}} = \ frac {\ partial E_ {total}} {\ partial net_ {o1}}

\ delta_ {o1} =  - (target_ {o1}  -  out_ {o1})* out_ {o1}(1  -  out_ {o1})

因此:

\ frac {\ partial E_ {total}} {\ partial w_ {5}} = \ delta_ {o1} out_ {h1}

有些来源提取负号,\三角洲所以它会写成:

\ frac {\ partial E_ {total}} {\ partial w_ {5}} =  -  \ delta_ {o1} out_ {h1}

为了减少误差,我们从当前权重中减去这个值(可选地乘以一些学习率eta,我们将其设置为0.5):

w_5 ^ {e} * \ frac {\ partial E_ {total}} {\ partial w_ {5}} = 0.4  -  0.5 * 0.082167041 = 0.35891648

有些  来源使用 \α(alpha)来表示学习率, 其他来源使用  \ ETA(eta), 其他使用 \小量(epsilon)。

我们可以重复这个过程中获得新的权重w_6w_7以及w_8

w_6 ^ {+} = 0.408666186

w_7 ^ {+} = 0.511301270

w_8 ^ {+} = 0.561370121

我们将新权重引入隐含层神经元之后,我们执行神经网络中的实际更新(即,当我们继续下面的反向传播算法时,我们使用原始权重,而不是更新的权重)。

隐藏层

接下来,我们将继续为新的计算值,向后传递W_1W_2w_3,和W_4

大图片,这是我们需要弄清楚的:

\ frac {\ partial E_ {total}} {\ partial w_ {1}} = \ frac {\ partial E_ {total}} {\ partial out_ {h1}} * \ frac {\ partial out_ {h1}} {\ partial net_ {h1}} * \ frac {\ partial net_ {h1}} {\ partial w_ {1}}

视觉:

NN-计算

我们将使用与输出层类似的过程,但略有不同,以说明每个隐藏层神经元的输出对多个输出神经元的输出(并因此产生误差)的贡献。我们知道这OUT_ {} H1影响到两者OUT_ {} O1OUT_ {} O2因此\ frac {\ partial E_ {total}} {\ partial out_ {h1}}需要考虑它对两个输出神经元的影响:

\ frac {\ partial E_ {total}} {\ partial out_ {h1}} \ frac {\ partial E_ {o1}} {\ partial out_ {h1}} + \ frac {\ partial E_ {o2}} {\部分out_ {h1}}

从以下开始\ frac {\ partial E_ {o1}} {\ partial out_ {h1}}

\ frac {\ partial E_ {o1}} {\ partial out_ {h1}} \ frac {\ partial E_ {o1}} {\ partial net_ {o1}} * \ frac {\ partial net_ {o1}} {\部分out_ {h1}}

我们可以\ frac {\ partial E_ {o1}} {\ partial net_ {o1}}使用我们之前计算的值来计算:

\ frac {\ partial E_ {o1}} {\ partial net_ {o1}} = \ frac {\ partial E_ {o1}} {\ partial out_ {o1}} * \ frac {\ partial out_ {o1}} {\部分net_ {o1}} = 0.74136507 * 0.186815602 = 0.138498562

并且\ frac {\ partial net_ {o1}} {\ partial out_ {h1}}等于的例句

net_ {o1} = w_5 * out_ {h1} + w_6 * out_ {h2} + b_2 * 1

\ frac {\ partial net_ {o1}} {\ partial out_ {h1}} = w_5 = 0.40

将它们插入:

\ frac {\ partial E_ {o1}} {\ partial out_ {h1}} \ frac {\ partial E_ {o1}} {\ partial net_ {o1}} * \ frac {\ partial net_ {o1}} {\部分out_ {h1}} = 0.138498562 * 0.40 = 0.055399425

按照相同的过程\ frac {\ partial E_ {o2}} {\ partial out_ {h1}},我们得到:

\ frac {\ partial E_ {o2}} {\ partial out_ {h1}} = -0.019049119

因此:

\ frac {\ partial E_ {total}} {\ partial out_ {h1}} \ frac {\ partial E_ {o1}} {\ partial out_ {h1}} + \ frac {\ partial E_ {o2}} {\部分out_ {h1}} = 0.055399425 + -0.019049119 = 0.036350306

现在,我们有\ frac {\ partial E_ {total}} {\ partial out_ {h1}},我们需要弄清楚\ frac {\ partial out_ {h1}} {\ partial net_ {h1}},然后\ frac {\ partial net_ {h1}} {\ partial w}每一个权重:

out_ {h1} = \ frac {1} {1 + e ^ { -  net_ {h1}}}

(1  -  0.59326999)= 0.241300709(1  -  out_ {h1})= 0.59326999

我们计算总净投入的偏导数,H_1W_1我们对输出神经元所做的相同:

net_ {h1} = w_1 * i_1 + w_3 * i_2 + b_1 * 1

\ frac {\ partial net_ {h1}} {\ partial w_1} = i_1 = 0.05

把它放在一起:

\ frac {\ partial E_ {total}} {\ partial w_ {1}} = \ frac {\ partial E_ {total}} {\ partial out_ {h1}} * \ frac {\ partial out_ {h1}} {\ partial net_ {h1}} * \ frac {\ partial net_ {h1}} {\ partial w_ {1}}

\ frac {\ partial E_ {total}} {\ partial w_ {1}} = 0.036350306 * 0.241300709 * 0.05 = 0.000438568

你也可以看到这写成:

\ frac {\ partial E_ {total}} {\ partial w_ {1}} =(\ sum \ limits_ {o} {\ frac {\ partial E_ {total}} {\ partial out_ {o}} * \ frac { \ partial {{}} {\ partial net_ {o}} \ \ frac {\ partial net_ {o}} {\ partial out_ {h1}}})* \ frac {\ partial out_ {h1}} {\ partial net_ {h1}} * \ frac {\ partial net_ {h1}} {\ partial w_ {1}}

\ frac {\ partial E_ {total}} {\ partial w_ {1}} =(\ sum \ limits_ {o} {\ delta_ {o} * w_ {ho}})* out_ {h1}(1  -  out_ { h1})* i_ {1}

\ frac {\ partial E_ {total}} {\ partial w_ {1}} = \ delta_ {h1} i_ {1}

我们现在可以更新W_1

w1 ^ {+} = w_1  -  \ eta * \ frac {\ partial E_ {total}} {\ partial w_ {1}} = 0.15  -  0.5 * 0.000438568 = 0.149780716

重复这些W_2w_3W_4

w_2 ^ {+} = 0.19956143

w_3 ^ {+} = 0.24975114

w_4 ^ {+} = 0.29950229

Finally, we have updated all the weight! When we first input of 0.05 and 0.1, the error in the network is 0.298371109. After the first round of back propagation, the total error is now reduced to 0.291027924. It may not seem like much, but after this process is repeated 10,000 times, the error would plummet to 0.0000351085. At this time, when we advance 0.05 and 0.1, two output neurons 0.015912196 (vs 0.01 objectives) and 0.984065734 (vs 0.99 objectives).

If you've done this, and found that any of the above error, or you can think of any method to make future readers more clearly, do not hesitate to give me a note . Thank you!

Guess you like

Origin www.cnblogs.com/jukan/p/10975263.html