[Deep Learning] 反向传播算法(Backpropagation Algorithm)

Chain Rule(链式法则)


Case 1

如果有:

y = g ( x )        z = h ( y ) y = g(x)\ \ \ \ \ \ z = h(y)

那么“变量影响链”就有:

Δ x Δ y Δ z \Delta x\rightarrow \Delta y \Delta z

因此就有:

d z d x = d z d y d y d x \frac{d z}{d x} = \frac{d z}{d y}\frac{d y}{d x}

Case 2

如果有:

y = g ( s )        y = h ( s )        z = k ( x , y ) y = g(s)\ \ \ \ \ \ y = h(s)\ \ \ \ \ \ z=k(x,y)

那么“变量影响链”就有:

因此就有:

d z d s = z x d x d s + z y d y d s \frac{d z}{d s} = \frac{\partial z}{\partial x}\frac{d x}{ds} + \frac{\partial z}{\partial y}\frac{d y}{ds}

Backpropagation(反向传播算法)——实例讲解


定义

反向传播(英语:Backpropagation,缩写为BP)是“误差反向传播”的简称,是一种与最优化方法(如梯度下降法)结合使用的,用来训练人工神经网络的常见方法。 该方法对网络中所有权重计算损失函数的梯度。 这个梯度会反馈给最优化方法,用来更新权值以最小化损失函数。(误差的反向传播)——维基百科

说明

假设现在有N个样本数据,那么实际上损失函数可以表示为:

L ( θ ) = n = 1 N l n ( θ ) L(\theta) = \sum_{n=1}^Nl^n(\theta)

其中 θ \theta 为需要学习的参数。

那么现在 ω \omega L L 进行偏微分,实际上是对每个样本数据的损失函数 l ( θ ) l(\theta) 进行偏微分后再求和:

L ( θ ) ω = n = 1 N l n ( θ ) ω \frac{\partial L(\theta)}{\partial \omega} = \sum_{n=1}^N\frac{\partial l^n(\theta)}{\partial \omega}

用代数表示为:

z 1 = ω 11 x 1 + ω 12 x 2 + b 1             a 1 = σ ( z 1 ) z_1 = \omega_{11}x_1 + \omega_{12}x_2 + b_1 \ \ \ \ \ \ \ \ \ \ \ a_1 = \sigma(z_1)

z 2 = ω 21 x 1 + ω 22 x 2 + b 2             a 2 = σ ( z 2 ) z_2 = \omega_{21}x_1 + \omega_{22}x_2 + b_2 \ \ \ \ \ \ \ \ \ \ \ a_2 = \sigma(z_2)

z 3 = ω 31 a 1 + ω 32 a 2 + b 3             a 3 = σ ( z 3 ) z_3 = \omega_{31}a_1 + \omega_{32}a_2 + b_3 \ \ \ \ \ \ \ \ \ \ \ a_3 = \sigma(z_3)

z 4 = ω 41 a 1 + ω 42 a 2 + b 4             a 4 = σ ( z 4 ) z_4 = \omega_{41}a_1 + \omega_{42}a_2 + b_4 \ \ \ \ \ \ \ \ \ \ \ a_4 = \sigma(z_4)

z 5 = ω 51 a 3 + ω 52 a 4 + b 5             y 1 = σ ( z 5 ) z_5 = \omega_{51}a_3 + \omega_{52}a_4 + b_5 \ \ \ \ \ \ \ \ \ \ \ y_1 = \sigma(z_5)

z 6 = ω 61 a 3 + ω 62 a 4 + b 6             y 2 = σ ( z 6 ) z_6 = \omega_{61}a_3 + \omega_{62}a_4 + b_6 \ \ \ \ \ \ \ \ \ \ \ y_2 = \sigma(z_6)

那么我们实际要计算的是:

l ω = z ω l z \frac{\partial l}{\partial \omega} = \frac{\partial z}{\partial \omega}\frac{\partial l}{\partial z}

即分别计算出 z ω \frac{\partial z}{\partial \omega} l z \frac{\partial l}{\partial z} :

Step 1:Forward Pass

这个过程实际上是计算Neural Network的所有 z i ω i 1 \frac{\partial z_i}{\partial \omega_{i1}} z i ω i 2 \frac{\partial z_i}{\partial \omega_{i2}} ,即:

z 1 ω 11 = x 1                z 1 ω 12 = x 2 \frac{\partial z_1}{\partial \omega_{11}} = x_1 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{\partial z_1}{\partial \omega_{12}} = x_2

z 2 ω 21 = x 1                z 2 ω 22 = x 2 \frac{\partial z_2}{\partial \omega_{21}} = x_1 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{\partial z_2}{\partial \omega_{22}} = x_2

z 3 ω 31 = a 1                z 3 ω 32 = a 2 \frac{\partial z_3}{\partial \omega_{31}} = a_1 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{\partial z_3}{\partial \omega_{32}} = a_2

z 4 ω 41 = a 1                z 4 ω 42 = a 2 \frac{\partial z_4}{\partial \omega_{41}} = a_1 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{\partial z_4}{\partial \omega_{42}} = a_2

z 5 ω 51 = a 3                z 5 ω 52 = a 4 \frac{\partial z_5}{\partial \omega_{51}} = a_3 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{\partial z_5}{\partial \omega_{52}} = a_4

z 6 ω 61 = a 3                z 6 ω 62 = a 4 \frac{\partial z_6}{\partial \omega_{61}} = a_3 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{\partial z_6}{\partial \omega_{62}} = a_4

如果用具体数值表示的话,那就是下图所示:

因为这个过程必须从输入 x 1 x_1 x 2 x_2 开始到输出,否则无法计算出之后的 a 1 a_1 a 2 a_2 a 3 a_3 a 4 a_4 ,所以这个过程叫做Forward Pass

Step 2:Backward Pass

这个过程是计算 l z \frac{\partial l}{\partial z} 的过程,如果我们按照Step 1中的过程来计算的话,就会有如下过程:

l z 1 = a 1 z 1 l a 1 \frac{\partial l}{\partial z_1} = \frac{\partial a_1}{\partial z_1}\frac{\partial l}{\partial a_1}

其中,

a 1 z 1 = σ ( z 1 ) \frac{\partial a_1}{\partial z_1} = \sigma'(z_1)

l a 1 = z 3 a 1 l z 3 + z 4 a 1 l z 4 = ω 31 l z 3 + ω 41 l z 4 \frac{\partial l}{\partial a_1} = \frac{\partial z_3}{\partial a_1}\frac{\partial l}{\partial z_3}+\frac{\partial z_4}{\partial a_1}\frac{\partial l}{\partial z_4} = \omega_{31}\frac{\partial l}{\partial z_3} + \omega_{41}\frac{\partial l}{\partial z_4}

即:

l z 1 = σ ( z 1 ) ( ω 31 l z 3 + ω 41 l z 4 ) \frac{\partial l}{\partial z_1} = \sigma'(z_1)(\omega_{31}\frac{\partial l}{\partial z_3} + \omega_{41}\frac{\partial l}{\partial z_4})

同理,有:

l z 2 = σ ( z 2 ) ( ω 32 l z 3 + ω 42 l z 4 ) \frac{\partial l}{\partial z_2} = \sigma'(z_2)(\omega_{32}\frac{\partial l}{\partial z_3} + \omega_{42}\frac{\partial l}{\partial z_4})

因此,如果我们要计算出 l z 1 \frac{\partial l}{\partial z_1} l z 2 \frac{\partial l}{\partial z_2} ,我们还要先计算 l z 3 \frac{\partial l}{\partial z_3} l z 4 \frac{\partial l}{\partial z_4} ,可以想象出来,我们再计算 l z 3 \frac{\partial l}{\partial z_3} l z 4 \frac{\partial l}{\partial z_4} 的过程中,肯定还要计算 l z 5 \frac{\partial l}{\partial z_5} l z 6 \frac{\partial l}{\partial z_6}

没错,这是一个递归过程!这还只是个比较简单的例子,如果是比较复杂的深度神经网络的话,时间复杂度必然是很高的,所以说,不能用Forward Pass的方法计算 l z \frac{\partial l}{\partial z}

(重点来了!!!)

现在,如果你仔细端详 l z 1 = σ ( z 1 ) ( ω 31 l z 3 + ω 41 l z 4 ) \frac{\partial l}{\partial z_1} = \sigma'(z_1)(\omega_{31}\frac{\partial l}{\partial z_3} + \omega_{41}\frac{\partial l}{\partial z_4}) 这个式子,你会发现,式子的形式是不是很像神经元的形式: l z 3 \frac{\partial l}{\partial z_3} l z 4 \frac{\partial l}{\partial z_4} 作为输入 ω 31 \omega_{31} ω 41 \omega_{41} 作为权重,而 σ ( z 1 ) \sigma'(z_1) 可以看作是一个数值放大器,放大了 ω 31 l z 3 + ω 41 l z 4 \omega_{31}\frac{\partial l}{\partial z_3} + \omega_{41}\frac{\partial l}{\partial z_4} 的结果!如下图:

因此,计算 l z \frac{\partial l}{\partial z} 的过程可以用如下图来表示:

这种方法就是Backward Pass,这样就不会出现刚才所说的递归了!

summary

通过Forward Pass计算得到的 z ω \frac{\partial z}{\partial \omega} 以及Backward Pass计算得到的 l z \frac{\partial l}{\partial z} ,就可以得到 l ω \frac{\partial l}{\partial \omega}

至此,“反向传播算法”及公式推导的过程总算是结束啦!我觉得这种思路还是比较好接受的,毕竟是受了“大木博士”的熏陶哈哈。

发布了36 篇原创文章 · 获赞 3 · 访问量 6229

猜你喜欢

转载自blog.csdn.net/Oh_MyBug/article/details/104377696