Machine Learning |吴恩达 公式总结(2)【傻瓜入门版】(未完稿)

Neural Networks Learning

Neural network model

Forward propagation:
这里写图片描述

cost function

notation:
a i ( j ) = “activation” of unit i in layer j
Θ ( j ) = matrix of weights controlling function mapping from layer j to layer j+1
h θ ( x ( i ) ) k = a k ( 3 ) is the activation (output value) of the k-th output unit
y = [ y k = 1 1 y k = 2 0 y k = 3 0 y k = 10 0 ] , [ 0 1 0 0 ] , o r [ 1 0 0 1 ] 是one-hot编码。
【备注:one-hot编码的优点:不同数字如(0,1,….9)之间的欧式距离是相等的。】
y k ( i ) = 第i个样本的的第j个输出单元。the k-th output unit of the i-th example
L = 网络层的总数,包括输入层与输出层。the number of the layers,include input layer and output layer

J ( θ ) = 1 m i = 1 m k = 1 k [ y k ( i ) l o g ( ( h Θ ( x ( i ) ) ) k ) + ( 1 y k ( i ) ) l o g ( 1 ( h Θ ( x ( i ) ) ) k ) ] + λ 2 m l = 1 L 1 i = 1 s l j = 1 s ( l + 1 ) ( Θ j , i ( l ) ) 2

backpropagation algorithm

Backpropagation:
这里写图片描述
Training set { ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , , ( x ( m ) , y ( m ) ) } .m examples.
Set Δ i j ( l ) = 0 (for all l,i,j) ,used to comput Θ i , j ( l ) J ( Θ ) , Cycle accumulation
For training example t=1 to m:
1. Set a ( 1 ) := x ( t )
2.执行前向传播,分别计算 a ( l ) for l = 2 , 3 , , L
3.用 y ( i ) ,通过 δ ( L ) = a ( L ) y ( t ) 计算输出层的损失。
4.Compute δ ( L 1 ) , δ ( L 2 ) , , δ ( 2 ) . δ ( l ) = ( ( Θ ( l ) ) T δ ( l + 1 ) ) . a ( l ) . ( 1 a ( i ) ) // g ( z ( l ) ) = a ( l ) . ( 1 a ( i ) )
5. Δ i , j ( l ) := Δ i , j ( l ) + a j ( l ) δ i ( l + 1 ) or with vectorization, Δ ( l ) := Δ ( l ) + δ ( l + 1 ) ( a ( l ) ) T .
ENDFOR

hense we update our new Δ matrix.

  • D i , j ( l ) := 1 m ( Δ i , j ( l ) + λ Θ i , j ( l ) ) , i f j 0
  • D i , j ( l ) := 1 m Δ i , j ( l ) , i f j = 0
    Thus we get Θ i , j ( l ) J ( Θ ) = D i , j ( l )

partial derivative work processes

g ( z ( l ) ) = a ( l ) . ( 1 a ( i ) ) 推导过程:
已知: g ( z ( l ) ) = 1 1 + e z ( l ) = a ( l )
g ( z ( l ) ) = ( s i g m o i d ( z ( l ) ) )
= ( 1 1 + e z ( l ) )
= 1 1 ( 1 + e z ( l ) ) 2 ( 1 + e z ( l ) )
= 1 ( 1 + e z ( l ) ) 2 ( e z ( l ) )
= 1 ( 1 + e z ( l ) ) 2 e z ( l ) ( z ( l ) )
= 1 ( 1 + e z ( l ) ) 2 e z ( l ) 1
= ( e z ( l ) + 1 ) 1 ( 1 + e z ( l ) ) 2
= 1 1 + e z ( l ) 1 ( 1 + e z ( l ) ) 2
= g ( z ( l ) ) ( 1 g ( z ( l ) ) )
= a ( l ) ( 1 a ( l ) )

反向传播部分的计算比较繁琐,特别作了一张简化了的网络示意图。实际效果与课程中样例是一致的。
这里写图片描述
a ( 1 ) = x ( i ) 是样本的输入层,图中示意X有两个特征,另外增加一个偏置量 a 0 ( 1 )
a ( 2 ) 层是隐藏层,有三个节点, a ( 1 ) θ ( 1 ) 后输出 z ( 2 ) , z ( 2 ) 再经过激活函数sigmoid输出 a ( 2 ) 。隐藏层添加一个偏置量 a 0 ( 2 )
a ( 3 ) 是输出层,也有激活的过程,输出的 a 1 ( 3 ) = y 1 对应公式里的 ( h θ ( x ( i ) ) ) k = 1 , a 2 ( 3 ) = y 2 对应公式里的 ( h θ ( x ( i ) ) ) k = 2 , h θ ( x ( i ) ) = [ y 1 ( i ) , y 2 ( i ) ]
回顾一下神经网络的损失函数

J ( θ ) = 1 m i = 1 m k = 1 k [ y k ( i ) l o g ( ( h Θ ( x ( i ) ) ) k ) + ( 1 y k ( i ) ) l o g ( 1 ( h Θ ( x ( i ) ) ) k ) ] + λ 2 m l = 1 L 1 i = 1 s l j = 1 s ( l + 1 ) ( Θ j , i ( l ) ) 2

计算损失:
已知前提:
1) h θ ( x ( i ) ) 就是输出层的输出结果,那么 h θ ( x ( i ) ) = a ( L ) = s i g m o i d ( z ( L ) )
2) z ( L ) = θ ( L 1 ) a ( L 1 )
3) J ( Θ ) 考虑单条训练样本省略m,将输出值作为整体计算,即将 y k 表达为 y ,同时省略正则项可以表示为: J ( θ ) = [ y l o g ( a ( L ) ) + ( 1 y ) l o g ( 1 a ( L ) ) ]
4) 输出层的误差定义为 δ ( L ) = a ( L ) y
θ ( L ) 的梯度: J ( Θ ) θ ( L ) ,在上图的3层网络中 θ ( L ) = θ ( 2 )
计算该梯度目的是为了计算 θ ( L ) := θ ( L ) J ( Θ ) θ ( L )
J ( Θ ) θ ( L ) = J ( Θ ) a ( L ) a ( L ) z ( L ) z ( L ) θ ( L )
J ( Θ ) a ( L ) = a ( L ) y ( 1 a ( L ) ) a ( L )
a ( L ) z ( L ) = a ( L ) ( 1 a ( L ) )
z ( L ) θ ( L ) = a ( L 1 )

合并: J ( Θ ) θ ( L ) = a ( L 1 ) ( a ( L ) y )
(非输出层)
δ ( l ) = a ( l ) δ ( l + 1 )

猜你喜欢

转载自blog.csdn.net/weixin_40920228/article/details/80645763
今日推荐