深度学习:神经网络从1到N

符号表示

  • Superscript [ l ] [l] denotes a quantity associated with the l t h l^{th} layer.
    • Example: a [ L ] a^{[L]} is the L t h L^{th} layer activation. W [ L ] W^{[L]} and b [ L ] b^{[L]} are the L t h L^{th} layer parameters.
  • Superscript ( i ) (i) denotes a quantity associated with the i t h i^{th} example.
    • Example: x ( i ) x^{(i)} is the i t h i^{th} training example.
  • Lowerscript i i denotes the i t h i^{th} entry of a vector.
    • Example: a i [ l ] a^{[l]}_i denotes the i t h i^{th} entry of the l t h l^{th} layer’s activations).
  • J a = d a \frac{\partial J}{\partial a } = da for any variable a a .

1. 一层神经网络

逻辑回归实际就是一层神经网络

x = ( x 1 , x 2 ) T , w = ( w 1 , w 2 ) T x=(x_1, x_2)^T,w=(w_1, w_2)^T z = w T x + b = w 1 x 1 + w 2 x 2 + b z = w^T x + b = w_1x_1+w_2x_2+b z w = x = ( x 1 , x 2 ) T \frac{\partial z}{\partial w}= x = (x_1, x_2)^T

z b = 1 \frac{\partial z}{\partial b} = 1

a = σ ( z ) = 1 1 + e z a = \sigma(z) = \frac{1}{1+e^{-z}} ,则 a z = a ( 1 a ) \frac{\partial a}{\partial z} = a(1-a)

L ( a , y ) = [ y log a + ( 1 y ) log ( 1 a ) ] L(a,y)=-[y\log a+(1-y)\log(1-a)] ,则 L a = y a + 1 y 1 a = a y a ( 1 a ) \frac{\partial L}{\partial a} = - \frac{y}{a}+\frac{1-y}{1-a} = \frac{a-y}{a(1-a)} 根据链式法则:
L w = L a a z z w = ( a y ) x \begin{aligned} \frac{\partial L}{\partial w} =\frac{\partial L}{\partial a} \frac{\partial a}{\partial z} \frac{\partial z}{\partial w} &= (a-y)x \end{aligned}

L b = L a a z z b = ( a y ) \begin{aligned} \frac{\partial L}{\partial b} =\frac{\partial L}{\partial a} \frac{\partial a}{\partial z} \frac{\partial z}{\partial b} &= (a-y)\end{aligned}

For i t h i^{th} training example x ( i ) ( i = 1 , . . . , m ) x^{(i)}(i=1,...,m) :
z ( i ) = w T x ( i ) + b z^{(i)} = w^T x^{(i)} + b

y ^ ( i ) = a ( i ) = σ ( z ( i ) ) \hat {y}^{(i)} = a^{(i)} = \sigma(z^{(i)})

L ( a ( i ) , y ( i ) ) = [ y ( i ) log a ( i ) + ( 1 y ( i ) ) log ( 1 a ( i ) ) ] \mathcal{L}(a^{(i)}, y^{(i)}) = - [y^{(i)} \log a^{(i)} + (1-y^{(i)} ) \log(1-a^{(i)})]

For X = ( x ( 0 ) , x ( 1 ) , . . . , x ( m ) ) X = (x^{(0)}, x^{(1)}, ..., x^{(m)})
Z = w T X + b ,        Z = ( z ( 0 ) , z ( 1 ) , . . . , z ( m ) ) Z=w^T X +b, \;\;\; Z= (z^{(0)}, z^{(1)}, ...,z^{(m)})

Y ^ = A = σ ( Z ) ,        A = ( a ( 0 ) , a ( 1 ) , . . . , a ( m ) ) \hat{Y}= A = \sigma(Z), \;\;\; A= (a^{(0)}, a^{(1)}, ..., a^{(m)})

J = 1 m i = 1 m L ( a ( i ) , y ( i ) ) = 1 m [ i = 1 m y ( i ) log a ( i ) + ( 1 y ( i ) ) log ( 1 a ( i ) ) ] J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})=-\frac{1}{m}[\sum_{i=1}^{m}y^{(i)}\log a^{(i)}+(1-y^{(i)})\log(1-a^{(i)})] J w = 1 m X ( A Y ) T \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T J b = 1 m i = 1 m ( a ( i ) y ( i ) ) \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})

2. 两层神经网络

(1)Forward Propagation

Input Layer

x = ( x 1 , x 2 ) T x=(x_1,x_2)^T

Hidden Layer

z 1 [ 1 ] = w 1 [ 1 ] T x + b 1 [ 1 ]          a 1 [ 1 ] = g ( z 1 [ 1 ] ) z_1^{[1]} = {w_1^{[1]}}^Tx + b_1^{[1]} \;\;\;\; a_1^{[1]}= g(z_1^{[1]})

z 2 [ 1 ] = w 2 [ 1 ] T x + b 2 [ 1 ]          a 2 [ 1 ] = g ( z 2 [ 1 ] ) z_2^{[1]} = {w_2^{[1]}}^Tx + b_2^{[1]} \;\;\;\; a_2^{[1]}= g(z_2^{[1]})

z 3 [ 1 ] = w 3 [ 1 ] T x + b 3 [ 1 ]          a 3 [ 1 ] = g ( z 3 [ 1 ] ) z_3^{[1]} = {w_3^{[1]}}^Tx + b_3^{[1]} \;\;\;\; a_3^{[1]}= g(z_3^{[1]})

z 4 [ 1 ] = w 4 [ 1 ] T x + b 4 [ 1 ]          a 4 [ 1 ] = g ( z 4 [ 1 ] ) z_4^{[1]} = {w_4^{[1]}}^Tx + b_4^{[1]} \;\;\;\; a_4^{[1]}= g(z_4^{[1]})

z [ 1 ] = ( z 1 [ 1 ] , z 2 [ 1 ] , z 3 [ 1 ] , z 4 [ 1 ] ) T ,      W [ 1 ] = ( w 1 [ 1 ] T , w 2 [ 1 ] T , w 3 [ 1 ] T , w 4 [ 1 ] T ) T ,      b [ 1 ] = ( b 1 [ 1 ] , b 2 [ 1 ] , b 3 [ 1 ] , b 4 [ 1 ] ) T z^{[1]}=(z_1^{[1]}, z_2^{[1]}, z_3^{[1]}, z_4^{[1]})^T,\;\; W^{[1]}=({w_1^{[1]}}^T,{w_2^{[1]}}^T,{w_3^{[1]}}^T,{w_4^{[1]}}^T)^T, \;\; b^{[1]}=( b_1^{[1]}, b_2^{[1]}, b_3^{[1]}, b_4^{[1]})^T
z [ 1 ] = W [ 1 ] x + b [ 1 ]          a [ 1 ] = g ( z [ 1 ] ) z^{[1]} = {W^{[1]}}x + b^{[1]} \;\;\;\; a^{[1]}= g(z^{[1]})

Output Layer

z [ 2 ] = W [ 2 ] a [ 1 ] + b [ 2 ]          a [ 2 ] = σ ( z [ 2 ] ) z^{[2]} = {W^{[2]}}a^{[1]} + b^{[2]} \;\;\;\; a^{[2]}=\sigma(z^{[2]})

For x ( i ) ( i = 1 , . . . , m ) x^{(i)}(i=1,...,m) :
z [ 1 ] ( i ) = W [ 1 ] x ( i ) + b [ 1 ]          a [ 1 ] ( i ) = g ( z [ 1 ] ( i ) ) z^{[1] (i)} = W^{[1]} x^{(i)} + b^{[1]} \;\;\;\; a^{[1] (i)} = g(z^{[1] (i)})

z [ 2 ] ( i ) = W [ 2 ] a [ 1 ] ( i ) + b [ 2 ]            y ^ ( i ) = a [ 2 ] ( i ) = σ ( z [ 2 ] ( i ) ) z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2]} \;\;\;\;\; \hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})

y p r e d i c t i o n ( i ) = { 1 i f            y ^ ( i ) > 0.5 0                  o t h e r w i s e y^{(i)}_{prediction} = \begin{cases} 1 {if } \;\;\;\;\;\hat{y}^{(i)} > 0.5 \\ 0 \;\;\;\;\;\;\;\;{otherwise } \end{cases}

For X = ( x ( 0 ) , x ( 1 ) , . . . , x ( m ) ) X = (x^{(0)}, x^{(1)}, ..., x^{(m)}) :
Z [ 1 ] = W [ 1 ] X + b [ 1 ]          A [ 1 ] = g ( Z [ 1 ] ) Z^{[1]} = W^{[1]}X + b^{[1]}\;\;\;\;A^{[1]} = g(Z^{[1]})

Z [ 2 ] = W [ 2 ] A [ 1 ] + b [ 2 ]            Y ^ = A [ 2 ] = σ ( Z [ 2 ] ) Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}\;\;\;\;\; \hat{Y} = A^{[2]} = \sigma(Z^{[2]})

其中, Z [ l ] = ( z [ l ] ( 1 ) , z [ l ] ( 2 ) , . . . , z [ l ] ( m ) )        A [ l ] = ( a [ l ] ( 1 ) , a [ l ] ( 2 ) , . . . , a [ l ] ( m ) ) Z^{[l]}=(z^{[l] (1)},z^{[l] (2)},...,z^{[l] (m)})\;\;\;A^{[l]}=(a^{[l] (1)},a^{[l] (2)},...,a^{[l] (m)})
在这里插入图片描述
J = 1 m i = 0 m ( y ( i ) log ( a [ 2 ] ( i ) ) + ( 1 y ( i ) ) log ( 1 a [ 2 ] ( i ) ) ) J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large \right) \small

(2) Borward Propagation

3. 三层神经网络


3_Layers_NNLINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID

m = 5 , L = 3 , n [ 0 ] = n [ x ] = 3 , n [ 1 ] = 2 , n [ 2 ] = 3 , n [ 3 ] = 1 m=5,L=3,n^{[0]}=n^{[x]}=3, n^{[1]}=2, n^{[2]}=3, n^{[3]}=1

(1) Forward Propatation在这里插入图片描述

J = 1 m i = 0 m ( y ( i ) log ( a [ 3 ] ( i ) ) + ( 1 y ( i ) ) log ( 1 a [ 3 ] ( i ) ) ) J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[3] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[3] (i)}\right) \large \right) \small

(2) Borward Propatation

Layer 3:
d Z [ 3 ] = A [ 3 ] Y dZ^{[3]} = A^{[3]} - Y d W [ 3 ] = 1 m d Z [ 3 ] A [ 2 ] T dW^{[3]} = \frac{1}{m}dZ^{[3]} A^{[2]^{T}} d b [ 3 ] = 1 m i = 1 m d z [ 3 ] ( i ) = 1 m n p . s u m ( d Z [ 3 ] , a x i s = 1 , k e e p d i m = T r u e ) db^{[3]} = \frac{1}{m} \sum_{i = 1}^{m} dz^{[3](i)}= \frac{1}{m} np.sum(dZ^{[3]},axis=1, keepdim=True)
Layer 2:
d A [ 2 ] = W [ 3 ] T d Z [ 3 ] dA^{[2]} = W^{[3]^{T}} dZ^{[3]} d Z [ 2 ] = d A [ 2 ] g ( Z [ 2 ] ) dZ^{[2]} = dA^{[2]}*g'(Z^{[2]}) d W [ 2 ] = 1 m d Z [ 2 ] A [ 1 ] T dW^{[2]} = \frac{1}{m}dZ^{[2]} A^{[1]^{T}} d b [ 2 ] = 1 m i = 1 m d z [ 2 ] ( i ) = 1 m n p . s u m ( d Z [ 2 ] , a x i s = 1 , k e e p d i m = T r u e ) db^{[2]} = \frac{1}{m} \sum_{i = 1}^{m} dz^{[2](i)}= \frac{1}{m} np.sum(dZ^{[2]},axis=1, keepdim=True)
Layer 1:
d A [ 1 ] = W [ 2 ] T d Z [ 2 ] dA^{[1]} = W^{[2]^{T}} dZ^{[2]} d Z [ 1 ] = d A [ 1 ] g ( Z [ 1 ] ) dZ^{[1]} = dA^{[1]}*g'(Z^{[1]}) d W [ 1 ] = 1 m d Z [ 1 ] A [ 0 ] T dW^{[1]} = \frac{1}{m}dZ^{[1]} A^{[0]^{T}} d b [ 1 ] = 1 m i = 1 m d z [ 1 ] ( i ) = 1 m n p . s u m ( d Z [ 1 ] , a x i s = 1 , k e e p d i m = T r u e ) db^{[1]} = \frac{1}{m} \sum_{i = 1}^{m} dz^{[1](i)}= \frac{1}{m} np.sum(dZ^{[1]},axis=1, keepdim=True)

4. L_Layers_NN


L_Layers_NN[LINEAR -> RELU] × \times (L-1) -> LINEAR -> SIGMOID

For layer l    ( l = 1 , 2... , L ) l\;(l=1,2...,L) ,

Forward Propatation And Backward Propagation

在这里插入图片描述
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/apr15/article/details/106300720
今日推荐