【Andrew Ng Deep Learning个人学习笔记】 2、神经网络基础(2)

计算图(Computation Graph)

举例:
   J ( a , b , c ) = 3 ( a + b c )       { u = b c v = a + u J = 3 v J(a,b,c)=3(a+bc)\implies\begin{cases} u=bc \\ v=a+u \\ J=3v \end{cases}
那么这个函数的计算图为:
J(a, b, c)的计算图
  
  

逻辑回归梯度下降算法(Gradient descent algorithm)

单个训练样本(One training sample):
    z = w T + b z=w^T+b
    y ^ = a = σ ( z ) \hat{y}=a=\sigma(z)
    L ( a , y ) = ( y l o g ( a ) + ( 1 y ) l o g ( 1 a ) ) L(a,y)=-(ylog(a)+(1-y)log(1-a))
  计算图(Computaion Graph):
Logistic regression computation graph
  计算导数(Derivative):
    d l ( a , y ) d a = y a + 1 y 1 a \frac{dl(a,y)}{da}=-\frac{y}{a}+\frac{1-y}{1-a}
    d l ( a , y ) d z = d l d a d a d z \frac{dl(a,y)}{dz}=\frac{dl}{da}\cdot\frac{da}{dz}
       = ( y a + 1 y 1 a ) a ( 1 a ) =(-\frac{y}{a}+\frac{1-y}{1-a})a(1-a)
       = a y =a-y
    d l ( a , y ) d w 1 = x 1 ( a y ) \frac{dl(a,y)}{dw_1}=x_1(a-y)
    d l ( a , y ) d w 2 = x 2 ( a y ) \frac{dl(a,y)}{dw_2}=x_2(a-y)
    d l ( a , y ) d b = a y \frac{dl(a,y)}{db}=a-y
  
  这实际上是把逻辑回归看作单层的神经网络,用反向传播算法(Back Propagation Algorithm)计算出各个参数的导数,以便下一步用梯度下降算法计算出代价最小的参数。
  
多个训练样本(m training samples):
   J ( w , b ) = 1 m i = 1 m l ( a ( i ) , y ( i ) ) J(w,b)=\frac{1}{m}\sum_{i=1}^{m}l(a^{(i)},y^{(i)})
   a ( i ) = y ^ ( i ) = σ ( z ( i ) ) = σ ( w T x ( i ) + b ) a^{(i)}=\hat{y}^{(i)}=\sigma(z^{(i)})=\sigma(w^Tx^{(i)}+b)
   J ( w , b ) w 1 = 1 m i = 1 m l ( a ( i ) , y ( i ) ) w 1 \frac{∂J(w,b)}{∂w_1}=\frac{1}{m}\sum_{i=1}^{m}\frac{∂l(a^{(i)},y^{(i)})}{∂w_1}
   J ( w , b ) b = 1 m i = 1 m l ( a ( i ) , y ( i ) ) b \frac{∂J(w,b)}{∂b}=\frac{1}{m}\sum_{i=1}^{m}\frac{∂l(a^{(i)},y^{(i)})}{∂b}
  
逻辑回归算法(Logistic regression algorithm)
  Repeat{
      J = 0 ; d w 1 = 0 ; d w 2 = 0 ; d b = 0 J=0;dw_1=0;dw_2=0;db=0
     For i in range(m):
        z ( i ) = w T x ( i ) + b z^{(i)}=w^Tx^{(i)}+b
        a ( i ) = σ ( z ( i ) ) a^{(i)}=\sigma(z^{(i)})
        J + = y ( i ) l o g a ( i ) + ( 1 y ( i ) ) l o g ( 1 a ( i ) ) J+=y^{(i)}loga^{(i)}+(1-y^{(i)})log(1-a^{(i)})
        d z ( i ) = a ( i ) y ( i ) dz^{(i)}=a^{(i)}-y^{(i)}
        d w 1 ( i ) + = x 1 ( i ) d z ( i ) dw_1^{(i)}+=x_1^{(i)}dz^{(i)}
        d w 2 ( i ) + = x 2 ( i ) d z ( i ) dw_2^{(i)}+=x_2^{(i)}dz^{(i)}
        d b + = d z ( i ) db +=dz^{(i)}
      J / = m J/=m
      d w 1 / = m dw_1/=m
      d w 2 / = m dw_2/=m
      d b / = m db/=m
     
      w 1 = w 1 α d w 1 w_1=w_1-\alpha dw_1
      w 2 = w 2 α d w 2 w_2=w_2-\alpha dw_2
      b = w 1 α d b b=w_1-\alpha db
  }
  
  未完待续…

猜你喜欢

转载自blog.csdn.net/Valeni/article/details/82830580