【Andrew Ng Deep Learning个人学习笔记】 2、神经网络基础(1)

构建训练集的矩阵时,使用以下形式:

X = ( x ( 1 ) x ( 2 ) x ( m ) ) X R n × m X = \begin{pmatrix} \vdots & \vdots & & \vdots \\ x^{(1)}& x^{(2)} & \cdots & x^{(m)}\\ \vdots & \vdots & & \vdots \\ \end{pmatrix} X\in R^{n \times m}
Y = ( y ( 1 ) y ( 2 ) y ( m ) ) Y R 1 × m Y= \begin{pmatrix} y^{(1)} & y^{(2)} & \cdots & y^{(m)} \\ \end{pmatrix} Y\in R^{1 \times m}
  
  

逻辑回归(Logistic Regression)

   Given X, y ^ \hat{y} = P(y = 1 | X)   0 \leq y ^ \hat{y} \leq 1
   即预测值 y ^ \hat{y} 为:X条件下, y = 1的概率。
  
  

参数说明(Parameters specification)

   输入的特征向量(Feature Vector)X:  X \in R n x R^{n_x} , n x n_x 为特征的数量;
   训练标签(Training Label)Y:  Y \in {0, 1} ;
   权重(Weights)w:  w \in R n x R^{n_x} ;
   阈值??(Threshold)b:  b \in R R ;
   输出(Output) y ^ \hat{y} :   y ^ \hat{y} = σ \sigma ( w T w^T x + b) ;
   S S 型函数(Sigmoid Function):  S S = σ \sigma ( w T w^T x + b) = σ \sigma ( z z ) = 1 1 + e z \frac{1}{1+e^{-z}} \quad ;
   参数向量(Parameter Vector):   Θ \Theta = ( θ 0 θ 1 θ 2 θ m ) \begin{pmatrix} \theta_0 \\ \theta_1 \\ \theta_2 \\ \vdots \\ \theta_m \\ \end{pmatrix}
  
  

损失函数(Loss/Error Function)

   l ( y ^ ( i ) , y ( i ) ) = 1 2 ( y ^ ( i ) y ( i ) ) 2 l(\hat{y}^{(i)}, y^{(i)}) =\frac{1}{2} (\hat{y}^{(i)}- y^{(i)})^2
  一般情况下,我们使用平方误差(Squared Error)来衡量损失函数,但是一个非凸函数,运行梯度下降算法时,很大可能性取到的是局部最优解,而我们想要的是全局最优解,因此一般情况下不使用这种损失函数。
  
一般使用这种形式的损失函数:
   l ( y ^ ( i ) , y ( i ) ) = [ y ( i ) l o g ( y ^ ( i ) ) + ( 1 y ( i ) ) l o g ( 1 y ^ ( i ) ) ] l(\hat{y}^{(i)}, y^{(i)}) =-[y^{(i)}log(\hat{y}^{(i)}) + (1-y^{(i)})log(1-\hat{y}^{(i)})]
     i f if y ^ ( i ) = = 1 : l ( y ^ ( i ) , y ( i ) ) = y ( i ) l o g ( y ^ ( i ) ) \hat{y}^{(i)}==1: l(\hat{y}^{(i)}, y^{(i)}) =-y^{(i)}log(\hat{y}^{(i)})
     i f if y ^ ( i ) = = 0 : l ( y ^ ( i ) , y ( i ) ) = ( 1 y ( i ) ) l o g ( 1 y ^ ( i ) ) \hat{y}^{(i)}==0: l(\hat{y}^{(i)}, y^{(i)}) =-(1-y^{(i)})log(1-\hat{y}^{(i)})
  
  

代价函数(Cost Function)

   J ( w , b ) = 1 m i = 1 m l ( y ^ ( i ) , y ( i ) ) J(w,b)=\frac{1}{m}\sum_{i=1}^ml(\hat{y}^{(i)}, y^{(i)})
       = 1 m i = 1 m [ y ( i ) l o g ( y ^ ( i ) ) + ( 1 y ( i ) ) l o g ( 1 y ^ ( i ) ) ] =-\frac{1}{m}\sum_{i=1}^m[y^{(i)}log(\hat{y}^{(i)}) + (1-y^{(i)})log(1-\hat{y}^{(i)})]
  
  

对比Cost Function与Loss/Error Function

  Loss/Error Function衡量单个训练样本上的表现;Cost Function是Loss Function在整个训练集(Training set)上的平均值。
  
  

猜你喜欢

转载自blog.csdn.net/Valeni/article/details/82825517