Deeplearning.ai-Course-2-浅层神经网络(编程作业)

声明:本文参考https://blog.csdn.net/u013733326/article/details/79702148,记录学习过程中的心得体会

Python版本:3.6.x

实验目的:搭建一个能分类平面数据的浅层神经网络,它只有一个隐藏层


在这篇文章中,我们会学到以下知识:

  • 构建具有单隐藏层的二分类神经网络
  • 了解非线性激活函数,如tanh函数
  • 计算损失函数
  • 编程实现前向传播和后向传播

实验步骤:

一、加载、处理数据

开始前引入的库:

import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import sklearn.linear_model
from testCases import *
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets
引入的库
  • sklearn:进行数据挖掘和数据分析的框架
  • testCases:提供一些测试样例评估函数的正确性
  • planar_utils:提供在本实验中使用的功能函数

加载、查看数据集:

X, Y = load_planar_dataset()
plt.scatter(X[0,:], X[1,:], c = np.squeeze(Y), s = 40, cmap = plt.cm.Spectral) #s表示大小,c表示颜色序列,cmap表示Colormap
plt.show()
加载、查看数据集

shape_X = X.shape #(2,400)
shape_Y = Y.shape #(1,400)
m = Y.shape[1]   #训练集里面的数量 
print ("X的维度为: " + str(shape_X))
print ("Y的维度为: " + str(shape_Y))
print ("数据集里面的数据有:" + str(m) + "")

X的维度为: (2, 400)
Y的维度为: (1, 400)
数据集里面的数据有:400 个

符号说明:

  • X:(2,400)的numpy矩阵,包含数据点的数值
  • Y:(1,400)的numpy向量,对应X的标签(0-red、1-blue)

二、搭建浅层神经网络

浅层神经网路的模型如下图:


前向传播:

对单个样本$\left \{ x^{\left ( i \right )},y^{\left ( i \right )}\right \}$:

隐藏层中每个神经元的计算过程如下:

$$\left\{\begin{matrix}
z_{1}^{\left [ 1 \right ]}=w_{1}^{\left [ 1 \right ]}x + b_{1}^{\left [ 1 \right ]} &a_{1}^{\left [ 1 \right ]}=\sigma \left ( z_{1}^{\left [ 1 \right ]} \right ) \\
z_{2}^{\left [ 1 \right ]}=w_{2}^{\left [ 1 \right ]}x + b_{2}^{\left [ 1 \right ]} &a_{2}^{\left [ 1 \right ]}=\sigma \left ( z_{2}^{\left [ 1 \right ]} \right )\\
z_{3}^{\left [ 1 \right ]}=w_{3}^{\left [ 1 \right ]}x + b_{3}^{\left [ 1 \right ]} &a_{3}^{\left [ 1 \right ]}=\sigma \left ( z_{3}^{\left [ 1 \right ]} \right ) \\
z_{4}^{\left [ 1 \right ]}=w_{4}^{\left [ 1 \right ]}x + b_{4}^{\left [ 1 \right ]} &a_{4}^{\left [ 1 \right ]}=\sigma \left ( z_{4}^{\left [ 1 \right ]} \right )
\end{matrix}\right.$$

 

$$\begin{align*}
z^{\left [ 1 \right ]}&=\begin{bmatrix}
z_{1}^{\left [ 1 \right ]}\\
z_{2}^{\left [ 1 \right ]}\\
z_{3}^{\left [ 1 \right ]}\\
z_{4}^{\left [ 1 \right ]}
\end{bmatrix}=\begin{bmatrix}
\cdots &W_{1}^{\left [ 1 \right ]T} & \cdots \\
\cdots &W_{2}^{\left [ 1 \right ]T} & \cdots \\
\cdots &W_{3}^{\left [ 1 \right ]T} &\cdots \\
\cdots &W_{4}^{\left [ 1 \right ]T} & \cdots
\end{bmatrix}\ast \begin{bmatrix}
x_{1}\\x_{2}
\end{bmatrix}+\begin{bmatrix}
b_{1}^{\left [ 1 \right ]}\\
b_{2}^{\left [ 1 \right ]}\\
b_{3}^{\left [ 1 \right ]}\\
b_{4}^{\left [ 1 \right ]}
\end{bmatrix}\\\\
z^{\left [ 1 \right ]}&=W^{\left [ 1 \right ]}x+b^{\left [ 1 \right ]}
\end{align*}$$

$$a^{\left [ 1 \right ]}=\begin{bmatrix}
a_{1}^{\left [ 1 \right ]}\\
a_{2}^{\left [ 1 \right ]}\\
a_{3}^{\left [ 1 \right ]}\\
a_{4}^{\left [ 1 \right ]}
\end{bmatrix}=\sigma \left ( \begin{bmatrix}
z_{1}^{\left [ 1 \right ]}\\
z_{2}^{\left [ 1 \right ]}\\
z_{3}^{\left [ 1 \right ]}\\
z_{4}^{\left [ 1 \right ]}
\end{bmatrix} \right )=\sigma \left ( z^{\left [ 1 \right ]} \right )$$

对于多个样本

$$X = \begin{bmatrix}
\vdots & \vdots & \vdots & \vdots \\
x^{\left ( 1 \right )} & x^{\left ( 2 \right )} & \cdots &x^{\left ( m \right )} \\
\vdots & \vdots & \vdots & \vdots
\end{bmatrix}$$

对于所有训练样本,需要让i从1到m实现下式:

$$z^{\left [ 1 \right ]\left ( i \right )}=W^{\left [ 1 \right ]}x^{\left ( i \right )}+b^{\left [ 1 \right ]}\\$$
$$a^{\left [ 1 \right ]\left ( i \right )}=\sigma \left ( z^{\left [ 1 \right ]\left ( i \right )} \right )$$

所以有

$$\begin{align*}
Z^{\left [ 1 \right ]} &= \begin{bmatrix}
\vdots & \vdots & \vdots & \vdots \\
z^{\left [ 1 \right ]\left ( 1 \right )} & z^{\left [ 1 \right ]\left ( 2 \right )} & \cdots &z^{\left [ 1 \right ]\left ( m \right )} \\
\vdots & \vdots & \vdots & \vdots
\end{bmatrix}\\&=\begin{bmatrix}
W^{\left [ 1 \right ]}x^{\left ( 1 \right )}+b^{\left [ 1 \right ]} &W^{\left [ 1 \right ]}x^{\left ( 2\right )}+b^{\left [ 1 \right ]} &\cdots & W^{\left [ 1 \right ]}x^{\left ( m \right )}+b^{\left [ 1 \right ]}
\end{bmatrix}\\&=W^{\left [ 1 \right ]}\begin{bmatrix}
x^{\left ( 1 \right )}&x^{\left ( 2\right )} &\cdots & x^{\left ( m \right )}
\end{bmatrix} +\begin{bmatrix}
b^{\left [ 1 \right ]} & b^{\left [ 1 \right ]} & \cdots & b^{\left [ 1 \right ]}
\end{bmatrix}\\&=W^{\left [ 1 \right ]}X+b^{\left [ 1 \right ]}(Python中的广播机制)
\end{align*}$$

$$\begin{align*}
A^{\left [ 1 \right ]}&=\begin{bmatrix}
\vdots &\vdots & \vdots &\vdots \\
a^{\left [ 1 \right ]\left ( 1 \right )}& a^{\left [ 1 \right ]\left ( 2 \right )} & \cdots & a^{\left [ 1 \right ]\left ( m \right )} \\
\vdots & \vdots &\vdots & \vdots
\end{bmatrix}=\begin{bmatrix}
\sigma \left ( z^{\left [ 1 \right ]\left ( 1 \right )} \right )& \sigma \left ( z^{\left [ 1 \right ]\left ( 2 \right )} \right ) &\cdots & \sigma \left ( z^{\left [ 1 \right ]\left ( m \right )} \right )
\end{bmatrix}\\&=\sigma \begin{bmatrix}
z ^{\left [ 1 \right ]\left ( 1 \right )}&z ^{\left [ 1 \right ]\left ( 2 \right )} &\cdots & z ^{\left [ 1 \right ]\left ( m \right )}
\end{bmatrix}=\sigma\left ( Z^{\left [ 1 \right ]} \right )
\end{align*}$$


反向传播

 对于单个样例$\left \{ x,y \right \}$(省略上标):

$$\because z^{\left [ 2 \right ]}=W^{\left [ 2 \right ]}a^{\left [ 1 \right ]}+b^{\left [ 2 \right ]}\\$$
$$a^{\left [ 2 \right ]}=\sigma \left ( z^{\left [ 2 \right ]} \right )\\$$
$$\therefore dz^{\left [ 2 \right ]}=a^{\left [ 2 \right ]}-y\\$$
$$db^{\left [ 2 \right ]}=dz^{\left [ 2 \right ]}\\$$
$$dW^{\left [ 2 \right ]}=dz^{\left [ 2 \right ]}a^{\left [ 1 \right ]T}\Leftrightarrow \left ( n^{\left [ 2 \right ]},n^{\left [ 1 \right ]}\right )= \left ( n^{\left [ 2 \right ]} ,1 \right )*\left (1,n^{\left [ 1 \right ]}\right)\\$$
$$da^{\left [ 1 \right ]}=W^{\left [ 2 \right ]T}dz^{\left [ 2 \right ]}\Leftrightarrow \left ( n^{\left [ 1 \right ]},1 \right )=\left ( n^{\left [ 1 \right ]},n^{\left [ 2 \right ]} \right )*\left ( n^{\left [ 2 \right ]} ,1\right )\\$$

$$\because z^{\left [ 1 \right ]}=W^{\left [ 1 \right ]}a^{\left [ 0 \right ]}+b^{\left [ 1 \right ]}\left(a^{\left [ 0 \right ]}=x \right )\\$$
$$a^{\left [ 1 \right ]}=g^{\left [ 1 \right ]}\left ( z^{\left [ 1 \right ]} \right )\\$$
$$dz^{\left [ 1 \right ]}=da^{\left [ 1 \right ]}\ast g^{\left [ 1 \right ]}{}'\left ( z^{\left [ 1 \right ]} \right )=W^{\left [ 2 \right ]T}dz^{\left [ 2 \right ]}\ast g^{\left [ 1 \right ]}{}'\left ( z^{\left [ 1 \right ]} \right )\\$$
$$db^{\left [ 1 \right ]}=dz^{\left [ 1 \right ]}\\$$
$$dW^{\left [ 1 \right ]}=dz^{\left [ 1 \right ]}a^{\left [ 0 \right ]T}=dz^{\left [ 1 \right ]}x$$

对于全部样例$\left \{ X,Y \right \}$:

$$A^{\left [ 2 \right ]}=\begin{bmatrix}
\vdots & \vdots &\vdots &\vdots \\
a^{\left [ 2 \right ]\left ( 1 \right )}& a^{\left [ 2 \right ]\left ( 2 \right )} & \cdots &a^{\left [ 2 \right ]\left ( m \right )} \\
\vdots&\vdots & \vdots &\vdots
\end{bmatrix}\\$$
$$Y=\begin{bmatrix}
y^{\left ( 1 \right )} & y^{\left ( 2 \right )} & \cdots & y^{\left ( m \right )}
\end{bmatrix}\\$$

$$\begin{align*}
dZ^{\left [ 2 \right ]}&=\begin{bmatrix}
dz^{\left [ 2 \right ]\left ( 1 \right )}& dz^{\left [ 2 \right ]\left ( 2 \right )} & \cdots & dz^{\left [ 2 \right ]\left ( m \right )}
\end{bmatrix}\\&=\begin{bmatrix}
a^{\left [ 2 \right ]\left ( 1 \right )}-y^{\left ( 1 \right )} & a^{\left [ 2 \right ]\left ( 2 \right )}-y^{\left ( 2 \right )} &\cdots & a^{\left [ 2 \right ]\left ( m \right )}-y^{\left ( m\right )}
\end{bmatrix}\\ &=A^{\left [ 2 \right ]}-Y
\end{align*}$$

全部样例对W1的偏导数实际上是从1到m所有单个样例对W1偏导数的平均值:

$$\begin{align*}
dW^{\left [ 2 \right ]}&=\frac{1}{m}\sum_{i=1}^{m}dz^{\left [ 2 \right ]\left ( i \right )}a^{\left [ 1 \right ]\left ( i \right )T}=\frac{1}{m}\begin{bmatrix}
dz^{\left [ 2 \right ]\left ( 1 \right )}&\cdots & dz^{\left [ 2 \right ]\left ( m \right )}
\end{bmatrix}\begin{bmatrix}
a^{\left [ 1 \right ]\left ( 1 \right )T}\\
\cdots \\ a^{\left [ 1 \right ]\left ( m \right )T}
\end{bmatrix}\\
&=\frac{1}{m}\begin{bmatrix}
dz^{\left [ 2 \right ]\left ( 1 \right )} &\cdots & dz^{\left [ 2 \right ]\left ( m \right )}
\end{bmatrix}\begin{bmatrix}
a^{\left [ 1 \right ]\left ( 1 \right )} & \cdots &a^{\left [ 1 \right ]\left ( m \right )}
\end{bmatrix}^{T}=\frac{1}{m}np.dot\left ( dZ^{\left [ 2 \right ]},A^{\left [ 1 \right ]T} \right )
\end{align*}$$

$$db^{\left [ 2 \right ]}=\frac{1}{m}\sum_{i=1}^{m}dz^{\left [ 2 \right ]\left ( i \right )}=\frac{1}{m}np.sum\left ( dZ^{\left [ 2 \right ]},axis=1,keepdims=True \right )$$

注:axis=1,表示按照行取平均值

单个样例和全部样例的公式表格如下:

本篇未完,待续中......

猜你喜欢

转载自www.cnblogs.com/xiazhenbin/p/12233895.html