DeepLearning.ai code笔记1：神经网络与深度学习

说明一下，这和系列是对编程作业的作一些我认为比较重要的摘抄、翻译和解释，主要是为了记录不同的模型的主要思想或者流程，以及一些coding中常见的错误，作为查漏补缺之用。

作业链接：https://github.com/Wasim37/deeplearning-assignment。感谢大佬们在GitHub上的贡献。

1、随机数的生成

np.random.randn() 和 np.random.rand() 的差别：前者n表示按正太分布，后者按线性产生随机数。我在编程中开始总是因为少个 n 发现产生的随机数和作业不一致。

np.random.seed() ：通过设定一个随机数种子，相当于产生了一个固定的数组列表，每次按顺序返回数组中对应索引的数据。

import numpy as np
# np.random.seed(1)     # 取消注释查看差异就明白了seed的作用
print(np.random.random())
for i in range(5):
    print(np.random.random())

未去掉	去掉
0.22199317108973948	0.22199317108973948
0.8707323061773764	0.8707323061773764
0.20671915533942642	0.20671915533942642
0.9186109079379216	0.9186109079379216
0.48841118879482914	0.48841118879482914
0.6117438629026457

2、建立神经网络的基本步骤

1、Define the model structure (such as number of input features)
2、Initialize the model’s parameters
3、Loop:
     Calculate current loss (forward propagation)
     Calculate current gradient (backward propagation)
     Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call model().

翻译：

扫描二维码关注公众号，回复： 2309744 查看本文章

1、定义模型结构（如输入特征的个数）
2、初始化模型的参数
3、循环：
    计算当前损失（正向传播）
    计算当前梯度（反向传播）
    更新参数（梯度下降）

你经常分别建立1-3，并把它们整合到我们所说的一个函数中model()。

def initialize_parameters_deep(layer_dims):
    ...
    return parameters 
def L_model_forward(X, parameters):
    ...
    return AL, caches # 返回最后一层的激活值，所有层激活值的集合
def compute_cost(AL, Y):
    ...
    return cost
def L_model_backward(AL, Y, caches):
    ...
    return grads
def update_parameters(parameters, grads, learning_rate):
    ...
    return parameters

前向传播的主要公式：

\begin{matrix} (1) & z^{(i)} = w^{T} x^{(i)} + b \end{matrix}

$z^{(i)} = w^T x^{(i)} + b \tag{1}$

\begin{matrix} (2) & {\hat{y}}^{(i)} = a^{(i)} = s i g m o i d (z^{(i)}) \end{matrix}

$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$

\begin{matrix} (3) & L (a^{(i)}, y^{(i)}) = - y^{(i)} \log (a^{(i)}) - (1 - y^{(i)}) \log (1 - a^{(i)}) \end{matrix}

$\mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$
The cost is then computed by summing over all training examples:

\begin{matrix} (4) & J = \frac{1}{m} \sum_{i = 1}^{m} L (a^{(i)}, y^{(i)}) \end{matrix}

$J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{4}$

反向传播的主要公式：

For layer $l$ , the linear part is: $Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}$ (followed by an activation).
Suppose you have already calculated the derivative $dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}}$ . You want to get $(dW^{[l]}, db^{[l]} dA^{[l-1]})$ .
The three outputs $(dW^{[l]}, db^{[l]}, dA^{[l]})$ are computed using the input $dZ^{[l]}$ .Here are the formulas you need:

\begin{matrix} (1) & d W^{[l]} = \frac{\partial L}{\partial W^{[l]}} = \frac{1}{m} d Z^{[l]} A^{[l - 1] T} \end{matrix}

$dW^{[l]} = \frac{\partial \mathcal{L} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T} \tag{1}$

\begin{matrix} (2) & d b^{[l]} = \frac{\partial L}{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} d Z^{[l] (i)} \end{matrix}

$db^{[l]} = \frac{\partial \mathcal{L} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l](i)}\tag{2}$

\begin{matrix} (3) & d A^{[l - 1]} = \frac{\partial L}{\partial A^{[l - 1]}} = W^{[l] T} d Z^{[l]} \end{matrix}

$dA^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]} \tag{3}$

def linear_backward(dZ, cache):
    """
    反向传播计算梯度
    :param dZ: 当前层损失函数的导数，L层一般为 A-y 
    :param cache:
    :return:
    """
    A_pre, W, b = cache
    m = A_pre.shape[1]

    dW = np.dot(dZ, A_pre.T) / m
    db = np.sum(dZ, axis=1, keepdims=True) / m
    # dA/dA_pre = (dA/dZ * dZ/dA_pre) = (dA/dZ * w), 为了表示方便去掉了"dA/", 故乘法不变
    dA_pre = np.dot(W.T, dZ)  # 注意 dA 和 dZ 不需要 / m
    return dA_pre, dW, db