吴恩达 deeplearning.ai 专项课程第一课第二周正式作业
Neural Networks and Deep Learning 笔记

基于coursera的honor code，代码将不会直接贴出来，
下面为作业的一些翻译和个人总结，提供一些解决思路和笔记

第一次作业（正式）Logistic Regression with a Neural Network mindset

在本节，你将会学到：

构建学习算法的一般结构，包括：
- 初始化参数
- 计算损失函数及其梯度
- 使用一个最优化算法（梯度下降法）
正确整合上述函数至主要模型中

1 - Packages

本节需要载入一些库：

numpy 科学计算包.
h5py 用于与H5格式文件进行数据交互.
matplotlib常用的绘图包.
PIL and scipy 用于测试你自己的图片.

    #python
    import numpy as np
    import matplotlib.pyplot as plt
    import h5py
    import scipy
    from PIL import Image
    from scipy import ndimage
    from lr_utils import load_dataset

2 - Overview of the Problem set

数据集： data.h5，包含
- 训练集 m_train
- 测试集 m_test
- 每个图片维度(num_px, num_px, 3)
  height = num_px
  width = num_px
  channels (RGB) = 3

载入数据集

train_set_x_orig,
train_set_y,
test_set_x_orig,
test_set_y,
classes
添加"_orig" 是因为我们后面要对它们进行一些处理，添加这个尾巴表示原始数据

train_set_x_orig 与 test_set_x_orig的每一行都表示一个图片

train_set_x_orig 各维度含义：
(图片个数, 图片的长, 图片的宽, 3（指RGB通道数）)

For instance, you can access m_train by writing train_set_x_orig.shape[0].
用到的函数举例：~~都相当于源码了~~

    train_set_x_orig.shape[0]#表示训练集有多少行，即有多少个测试数据
    train_set_x_orig.shape[1]#表示训练集有多少列，即每个训练数据有多少特征值（RGB图片中表示每个通道的像素点个数）

为了方便，我们应该将类似于（64,64,3）的图片表达转换成（64* 64 * 3,1）的表达形式，这样，我们就可以用一列来表示所有的像素点。

作业：Reshape 训练和测试数据，将图片数据存在一列中

*TIPS:将四维的(a,b,c,d)转换为二维的(b∗∗c∗∗d, a)，可以使用
X_flatten = X.reshape(X.shape[0], -1).T
tips：train_set_x_orig.shape[0]指的是训练集图片的个数

3 - General Architecture of the learning algorithm

关键步骤：

初始化模型参数
通过最小化损失来学习参数
通过学好的参数进行预测
分析结果并总结

4 - Building the parts of our algorithm

建立神经网络的主要步骤：

定义模型的结构（例如输入的特征值数目）
初始化模型的参数
循环：
- 计算 current loss (forward propagation)
- 计算 current gradient (backward propagation)
- 更新参数 (gradient descent)

作业1：计算 $sigmoid( w^T x + b) = \frac{1}{1 + e^{-(w^T x + b)}}$

s = 1 / (1 + np.exp(-z))

作业2：初始化参数

np.zeros()函数

用法：zeros(shape, dtype=float, order=‘C’)
返回：一个给定形状和类型的用0填充的数组；
参数：shape:形状dtype:数据类型，可选参数，默认numpy.float64

作业3：前向传播与反向传播

Hints:

Forward Propagation:

You get X
You compute $A = \sigma(w^T X + b) = (a^{(1)}, a^{(2)}, ..., a^{(m-1)}, a^{(m)})$
You calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

Here are the two formulas you will be using:

$\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$

$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$

np.sum(a, axis=None, dtype=None, out=None, keepdims=np._NoValue)

a：矩阵
axis: 1.None，2.整数， 3.整数元组。

默认axis=None，将所有元素相加得到一个数

axis=0,计算结果是一行

axis=1时,计算结果为一列

dtype : dtype, optional

注意：NUMPY中，矩阵星乘为按位相乘，.dot()为矩阵相乘

作业4：最优化

初始化参数
计算损失函数与梯度
梯度更新

计算当下参数的损失与梯度。使用之前我们写的propagate()函数
使用梯度下降更新参数w和b

作业5：最优化

在作业4中，我们学到了w和b，我们可以使用w和b去预测测试集X了

What to remember:
You’ve implemented several functions that:

初始化 (w,b)

最优化损失函数，得到(w,b):
- computing the cost and its gradient
- updating the parameters using gradient descent

Use the learned (w,b) to predict the labels for a given set of examples

5 - Merge all functions into a model

总合所有函数到一个模型中

    # initialize parameters with zeros (≈ 1 line of code)
    使用nitialize_with_zeros()函数

    # Gradient descent (≈ 1 line of code)
    使用 optimize()
    
    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]
    
    # Predict test/train set examples (≈ 2 lines of code)
    使用predict()

6 - Further analysis (optional/ungraded exercise)

使用不同的学习速率a：

learning rate is: 0.01
train accuracy: 99.52153110047847 %
test accuracy: 68.0 %

-------------------------------------------------------

learning rate is: 0.001
train accuracy: 88.99521531100478 %
test accuracy: 64.0 %

-------------------------------------------------------

learning rate is: 0.0001
train accuracy: 68.42105263157895 %
test accuracy: 36.0 %

-------------------------------------------------------

吴恩达 deeplearning.ai 专项课程 第一课 第二周 正式作业