01.神经网络和深度学习 W3.浅层神经网络（作业：带一个隐藏层的神经网络）

文章目录

1. 导入包
2. 预览数据
3. 逻辑回归
4. 神经网络

4.1 定义神经网络结构
4.2 初始化模型参数
4.3 循环

4.3.1 前向传播
4.3.2 计算损失
4.3.3 后向传播
4.3.4 梯度下降

4.4 组建Model
4.5 预测
4.6 调节隐藏层单元个数
4.7 更改激活函数
4.8 更改学习率
4.9 其他数据集下的表现

建立你的第一个神经网络！其有1个隐藏层。

1. 导入包

# Package imports
import numpy as np
import matplotlib.pyplot as plt
from testCases import *
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets

%matplotlib inline

np.random.seed(1) # set a seed so that the results are consistent

2. 预览数据

可视化数据

X, Y = load_planar_dataset()
# Visualize the data:
plt.scatter(X[0, :], X[1, :], c=Y, s=40, cmap=plt.cm.Spectral);

看起来像花
红色的标签为 0，蓝色的标签为 1，我们的目标是建模将它们分开

数据维度

### START CODE HERE ### (≈ 3 lines of code)
shape_X = X.shape
shape_Y = Y.shape
m = X.shape[1]  # training set size
### END CODE HERE ###

print ('The shape of X is: ' + str(shape_X))
print ('The shape of Y is: ' + str(shape_Y))
print ('I have m = %d training examples!' % (m))

The shape of X is: (2, 400)
The shape of Y is: (1, 400)
I have m = 400 training examples!

3. 逻辑回归

# Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV();
clf.fit(X.T, Y.T);

# Plot the decision boundary for logistic regression
plot_decision_boundary(lambda x: clf.predict(x), X, Y)
plt.title("Logistic Regression")

# Print accuracy
LR_predictions = clf.predict(X.T)
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y,LR_predictions) + np.dot(1-Y,1-LR_predictions))/float(Y.size)*100) +
       '% ' + "(percentage of correctly labelled datapoints)")

Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)

在这里插入图片描述
数据集是线性不可分的，逻辑回归变现的不好，下面看看神经网络怎么样。

4. 神经网络

模型如下：

在这里插入图片描述
对于一个样本 $x^{(i)}$ 而言：
$z^{[1] (i)} = W^{[1]} x^{(i)} + b^{[1] (i)}$
$a^{[1] (i)} = \tanh(z^{[1] (i)})$
$z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2] (i)}$
$\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})$

$y_{\text {prediction}}^{(i)}=\left\{\begin{array}{ll}1 & \text { if } a^{[2](i)}>0.5 \\ 0 & \text { otherwise }\end{array}\right.$

得到所有的样本的预测值后，计算损失：
$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large \right) \small$

建立神经网络的一般方法：

1、定义神经网络结构（输入，隐藏单元等）

2、初始化模型的参数

3、循环：
—— a、实现正向传播
—— b、计算损失
—— c、实现反向传播，计算梯度
—— d、更新参数（梯度下降）

编写辅助函数，计算步骤1-3
将它们合并到 nn_model（）的函数中
学习正确的参数，对新数据进行预测

4.1 定义神经网络结构

定义每层的节点个数

# GRADED FUNCTION: layer_sizes

def layer_sizes(X, Y):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    Y -- labels of shape (output size, number of examples)
    
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """
    ### START CODE HERE ### (≈ 3 lines of code)
    n_x = X.shape[0] # size of input layer
    n_h = 4
    n_y = Y.shape[0] # size of output layer
    ### END CODE HERE ###
    return (n_x, n_h, n_y)

4.2 初始化模型参数

随机初始化权重 w，偏置 b 初始化为 0

# GRADED FUNCTION: initialize_parameters

def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """
    
    np.random.seed(2) # we set up a seed so that your output matches ours although the initialization is random.
    
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = np.random.randn(n_h, n_x)*0.01 # randn 标准正态分布
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h)*0.01
    b2 = np.zeros((n_y, 1))
    ### END CODE HERE ###
    
    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

4.3 循环

4.3.1 前向传播

根据上面的公式，编写代码

# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters 
    (output of initialization function)
    
    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    ### END CODE HERE ###
    
    # Implement Forward Propagation to calculate A2 (probabilities)
    ### START CODE HERE ### (≈ 4 lines of code)
    Z1 = np.dot(W1, X) + b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    ### END CODE HERE ###
    
    assert(A2.shape == (1, X.shape[1]))
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache

4.3.2 计算损失

计算了 A2，也就是每个样本的预测值，计算损失
$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large \right) \small$

# GRADED FUNCTION: compute_cost

def compute_cost(A2, Y, parameters):
    """
    Computes the cross-entropy cost given in equation (13)
    
    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    parameters -- python dictionary containing your parameters W1, b1, W2 and b2
    
    Returns:
    cost -- cross-entropy cost given equation (13)
    """
    
    m = Y.shape[1] # number of example

    # Compute the cross-entropy cost
    ### START CODE HERE ### (≈ 2 lines of code)
    logprobs = Y*np.log(A2)+(1-Y)*np.log(1-A2)
    cost = -np.sum(logprobs)/m
    ### END CODE HERE ###
    
    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. 
                                # E.g., turns [[17]] into 17 
    assert(isinstance(cost, float))
    
    return cost

4.3.3 后向传播

一些公式如下：
在这里插入图片描述
激活函数的导数，请查阅

sigmoid

$a=g(z) ;\quad g^{\prime}(z)=\frac{d}{d z} g(z)=a(1-a)$
tanh

$a=g(z) ; \quad g^{\prime}(z)=\frac{d}{d z} g(z)=1-a^2$

sigmoid 下损失函数求导

# GRADED FUNCTION: backward_propagation

def backward_propagation(parameters, cache, X, Y):
    """
    Implement the backward propagation using the instructions above.
    
    Arguments:
    parameters -- python dictionary containing our parameters 
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
    X -- input data of shape (2, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    
    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    m = X.shape[1]
    
    # First, retrieve W1 and W2 from the dictionary "parameters".
    ### START CODE HERE ### (≈ 2 lines of code)
    W1 = parameters['W1']
    W2 = parameters['W2']
    ### END CODE HERE ###
        
    # Retrieve also A1 and A2 from dictionary "cache".
    ### START CODE HERE ### (≈ 2 lines of code)
    A1 = cache['A1']
    A2 = cache['A2']
    ### END CODE HERE ###
    
    # Backward propagation: calculate dW1, db1, dW2, db2. 
    ### START CODE HERE ### (≈ 6 lines of code, corresponding to 6 equations on slide above)
    dZ2 = A2-Y
    dW2 = np.dot(dZ2, A1.T)/m
    db2 = np.sum(dZ2, axis=1, keepdims=True)/m
    dZ1 = np.dot(W2.T, dZ2)*(1-np.power(A1, 2))
    dW1 = np.dot(dZ1, X.T)/m
    db1 = np.sum(dZ1, axis=1, keepdims=True)/m
    ### END CODE HERE ###
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

4.3.4 梯度下降

选取合适的学习率，学习率太大，会产生震荡，收敛慢，甚至不收敛

# GRADED FUNCTION: update_parameters

def update_parameters(parameters, grads, learning_rate = 1.2):
    """
    Updates parameters using the gradient descent update rule given above
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients 
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
    """
    # Retrieve each parameter from the dictionary "parameters"
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    ### END CODE HERE ###
    
    # Retrieve each gradient from the dictionary "grads"
    ### START CODE HERE ### (≈ 4 lines of code)
    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']
    ## END CODE HERE ###
    
    # Update rule for each parameter
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2
    ### END CODE HERE ###
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

4.4 组建Model

将上面的函数以正确顺序放在 model 里

# GRADED FUNCTION: nn_model

def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):
    """
    Arguments:
    X -- dataset of shape (2, number of examples)
    Y -- labels of shape (1, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    print_cost -- if True, print the cost every 1000 iterations
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    
    # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". 
    # Outputs = "W1, b1, W2, b2, parameters".
    ### START CODE HERE ### (≈ 5 lines of code)
    parameters = initialize_parameters(n_x, n_h, n_y)
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    ### END CODE HERE ###
    
    # Loop (gradient descent)

    for i in range(0, num_iterations):
         
        ### START CODE HERE ### (≈ 4 lines of code)
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        A2, cache = forward_propagation(X, parameters)
        
        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        cost = compute_cost(A2, Y, parameters)
 
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backward_propagation(parameters, cache, X, Y)
 
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        parameters = update_parameters(parameters, grads, learning_rate=1.2)
        
        ### END CODE HERE ###
        
        # Print the cost every 1000 iterations
        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

    return parameters

4.5 预测

$predictions = \begin{cases} 1 & \text{if}\ activation > 0.5 \\ 0 & \text{otherwise} \end{cases}$

# GRADED FUNCTION: predict

def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)
    
    Returns
    predictions -- vector of predictions of our model (red: 0 / blue: 1)
    """
    
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    ### START CODE HERE ### (≈ 2 lines of code)
    A2, cache = forward_propagation(X, parameters)
    predictions = (A2 > 0.5)
    ### END CODE HERE ###
    
    return predictions

建立一个含有1个隐藏层（4个单元）的神经网络模型

# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)

# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))

Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.288083
Cost after iteration 2000: 0.254385
Cost after iteration 3000: 0.233864
Cost after iteration 4000: 0.226792
Cost after iteration 5000: 0.222644
Cost after iteration 6000: 0.219731
Cost after iteration 7000: 0.217504
Cost after iteration 8000: 0.219550
Cost after iteration 9000: 0.218633

决策边界

# Print accuracy
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')

Accuracy: 90%

可以看出模型较好地将两类点分开了！准确率 90%，比逻辑回归 47%高不少。

4.6 调节隐藏层单元个数

plt.figure(figsize=(16, 32))
hidden_layer_sizes = [1, 2, 3, 4, 5, 20, 50]
for i, n_h in enumerate(hidden_layer_sizes):
    plt.subplot(5, 2, i+1)
    plt.title('Hidden Layer of size %d' % n_h)
    parameters = nn_model(X, Y, n_h, num_iterations = 5000)
    plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
    predictions = predict(parameters, X)
    accuracy = float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100)
    print ("Accuracy for {} hidden units: {} %".format(n_h, accuracy))

Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.25 %
Accuracy for 20 hidden units: 90.5 %
Accuracy for 50 hidden units: 90.75 %

不同的隐藏单元数量下的分类效果对比
可以看出：

较大的模型（具有更多隐藏单元）能够更好地适应训练集，直到最大的模型过拟合了
最好的隐藏层大小似乎是n_h=5左右。这个值似乎很适合数据，而不会引起明显的过拟合
稍后还将了解正则化，它允许你使用非常大的模型（如n_h=50），而不会出现太多过拟合

4.7 更改激活函数

将隐藏层的激活函数更改为 sigmoid 函数，准确率没有使用tanh的高，tanh在任何场合几乎都优于sigmoid

Accuracy for 1 hidden units: 50.5 %
Accuracy for 2 hidden units: 59.0 %
Accuracy for 3 hidden units: 56.75 %
Accuracy for 4 hidden units: 50.0 %
Accuracy for 5 hidden units: 62.25000000000001 %
Accuracy for 20 hidden units: 85.5 %
Accuracy for 50 hidden units: 87.0 %

在这里插入图片描述

将隐藏层的激活函数更改为 ReLu 函数，似乎没有用，感觉是需要更多的隐藏层，才能达到效果

def relu(X):
    return np.maximum(0, X)

Accuracy for 1 hidden units: 50.0 %
Accuracy for 2 hidden units: 50.0 %
Accuracy for 3 hidden units: 50.0 %
Accuracy for 4 hidden units: 50.0 %
Accuracy for 5 hidden units: 50.0 %
Accuracy for 20 hidden units: 50.0 %
Accuracy for 50 hidden units: 50.0 %

报了些警告

C:\Users\mingm\AppData\Roaming\Python\Python37\site-packages\
ipykernel_launcher.py:20: RuntimeWarning: divide by zero encountered in log
C:\Users\mingm\AppData\Roaming\Python\Python37\site-packages\
ipykernel_launcher.py:20: RuntimeWarning: invalid value encountered in multiply
C:\Users\mingm\AppData\Roaming\Python\Python37\site-packages\
ipykernel_launcher.py:35: RuntimeWarning: overflow encountered in power
C:\Users\mingm\AppData\Roaming\Python\Python37\site-packages\
ipykernel_launcher.py:35: RuntimeWarning: invalid value encountered in multiply
C:\Users\mingm\AppData\Roaming\Python\Python37\site-packages
\ipykernel_launcher.py:35: RuntimeWarning: overflow encountered in multiply

在这里插入图片描述

4.8 更改学习率

采用 tanh 激活函数，调整学习率检查效果

学习率 2.0

Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.75 %
Accuracy for 5 hidden units: 90.25 %
Accuracy for 20 hidden units: 91.0 %
Accuracy for 50 hidden units: 91.25 %  best

学习率 1.5

Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 89.75 %
Accuracy for 5 hidden units: 90.5 %
Accuracy for 20 hidden units: 91.0 %  best
Accuracy for 50 hidden units: 90.75 %

学习率 1.2

Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.25 % best
Accuracy for 20 hidden units: 90.5 %
Accuracy for 50 hidden units: 90.75 %

学习率 1.0

Accuracy for 1 hidden units: 67.25 %
Accuracy for 2 hidden units: 67.0 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.0 %  best
Accuracy for 20 hidden units: 91.0 %  best
Accuracy for 50 hidden units: 90.75 %

学习率 0.5

Accuracy for 1 hidden units: 67.25 %
Accuracy for 2 hidden units: 66.5 %
Accuracy for 3 hidden units: 89.25 %
Accuracy for 4 hidden units: 90.0 %
Accuracy for 5 hidden units: 89.75 %
Accuracy for 20 hidden units: 90.0 % best
Accuracy for 50 hidden units: 89.75 %

学习率 0.1

Accuracy for 1 hidden units: 67.0 %
Accuracy for 2 hidden units: 64.75 %
Accuracy for 3 hidden units: 88.25 %
Accuracy for 4 hidden units: 88.0 %
Accuracy for 5 hidden units: 88.5 %
Accuracy for 20 hidden units: 88.75 %  best
Accuracy for 50 hidden units: 88.75 %  best

大致规律：

学习率太小，造成学习不充分，准确率较低
学习率越大，需要越多的隐藏单元来提高准确率？（请大佬指点）

4.9 其他数据集下的表现

均为tanh激活函数，学习率1.2

dataset = "noisy_circles"

Accuracy for 1 hidden units: 62.5 %
Accuracy for 2 hidden units: 72.5 %
Accuracy for 3 hidden units: 84.0 % best
Accuracy for 4 hidden units: 83.0 %
Accuracy for 5 hidden units: 83.5 %
Accuracy for 20 hidden units: 79.5 %
Accuracy for 50 hidden units: 83.5 %

在这里插入图片描述

dataset = "noisy_moons"

Accuracy for 1 hidden units: 86.0 %
Accuracy for 2 hidden units: 88.0 %
Accuracy for 3 hidden units: 97.0 % best
Accuracy for 4 hidden units: 96.5 %
Accuracy for 5 hidden units: 96.0 %
Accuracy for 20 hidden units: 86.0 %
Accuracy for 50 hidden units: 86.0 %

在这里插入图片描述

dataset = "blobs"

Accuracy for 1 hidden units: 67.0 %
Accuracy for 2 hidden units: 67.0 %
Accuracy for 3 hidden units: 83.0 %
Accuracy for 4 hidden units: 83.0 %
Accuracy for 5 hidden units: 83.0 %
Accuracy for 20 hidden units: 86.0 % best
Accuracy for 50 hidden units: 83.5 %

在这里插入图片描述

dataset = "gaussian_quantiles"

Accuracy for 1 hidden units: 65.0 %
Accuracy for 2 hidden units: 79.5 %
Accuracy for 3 hidden units: 97.0 %
Accuracy for 4 hidden units: 97.0 %
Accuracy for 5 hidden units: 100.0 % best
Accuracy for 20 hidden units: 97.5 %
Accuracy for 50 hidden units: 96.0 %

在这里插入图片描述
不同的数据集下，表现的效果也不太一样。

我的CSDN博客地址 https://michael.blog.csdn.net/

长按或扫码关注我的公众号（Michael阿明），一起加油、一起学习进步！