The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer.

True/False?

True!

1.3 第3题

在这里插入图片描述

1.4 第4题

You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?

[ ] ReLU

[ ] Leaky ReLU

[ ] sigmoid

[ ] tanh

答案：应该选择sigmoid，因为是二分类，而sigmoid是取值范围在0-1之间，故可以以0.5位界限，比较方便！

1.5 第5题

Consider the following code:

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)

What will be B.shape?

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)
print(B.shape)
B

(4, 1)





array([[ 1.48233428],
       [ 0.68569827],
       [ 0.25963837],
       [-0.08522427]])

1.6 第6题

Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements are True? (Check all that apply)

Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.
Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.
Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.
The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.

答案为1

1.7 第7题

Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

错误！逻辑回归没有隐藏层！初始化为0无所谓！

1.8 第8题

You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(…,…)*1000. What will happen?

[ ] It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.

[ ] This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set 伪 to be very small to prevent divergence; this will slow down learning.

[ ] This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

[✅] This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.

1.9 第9题

在这里插入图片描述

1.10 第10题

在这里插入图片描述

2 编程练习

2.1 需求

建立一个神经网络，它有一个隐藏层。

import numpy as np
import matplotlib.pyplot as plt
from testCases import *
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets

#%matplotlib inline #如果你使用用的是Jupyter Notebook的话请取消注释。

np.random.seed(1) #设置一个固定的随机种子，以保证接下来的步骤中我们的结果是一致的。

2.2 加载和查看数据集

X, Y = load_planar_dataset()

shape_X = X.shape
shape_Y = Y.shape
m = Y.shape[1]  # 训练集里面的数量

print ("X的维度为: " + str(shape_X))
print ("Y的维度为: " + str(shape_Y))
print ("数据集里面的数据有：" + str(m) + " 个")

X的维度为: (2, 400)
Y的维度为: (1, 400)
数据集里面的数据有：400 个

plt.scatter(X[0, :], X[1, :], c=np.squeeze(Y), s=40, cmap=plt.cm.Spectral) #绘制散点图

<matplotlib.collections.PathCollection at 0x111cb7668>

在这里插入图片描述

2.3 逻辑回归实现分类

print(X.T.shape)
print(Y.T.shape)

(400, 2)
(400, 1)

import warnings
warnings.filterwarnings('ignore')

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
lr = LogisticRegression()
# 就不分训练集测试集了 直接交叉验证
score_all = cross_val_score(lr, X.T, Y.T, cv=10, scoring='roc_auc')
print(score_all)
print(score_all.mean())

[0.6525 1.     0.5825 0.0125 0.0675 0.145  0.     0.5    0.9575 0.7425]
0.466

结论：

可以看到10折交叉验证得到的平均Auc结果仅为0.466，说明逻辑回归效果不好
为什么不好呢？从上面原始数据可以看出来这并不是一个线性可分的问题！所以使用线性模型中的逻辑回归效果不好！
下面考虑搭建神经网络！

2.4 搭建神经网络

一般步骤：

先定义网络结构 比如隐层有几层啊隐层有几个神经元啊隐层和输出层的激活函数是什么呀等等！
初始化模型参数
循环
- 前向传播
- 计算损失
- 反向传播
- 更新参数

2.4.1 定义神经网络结构

def layer_sizes(X , Y):
    """
    参数：
     X - 输入数据集,维度为（输入的数量，训练/测试的数量）
     Y - 标签，维度为（输出的数量，训练/测试数量）

    返回：
     n_x - 输入层的数量
     n_h - 隐藏层的数量
     n_y - 输出层的数量
    """
    n_x = X.shape[0] #输入层 2 x1 x2
    n_h = 4 #，隐藏层，硬编码为4
    n_y = Y.shape[0] #输出层 1 

    return (n_x,n_h,n_y)

layer_sizes_test_case() # testCases中的函数 生成标准正态分布！

(array([[ 1.62434536, -0.61175641, -0.52817175],
        [-1.07296862,  0.86540763, -2.3015387 ],
        [ 1.74481176, -0.7612069 ,  0.3190391 ],
        [-0.24937038,  1.46210794, -2.06014071],
        [-0.3224172 , -0.38405435,  1.13376944]]),
 array([[-1.09989127, -0.17242821, -0.87785842],
        [ 0.04221375,  0.58281521, -1.10061918]]))

#测试layer_sizes
print("=========================测试layer_sizes=========================")
X_asses , Y_asses = layer_sizes_test_case()
(n_x,n_h,n_y) =  layer_sizes(X_asses,Y_asses)
print("输入层的节点数量为: n_x = " + str(n_x))
print("隐藏层的节点数量为: n_h = " + str(n_h))
print("输出层的节点数量为: n_y = " + str(n_y))

=========================测试layer_sizes=========================
输入层的节点数量为: n_x = 5
隐藏层的节点数量为: n_h = 4
输出层的节点数量为: n_y = 2

2.4.2 初始化参数

初始化一个是w 一个是b
w的维度是多少呢？w1:4×2 w2: 1×4 不可以初始化为0
b的维度是多少呢？b1:4×1 b2: 1×1 可以初始化为0

np.random.randn(4, 2) * 0.01
# np.zeros((4,1))

array([[-0.00636996,  0.00190915],
       [ 0.02100255,  0.00120159],
       [ 0.00617203,  0.0030017 ],
       [-0.0035225 , -0.01142518]])

def initialize_parameters(n_x,n_h,n_y):
    '''
    参数：
        n_x - 输入层节点的数量
        n_h - 隐藏层节点的数量
        n_y - 输出层节点的数量

    返回：
        parameters - 包含参数的字典：
            W1 - 权重矩阵,维度为（n_h，n_x）
            b1 - 偏向量，维度为（n_h，1）
            W2 - 权重矩阵，维度为（n_y，n_h）
            b2 - 偏向量，维度为（n_y，1）

    '''
    np.random.seed(2) #指定一个随机种子，以便你的输出与我们的一样。
    W1 = np.random.randn(n_h, n_x) * 0.01
    W2 = np.random.randn(n_y, n_h) * 0.01
    b1 = np.zeros((n_h,1))
    b2 = np.zeros((n_y,1))
    
    #使用断言确保我的数据格式是正确的
    assert(W1.shape == ( n_h , n_x ))
    assert(b1.shape == ( n_h , 1 ))
    assert(W2.shape == ( n_y , n_h ))
    assert(b2.shape == ( n_y , 1 ))

    parameters = {'W1': W1,
                 'b1': b1,
                 'W2': W2,
                 'b2': b2}
    return parameters

#测试initialize_parameters
print("=========================测试initialize_parameters=========================")
X_asses , Y_asses = layer_sizes_test_case()
(n_x,n_h,n_y) =  layer_sizes(X_asses,Y_asses)
print("输入层的节点数量为: n_x = " + str(n_x))
print("隐藏层的节点数量为: n_h = " + str(n_h))
print("输出层的节点数量为: n_y = " + str(n_y))
print('---' * 20)
parameters = initialize_parameters(n_x,n_h,n_y)
print("W1权重为: ", parameters['W1'])
print("b1权重为: ", parameters['b1'])
print("W2权重为: ", parameters['W2'])
print("b2权重为: ", parameters['b2'])

=========================测试initialize_parameters=========================
输入层的节点数量为: n_x = 5
隐藏层的节点数量为: n_h = 4
输出层的节点数量为: n_y = 2
------------------------------------------------------------
W1权重为:  [[-4.16757847e-03 -5.62668272e-04 -2.13619610e-02  1.64027081e-02
  -1.79343559e-02]
 [-8.41747366e-03  5.02881417e-03 -1.24528809e-02 -1.05795222e-02
  -9.09007615e-03]
 [ 5.51454045e-03  2.29220801e-02  4.15393930e-04 -1.11792545e-02
   5.39058321e-03]
 [-5.96159700e-03 -1.91304965e-04  1.17500122e-02 -7.47870949e-03
   9.02525097e-05]]
b1权重为:  [[0.]
 [0.]
 [0.]
 [0.]]
W2权重为:  [[-0.00878108 -0.00156434  0.0025657  -0.00988779]
 [-0.00338822 -0.00236184 -0.00637655 -0.01187612]]
b2权重为:  [[0.]
 [0.]]

2.4.3 循环

2.4.3.1 定义隐层+输出层的激活函数

def ReLU(z):
    return max(0,Z)

import numpy
def sigmoid(z):
    return 1/(1+np.exp(-z))

2.4.3.2 前向传播

目的：计算出每次更新参数之后的预测结果！
反向传播所需的值存储在“cache”中，cache将作为反向传播函数的输入。

def forward_propagation(X, parameters):
    '''
    参数：
         X - 维度为（n_x，m）的输入数据。
         parameters - 初始化函数（initialize_parameters）的输出

    返回：
         A2 - 使用sigmoid()函数计算的第二次激活后的数值
         cache - 包含“Z1”，“A1”，“Z2”和“A2”的字典类型变量
    '''
    
    
#     (n_x,n_h,n_y) = layer_sizes(X, Y)
#     parameters = initialize_parameters(n_x,n_h,n_y)
    # 前向传播 计算 A2
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    
    Z1 = np.dot(W1, X) + b1
    A1 = np.tanh(Z1)
#     a1 = ReLU(z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    #使用断言确保我的数据格式是正确的
    assert(A2.shape == (1,X.shape[1]))
    cache = {'Z1': Z1,
            'A1': A1,
            'Z2': Z2,
            'A2': A2}
    
    return (A2, cache)

print("=========================测试forward_propagation=========================") 
X_assess, parameters = forward_propagation_test_case()
A2, cache = forward_propagation(X_assess, parameters)
print(np.mean(cache["Z1"]), np.mean(cache["A1"]), np.mean(cache["Z2"]), np.mean(cache["A2"]))

=========================测试forward_propagation=========================
-0.0004997557777419913 -0.0004969633532317802 0.0004381874509591466 0.500109546852431

2.4.3.3 计算损失

def compute_cost(A2,Y,parameters):
    # 下面的方法太麻烦了 应该向量化的！
    '''
    logloss = 0
    m = Y.shape[1]
    for i in range(m):
        logloss += (-Y[i]*np.log(A2[i])-(1-Y[i])*log(1-A2[i]))
    return (1/m) * logloss
    '''
    m = Y.shape[1]
    logprobs = np.multiply(np.log(A2), Y) + np.multiply(np.log(1-A2), (1-Y))
    cost = - np.sum(logprobs) / m
    cost = float(np.squeeze(cost))
    assert(isinstance(cost,float))

    return cost

#测试compute_cost
print("=========================测试compute_cost=========================") 
A2 , Y_assess , parameters = compute_cost_test_case()
print("cost = " + str(compute_cost(A2,Y_assess,parameters)))

=========================测试compute_cost=========================
cost = 0.6929198937761266

2.4.3.4 反向传播

使用正向传播期间计算的cache，现在可以利用它实现反向传播。
现在我们要开始实现函数backward_propagation

# 定义激活函数的导数值 - 不太好
def d_tanh(z):
    a = np.tanh(z)
    d = 1 - a*a
    return d

def backward_propagation(parameters,cache,X,Y):
    
#     alpha = alpha
    # 第一步
#     Z1 = cache['Z1'] # 没用上
    A1 = cache['A1']
#     Z2 = cache['Z2'] # 没用上
    A2 = cache['A2']
    
    m = X.shape[1]
    dZ2 = A2 - Y
    dW2 = np.dot(dZ2, A1.T) / m
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m
    
    # 第二步
    W1 = parameters['W1']
#     b1 = parameters['b1']
    W2 = parameters['W2']
#     b2 = parameters['b2']
    
    dZ1_tmp = np.dot(W2.T, dZ2)
#     dZ1 = np.dot(dZ1_tmp, d_tanh(Z1))
    dZ1 = np.multiply(dZ1_tmp, 1-np.power(A1,2))
    dW1 = np.dot(dZ1, X.T) / m
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m
    
    # 更新参数
#     W1 = W1 - alpha * dW1
#     b1 = b1 - alpha * db1
#     W2 = W2 - alpha * dW2
#     b2 = b2 - alpha * db2
    
    grads = {'dW1': dW1,
                 'db1': db1,
                 'dW2': dW2,
                 'db2': db2}
    return grads

#测试backward_propagation
print("=========================测试backward_propagation=========================")
parameters, cache, X_assess, Y_assess = backward_propagation_test_case()

grads = backward_propagation(parameters, cache, X_assess, Y_assess)
print ("dW1 = "+ str(grads["dW1"]))
print ("db1 = "+ str(grads["db1"]))
print ("dW2 = "+ str(grads["dW2"]))
print ("db2 = "+ str(grads["db2"]))

=========================测试backward_propagation=========================
dW1 = [[ 0.01018708 -0.00708701]
 [ 0.00873447 -0.0060768 ]
 [-0.00530847  0.00369379]
 [-0.02206365  0.01535126]]
db1 = [[-0.00069728]
 [-0.00060606]
 [ 0.000364  ]
 [ 0.00151207]]
dW2 = [[ 0.00363613  0.03153604  0.01162914 -0.01318316]]
db2 = [[0.06589489]]

2.4.3.5 更新参数

def update_parameters(parameters, grads, learning_rate=1.2):
    
    '''
    使用上面给出的梯度下降更新规则更新参数

    参数：
     parameters - 包含参数的字典类型的变量。
     grads - 包含导数值的字典类型的变量。
     learning_rate - 学习速率

    返回：
     parameters - 包含更新参数的字典类型的变量。
    '''
    
    # 所有参数
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    
    # 梯度
    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']
    
    # 更新参数
    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2

    parameters = {'W1': W1,
                 'b1': b1,
                 'W2': W2,
                 'b2': b2}
    return parameters

#测试update_parameters
print("=========================测试update_parameters=========================")
parameters, grads = update_parameters_test_case()
parameters = update_parameters(parameters, grads)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

=========================测试update_parameters=========================
W1 = [[-0.00643025  0.01936718]
 [-0.02410458  0.03978052]
 [-0.01653973 -0.02096177]
 [ 0.01046864 -0.05990141]]
b1 = [[-1.02420756e-06]
 [ 1.27373948e-05]
 [ 8.32996807e-07]
 [-3.20136836e-06]]
W2 = [[-0.01041081 -0.04463285  0.01758031  0.04747113]]
b2 = [[0.00010457]]

2.5 整合上述的过程

def layer_sizes(X , Y):
    """
    参数：
     X - 输入数据集,维度为（输入的数量，训练/测试的数量）
     Y - 标签，维度为（输出的数量，训练/测试数量）

    返回：
     n_x - 输入层的数量
     n_h - 隐藏层的数量
     n_y - 输出层的数量
    """
    n_x = X.shape[0] #输入层 2 x1 x2
    n_h = 4 #，隐藏层，硬编码为4
    n_y = Y.shape[0] #输出层 1 

    return (n_x,n_h,n_y)

def initialize_parameters(n_x,n_h,n_y):
    '''
    参数：
        n_x - 输入层节点的数量
        n_h - 隐藏层节点的数量
        n_y - 输出层节点的数量

    返回：
        parameters - 包含参数的字典：
            W1 - 权重矩阵,维度为（n_h，n_x）
            b1 - 偏向量，维度为（n_h，1）
            W2 - 权重矩阵，维度为（n_y，n_h）
            b2 - 偏向量，维度为（n_y，1）

    '''
    np.random.seed(2) #指定一个随机种子，以便你的输出与我们的一样。
    W1 = np.random.randn(n_h, n_x) * 0.01
    W2 = np.random.randn(n_y, n_h) * 0.01
    b1 = np.zeros((n_h,1))
    b2 = np.zeros((n_y,1))
    
    #使用断言确保我的数据格式是正确的
    assert(W1.shape == ( n_h , n_x ))
    assert(b1.shape == ( n_h , 1 ))
    assert(W2.shape == ( n_y , n_h ))
    assert(b2.shape == ( n_y , 1 ))

    parameters = {'W1': W1,
                 'b1': b1,
                 'W2': W2,
                 'b2': b2}
    return parameters

def forward_propagation(X, parameters):
    '''
    参数：
         X - 维度为（n_x，m）的输入数据。
         parameters - 初始化函数（initialize_parameters）的输出

    返回：
         A2 - 使用sigmoid()函数计算的第二次激活后的数值
         cache - 包含“Z1”，“A1”，“Z2”和“A2”的字典类型变量
    '''
    
    
#     (n_x,n_h,n_y) = layer_sizes(X, Y)
#     parameters = initialize_parameters(n_x,n_h,n_y)
    # 前向传播 计算 A2
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    
    Z1 = np.dot(W1, X) + b1
    A1 = np.tanh(Z1)
#     a1 = ReLU(z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    #使用断言确保我的数据格式是正确的
    assert(A2.shape == (1,X.shape[1]))
    cache = {'Z1': Z1,
            'A1': A1,
            'Z2': Z2,
            'A2': A2}
    
    return (A2, cache)

def compute_cost(A2,Y,parameters):
    # 下面的方法太麻烦了 应该向量化的！
    '''
    logloss = 0
    m = Y.shape[1]
    for i in range(m):
        logloss += (-Y[i]*np.log(A2[i])-(1-Y[i])*log(1-A2[i]))
    return (1/m) * logloss
    '''
    m = Y.shape[1]
    logprobs = np.multiply(np.log(A2), Y) + np.multiply(np.log(1-A2), (1-Y))
    cost = - np.sum(logprobs) / m
    cost = float(np.squeeze(cost))
    assert(isinstance(cost,float))

    return cost

def backward_propagation(parameters,cache,X,Y):
    
#     alpha = alpha
    # 第一步
#     Z1 = cache['Z1'] # 没用上
    A1 = cache['A1']
#     Z2 = cache['Z2'] # 没用上
    A2 = cache['A2']
    
    m = X.shape[1]
    dZ2 = A2 - Y
    dW2 = np.dot(dZ2, A1.T) / m
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m
    
    # 第二步
    W1 = parameters['W1']
#     b1 = parameters['b1']
    W2 = parameters['W2']
#     b2 = parameters['b2']
    
    dZ1_tmp = np.dot(W2.T, dZ2)
#     dZ1 = np.dot(dZ1_tmp, d_tanh(Z1))
    dZ1 = np.multiply(dZ1_tmp, 1-np.power(A1,2))
    dW1 = np.dot(dZ1, X.T) / m
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m
    
    # 更新参数
#     W1 = W1 - alpha * dW1
#     b1 = b1 - alpha * db1
#     W2 = W2 - alpha * dW2
#     b2 = b2 - alpha * db2
    
    grads = {'dW1': dW1,
                 'db1': db1,
                 'dW2': dW2,
                 'db2': db2}
    return grads

def update_parameters(parameters, grads, learning_rate=1.2):
    
    '''
    使用上面给出的梯度下降更新规则更新参数

    参数：
     parameters - 包含参数的字典类型的变量。
     grads - 包含导数值的字典类型的变量。
     learning_rate - 学习速率

    返回：
     parameters - 包含更新参数的字典类型的变量。
    '''
    
    # 所有参数
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    
    # 梯度
    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']
    
    # 更新参数
    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2

    parameters = {'W1': W1,
                 'b1': b1,
                 'W2': W2,
                 'b2': b2}
    return parameters

def nn_model(X,Y,n_h,num_iterations,print_cost=False):
    # 初始化网络结构
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    # 初始化参数
    parameters = initialize_parameters(n_x,n_h,n_y)
    
    cost = []
    for i in range(num_iterations):
        # 前向传播
        A2, cache = forward_propagation(X, parameters)
        # 计算损失
        cost.append(compute_cost(A2, Y, parameters))
        # 反向传播
        grads = backward_propagation(parameters, cache, X, Y)
        # 更新参数
        parameters = update_parameters(parameters, grads, learning_rate=0.5)
        
        # 打印损失看看
        if print_cost:
            if i%1000 == 0:
                print("第 ",i," 次循环，成本为："+str(cost[i]))
        
    return parameters, cost

#测试nn_model
print("=========================测试nn_model=========================")
X_assess, Y_assess = nn_model_test_case()

parameters, cost = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=False)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

=========================测试nn_model=========================
W1 = [[-3.89167767  4.77541602]
 [-6.77960338  1.20272585]
 [-3.88338966  4.78028666]
 [ 6.77958203 -1.20272574]]
b1 = [[ 2.11530892]
 [ 3.41221357]
 [ 2.11585732]
 [-3.41221322]]
W2 = [[-2512.9093032  -2502.70799785 -2512.01655969  2502.65264416]]
b2 = [[-22.29071761]]

len(cost)

plt.plot(cost)

[<matplotlib.lines.Line2D at 0x121071b70>]

在这里插入图片描述

2.6 预测

运用最新的参数来进行预测
同时定义函数进行分类确定概率的阈值

def predict_prob(z):
    if z > 0.5:
        return 1
    else:
        return 0

# 上面的函数等价于 np.round
# 太厉害了！

def predict(parameters,X):
    
    '''
    使用学习的参数，为X中的每个示例预测一个类

    参数：
        parameters - 包含参数的字典类型的变量。
        X - 输入数据（n_x，m）

    返回
        predictions - 我们模型预测的向量（红色：0 /蓝色：1）
    '''
    
    A2, cache = forward_propagation(X, parameters)
#     result = predict_prob(A2)
    predictions = np.round(A2)

    return predictions,A2

#测试predict
print("=========================测试predict=========================")

parameters, X_assess = predict_test_case()

predictions = predict(parameters, X_assess)
print("预测的平均值 = " + str(np.mean(predictions)))

=========================测试predict=========================
预测的平均值 = 0.6666666666666666

2.7 正式运行

A2.shape

(1, 400)

Y.shape

(1, 400)

parameters, cost = nn_model(X,Y,4,num_iterations=10000,print_cost=True)

predictions,A2 = predict(parameters, X)
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
# 计算auc
from sklearn import metrics
print('AUC为：', metrics.roc_auc_score(Y[0], A2[0]))

第  0  次循环，成本为：0.6930480201239823
第  1000  次循环，成本为：0.3098018601352803
第  2000  次循环，成本为：0.2924326333792646
第  3000  次循环，成本为：0.2833492852647412
第  4000  次循环，成本为：0.27678077562979253
第  5000  次循环，成本为：0.2634715508859315
第  6000  次循环，成本为：0.2420441312994076
第  7000  次循环，成本为：0.23552486626608765
第  8000  次循环，成本为：0.23140964509854278
第  9000  次循环，成本为：0.22846408048352365
准确率: 90%
AUC为： 0.9554875

plt.plot(cost)
plt.xlabel('rounds')
plt.ylabel('costs')
plt.title('costs_curve')

Text(0.5, 1.0, 'costs_curve')

在这里插入图片描述

2.8 继续探索-修改隐层个数

2.8.1 隐层个数为3

parameters, cost = nn_model(X,Y,3,num_iterations=10000,print_cost=True)

predictions,A2 = predict(parameters, X)
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
# 计算auc
from sklearn import metrics
print('AUC为：', metrics.roc_auc_score(Y[0], A2[0]))

第  0  次循环，成本为：0.6931142222248914
第  1000  次循环，成本为：0.30458021375364525
第  2000  次循环，成本为：0.2889543896241841
第  3000  次循环，成本为：0.2813362390693701
第  4000  次循环，成本为：0.2762056290917325
第  5000  次循环，成本为：0.27234679294052705
第  6000  次循环，成本为：0.2692913448193197
第  7000  次循环，成本为：0.26680027251827704
第  8000  次循环，成本为：0.26472927260218204
第  9000  次循环，成本为：0.26298202702898793
准确率: 90%
AUC为： 0.9316625000000001

2.8.2 隐层个数为5

parameters, cost = nn_model(X,Y,5,num_iterations=10000,print_cost=True)

predictions,A2 = predict(parameters, X)
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
# 计算auc
from sklearn import metrics
print('AUC为：', metrics.roc_auc_score(Y[0], A2[0]))

第  0  次循环，成本为：0.6932523197920643
第  1000  次循环，成本为：0.30190419419788633
第  2000  次循环，成本为：0.28710926743182574
第  3000  次循环，成本为：0.279364307312653
第  4000  次循环，成本为：0.2739440923229426
第  5000  次循环，成本为：0.2698337077666274
第  6000  次循环，成本为：0.26658174487625713
第  7000  次循环，成本为：0.26391056140530544
第  8000  次循环，成本为：0.2616187349209759
第  9000  次循环，成本为：0.2595348220369931
准确率: 91%
AUC为： 0.9404250000000001

结论：

目前来看，隐层为4是最好的选择

2.9 继续探索-修改学习率

2.9.1 学习率为0.1

def nn_model(X,Y,n_h,num_iterations,print_cost=False):
    # 初始化网络结构
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    # 初始化参数
    parameters = initialize_parameters(n_x,n_h,n_y)
    
    cost = []
    for i in range(num_iterations):
        # 前向传播
        A2, cache = forward_propagation(X, parameters)
        # 计算损失
        cost.append(compute_cost(A2, Y, parameters))
        # 反向传播
        grads = backward_propagation(parameters, cache, X, Y)
        # 更新参数
        parameters = update_parameters(parameters, grads, learning_rate=0.1)
        
        # 打印损失看看
        if print_cost:
            if i%1000 == 0:
                print("第 ",i," 次循环，成本为："+str(cost[i]))
        
    return parameters, cost

parameters, cost = nn_model(X,Y,4,num_iterations=10000,print_cost=True)

predictions,A2 = predict(parameters, X)
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
# 计算auc
from sklearn import metrics
print('AUC为：', metrics.roc_auc_score(Y[0], A2[0]))

第  0  次循环，成本为：0.6930480201239823
第  1000  次循环，成本为：0.6082072899772629
第  2000  次循环，成本为：0.36073810801711903
第  3000  次循环，成本为：0.3290211712344614
第  4000  次循环，成本为：0.3169254361704178
第  5000  次循环，成本为：0.3097334282605931
第  6000  次循环，成本为：0.3046867806714765
第  7000  次循环，成本为：0.30080373411176864
第  8000  次循环，成本为：0.2976318125152893
第  9000  次循环，成本为：0.2949307617619345
准确率: 89%
AUC为： 0.9420000000000001

2.9.2 学习率为0.01

def nn_model(X,Y,n_h,num_iterations,print_cost=False):
    # 初始化网络结构
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    # 初始化参数
    parameters = initialize_parameters(n_x,n_h,n_y)
    
    cost = []
    for i in range(num_iterations):
        # 前向传播
        A2, cache = forward_propagation(X, parameters)
        # 计算损失
        cost.append(compute_cost(A2, Y, parameters))
        # 反向传播
        grads = backward_propagation(parameters, cache, X, Y)
        # 更新参数
        parameters = update_parameters(parameters, grads, learning_rate=0.01)
        
        # 打印损失看看
        if print_cost:
            if i%1000 == 0:
                print("第 ",i," 次循环，成本为："+str(cost[i]))
        
    return parameters, cost

parameters, cost = nn_model(X,Y,4,num_iterations=10000,print_cost=True)

predictions,A2 = predict(parameters, X)
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
# 计算auc
from sklearn import metrics
print('AUC为：', metrics.roc_auc_score(Y[0], A2[0]))

第  0  次循环，成本为：0.6930480201239823
第  1000  次循环，成本为：0.688932572309688
第  2000  次循环，成本为：0.6775378463175366
第  3000  次循环，成本为：0.6703348893575363
第  4000  次循环，成本为：0.6496376254088237
第  5000  次循环，成本为：0.6306902415507508
第  6000  次循环，成本为：0.6218616797674815
第  7000  次循环，成本为：0.6171159617033553
第  8000  次循环，成本为：0.6136090975678596
第  9000  次循环，成本为：0.6106707129998842
准确率: 55%
AUC为： 0.6728500000000001

可以看到1万轮迭代还没有收敛继续迭代！

parameters, cost = nn_model(X,Y,4,num_iterations=50000,print_cost=True)

predictions,A2 = predict(parameters, X)
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
# 计算auc
from sklearn import metrics
print('AUC为：', metrics.roc_auc_score(Y[0], A2[0]))

第  0  次循环，成本为：0.6930480201239823
第  1000  次循环，成本为：0.688932572309688
第  2000  次循环，成本为：0.6775378463175366
第  3000  次循环，成本为：0.6703348893575363
第  4000  次循环，成本为：0.6496376254088237
第  5000  次循环，成本为：0.6306902415507508
第  6000  次循环，成本为：0.6218616797674815
第  7000  次循环，成本为：0.6171159617033553
第  8000  次循环，成本为：0.6136090975678596
第  9000  次循环，成本为：0.6106707129998842
第  10000  次循环，成本为：0.608163583891612
第  11000  次循环，成本为：0.6058905647297186
第  12000  次循环，成本为：0.6027834825648756
第  13000  次循环，成本为：0.5523938714990329
第  14000  次循环，成本为：0.47967203953302684
第  15000  次循环，成本为：0.4348750458083448
第  16000  次循环，成本为：0.40728314388684744
第  17000  次循环，成本为：0.38924018867788823
第  18000  次循环，成本为：0.3766692500106537
第  19000  次循环，成本为：0.367413794512512
第  20000  次循环，成本为：0.3602834919443704
第  21000  次循环，成本为：0.3545865692893643
第  22000  次循环，成本为：0.3499002292965136
第  23000  次循环，成本为：0.3459537689884274
第  24000  次循环，成本为：0.3425665004077578
第  25000  次循环，成本为：0.3396133493582579
第  26000  次循环，成本为：0.3370049768527177
第  27000  次循环，成本为：0.3346758189101813
第  28000  次循环，成本为：0.332576616082092
第  29000  次循环，成本为：0.3306695888427735
第  30000  次循环，成本为：0.3289252294595474
第  31000  次循环，成本为：0.32732011375421904
第  32000  次循环，成本为：0.3258353747011518
第  33000  次循环，成本为：0.3244556161816236
第  34000  次循环，成本为：0.32316812587736293
第  35000  次循环，成本为：0.3219622954509282
第  36000  次循环，成本为：0.32082918691855317
第  37000  次循环，成本为：0.31976120380702594
第  38000  次循环，成本为：0.3187518385431214
第  39000  次循环，成本为：0.3177954760757447
第  40000  次循环，成本为：0.3168872395147314
第  41000  次循环，成本为：0.31602286754285075
第  42000  次循环，成本为：0.3151986161255414
第  43000  次循环，成本为：0.3144111789975419
第  44000  次循环，成本为：0.3136576228032735
第  45000  次循环，成本为：0.31293533377912264
第  46000  次循环，成本为：0.31224197360564665
第  47000  次循环，成本为：0.31157544260471165
第  48000  次循环，成本为：0.31093384886498865
第  49000  次循环，成本为：0.31031548218703897
准确率: 88%
AUC为： 0.93635

综上来看，学习率为0.5是一个不错的选择

2.10 继续探索-修改激活函数

2.10.1 修改为ReLU激活函数

需要定义ReLU激活函数和其对应的导数
for循环有待改进！速度有点慢！

z_tmp = np.random.randn(4,400)
print(z_tmp)
print(z_tmp.shape)

[[ 0.58472759 -0.47711751  0.68453845 ... -1.24153568  0.25193707
  -0.53319152]
 [ 0.26508834  0.48854075 -1.35832514 ...  0.58014136  1.18702522
  -1.22336582]
 [-0.29353028  0.38510241 -0.73588592 ...  1.04939218 -0.50527221
   0.35924516]
 [ 0.68859855 -1.25955219  1.1387209  ...  0.12512238 -0.14781898
  -0.17004473]]
(4, 400)

现在要对上面每一个元素都调用一个ReLU函数！

A = np.zeros((z_tmp.shape[0], z_tmp.shape[1]))
for i in range(z_tmp.shape[0]):
    for j in range(z_tmp.shape[1]):
        A[i][j] = ReLU(z_tmp[i][j])
A

array([[0.58472759, 0.        , 0.68453845, ..., 0.        , 0.25193707,
        0.        ],
       [0.26508834, 0.48854075, 0.        , ..., 0.58014136, 1.18702522,
        0.        ],
       [0.        , 0.38510241, 0.        , ..., 1.04939218, 0.        ,
        0.35924516],
       [0.68859855, 0.        , 1.1387209 , ..., 0.12512238, 0.        ,
        0.        ]])

def ReLU(z):

    return max(0,z)

def d_ReLU(z):
    if z > 0:
        return 1
    else:
        return 0

下面仅列出需要更新的函数！

X.shape

(2, 400)

def forward_propagation(X, parameters):
    '''
    参数：
         X - 维度为（n_x，m）的输入数据。
         parameters - 初始化函数（initialize_parameters）的输出

    返回：
         A2 - 使用sigmoid()函数计算的第二次激活后的数值
         cache - 包含“Z1”，“A1”，“Z2”和“A2”的字典类型变量
    '''
    
    
#     (n_x,n_h,n_y) = layer_sizes(X, Y)
#     parameters = initialize_parameters(n_x,n_h,n_y)
    # 前向传播 计算 A2
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    
    Z1 = np.dot(W1, X) + b1
#     A1 = np.tanh(Z1)
    '''
    变动1
    '''
    A1 = np.zeros((Z1.shape[0], Z1.shape[1]))
    for i in range(Z1.shape[0]):
        for j in range(Z1.shape[1]):
            A1[i][j] = ReLU(Z1[i][j])
#     A1 = ReLU(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    #使用断言确保我的数据格式是正确的
    assert(A2.shape == (1,X.shape[1]))
    cache = {'Z1': Z1,
            'A1': A1,
            'Z2': Z2,
            'A2': A2}
    
    return (A2, cache)

def backward_propagation(parameters,cache,X,Y):
    
#     alpha = alpha
    # 第一步
    Z1 = cache['Z1'] # 没用上
    A1 = cache['A1']
#     Z2 = cache['Z2'] # 没用上
    A2 = cache['A2']
    
    m = X.shape[1]
    dZ2 = A2 - Y
    dW2 = np.dot(dZ2, A1.T) / m
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m
    
    # 第二步
    W1 = parameters['W1']
#     b1 = parameters['b1']
    W2 = parameters['W2']
#     b2 = parameters['b2']
    
    dZ1_tmp = np.dot(W2.T, dZ2)
#     dZ1 = np.dot(dZ1_tmp, d_tanh(Z1))
    '''
    变动2
    '''
    ddZ1 = np.zeros((Z1.shape[0], Z1.shape[1]))
    for i in range(Z1.shape[0]):
        for j in range(Z1.shape[1]):
            ddZ1[i][j] = d_ReLU(Z1[i][j])
    
    dZ1 = np.multiply(dZ1_tmp, ddZ1)
    dW1 = np.dot(dZ1, X.T) / m
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m
    
    # 更新参数
#     W1 = W1 - alpha * dW1
#     b1 = b1 - alpha * db1
#     W2 = W2 - alpha * dW2
#     b2 = b2 - alpha * db2
    
    grads = {'dW1': dW1,
                 'db1': db1,
                 'dW2': dW2,
                 'db2': db2}
    return grads

def nn_model(X,Y,n_h,num_iterations,print_cost=False):
    # 初始化网络结构
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    # 初始化参数
    parameters = initialize_parameters(n_x,n_h,n_y)
    
    cost = []
    for i in range(num_iterations):
        # 前向传播
        A2, cache = forward_propagation(X, parameters)
        # 计算损失
        cost.append(compute_cost(A2, Y, parameters))
        # 反向传播
        grads = backward_propagation(parameters, cache, X, Y)
        # 更新参数
        parameters = update_parameters(parameters, grads, learning_rate=0.5)
        
        # 打印损失看看
        if print_cost:
            if i%1000 == 0:
                print("第 ",i," 次循环，成本为："+str(cost[i]))
        
    return parameters, cost

parameters, cost = nn_model(X,Y,4,num_iterations=10000,print_cost=True)

predictions,A2 = predict(parameters, X)
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
# 计算auc
from sklearn import metrics
print('AUC为：', metrics.roc_auc_score(Y[0], A2[0]))

第  0  次循环，成本为：0.6930967698565371
第  1000  次循环，成本为：0.5837333193671671
第  2000  次循环，成本为：0.582480193736169
第  3000  次循环，成本为：0.5820746670783192
第  4000  次循环，成本为：0.5819189632821291
第  5000  次循环，成本为：0.5818578214239326
第  6000  次循环，成本为：0.581872455649722
第  7000  次循环，成本为：0.5818531341179692
第  8000  次循环，成本为：0.5818445071739313
第  9000  次循环，成本为：0.581860102502225
准确率: 69%
AUC为： 0.7574375

可以看到使用ReLU作为激活函数的效果不好！

2.10.2 修改为sigmoid函数

def forward_propagation(X, parameters):
    '''
    参数：
         X - 维度为（n_x，m）的输入数据。
         parameters - 初始化函数（initialize_parameters）的输出

    返回：
         A2 - 使用sigmoid()函数计算的第二次激活后的数值
         cache - 包含“Z1”，“A1”，“Z2”和“A2”的字典类型变量
    '''
    
    
#     (n_x,n_h,n_y) = layer_sizes(X, Y)
#     parameters = initialize_parameters(n_x,n_h,n_y)
    # 前向传播 计算 A2
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    
    Z1 = np.dot(W1, X) + b1
#     A1 = np.tanh(Z1)
    '''
    变动1
    '''
    A1 = sigmoid(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    #使用断言确保我的数据格式是正确的
    assert(A2.shape == (1,X.shape[1]))
    cache = {'Z1': Z1,
            'A1': A1,
            'Z2': Z2,
            'A2': A2}
    
    return (A2, cache)

def backward_propagation(parameters,cache,X,Y):
    
#     alpha = alpha
    # 第一步
    Z1 = cache['Z1'] # 没用上
    A1 = cache['A1']
#     Z2 = cache['Z2'] # 没用上
    A2 = cache['A2']
    
    m = X.shape[1]
    dZ2 = A2 - Y
    dW2 = np.dot(dZ2, A1.T) / m
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m
    
    # 第二步
    W1 = parameters['W1']
#     b1 = parameters['b1']
    W2 = parameters['W2']
#     b2 = parameters['b2']
    
    dZ1_tmp = np.dot(W2.T, dZ2)
#     dZ1 = np.dot(dZ1_tmp, d_tanh(Z1))
    '''
    变动2
    '''
    ddZ1 = A1 * (1-A1)
    
    dZ1 = np.multiply(dZ1_tmp, ddZ1)
    dW1 = np.dot(dZ1, X.T) / m
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m
    
    # 更新参数
#     W1 = W1 - alpha * dW1
#     b1 = b1 - alpha * db1
#     W2 = W2 - alpha * dW2
#     b2 = b2 - alpha * db2
    
    grads = {'dW1': dW1,
                 'db1': db1,
                 'dW2': dW2,
                 'db2': db2}
    return grads

def nn_model(X,Y,n_h,num_iterations,print_cost=False):
    # 初始化网络结构
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    # 初始化参数
    parameters = initialize_parameters(n_x,n_h,n_y)
    
    cost = []
    for i in range(num_iterations):
        # 前向传播
        A2, cache = forward_propagation(X, parameters)
        # 计算损失
        cost.append(compute_cost(A2, Y, parameters))
        # 反向传播
        grads = backward_propagation(parameters, cache, X, Y)
        # 更新参数
        parameters = update_parameters(parameters, grads, learning_rate=0.5)
        
        # 打印损失看看
        if print_cost:
            if i%1000 == 0:
                print("第 ",i," 次循环，成本为："+str(cost[i]))
        
    return parameters, cost

parameters, cost = nn_model(X,Y,4,num_iterations=10000,print_cost=True)

predictions,A2 = predict(parameters, X)
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
# 计算auc
from sklearn import metrics
print('AUC为：', metrics.roc_auc_score(Y[0], A2[0]))

第  0  次循环，成本为：0.6931247519503794
第  1000  次循环，成本为：0.6107222228158191
第  2000  次循环，成本为：0.5995084389140075
第  3000  次循环，成本为：0.593107158384943
第  4000  次循环，成本为：0.5868709574764809
第  5000  次循环，成本为：0.3488657656261512
第  6000  次循环，成本为：0.32144647880586474
第  7000  次循环，成本为：0.31129073544086866
第  8000  次循环，成本为：0.3051780807204319
第  9000  次循环，成本为：0.3008250363286913
准确率: 88%
AUC为： 0.940525

初步来看，sigmoid作为激活函数效果还ok 和tanh比较接近！

尝试增加迭代次数，看会不会效果更好！

parameters, cost = nn_model(X,Y,4,num_iterations=30000,print_cost=True)

predictions,A2 = predict(parameters, X)
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
# 计算auc
from sklearn import metrics
print('AUC为：', metrics.roc_auc_score(Y[0], A2[0]))

第  0  次循环，成本为：0.6931247519503794
第  1000  次循环，成本为：0.6107222228158191
第  2000  次循环，成本为：0.5995084389140075
第  3000  次循环，成本为：0.593107158384943
第  4000  次循环，成本为：0.5868709574764809
第  5000  次循环，成本为：0.3488657656261512
第  6000  次循环，成本为：0.32144647880586474
第  7000  次循环，成本为：0.31129073544086866
第  8000  次循环，成本为：0.3051780807204319
第  9000  次循环，成本为：0.3008250363286913
第  10000  次循环，成本为：0.2974458944935382
第  11000  次循环，成本为：0.2946780623796379
第  12000  次循环，成本为：0.2923255453297519
第  13000  次循环，成本为：0.29027124456539866
第  14000  次循环，成本为：0.28844000975040357
第  15000  次循环，成本为：0.28678077115120976
第  16000  次循环，成本为：0.285256929060494
第  17000  次循环，成本为：0.2838406200306754
第  18000  次循环，成本为：0.28250884492773976
第  19000  次循环，成本为：0.2812406140964282
第  20000  次循环，成本为：0.28001568482824973
第  21000  次循环，成本为：0.27881786253062496
第  22000  次循环，成本为：0.27764446577654806
第  23000  次循环，成本为：0.2765094103606371
第  24000  次循环，成本为：0.2754290107057539
第  25000  次循环，成本为：0.27441038489412295
第  26000  次循环，成本为：0.2734523400876381
第  27000  次循环，成本为：0.27255078918374975
第  28000  次循环，成本为：0.2717015333076758
第  29000  次循环，成本为：0.270900699909345
准确率: 90%
AUC为： 0.9369500000000001

提升效果不是很大，AUC反而还下降了！

3 使用keras搭建上述神经网络

3.1 无隐层

def Convert1(x):
    s = (x - min(x)) / (max(x) - min(x))
    return s

def Convert2(df):
    # 函数作用：归一化
    m = df.shape[1]
    for i in range(m):
        df[:,i] = Convert1(df[:,i])
    return df

from __future__ import print_function
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
from keras.utils import np_utils
from sklearn.model_selection import train_test_split


np.random.seed(1671)  # for reproducibility

# network and training
NB_EPOCH = 100
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 2   # number of outputs = number of digits
OPTIMIZER = SGD() # SGD optimizer, explained later in this chapter
N_HIDDEN = 128
VALIDATION_SPLIT=0.2 # how much TRAIN is reserved for VALIDATION

# data: shuffled and split between train and test sets
#
X_train, X_test, y_train, y_test = train_test_split(X.T, Y.T, test_size = 0.3,
                                                    random_state = 23)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# normalize 
X_train = Convert2(X_train)
X_test = Convert2(X_test)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, NB_CLASSES)
Y_test = np_utils.to_categorical(y_test, NB_CLASSES)

# 2 outputs
# final stage is softmax

model = Sequential()
model.add(Dense(NB_CLASSES, input_shape=(2,))) # 一个是输出类别 一个是自变量x的个数
model.add(Activation('softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=OPTIMIZER,
              metrics=['accuracy'])

history = model.fit(X_train, Y_train,
                    batch_size=BATCH_SIZE, epochs=NB_EPOCH,
                    verbose=VERBOSE, validation_split=VALIDATION_SPLIT)
score = model.evaluate(X_test, Y_test, verbose=VERBOSE)
print("\nTest score:", score[0])
print('Test AUC:', score[1])

280 train samples
120 test samples
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_3 (Dense)              (None, 2)                 6         
_________________________________________________________________
activation_3 (Activation)    (None, 2)                 0         
=================================================================
Total params: 6
Trainable params: 6
Non-trainable params: 0
_________________________________________________________________
Train on 224 samples, validate on 56 samples
Epoch 1/100
224/224 [==============================] - 0s 878us/step - loss: 0.7439 - acc: 0.4554 - val_loss: 0.6457 - val_acc: 0.6250
Epoch 2/100
224/224 [==============================] - 0s 46us/step - loss: 0.7422 - acc: 0.4554 - val_loss: 0.6457 - val_acc: 0.6250
......
Epoch 97/100
224/224 [==============================] - 0s 24us/step - loss: 0.6864 - acc: 0.6429 - val_loss: 0.6756 - val_acc: 0.6786
Epoch 98/100
224/224 [==============================] - 0s 32us/step - loss: 0.6863 - acc: 0.6518 - val_loss: 0.6759 - val_acc: 0.6786
Epoch 99/100
224/224 [==============================] - 0s 32us/step - loss: 0.6862 - acc: 0.6562 - val_loss: 0.6762 - val_acc: 0.6786
Epoch 100/100
224/224 [==============================] - 0s 17us/step - loss: 0.6861 - acc: 0.6562 - val_loss: 0.6765 - val_acc: 0.6964
120/120 [==============================] - 0s 31us/step

Test score: 0.6931502143541972
Test AUC: 0.6166666666666667

import matplotlib.pyplot as plt
def Keras_Plot(history):    
#     fig,axes=plt.subplots(1,2,figsize=(10,8))
#     plt.figure(10,8)
    # summarize history for accuracy
    plt.plot(history.history['acc']) # 训练集的acc
    plt.plot(history.history['val_acc']) # 验证集acc
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='lower right')
    plt.show()
    # summarize history for loss
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper right')
    plt.show()
Keras_Plot(history)

在这里插入图片描述

3.2 加隐层

from __future__ import print_function
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.optimizers import SGD
from keras.utils import np_utils
from sklearn.model_selection import train_test_split


np.random.seed(1671)  # for reproducibility

# network and training
NB_EPOCH = 100
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 2   # number of outputs = number of digits
OPTIMIZER = SGD() # SGD optimizer, explained later in this chapter
N_HIDDEN = 128
VALIDATION_SPLIT=0.2 # how much TRAIN is reserved for VALIDATION
DROPOUT = 0.3

# data: shuffled and split between train and test sets
#
X_train, X_test, y_train, y_test = train_test_split(X.T, Y.T, test_size = 0.3,
                                                    random_state = 23)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# normalize 
X_train = Convert2(X_train)
X_test = Convert2(X_test)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, NB_CLASSES)
Y_test = np_utils.to_categorical(y_test, NB_CLASSES)

# 2 outputs
# final stage is softmax

model = Sequential()
model.add(Dense(N_HIDDEN, input_shape = (2,))) 
'''
输入层之后，有了第一个具有N_HIDDEN个神经元并将relu作为激活函数的dense层
'''
model.add(Activation('relu'))
'''
第一个隐藏层之后是第二个，同样具有N_HIDDEN个神经元
'''

'''
改2
'''
model.add(Dropout(DROPOUT))

model.add(Dense(N_HIDDEN))
model.add(Activation('relu'))
'''
再之后是一个具有10个神经元的输出层
'''

'''
改3
'''
model.add(Dropout(DROPOUT))

model.add(Dense(NB_CLASSES)) 
model.add(Activation('softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=OPTIMIZER,
              metrics=['accuracy'])

history = model.fit(X_train, Y_train,
                    batch_size=BATCH_SIZE, epochs=NB_EPOCH,
                    verbose=VERBOSE, validation_split=VALIDATION_SPLIT)
score = model.evaluate(X_test, Y_test, verbose=VERBOSE)
print("\nTest score:", score[0])
print('Test AUC:', score[1])

280 train samples
120 test samples
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_4 (Dense)              (None, 128)               384       
_________________________________________________________________
activation_4 (Activation)    (None, 128)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_5 (Activation)    (None, 128)               0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 2)                 258       
_________________________________________________________________
activation_6 (Activation)    (None, 2)                 0         
=================================================================
Total params: 17,154
Trainable params: 17,154
Non-trainable params: 0
_________________________________________________________________
Train on 224 samples, validate on 56 samples
Epoch 1/100
224/224 [==============================] - 0s 1ms/step - loss: 0.6987 - acc: 0.4821 - val_loss: 0.6846 - val_acc: 0.6250
Epoch 2/100
224/224 [==============================] - 0s 26us/step - loss: 0.7001 - acc: 0.5000 - val_loss: 0.6853 - val_acc: 0.6250
Epoch 3/100
224/224 [==============================] - 0s 40us/step - loss: 0.6961 - acc: 0.5000 - val_loss: 0.6858 - val_acc: 0.6250
......
Epoch 98/100
224/224 [==============================] - 0s 32us/step - loss: 0.6894 - acc: 0.5357 - val_loss: 0.7058 - val_acc: 0.3750
Epoch 99/100
224/224 [==============================] - 0s 27us/step - loss: 0.6850 - acc: 0.5536 - val_loss: 0.7059 - val_acc: 0.3750
Epoch 100/100
224/224 [==============================] - 0s 25us/step - loss: 0.6880 - acc: 0.5491 - val_loss: 0.7057 - val_acc: 0.3750
120/120 [==============================] - 0s 42us/step

Test score: 0.6933380603790283
Test AUC: 0.4750000019868215

import matplotlib.pyplot as plt
def Keras_Plot(history):    
#     fig,axes=plt.subplots(1,2,figsize=(10,8))
#     plt.figure(10,8)
    # summarize history for accuracy
    plt.plot(history.history['acc']) # 训练集的acc
    plt.plot(history.history['val_acc']) # 验证集acc
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='lower right')
    plt.show()
    # summarize history for loss
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper right')
    plt.show()
Keras_Plot(history)

在这里插入图片描述

效果怎么差，是我keras搭错了吗？后续待定！待检验

4 参考

2 ↩︎
2 ↩︎
2 ↩︎

深度学习 | 吴恩达深度学习课程1 Week3

第一门课程-第三周Quiz+编程作业

1 Quiz

1.1 第1题

1.2 第2题

1.3 第3题

1.4 第4题

1.5 第5题

1.6 第6题

1.7 第7题

1.8 第8题

1.9 第9题

1.10 第10题

2 编程练习

2.1 需求

2.2 加载和查看数据集

2.3 逻辑回归实现分类

2.4 搭建神经网络

2.4.1 定义神经网络结构

2.4.2 初始化参数

2.4.3 循环

2.4.3.1 定义隐层+输出层的激活函数

2.4.3.2 前向传播

2.4.3.3 计算损失

2.4.3.4 反向传播

2.4.3.5 更新参数

2.5 整合上述的过程

2.6 预测

2.7 正式运行

2.8 继续探索-修改隐层个数

2.8.1 隐层个数为3

2.8.2 隐层个数为5

2.9 继续探索-修改学习率

2.9.1 学习率为0.1

2.9.2 学习率为0.01

2.10 继续探索-修改激活函数

2.10.1 修改为ReLU激活函数

2.10.2 修改为sigmoid函数

3 使用keras搭建上述神经网络

3.1 无隐层

3.2 加隐层

4 参考

猜你喜欢