DL-C1-week3-1(build a neural network with one hidden layer)多层感知机的简单实现

Build a neural network with one hidden layer

版权声明:本文为博主原创文章,未经博主允许不得转载。

上一篇讲的是如何实现一个Logistic Regression分类器,Neural network其实和LR是很相似的,可以把Neural Network看作是有多个LR对叠起来实现的.只要理解了Logistic Regression,就不难理解Neural Network.

本文的主要内容

  • 实现一个2分类的,单个隐藏层的神经网络模型
  • 神经元的非线性激活,使用tanh函数
  • 计算交叉熵损失
  • 实现正向和反向传播
  • 更新参数
  • 超参数的选择

1 - Packages

  • numpy
  • sklearn:scikit-learn
  • matplotlib
     
# Package imports
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import sklearn.linear_model

%matplotlib inline

np.random.seed(1) # set a seed so that the results are consistent

2.Helper function

# 生成训练数据
def create_dataset(m = 400, D = 2):
    """
    m : number of example
    D : number of features
    N : number of class
    X : data matrix each row is a single example
    Y : label vector
    """
    np.random.seed(1)
    N = int(m/2)  
    X = np.zeros((m,D))
    Y = np.zeros((m,1), dtype='uint8') # (0 for red, 1 for blue)
    a = 4 # maximum ray of the flower
    for j in range(2):
        ix = range(N*j,N*(j+1))
        t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta
        r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius
        X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
        Y[ix] = j

    # X:shape(m, D)
    # Y:shape(m, 1)
    return X, Y 
# 画出模型的分类决策边界
def plot_decision_boundary(model, X, y):
    # Set min and max values and give it some padding
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    h = 0.01
    # Generate a grid of points with distance h between them
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    # Predict the function value for the whole grid
    Z = model(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    # Plot the contour and training examples
    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
    plt.ylabel('x2')
    plt.xlabel('x1')
    plt.scatter(X[:,0], X[:,1], c=y[:,0], cmap=plt.cm.Spectral)

3. Create and overview dataset

  • create data function : create_dataset()
  • 随即生成一些,两个类别的训练数据

3.1 generate data

X, Y = create_dataset(400, 2)

print ('The shape of X is: ' + str(X.shape))
print ('The shape of Y is: ' + str(Y.shape))
print ('We have m = %d training examples!' % (X.shape[0]))
The shape of X is: (400, 2)
The shape of Y is: (400, 1)
We have m = 400 training examples!

3.2 visualize dataset

  • 目标:build a model 拟合这些数据
# Visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=Y[:,0], s=30, cmap=plt.cm.Spectral);

这里写图片描述

training dataset:

  • a numpy-array (matrix) X,features (x1, x2)
  • a numpy-array (vector) Y,labels (red:0, blue:1).

4. Simple Logistic Regression

  在实现全连接网络之前,先使用Logistic Regression 分类器来fit数据,看看LR在这个问题上的表现如何,通过sklearn来实现Logistic Regression非常简单,两行代码搞定.

4.1 train logistic regression classifier

# Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV()
clf.fit(X, Y.reshape(X.shape[0],))
LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,
           fit_intercept=True, intercept_scaling=1.0, max_iter=100,
           multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
           refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)

4.2 plot decision boundary

# Plot the decision boundary for logistic regression
plot_decision_boundary(lambda x: clf.predict(x), X, Y)
plt.title("Logistic Regression")

# Print accuracy
LR_predictions = clf.predict(X)
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y[:,0],LR_predictions) + np.dot(1-Y[:,0],1-LR_predictions))/float(Y.size)*100) +
       '% ' + "(percentage of correctly labelled datapoints)")
Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)

这里写图片描述

Output:

Accuracy 47%

分类的准确率只有47%,logistic regression对数据拟合不是很好.下面使用neural network来对数据进行分类. Let’s try this now!

5 - Neural Network model

a Neural Network with a single hidden layer.

Here is our model:
这里写图片描述
Mathematically:

For one example x ( i ) :

(1) z [ 1 ] ( i ) = W [ 1 ] x ( i ) + b [ 1 ] ( i )

(2) a [ 1 ] ( i ) = tanh ( z [ 1 ] ( i ) )

(3) z [ 2 ] ( i ) = W [ 2 ] a [ 1 ] ( i ) + b [ 2 ] ( i )

(4) y ^ ( i ) = a [ 2 ] ( i ) = σ ( z [ 2 ] ( i ) )

(5) y p r e d i c t i o n ( i ) = { 1 if  a [ 2 ] ( i ) > 0.5 0 otherwise 

计算 m 个样本时的 cost J as follows:

(6) J = 1 m i = 0 m ( y ( i ) log ( a [ 2 ] ( i ) ) + ( 1 y ( i ) ) log ( 1 a [ 2 ] ( i ) ) )

Reminder: Bulid neural network 的步骤:

  1. 确定网络的结构 ( # of input units, # of hidden units, etc).
  2. 初始化模型的参数
  3. Loop:

    • 正向传播
    • 计算loss
    • 反向传播,计算梯度
    • 更新参数 (gradient descent)

需要实现一些辅助函数来实现1-3,然后在将辅助函数集中到'nn_model()’中,最后训练模型,学习参数.在新的数据上进行预测.

5.1 - Defining the neural network structure

Exercise: Define three variables:

  • n_x: the size of the input layer
  • n_h: the size of the hidden layer (set this to 4)
  • n_y: the size of the output layer
# GRADED FUNCTION: layer_sizes

def layer_sizes(X, Y):
    """
    Arguments:
    X -- input dataset of shape ( number of examples,inputs/features)
    Y -- labels of shape (number of examples, output)
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """
    n_x = X.shape[1]
    n_h = 4 # hard code
    n_y = Y.shape[1]
     
    return (n_x, n_h, n_y)

test function

(n_x, n_h, n_y) = layer_sizes(X, Y)
print("The size of the input layer is: n_x = " + str(n_x))
print("The size of the hidden layer is: n_h = " + str(n_h))
print("The size of the output layer is: n_y = " + str(n_y))
The size of the input layer is: n_x = 2
The size of the hidden layer is: n_h = 4
The size of the output layer is: n_y = 1

5.2 - Initialize the model’s parameters

初始化参数: function initialize_parameters().

初始化方法:

  • 随即初始化.
    • Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
  • 全0初始化
    • Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with 0.
  • 使用不同的初始化方法,观察对模型的影响
# GRADED FUNCTION: initialize_parameters
# 提供两种初始化方案,增加标志参数,flag
def initialize_parameters(n_x, n_h, n_y, flag=0):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    flag = 0 ,random initial
    flag = 1 , zeros initial
    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """

    np.random.seed(2) #set up a seed although the initialization is random.
    if flag:
        W1 = np.zeros((n_h, n_x))
        b1 = np.zeros((n_h, 1))
        W2 = np.zeros((n_y, n_h))
        b2 = np.zeros((n_y, 1))
    else : 
        W1 = np.random.randn(n_h, n_x)*0.01
        b1 = np.zeros((n_h, 1))
        W2 = np.random.randn(n_y, n_h)*0.01
        b2 = np.zeros((n_y, 1))

    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    return parameters

test function:initialize_parameters()

  • 随即初始化
  • 0 初始化
parameters = initialize_parameters(n_x, n_h, n_y)
print("W1 : shape " + str(parameters["W1"].shape))
print("b1 : shape " + str(parameters["b1"].shape))
print("W2 : shape " + str(parameters["W2"].shape))
print("b2 : shape " + str(parameters["b2"].shape))
print('------------')
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
W1 : shape (4, 2)
b1 : shape (4, 1)
W2 : shape (1, 4)
b2 : shape (1, 1)
------------
W1 = [[-0.00416758 -0.00056267]
 [-0.02136196  0.01640271]
 [-0.01793436 -0.00841747]
 [ 0.00502881 -0.01245288]]
b1 = [[ 0.]
 [ 0.]
 [ 0.]
 [ 0.]]
W2 = [[-0.01057952 -0.00909008  0.00551454  0.02292208]]
b2 = [[ 0.]]
parameters_0 = initialize_parameters(n_x, n_h, n_y, 1)
print("W1 = " + str(parameters_0["W1"]))
print("b1 = " + str(parameters_0["b1"]))
print("W2 = " + str(parameters_0["W2"]))
print("b2 = " + str(parameters_0["b2"]))
W1 = [[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]
b1 = [[ 0.]
 [ 0.]
 [ 0.]
 [ 0.]]
W2 = [[ 0.  0.  0.  0.]]
b2 = [[ 0.]]

5.3 - The Loop

正向传播:forward_propagation()

用到的激活函数和要计算的值:

  • sigmoid(),需要实现
  • np.tanh(),numpy提供
  • Z [ 1 ] , A [ 1 ] , Z [ 2 ] and A [ 2 ] ( A [ 2 ] 包含对所有样本的预测输出).
  • 以上计算结果在,反向传播时需要用到

5.3.1 forward propagation

# Function: sigmoid()

def sigmoid(z):
    return 1./(1 + np.exp(-z))
# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (m, n_x)
    parameters -- python dictionary containing parameters  
    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    W1 = parameters['W1'] # (4,2)
    b1 = parameters['b1'] # (4,1)
    W2 = parameters['W2'] # (1,4)
    b2 = parameters['b2'] # (1,1)

    Z1 = np.dot(W1, X.T) + b1  #(n_h, X.shape[0])
    A1 = np.tanh(Z1)           
    Z2 = np.dot(W2, A1) + b2   #(n_y, X.shape[0])
    A2 = sigmoid(Z2)

    assert(A2.shape == (1, X.shape[0]))

    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}

    return A2, cache

用测试数据,测试forward_propagation()

# 测试数据
X_assess = np.random.randn(3, 2)

parameters = {'W1': np.array([[-0.00416758, -0.00056267],
                              [-0.02136196,  0.01640271],
                              [-0.01793436, -0.00841747],
                              [ 0.00502881, -0.01245288]]),
              'W2': np.array([[-0.01057952, -0.00909008,  0.00551454,  0.02292208]]),
              'b1': np.array([[ 0.],
                              [ 0.],
                              [ 0.],
                              [ 0.]]),
              'b2': np.array([[ 0.]])}
A2, cache = forward_propagation(X_assess, parameters)
print('Z1 ,shpae = '+str(cache['Z1'].shape))
print('A1 ,shape = '+str(cache['A1'].shape))
print('Z2 ,shape = '+str(cache['Z2'].shape))
print('A2 ,shape = '+str(cache['A2'].shape))
Z1 ,shpae = (4, 3)
A1 ,shape = (4, 3)
Z2 ,shape = (1, 3)
A2 ,shape = (1, 3)

5.3.2 cost function

A [ 2 ] (in the Python variable “A2“),矩阵 A [ 2 ] 中的每一个元素 a [ 2 ] ( i ) 为模型对样本的d的预测输出.

  • cost function as follows:

J = 1 m i = 0 m ( y ( i ) log ( a [ 2 ] ( i ) ) + ( 1 y ( i ) ) log ( 1 a [ 2 ] ( i ) ) )

  • compute_cost(): 计算cost J .

  • 交叉熵计算numpy: i = 0 m y ( i ) log ( a [ 2 ] ( i ) ) :

logprobs = np.multiply(np.log(A2),Y)
cost = - np.sum(logprobs)                # no need to use a for loop!
  • 也可以np.dot(A2,Y)
# GRADED FUNCTION: compute_cost

def compute_cost(A2, Y, parameters):
    """
    Computes the cross-entropy cost given in equation (13)

    Arguments:
    A2 -- The sigmoid output of shape (1, number of examples)
    Y -- "true" labels vector of shape (number of examples,1)
    parameters -- python dictionary containing your parameters W1, b1, W2 and b2

    Returns:
    cost -- cross-entropy cost given equation (13)
    """

    m = Y.shape[0] # number of example

    # Compute the cross-entropy cost
    # logprobs = np.multiply(np.log(A2), Y.T) + np.multiply((1-Y.T),np.log(1-A2))
    # cost = -1*np.sum(logprobs)/m
    cost = -1*(np.dot(np.log(A2), Y) + np.dot(np.log(1-A2), (1-Y)))/m
    cost = np.squeeze(cost)     
    # makes sure cost is the dimension we expect. 
    # E.g., turns [[17]] into 17 
    #assert(isinstance(cost, float))

    return cost

test cost function

  • 测试数据:
  • A2, Y_assess, parameters
Y_assess = np.random.randn(3, 1)
parameters = {'W1': np.array([[-0.00416758, -0.00056267],
                              [-0.02136196,  0.01640271],
                              [-0.01793436, -0.00841747],
                              [ 0.00502881, -0.01245288]]),
              'W2': np.array([[-0.01057952, -0.00909008,  0.00551454,  0.02292208]]),
              'b1': np.array([[ 0.],[ 0.],[ 0.],[ 0.]]),
              'b2': np.array([[ 0.]])}
A2 = (np.array([[ 0.5002307 ,  0.49985831,  0.50023963]]))
print("cost = " + str(compute_cost(A2, Y_assess, parameters)))
cost = 0.6934522895013014

5.3.3 backward propagation.

反向传播: backward_propagation()
实现反向传播的六个方程

  • 上标 ( i ) ,表示第 i 个样本
    (1) J z 2 ( i ) = ( a [ 2 ] ( i ) y ( i ) )

(2) J W 2 = 1 m i = 1 m J z 2 ( i ) a [ 1 ] ( i ) T

(3) J b 2 = 1 m i = 1 m J z 2 ( i )

  • : 两向量对应元素相乘,返回等大的向量
  • t a n h ( ) : t a n h ( z ) = a , t a n h ( z ) = 1 a 2

(4) J z 1 ( i ) = W 2 T J z 2 ( i ) ( 1 a [ 1 ] ( i ) 2 )

(5) J W 1 = 1 m i = 1 m J z 1 ( i ) X T

(6) J i b 1 = 1 m i = 1 m J z 1 ( i )


  • 下面是矩阵乘法版本的六个方程

(1) d Z [ 2 ] = A [ 2 ] Y

(2) d W [ 2 ] = 1 m d Z [ 2 ] A [ 1 ] T

(3) d b [ 2 ] = 1 m i = 1 m ( d Z [ 2 ] )

(4) d Z [ 1 ] = W [ 2 ] T d Z [ 2 ] ( 1 A [ 1 ] 2 )

(5) d W [ 1 ] = 1 m d Z [ 1 ] X

(6) d b [ 1 ] = 1 m i = 1 m d Z [ 1 ]

  • The notation you will use is common in deep learning coding:
    • dW1 = J W 1
    • db1 = J b 1
    • dW2 = J W 2
    • db2 = J b 2
# GRADED FUNCTION: backward_propagation

def backward_propagation(parameters, cache, X, Y):
    """
    Arguments:
    parameters -- python dictionary containing our parameters 
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
    X -- input data of shape (2, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    # X.shape = (400, 2)
    # Y.shape = (400, 1)
    m = X.shape[0]
    W1 = parameters['W1'] # W1.shape = (4, n_x)
    W2 = parameters['W2'] # W2.shape = (n_y, 4)

    A1 = cache['A1']      # A1.shape = (4, m)
    A2 = cache['A2']      # A2.shape = (n_y, m)
    # Backward propagation: calculate dW1, db1, dW2, db2. 
    dZ2 = A2 - Y.T                                              # dZ2.shape = (n_y, m)
    dW2 = np.dot(dZ2, A1.T)/m                                   # dW2.shape = (n_y, n_h)
    db2 = np.sum(dZ2, axis=1, keepdims=True)/m                  # db2.shape = (n_y,)
    dZ1 = np.multiply(np.dot(W2.T, dZ2), (1 - np.power(A1, 2))) # dZ1.shape = (4, m)
    dW1 = np.dot(dZ1, X)                                        # dW1.shape = (n_h, n_x)
    db1 = np.sum(dZ1, axis=1, keepdims=True)/m                  # db1.shape = (n_h, 1)

    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}

    return grads

test backward_propagation function
测试数据:

  • parameters(同上)
  • cache
  • X_assess
  • Y_assess
X_assess = np.random.randn(3, 2)
Y_assess = np.random.randn(3, 1)
cache = {'A1': np.array([[-0.00616578,  0.0020626 ,  0.00349619],
                         [-0.05225116,  0.02725659, -0.02646251],
                         [-0.02009721,  0.0036869 ,  0.02883756],
                         [ 0.02152675, -0.01385234,  0.02599885]]),
         'A2': np.array([[ 0.5002307 ,  0.49985831,  0.50023963]]),
         'Z1': np.array([[-0.00616586,  0.0020626 ,  0.0034962 ],
                         [-0.05229879,  0.02726335, -0.02646869],
                         [-0.02009991,  0.00368692,  0.02884556],
                         [ 0.02153007, -0.01385322,  0.02600471]]),
         'Z2': np.array([[ 0.00092281, -0.00056678,  0.00095853]])}
grads = backward_propagation(parameters, cache, X_assess, Y_assess)
print ("dW1.shape = "+ str(grads["dW1"].shape))
print ("db1.shape = "+ str(grads["db1"].shape))
print ("dW2.shape = "+ str(grads["dW2"].shape))
print ("db2.shape = "+ str(grads["db2"].shape))
dW1.shape = (4, 2)
db1.shape = (4, 1)
dW2.shape = (1, 4)
db2.shape = (1, 1)

5.3.4 update parameters

Question:use (dW1, db1, dW2, db2) update (W1, b1, W2, b2).

梯度下降:

  • θ = θ α J θ
  • α is the learning rate and θ 超参数
    α 选择很重要,好的参数,可以让模型更快的学习到最优的权重
    这里写图片描述 这里写图片描述
# GRADED FUNCTION: update_parameters

def update_parameters(parameters, grads, lr = 1.2):
    # lr : learning_rate
    """
    Arguments:
    parameters -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients 
    Returns:
    parameters -- python dictionary containing your updated parameters 
    """
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']

    W1 -= lr*dW1
    b1 -= lr*db1
    W2 -= lr*dW2
    b2 -= lr*db2

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters

5.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()

# GRADED FUNCTION: nn_model

def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=500, flag=0, lr=1.2):
    """
    Arguments:
    X -- dataset of shape (2, number of examples)
    Y -- labels of shape (1, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    print_cost -- if True, print the cost every 100 iterations
    flag -- parameters初始化方式,0:随即,1:全0
    lr -- learning rate
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]

    parameters = initialize_parameters(n_x, n_h, n_y, flag)
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    costs = []
    # Loop (gradient descent)
    for i in range(0, num_iterations):
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        A2, cache = forward_propagation(X, parameters)
        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        cost = compute_cost(A2, Y, parameters)
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backward_propagation(parameters, cache, X, Y)
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        parameters = update_parameters(parameters, grads, lr)
        costs.append(cost)
        # Print the cost every 1000 iterations
        if print_cost and i % print_cost == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

    return parameters,costs

4.5 Predictions

Predictions:

  • 用forward propagation来预测结果

  • predictions = y p r e d i c t i o n = 1 {activation > 0.5} = { 1 if   a c t i v a t i o n > 0.5   0 otherwise

# GRADED FUNCTION: predict

def predict(parameters, X):
    """
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)

    Returns
    predictions -- vector of predictions of our model (red: 0 / blue: 1)
    """
    A2, cache = forward_propagation(X, parameters)
    predictions = (A2 > 0.5)*1.

    return predictions

6.Training model(全0初始化)

# Build a model with a n_h-dimensional hidden layer
parameters,cost = nn_model(X, Y, n_h = 4, num_iterations=20000, print_cost=1000,lr=1.0,flag=1)
# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693147
Cost after iteration 1000: 0.693147
Cost after iteration 2000: 0.693147
Cost after iteration 3000: 0.693147
Cost after iteration 4000: 0.693147
Cost after iteration 5000: 0.693147
Cost after iteration 6000: 0.693147
Cost after iteration 7000: 0.693147
Cost after iteration 8000: 0.693147
Cost after iteration 9000: 0.693147
Cost after iteration 10000: 0.693147
Cost after iteration 11000: 0.693147
Cost after iteration 12000: 0.693147
Cost after iteration 13000: 0.693147
Cost after iteration 14000: 0.693147
Cost after iteration 15000: 0.693147
Cost after iteration 16000: 0.693147
Cost after iteration 17000: 0.693147
Cost after iteration 18000: 0.693147
Cost after iteration 19000: 0.693147
Text(0.5,1,'Decision Boundary for hidden layer size 4')

这里写图片描述

plt.figure(figsize=(14,6))
plt.title('Lost curve')
plt.grid()
plt.plot(cost)

这里写图片描述

全0初始化模型,梯度不会下降

6.1Training model(随即初始化)

parameters,cost = nn_model(X, Y, n_h = 4, num_iterations=20000, print_cost=1000,lr=1.0)
# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.513391
Cost after iteration 2000: 0.517645
Cost after iteration 3000: 0.516130
Cost after iteration 4000: 0.515024
Cost after iteration 5000: 0.514154
Cost after iteration 6000: 0.513449
Cost after iteration 7000: 0.512871
Cost after iteration 8000: 0.512389
Cost after iteration 9000: 0.511984
Cost after iteration 10000: 0.511640
Cost after iteration 11000: 0.511343
Cost after iteration 12000: 0.511086
Cost after iteration 13000: 0.510861
Cost after iteration 14000: 0.510661
Cost after iteration 15000: 0.510484
Cost after iteration 16000: 0.510325
Cost after iteration 17000: 0.510182
Cost after iteration 18000: 0.510052
Cost after iteration 19000: 0.509933

Text(0.5,1,'Decision Boundary for hidden layer size 4')

plt.figure(figsize=(14,6))
plt.title('Lost curve')
plt.grid()
plt.plot(cost)

这里写图片描述

# Print accuracy
predictions = predict(parameters, X)
accuracy = float((np.dot(predictions,Y) + np.dot(1-predictions,1-Y))/float(Y.shape[0])*100)
print ('Accuracy: %d '%accuracy+'%')
Accuracy: 68 %

模型的准确率似乎不是很高,有三个参数可以进行调整:

  • 增加训练的次数
  • 增加隐藏层神经元的数量
  • 寻找合适的学习效率
# 49种组合
n_hs = [3, 5, 7, 9, 10, 20 , 30]
lrs  = [1.5, 2.0, 2.5, 3.0, 3.3, 3.5, 4.0]
params = []
for n_h in n_hs:
    for lr in lrs:
        params.append((n_h, lr))

训练模型,并画出分类决策边界

# This may take about 2 minutes to run
plt.figure(figsize=(70,70))
Costs = []
for i, (n_h,lr) in enumerate(params):
    plt.subplot(7, 7, i+1)
    plt.title('n_b : %d,lr : %f' % (n_h,lr))
    parameters,costs = nn_model(X, Y, n_h, num_iterations = 10000, lr=lr,print_cost=0)
    Costs.append(costs)
    plot_decision_boundary(lambda x: predict(parameters, x), X, Y)
    predictions = predict(parameters, X)
    accuracy = float((np.dot(predictions,Y) + np.dot(1-predictions,1-Y))/float(Y.shape[0])*100)
    print ("Accuracy for {} hidden units, learning rate: {},{}%".format(n_h,lr, accuracy))
Accuracy for 3 hidden units, learning rate: 1.5,68.5%
Accuracy for 3 hidden units, learning rate: 2.0,67.0%
Accuracy for 3 hidden units, learning rate: 2.5,67.0%
Accuracy for 3 hidden units, learning rate: 3.0,67.0%
Accuracy for 3 hidden units, learning rate: 3.3,68.25%
Accuracy for 3 hidden units, learning rate: 3.5,67.25%
Accuracy for 3 hidden units, learning rate: 4.0,78.0%
Accuracy for 5 hidden units, learning rate: 1.5,67.25%
Accuracy for 5 hidden units, learning rate: 2.0,74.0%
Accuracy for 5 hidden units, learning rate: 2.5,68.75%
Accuracy for 5 hidden units, learning rate: 3.0,92.25%
Accuracy for 5 hidden units, learning rate: 3.3,91.75%
Accuracy for 5 hidden units, learning rate: 3.5,92.5%
Accuracy for 5 hidden units, learning rate: 4.0,92.5%
Accuracy for 7 hidden units, learning rate: 1.5,92.0%
Accuracy for 7 hidden units, learning rate: 2.0,92.75%
Accuracy for 7 hidden units, learning rate: 2.5,92.75%
Accuracy for 7 hidden units, learning rate: 3.0,92.75%
Accuracy for 7 hidden units, learning rate: 3.3,92.5%
Accuracy for 7 hidden units, learning rate: 3.5,92.25%
Accuracy for 7 hidden units, learning rate: 4.0,92.0%
Accuracy for 9 hidden units, learning rate: 1.5,92.0%
Accuracy for 9 hidden units, learning rate: 2.0,92.75%
Accuracy for 9 hidden units, learning rate: 2.5,92.75%
Accuracy for 9 hidden units, learning rate: 3.0,92.75%
Accuracy for 9 hidden units, learning rate: 3.3,92.5%
Accuracy for 9 hidden units, learning rate: 3.5,92.5%
Accuracy for 9 hidden units, learning rate: 4.0,92.75%
Accuracy for 10 hidden units, learning rate: 1.5,92.75%
Accuracy for 10 hidden units, learning rate: 2.0,92.75%
Accuracy for 10 hidden units, learning rate: 2.5,92.75%
Accuracy for 10 hidden units, learning rate: 3.0,92.75%
Accuracy for 10 hidden units, learning rate: 3.3,92.75%
Accuracy for 10 hidden units, learning rate: 3.5,92.75%
Accuracy for 10 hidden units, learning rate: 4.0,93.0%
Accuracy for 20 hidden units, learning rate: 1.5,93.75%
Accuracy for 20 hidden units, learning rate: 2.0,92.75%
Accuracy for 20 hidden units, learning rate: 2.5,93.0%
Accuracy for 20 hidden units, learning rate: 3.0,92.75%
Accuracy for 20 hidden units, learning rate: 3.3,91.0%
Accuracy for 20 hidden units, learning rate: 3.5,91.25%
Accuracy for 20 hidden units, learning rate: 4.0,89.5%
Accuracy for 30 hidden units, learning rate: 1.5,92.75%
Accuracy for 30 hidden units, learning rate: 2.0,93.75%
Accuracy for 30 hidden units, learning rate: 2.5,91.75%
Accuracy for 30 hidden units, learning rate: 3.0,91.5%
Accuracy for 30 hidden units, learning rate: 3.3,92.75%
Accuracy for 30 hidden units, learning rate: 3.5,92.0%
Accuracy for 30 hidden units, learning rate: 4.0,92.5%

下图,纵轴为n_hs,横轴为lrs:

  • n_hs = [3, 5, 7, 9, 10, 20, 30]
  • lrs = [1.5, 2.0, 2.5. 3.0, 3.3, 3.5, 4.0]
    这里写图片描述

cost curve

plt.figure(figsize=(140,140))
for i, (n_h,lr) in enumerate(params):
    plt.subplot(7, 7, i+1)
    plt.grid()
    plt.ylim(0,1)
    plt.title('n_b : %d,lr : %f' % (n_h,lr))
    plt.plot(Costs[i],c='green')

这里写图片描述

上图,为49种组合的,cost curve,纵轴为n_hs,横轴为lrs:

  • n_hs = [3, 5, 7, 9, 10, 20, 30]
  • lrs = [1.5, 2.0, 2.5. 3.0, 3.3, 3.5, 4.0]
plt.figure(figsize=(14,56))
for i,n_h,n in zip(np.arange(0,49,7), n_hs, range(7)):
    plt.subplot(7,1, n+1)
    plt.ylim(0,0.7)
    plt.grid()
    plt.title('Hidden Layer of size %d' % n_h)
    for lr,cost in zip(lrs,Costs[i: i+7]):
        plt.plot(cost, label='lr=%.2f'%lr)
    plt.legend()

这里写图片描述

通过对上面数据的观察找到几组不错的参数(n_h, lr)

  • (7, 2.5)
  • (9, 2.5)
  • (10, 3.5)
  • (20, 2.0)

7.Find best combination

对选出的四组参数对应的cost curve进行可视化

plt.figure(figsize=(14,8))
plt.grid()
plt.ylim(0.1,0.25)
plt.title('Find best combination')
for n_h, lr in [(7,2.5),(9,2.5),(10,3.5),(20,2.0)]:
    i = n_hs.index(n_h)
    j = lrs.index(lr)
    plt.plot(Costs[i*7:(i+1)*7][j], label='n_h:%d,lr=%.2f'%(n_h,lr))
plt.legend()

这里写图片描述

黄色的曲线表现不错,在迭代5500次之后,很平滑,下降的也比较快.下面继续增加迭代的次数,看看,黄色的曲线是否还会出现震荡.

plt.figure(figsize=(14,8))
plt.grid()
plt.ylim(0.1,0.25)
for i, (n_h,lr) in enumerate([(7,2.0),(10,4.0),(20,1.5),(30,2.0)]):
    parameters,costs = nn_model(X, Y, n_h, num_iterations = 30000, lr=lr,print_cost=0)
    plt.plot(costs, label='n_h=%d,lr=%.2f'%(n_h,lr))
plt.legend()

这里写图片描述
  

  增加了训练次数后,除了红色的曲线依然有震荡,且周期也在变大.其他的曲线15000轮的迭代后基本趋于稳定,所以(9, 2.5)可能是最佳组合,对数据拟合比较好,但是可能出现过拟合下降.不过这次的内容先不涉及过拟合的问题.

  • n_h = 9
  • lr = 2.5
  • num_iteration:15000

猜你喜欢

转载自blog.csdn.net/u014281392/article/details/80317449