Build a neural network with one hidden layer
版权声明:本文为博主原创文章,未经博主允许不得转载。
上一篇讲的是如何实现一个Logistic Regression分类器,Neural network其实和LR是很相似的,可以把Neural Network看作是有多个LR对叠起来实现的.只要理解了Logistic Regression,就不难理解Neural Network.
本文的主要内容
- 实现一个2分类的,单个隐藏层的神经网络模型
- 神经元的非线性激活,使用tanh函数
- 计算交叉熵损失
- 实现正向和反向传播
- 更新参数
- 超参数的选择
1 - Packages
- numpy
- sklearn:scikit-learn
- matplotlib
# Package imports
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import sklearn.linear_model
%matplotlib inline
np.random.seed(1) # set a seed so that the results are consistent
2.Helper function
# 生成训练数据
def create_dataset(m = 400, D = 2):
"""
m : number of example
D : number of features
N : number of class
X : data matrix each row is a single example
Y : label vector
"""
np.random.seed(1)
N = int(m/2)
X = np.zeros((m,D))
Y = np.zeros((m,1), dtype='uint8') # (0 for red, 1 for blue)
a = 4 # maximum ray of the flower
for j in range(2):
ix = range(N*j,N*(j+1))
t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta
r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius
X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
Y[ix] = j
# X:shape(m, D)
# Y:shape(m, 1)
return X, Y
# 画出模型的分类决策边界
def plot_decision_boundary(model, X, y):
# Set min and max values and give it some padding
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = 0.01
# Generate a grid of points with distance h between them
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# Predict the function value for the whole grid
Z = model(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Plot the contour and training examples
plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
plt.ylabel('x2')
plt.xlabel('x1')
plt.scatter(X[:,0], X[:,1], c=y[:,0], cmap=plt.cm.Spectral)
3. Create and overview dataset
- create data function : create_dataset()
- 随即生成一些,两个类别的训练数据
3.1 generate data
X, Y = create_dataset(400, 2)
print ('The shape of X is: ' + str(X.shape))
print ('The shape of Y is: ' + str(Y.shape))
print ('We have m = %d training examples!' % (X.shape[0]))
The shape of X is: (400, 2)
The shape of Y is: (400, 1)
We have m = 400 training examples!
3.2 visualize dataset
- 目标:build a model 拟合这些数据
# Visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=Y[:,0], s=30, cmap=plt.cm.Spectral);
training dataset:
- a numpy-array (matrix) X,features (x1, x2)
- a numpy-array (vector) Y,labels (red:0, blue:1).
4. Simple Logistic Regression
在实现全连接网络之前,先使用Logistic Regression 分类器来fit数据,看看LR在这个问题上的表现如何,通过sklearn来实现Logistic Regression非常简单,两行代码搞定.
4.1 train logistic regression classifier
# Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV()
clf.fit(X, Y.reshape(X.shape[0],))
LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,
fit_intercept=True, intercept_scaling=1.0, max_iter=100,
multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)
4.2 plot decision boundary
# Plot the decision boundary for logistic regression
plot_decision_boundary(lambda x: clf.predict(x), X, Y)
plt.title("Logistic Regression")
# Print accuracy
LR_predictions = clf.predict(X)
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y[:,0],LR_predictions) + np.dot(1-Y[:,0],1-LR_predictions))/float(Y.size)*100) +
'% ' + "(percentage of correctly labelled datapoints)")
Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)
Output:
Accuracy | 47% |
分类的准确率只有47%,logistic regression对数据拟合不是很好.下面使用neural network来对数据进行分类. Let’s try this now!
5 - Neural Network model
a Neural Network with a single hidden layer.
Here is our model:
Mathematically:
For one example
:
计算 m 个样本时的 cost
as follows:
Reminder: Bulid neural network 的步骤:
- 确定网络的结构 ( # of input units, # of hidden units, etc).
- 初始化模型的参数
Loop:
- 正向传播
- 计算loss
- 反向传播,计算梯度
- 更新参数 (gradient descent)
需要实现一些辅助函数来实现1-3,然后在将辅助函数集中到'nn_model()’中,最后训练模型,学习参数.在新的数据上进行预测.
5.1 - Defining the neural network structure
Exercise: Define three variables:
- n_x: the size of the input layer
- n_h: the size of the hidden layer (set this to 4)
- n_y: the size of the output layer
# GRADED FUNCTION: layer_sizes
def layer_sizes(X, Y):
"""
Arguments:
X -- input dataset of shape ( number of examples,inputs/features)
Y -- labels of shape (number of examples, output)
Returns:
n_x -- the size of the input layer
n_h -- the size of the hidden layer
n_y -- the size of the output layer
"""
n_x = X.shape[1]
n_h = 4 # hard code
n_y = Y.shape[1]
return (n_x, n_h, n_y)
test function
(n_x, n_h, n_y) = layer_sizes(X, Y)
print("The size of the input layer is: n_x = " + str(n_x))
print("The size of the hidden layer is: n_h = " + str(n_h))
print("The size of the output layer is: n_y = " + str(n_y))
The size of the input layer is: n_x = 2
The size of the hidden layer is: n_h = 4
The size of the output layer is: n_y = 1
5.2 - Initialize the model’s parameters
初始化参数: function initialize_parameters()
.
初始化方法:
- 随即初始化.
- Use:
np.random.randn(a,b) * 0.01
to randomly initialize a matrix of shape (a,b).
- Use:
- 全0初始化
- Use:
np.zeros((a,b))
to initialize a matrix of shape (a,b) with 0.
- Use:
- 使用不同的初始化方法,观察对模型的影响
# GRADED FUNCTION: initialize_parameters
# 提供两种初始化方案,增加标志参数,flag
def initialize_parameters(n_x, n_h, n_y, flag=0):
"""
Argument:
n_x -- size of the input layer
n_h -- size of the hidden layer
n_y -- size of the output layer
flag = 0 ,random initial
flag = 1 , zeros initial
Returns:
params -- python dictionary containing your parameters:
W1 -- weight matrix of shape (n_h, n_x)
b1 -- bias vector of shape (n_h, 1)
W2 -- weight matrix of shape (n_y, n_h)
b2 -- bias vector of shape (n_y, 1)
"""
np.random.seed(2) #set up a seed although the initialization is random.
if flag:
W1 = np.zeros((n_h, n_x))
b1 = np.zeros((n_h, 1))
W2 = np.zeros((n_y, n_h))
b2 = np.zeros((n_y, 1))
else :
W1 = np.random.randn(n_h, n_x)*0.01
b1 = np.zeros((n_h, 1))
W2 = np.random.randn(n_y, n_h)*0.01
b2 = np.zeros((n_y, 1))
assert (W1.shape == (n_h, n_x))
assert (b1.shape == (n_h, 1))
assert (W2.shape == (n_y, n_h))
assert (b2.shape == (n_y, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
test function:initialize_parameters()
- 随即初始化
- 0 初始化
parameters = initialize_parameters(n_x, n_h, n_y)
print("W1 : shape " + str(parameters["W1"].shape))
print("b1 : shape " + str(parameters["b1"].shape))
print("W2 : shape " + str(parameters["W2"].shape))
print("b2 : shape " + str(parameters["b2"].shape))
print('------------')
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
W1 : shape (4, 2)
b1 : shape (4, 1)
W2 : shape (1, 4)
b2 : shape (1, 1)
------------
W1 = [[-0.00416758 -0.00056267]
[-0.02136196 0.01640271]
[-0.01793436 -0.00841747]
[ 0.00502881 -0.01245288]]
b1 = [[ 0.]
[ 0.]
[ 0.]
[ 0.]]
W2 = [[-0.01057952 -0.00909008 0.00551454 0.02292208]]
b2 = [[ 0.]]
parameters_0 = initialize_parameters(n_x, n_h, n_y, 1)
print("W1 = " + str(parameters_0["W1"]))
print("b1 = " + str(parameters_0["b1"]))
print("W2 = " + str(parameters_0["W2"]))
print("b2 = " + str(parameters_0["b2"]))
W1 = [[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]
b1 = [[ 0.]
[ 0.]
[ 0.]
[ 0.]]
W2 = [[ 0. 0. 0. 0.]]
b2 = [[ 0.]]
5.3 - The Loop
正向传播:forward_propagation()
用到的激活函数和要计算的值:
- sigmoid(),需要实现
- np.tanh(),numpy提供
- and ( 包含对所有样本的预测输出).
- 以上计算结果在,反向传播时需要用到
5.3.1 forward propagation
# Function: sigmoid()
def sigmoid(z):
return 1./(1 + np.exp(-z))
# GRADED FUNCTION: forward_propagation
def forward_propagation(X, parameters):
"""
Argument:
X -- input data of size (m, n_x)
parameters -- python dictionary containing parameters
Returns:
A2 -- The sigmoid output of the second activation
cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
"""
W1 = parameters['W1'] # (4,2)
b1 = parameters['b1'] # (4,1)
W2 = parameters['W2'] # (1,4)
b2 = parameters['b2'] # (1,1)
Z1 = np.dot(W1, X.T) + b1 #(n_h, X.shape[0])
A1 = np.tanh(Z1)
Z2 = np.dot(W2, A1) + b2 #(n_y, X.shape[0])
A2 = sigmoid(Z2)
assert(A2.shape == (1, X.shape[0]))
cache = {"Z1": Z1,
"A1": A1,
"Z2": Z2,
"A2": A2}
return A2, cache
用测试数据,测试forward_propagation()
# 测试数据
X_assess = np.random.randn(3, 2)
parameters = {'W1': np.array([[-0.00416758, -0.00056267],
[-0.02136196, 0.01640271],
[-0.01793436, -0.00841747],
[ 0.00502881, -0.01245288]]),
'W2': np.array([[-0.01057952, -0.00909008, 0.00551454, 0.02292208]]),
'b1': np.array([[ 0.],
[ 0.],
[ 0.],
[ 0.]]),
'b2': np.array([[ 0.]])}
A2, cache = forward_propagation(X_assess, parameters)
print('Z1 ,shpae = '+str(cache['Z1'].shape))
print('A1 ,shape = '+str(cache['A1'].shape))
print('Z2 ,shape = '+str(cache['Z2'].shape))
print('A2 ,shape = '+str(cache['A2'].shape))
Z1 ,shpae = (4, 3)
A1 ,shape = (4, 3)
Z2 ,shape = (1, 3)
A2 ,shape = (1, 3)
5.3.2 cost function
(in the Python variable “A2
“),矩阵
中的每一个元素
为模型对样本的d的预测输出.
- cost function as follows:
compute_cost(): 计算cost .
交叉熵计算numpy: :
logprobs = np.multiply(np.log(A2),Y)
cost = - np.sum(logprobs) # no need to use a for loop!
- 也可以np.dot(A2,Y)
# GRADED FUNCTION: compute_cost
def compute_cost(A2, Y, parameters):
"""
Computes the cross-entropy cost given in equation (13)
Arguments:
A2 -- The sigmoid output of shape (1, number of examples)
Y -- "true" labels vector of shape (number of examples,1)
parameters -- python dictionary containing your parameters W1, b1, W2 and b2
Returns:
cost -- cross-entropy cost given equation (13)
"""
m = Y.shape[0] # number of example
# Compute the cross-entropy cost
# logprobs = np.multiply(np.log(A2), Y.T) + np.multiply((1-Y.T),np.log(1-A2))
# cost = -1*np.sum(logprobs)/m
cost = -1*(np.dot(np.log(A2), Y) + np.dot(np.log(1-A2), (1-Y)))/m
cost = np.squeeze(cost)
# makes sure cost is the dimension we expect.
# E.g., turns [[17]] into 17
#assert(isinstance(cost, float))
return cost
test cost function
- 测试数据:
- A2, Y_assess, parameters
Y_assess = np.random.randn(3, 1)
parameters = {'W1': np.array([[-0.00416758, -0.00056267],
[-0.02136196, 0.01640271],
[-0.01793436, -0.00841747],
[ 0.00502881, -0.01245288]]),
'W2': np.array([[-0.01057952, -0.00909008, 0.00551454, 0.02292208]]),
'b1': np.array([[ 0.],[ 0.],[ 0.],[ 0.]]),
'b2': np.array([[ 0.]])}
A2 = (np.array([[ 0.5002307 , 0.49985831, 0.50023963]]))
print("cost = " + str(compute_cost(A2, Y_assess, parameters)))
cost = 0.6934522895013014
5.3.3 backward propagation.
反向传播: backward_propagation()
实现反向传播的六个方程
- 上标
,表示第
个样本
- : 两向量对应元素相乘,返回等大的向量
- :
- 下面是矩阵乘法版本的六个方程
- The notation you will use is common in deep learning coding:
- dW1 =
- db1 =
- dW2 =
- db2 =
# GRADED FUNCTION: backward_propagation
def backward_propagation(parameters, cache, X, Y):
"""
Arguments:
parameters -- python dictionary containing our parameters
cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
X -- input data of shape (2, number of examples)
Y -- "true" labels vector of shape (1, number of examples)
Returns:
grads -- python dictionary containing your gradients with respect to different parameters
"""
# X.shape = (400, 2)
# Y.shape = (400, 1)
m = X.shape[0]
W1 = parameters['W1'] # W1.shape = (4, n_x)
W2 = parameters['W2'] # W2.shape = (n_y, 4)
A1 = cache['A1'] # A1.shape = (4, m)
A2 = cache['A2'] # A2.shape = (n_y, m)
# Backward propagation: calculate dW1, db1, dW2, db2.
dZ2 = A2 - Y.T # dZ2.shape = (n_y, m)
dW2 = np.dot(dZ2, A1.T)/m # dW2.shape = (n_y, n_h)
db2 = np.sum(dZ2, axis=1, keepdims=True)/m # db2.shape = (n_y,)
dZ1 = np.multiply(np.dot(W2.T, dZ2), (1 - np.power(A1, 2))) # dZ1.shape = (4, m)
dW1 = np.dot(dZ1, X) # dW1.shape = (n_h, n_x)
db1 = np.sum(dZ1, axis=1, keepdims=True)/m # db1.shape = (n_h, 1)
grads = {"dW1": dW1,
"db1": db1,
"dW2": dW2,
"db2": db2}
return grads
test backward_propagation function
测试数据:
- parameters(同上)
- cache
- X_assess
- Y_assess
X_assess = np.random.randn(3, 2)
Y_assess = np.random.randn(3, 1)
cache = {'A1': np.array([[-0.00616578, 0.0020626 , 0.00349619],
[-0.05225116, 0.02725659, -0.02646251],
[-0.02009721, 0.0036869 , 0.02883756],
[ 0.02152675, -0.01385234, 0.02599885]]),
'A2': np.array([[ 0.5002307 , 0.49985831, 0.50023963]]),
'Z1': np.array([[-0.00616586, 0.0020626 , 0.0034962 ],
[-0.05229879, 0.02726335, -0.02646869],
[-0.02009991, 0.00368692, 0.02884556],
[ 0.02153007, -0.01385322, 0.02600471]]),
'Z2': np.array([[ 0.00092281, -0.00056678, 0.00095853]])}
grads = backward_propagation(parameters, cache, X_assess, Y_assess)
print ("dW1.shape = "+ str(grads["dW1"].shape))
print ("db1.shape = "+ str(grads["db1"].shape))
print ("dW2.shape = "+ str(grads["dW2"].shape))
print ("db2.shape = "+ str(grads["db2"].shape))
dW1.shape = (4, 2)
db1.shape = (4, 1)
dW2.shape = (1, 4)
db2.shape = (1, 1)
5.3.4 update parameters
Question:use (dW1, db1, dW2, db2) update (W1, b1, W2, b2).
梯度下降:
-
is the learning rate and
超参数
选择很重要,好的参数,可以让模型更快的学习到最优的权重
# GRADED FUNCTION: update_parameters
def update_parameters(parameters, grads, lr = 1.2):
# lr : learning_rate
"""
Arguments:
parameters -- python dictionary containing your parameters
grads -- python dictionary containing your gradients
Returns:
parameters -- python dictionary containing your updated parameters
"""
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
dW1 = grads['dW1']
db1 = grads['db1']
dW2 = grads['dW2']
db2 = grads['db2']
W1 -= lr*dW1
b1 -= lr*db1
W2 -= lr*dW2
b2 -= lr*db2
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
5.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()
# GRADED FUNCTION: nn_model
def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=500, flag=0, lr=1.2):
"""
Arguments:
X -- dataset of shape (2, number of examples)
Y -- labels of shape (1, number of examples)
n_h -- size of the hidden layer
num_iterations -- Number of iterations in gradient descent loop
print_cost -- if True, print the cost every 100 iterations
flag -- parameters初始化方式,0:随即,1:全0
lr -- learning rate
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
np.random.seed(3)
n_x = layer_sizes(X, Y)[0]
n_y = layer_sizes(X, Y)[2]
parameters = initialize_parameters(n_x, n_h, n_y, flag)
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
costs = []
# Loop (gradient descent)
for i in range(0, num_iterations):
# Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
A2, cache = forward_propagation(X, parameters)
# Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
cost = compute_cost(A2, Y, parameters)
# Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
grads = backward_propagation(parameters, cache, X, Y)
# Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
parameters = update_parameters(parameters, grads, lr)
costs.append(cost)
# Print the cost every 1000 iterations
if print_cost and i % print_cost == 0:
print ("Cost after iteration %i: %f" %(i, cost))
return parameters,costs
4.5 Predictions
Predictions:
用forward propagation来预测结果
predictions =
# GRADED FUNCTION: predict
def predict(parameters, X):
"""
parameters -- python dictionary containing your parameters
X -- input data of size (n_x, m)
Returns
predictions -- vector of predictions of our model (red: 0 / blue: 1)
"""
A2, cache = forward_propagation(X, parameters)
predictions = (A2 > 0.5)*1.
return predictions
6.Training model(全0初始化)
# Build a model with a n_h-dimensional hidden layer
parameters,cost = nn_model(X, Y, n_h = 4, num_iterations=20000, print_cost=1000,lr=1.0,flag=1)
# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693147
Cost after iteration 1000: 0.693147
Cost after iteration 2000: 0.693147
Cost after iteration 3000: 0.693147
Cost after iteration 4000: 0.693147
Cost after iteration 5000: 0.693147
Cost after iteration 6000: 0.693147
Cost after iteration 7000: 0.693147
Cost after iteration 8000: 0.693147
Cost after iteration 9000: 0.693147
Cost after iteration 10000: 0.693147
Cost after iteration 11000: 0.693147
Cost after iteration 12000: 0.693147
Cost after iteration 13000: 0.693147
Cost after iteration 14000: 0.693147
Cost after iteration 15000: 0.693147
Cost after iteration 16000: 0.693147
Cost after iteration 17000: 0.693147
Cost after iteration 18000: 0.693147
Cost after iteration 19000: 0.693147
Text(0.5,1,'Decision Boundary for hidden layer size 4')
plt.figure(figsize=(14,6))
plt.title('Lost curve')
plt.grid()
plt.plot(cost)
全0初始化模型,梯度不会下降
6.1Training model(随即初始化)
parameters,cost = nn_model(X, Y, n_h = 4, num_iterations=20000, print_cost=1000,lr=1.0)
# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.513391
Cost after iteration 2000: 0.517645
Cost after iteration 3000: 0.516130
Cost after iteration 4000: 0.515024
Cost after iteration 5000: 0.514154
Cost after iteration 6000: 0.513449
Cost after iteration 7000: 0.512871
Cost after iteration 8000: 0.512389
Cost after iteration 9000: 0.511984
Cost after iteration 10000: 0.511640
Cost after iteration 11000: 0.511343
Cost after iteration 12000: 0.511086
Cost after iteration 13000: 0.510861
Cost after iteration 14000: 0.510661
Cost after iteration 15000: 0.510484
Cost after iteration 16000: 0.510325
Cost after iteration 17000: 0.510182
Cost after iteration 18000: 0.510052
Cost after iteration 19000: 0.509933
Text(0.5,1,'Decision Boundary for hidden layer size 4')
plt.figure(figsize=(14,6))
plt.title('Lost curve')
plt.grid()
plt.plot(cost)
# Print accuracy
predictions = predict(parameters, X)
accuracy = float((np.dot(predictions,Y) + np.dot(1-predictions,1-Y))/float(Y.shape[0])*100)
print ('Accuracy: %d '%accuracy+'%')
Accuracy: 68 %
模型的准确率似乎不是很高,有三个参数可以进行调整:
- 增加训练的次数
- 增加隐藏层神经元的数量
- 寻找合适的学习效率
# 49种组合
n_hs = [3, 5, 7, 9, 10, 20 , 30]
lrs = [1.5, 2.0, 2.5, 3.0, 3.3, 3.5, 4.0]
params = []
for n_h in n_hs:
for lr in lrs:
params.append((n_h, lr))
训练模型,并画出分类决策边界
# This may take about 2 minutes to run
plt.figure(figsize=(70,70))
Costs = []
for i, (n_h,lr) in enumerate(params):
plt.subplot(7, 7, i+1)
plt.title('n_b : %d,lr : %f' % (n_h,lr))
parameters,costs = nn_model(X, Y, n_h, num_iterations = 10000, lr=lr,print_cost=0)
Costs.append(costs)
plot_decision_boundary(lambda x: predict(parameters, x), X, Y)
predictions = predict(parameters, X)
accuracy = float((np.dot(predictions,Y) + np.dot(1-predictions,1-Y))/float(Y.shape[0])*100)
print ("Accuracy for {} hidden units, learning rate: {},{}%".format(n_h,lr, accuracy))
Accuracy for 3 hidden units, learning rate: 1.5,68.5%
Accuracy for 3 hidden units, learning rate: 2.0,67.0%
Accuracy for 3 hidden units, learning rate: 2.5,67.0%
Accuracy for 3 hidden units, learning rate: 3.0,67.0%
Accuracy for 3 hidden units, learning rate: 3.3,68.25%
Accuracy for 3 hidden units, learning rate: 3.5,67.25%
Accuracy for 3 hidden units, learning rate: 4.0,78.0%
Accuracy for 5 hidden units, learning rate: 1.5,67.25%
Accuracy for 5 hidden units, learning rate: 2.0,74.0%
Accuracy for 5 hidden units, learning rate: 2.5,68.75%
Accuracy for 5 hidden units, learning rate: 3.0,92.25%
Accuracy for 5 hidden units, learning rate: 3.3,91.75%
Accuracy for 5 hidden units, learning rate: 3.5,92.5%
Accuracy for 5 hidden units, learning rate: 4.0,92.5%
Accuracy for 7 hidden units, learning rate: 1.5,92.0%
Accuracy for 7 hidden units, learning rate: 2.0,92.75%
Accuracy for 7 hidden units, learning rate: 2.5,92.75%
Accuracy for 7 hidden units, learning rate: 3.0,92.75%
Accuracy for 7 hidden units, learning rate: 3.3,92.5%
Accuracy for 7 hidden units, learning rate: 3.5,92.25%
Accuracy for 7 hidden units, learning rate: 4.0,92.0%
Accuracy for 9 hidden units, learning rate: 1.5,92.0%
Accuracy for 9 hidden units, learning rate: 2.0,92.75%
Accuracy for 9 hidden units, learning rate: 2.5,92.75%
Accuracy for 9 hidden units, learning rate: 3.0,92.75%
Accuracy for 9 hidden units, learning rate: 3.3,92.5%
Accuracy for 9 hidden units, learning rate: 3.5,92.5%
Accuracy for 9 hidden units, learning rate: 4.0,92.75%
Accuracy for 10 hidden units, learning rate: 1.5,92.75%
Accuracy for 10 hidden units, learning rate: 2.0,92.75%
Accuracy for 10 hidden units, learning rate: 2.5,92.75%
Accuracy for 10 hidden units, learning rate: 3.0,92.75%
Accuracy for 10 hidden units, learning rate: 3.3,92.75%
Accuracy for 10 hidden units, learning rate: 3.5,92.75%
Accuracy for 10 hidden units, learning rate: 4.0,93.0%
Accuracy for 20 hidden units, learning rate: 1.5,93.75%
Accuracy for 20 hidden units, learning rate: 2.0,92.75%
Accuracy for 20 hidden units, learning rate: 2.5,93.0%
Accuracy for 20 hidden units, learning rate: 3.0,92.75%
Accuracy for 20 hidden units, learning rate: 3.3,91.0%
Accuracy for 20 hidden units, learning rate: 3.5,91.25%
Accuracy for 20 hidden units, learning rate: 4.0,89.5%
Accuracy for 30 hidden units, learning rate: 1.5,92.75%
Accuracy for 30 hidden units, learning rate: 2.0,93.75%
Accuracy for 30 hidden units, learning rate: 2.5,91.75%
Accuracy for 30 hidden units, learning rate: 3.0,91.5%
Accuracy for 30 hidden units, learning rate: 3.3,92.75%
Accuracy for 30 hidden units, learning rate: 3.5,92.0%
Accuracy for 30 hidden units, learning rate: 4.0,92.5%
下图,纵轴为n_hs,横轴为lrs:
- n_hs = [3, 5, 7, 9, 10, 20, 30]
- lrs = [1.5, 2.0, 2.5. 3.0, 3.3, 3.5, 4.0]
cost curve
plt.figure(figsize=(140,140))
for i, (n_h,lr) in enumerate(params):
plt.subplot(7, 7, i+1)
plt.grid()
plt.ylim(0,1)
plt.title('n_b : %d,lr : %f' % (n_h,lr))
plt.plot(Costs[i],c='green')
上图,为49种组合的,cost curve,纵轴为n_hs,横轴为lrs:
- n_hs = [3, 5, 7, 9, 10, 20, 30]
- lrs = [1.5, 2.0, 2.5. 3.0, 3.3, 3.5, 4.0]
plt.figure(figsize=(14,56))
for i,n_h,n in zip(np.arange(0,49,7), n_hs, range(7)):
plt.subplot(7,1, n+1)
plt.ylim(0,0.7)
plt.grid()
plt.title('Hidden Layer of size %d' % n_h)
for lr,cost in zip(lrs,Costs[i: i+7]):
plt.plot(cost, label='lr=%.2f'%lr)
plt.legend()
通过对上面数据的观察找到几组不错的参数(n_h, lr)
- (7, 2.5)
- (9, 2.5)
- (10, 3.5)
- (20, 2.0)
7.Find best combination
对选出的四组参数对应的cost curve进行可视化
plt.figure(figsize=(14,8))
plt.grid()
plt.ylim(0.1,0.25)
plt.title('Find best combination')
for n_h, lr in [(7,2.5),(9,2.5),(10,3.5),(20,2.0)]:
i = n_hs.index(n_h)
j = lrs.index(lr)
plt.plot(Costs[i*7:(i+1)*7][j], label='n_h:%d,lr=%.2f'%(n_h,lr))
plt.legend()
黄色的曲线表现不错,在迭代5500次之后,很平滑,下降的也比较快.下面继续增加迭代的次数,看看,黄色的曲线是否还会出现震荡.
plt.figure(figsize=(14,8))
plt.grid()
plt.ylim(0.1,0.25)
for i, (n_h,lr) in enumerate([(7,2.0),(10,4.0),(20,1.5),(30,2.0)]):
parameters,costs = nn_model(X, Y, n_h, num_iterations = 30000, lr=lr,print_cost=0)
plt.plot(costs, label='n_h=%d,lr=%.2f'%(n_h,lr))
plt.legend()
增加了训练次数后,除了红色的曲线依然有震荡,且周期也在变大.其他的曲线15000轮的迭代后基本趋于稳定,所以(9, 2.5)可能是最佳组合,对数据拟合比较好,但是可能出现过拟合下降.不过这次的内容先不涉及过拟合的问题.
- n_h = 9
- lr = 2.5
- num_iteration:15000