本文是基于吴恩达老师《深度学习》第二周第一课练习题所做,目的在于了解梯度检验在深度神经网络中如何运用。
一、梯度检验的意义
构建神经网络模型过程中,也许我们能够很容易检验前向传播的过程是否正确执行,但是在反向传播的计算过程中,由于导数或偏导数的使用,我们很难凭借自觉发现,因此梯度检验对于反向传播的正确执行具有重要意义。
二、一维梯度检验
1.一维函数在进行前向传播和反向传播的流程如下图所示:
2.一维函数的前向传播过程非常简单,其计算函数为
def forward_propagation(x, theta):
J = theta * x
return J
3.反向传播过程为
def backward_propagation(x, theta):
dtheta = x
return dtheta
4.一维梯度检验的步骤
(1)根据导数的定义公式,,计算相关的变量。
(2)使用反向传播计算梯度,并将结果存入变量“grad”
(3)计算变量“gradapprox”和 “grad” 的偏差,如果差值小于一个很小的值,比如,则梯度检验合格。
实现上述过程
def gradient_check(x, theta, epsilon = 1e-7):
thetaplus = theta + epsilon
thetaminus = theta - epsilon
J_plus = forward_propagation(x, thetaplus)
J_minus = forward_propagation(x, thetaminus)
gradapprox = (J_plus - J_minus) / (2 * epsilon)
grad = backward_propagation(x, theta)
numerator = np.linalg.norm(grad - gradapprox)
denumerator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)
difference = numerator / denumerator
if difference < epsilon:
print("The gradient is correct!")
else:
print("The gradient is wrong!")
return difference
x, theta = 2, 4
difference = gradient_check(x, theta)
print("difference = " + str(difference))
The gradient is correct!
difference = 2.919335883291695e-10
偏差小于1e-7,梯度检验通过。
三、N维梯度检验
1. 对于神经网络来说损失函数,theta是W[l]和b[l]的组合,其处理流程如下图所示。
2. 前向传播
def forward_propagation_n(X, Y, parameters):
m = X.shape[1]
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
W3 = parameters["W3"]
b3 = parameters["b3"]
Z1 = np.dot(W1, X) + b1
A1 = relu(Z1)
Z2 = np.dot(W2, A1) + b2
A2 = relu(Z2)
Z3 = np.dot(W3, A2) + b3
A3 = sigmoid(Z3)
logprobs = np.multiply(-np.log(A3), Y) + np.multiply(-np.log(1 - A3), 1 - Y)
cost = 1. / m * np.sum(logprobs)
cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3)
return cost, cache
3.反向传播
def backward_propagation_n(X, Y, cache):
m = X.shape[1]
(Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache
dZ3 = A3 - Y
dW3 = 1./m * np.dot(dZ3, A2.T)
db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
dA2 = np.dot(W3.T, dZ3)
dZ2 = np.multiply(dA2, np.int64(A2 > 0))
dW2 = 1./m * np.dot(dZ2, A1.T)
db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)
dA1 = np.dot(W2.T, dZ2)
dZ1 = np.multiply(dA1, np.int64(A1 > 0))
dW1 = 1./m * np.dot(dZ1, X.T)
db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)
gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,
"dA2": dA2, "dZ2": dZ2, "dW2": dW2, "db2": db2,
"dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1}
return gradients
4. N维梯度检验的步骤
在N维模型中梯度检验的原理与一维梯度检验是一致的,所不同的是N为模型中theta不是标量而是向量,因此需要进行一些转换,如将我们构建神经网络用到的字典变量parameters转化向量变量values;在计算完成后再将向量变量values转换回字典变量parameters;gradients转化为theta等。
(1)将字典转化为向量的函数
def dictionary_to_vector(parameters):
"""
Roll all our parameters dictionary into a single vector satisfying our specific required shape.
"""
keys = []
count = 0
for key in ["W1", "b1", "W2", "b2", "W3", "b3"]:
# flatten parameter
new_vector = np.reshape(parameters[key], (-1,1))
keys = keys + [key]*new_vector.shape[0]
if count == 0:
theta = new_vector
else:
theta = np.concatenate((theta, new_vector), axis=0)
count = count + 1
return theta, keys
(2)将向量转化为字典的函数
def vector_to_dictionary(theta):
"""
Unroll all our parameters dictionary from a single vector satisfying our specific required shape.
"""
parameters = {}
parameters["W1"] = theta[:20].reshape((5,4))
parameters["b1"] = theta[20:25].reshape((5,1))
parameters["W2"] = theta[25:40].reshape((3,5))
parameters["b2"] = theta[40:43].reshape((3,1))
parameters["W3"] = theta[43:46].reshape((1,3))
parameters["b3"] = theta[46:47].reshape((1,1))
return parameters
(3)将变量gradients转化为theta的函数为
def gradients_to_vector(gradients):
"""
Roll all our gradients dictionary into a single vector satisfying our specific required shape.
"""
count = 0
for key in ["dW1", "db1", "dW2", "db2", "dW3", "db3"]:
# flatten parameter
new_vector = np.reshape(gradients[key], (-1,1))
if count == 0:
theta = new_vector
else:
theta = np.concatenate((theta, new_vector), axis=0)
count = count + 1
return theta
(4)梯度检验函数
def gradient_check_n(parameters, gradients, X, Y, epsilon = 1e-7):
parameters_values, _ = dictionary_to_vector(parameters)
grad = gradients_to_vector(gradients)
num_parameters = parameters_values.shape[0]
J_plus = np.zeros((num_parameters, 1))
J_minus = np.zeros((num_parameters, 1))
gradapprox = np.zeros((num_parameters, 1))
for i in range(num_parameters):
thetaplus = np.copy(parameters_values)
thetaplus[i][0] = thetaplus[i][0] + epsilon
print("thetaplus[i][0] = "+str(thetaplus[i][0]))
J_plus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaplus))
thetaminus = np.copy(parameters_values)
thetaminus[i][0] = thetaminus[i][0] - epsilon
J_minus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaminus))
gradapprox[i] = (J_plus[i] - J_minus[i]) / (2.* epsilon)
numerator = np.linalg.norm(grad - gradapprox)
denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)
difference = numerator / denominator
if difference > 1e-7:
print ("There is a mistake in the backward propagation! difference = " + str(difference))
else:
print ("Your backward propagation works perfectly fine! difference = " + str(difference))
return difference
5.验证结果
X, Y, parameters = gradient_check_n_test_case()
cost, cache = forward_propagation_n(X, Y, parameters)
gradients = backward_propagation_n(X, Y, cache)
difference = gradient_check_n(parameters, gradients, X, Y)
不幸的是,笔者对gradient_check_n 和 backward_propagation_n 两个函数找了半天错,也没跑出正确的结果,运行结果始终是
“There is a mistake in the backward propagation!”,
稍后如果查到错误所在,再做更新。如果哪位大神找到问题,希望不吝赐教。