Building your Deep Neural Network: Step by Step
Welcome to your week 4 assignment (part 1 of 2)! You have previously trained a 2-layer Neural Network (with a single hidden layer). This week, you will build a deep neural network, with as many layers as you want!
- In this notebook, you will implement all the functions required to build a deep neural network.(在这个笔记本中,你将实现构建深度神经网络所需的所有功能。)
- In the next assignment, you will use these functions to build a deep neural network for image classification.(在下一个任务中,您将使用这些函数为图像分类构建深度神经网络。)
After this assignment you will be able to:
- Use non-linear units like ReLU to improve your model(使用像ReLU这样的非线性单位来改进你的模型)
- Build a deeper neural network (with more than 1 hidden layer)(建立一个更深层的神经网络(具有多于一个的隐藏层))
- Implement an easy-to-use neural network class(实现一个易于使用的神经网络类)
Notation:
- Superscript [l] denotes a quantity associated with the lth layer.
- Example: a[L] is the Lth layer activation. W[L] and b[L] are the Lth layer parameters.
- Superscript (i) denotes a quantity associated with the ith example.
- Example: x(i) is the ith training example.
- Lowerscript i denotes the ith entry of a vector.
- Example: a[l]i denotes the ith entry of the lth layer’s activations).
Let’s get started!
1 - Packages
Let’s first import all the packages that you will need during this assignment.
- numpy is the main package for scientific computing with Python.
- matplotlib is a library to plot graphs in Python.
- dnn_utils provides some necessary functions for this notebook.
- testCases provides some test cases to assess the correctness of your functions
- np.random.seed(1) is used to keep all the random function calls consistent. It will help us grade your work. Please don’t change the seed. (np.random.seed(1)用于保持所有的随机函数调用一致。这将帮助我们评分你的工作。请不要改变seed。)
3- 概要
- 双隐藏层和L层神经网络模型的参数初始化
- 做前向传播操作
- 计算正向传播的LINEAR 部分,因为每个神经元节点都是由两部分组成,这在逻辑回归里面有阐述。线性部分即Z=WX+b这部分,输出部分就是A,就是将线性部分的结果输入到激活函数所产生的结果。
- 采用RELU或者sigmoid激活函数计算结果值
- 联合上述两个步骤,进行前向传播操作[LINEAR->ACTIVATION]
- 对输出层之前的L-1层,做L-1次的前向传播 [LINEAR->RELU] ,并将结果输出到第L层[LINEAR->SIGMOID]。所以在前面L-1层我们的激活函数是RELU,在输出层我们的激活函数是sigmoid。
- 计算损失函数
- 做后向传播操作(下图红色区域部分)
- 计算神经网络反向传播的LINEAR部分
- 计算激活函数(RELU或者sigmoid)的梯度
- 结合前面两个步骤,产生一个新的后向函数[LINEAR->ACTIVATION]
- 更新参数
流程图:
注意:每个正向函数都是和反向函数相关联的。所以,正向传播模块的每一个步骤都会将反向传播需要用到的值存储在cache中。在反向传播模块,我们需要用到cache中的值计算梯度。
4- 初始化
在此,设计了两个初始化函数,分别用以双层模型和泛化的L层模型。
4-1 双层神经网络
该模型的结构是:LINEAR -> RELU -> LINEAR -> SIGMOID
对于权重矩阵采用随机化方式进行初始化(np.random.randn(shape)*0.01
),对于偏移值b矩阵则采用0矩阵即可(np.zeros(shape)
)。
初始化代码如下:
# GRADED FUNCTION: initialize_parameters def initialize_parameters(n_x, n_h, n_y): """ Argument: n_x -- size of the input layer n_h -- size of the hidden layer n_y -- size of the output layer Returns: parameters -- python dictionary containing your parameters: W1 -- weight matrix of shape (n_h, n_x) b1 -- bias vector of shape (n_h, 1) W2 -- weight matrix of shape (n_y, n_h) b2 -- bias vector of shape (n_y, 1) """ np.random.seed(1) ### START CODE HERE ### (≈ 4 lines of code) W1 = np.random.randn(n_h, n_x)*0.01 b1 = np.zeros((n_h, 1)) W2 = np.random.randn(n_y, n_h)*0.01 b2 = np.zeros((n_y, 1)) ### END CODE HERE ### assert(W1.shape == (n_h, n_x)) assert(b1.shape == (n_h, 1)) assert(W2.shape == (n_y, n_h)) assert(b2.shape == (n_y, 1)) parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters测试代码如下:
parameters = initialize_parameters(2,2,1) print("W1 = " + str(parameters["W1"])) print("b1 = " + str(parameters["b1"])) print("W2 = " + str(parameters["W2"])) print("b2 = " + str(parameters["b2"]))
测试代码运行结果如下:
W1 = [[ 0.01624345 -0.00611756] [-0.00528172 -0.01072969]] b1 = [[ 0.] [ 0.]] W2 = [[ 0.00865408 -0.02301539]] b2 = [[ 0.]]
4-2 L层神经网络
对于L层的神经网络由于涉及到很多的权重矩阵和偏移矩阵显得更加复杂。要特别注意的是矩阵之间的尺寸匹配。n[l]表示 l层的神经元数量。例如输入X 的尺寸是 (12288,209) (m=209表示样本数) :
Shape of W | Shape of b | Activation | Shape of Activation | |
Layer 1 | (n[1],12288) | (n[1],1) | Z[1]=W[1]X+b[1] | (n[1],209) |
Layer 2 | (n[2],n[1]) | (n[2],1) | Z[2]=W[2]A[1]+b[2] | (n[2],209) |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
Layer L-1 | (n[L−1],n[L−2]) | (n[L−1],1) | Z[L−1]=W[L−1]A[L−2]+b[L−1] | (n[L−1],209) |
Layer L | (n[L],n[L−1]) | (n[L],1) | Z[L]=W[L]A[L−1]+b[L] | (n[L],209) |
对于L层模型:
- 模型结构: [LINEAR -> RELU] × (L-1) -> LINEAR -> SIGMOID。所以 L−1 层是需要用到 ReLU激活函数的。输出层用的是sigmoid函数。
- 权重矩阵采用仍旧是随机化初始化的方式:
np.random.rand(shape) * 0.01
- 偏移矩阵仍旧是0矩阵进行处初始化:
np.zeros(shape)
. - 我们将每层的神经元数量n[l]信息进行存储,
layer_dims
。例如在平面数据分类模型中layer_dims
的值是[2,4,1],其中输入层的神经元个数是2,隐藏层的神经元个数是4,输出层的神经元个数是1。对应的W1
尺寸= (4,2),b1
尺寸= (4,1),W2
尺寸= (1,4) ,b2
尺寸= (1,1)。
代码如下:
# GRADED FUNCTION: initialize_parameters_deep def initialize_parameters_deep(layer_dims): """ Arguments: layer_dims -- python array (list) containing the dimensions of each layer in our network Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1]) bl -- bias vector of shape (layer_dims[l], 1) """ np.random.seed(3) parameters = {} L = len(layer_dims) # number of layers in the network for l in range(1, L): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01 parameters['b' + str(l)] = np.zeros((layer_dims[l], 1)) ### END CODE HERE ### assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1])) assert(parameters['b' + str(l)].shape == (layer_dims[l], 1)) return parameters
测试代码:
parameters = initialize_parameters_deep([5,4,3]) print("W1 = " + str(parameters["W1"])) print("b1 = " + str(parameters["b1"])) print("W2 = " + str(parameters["W2"])) print("b2 = " + str(parameters["b2"]))
测试结果如下:
W1 = [[ 0.01788628 0.0043651 0.00096497 -0.01863493 -0.00277388] [-0.00354759 -0.00082741 -0.00627001 -0.00043818 -0.00477218] [-0.01313865 0.00884622 0.00881318 0.01709573 0.00050034] [-0.00404677 -0.0054536 -0.01546477 0.00982367 -0.01101068]] b1 = [[ 0.] [ 0.] [ 0.] [ 0.]] W2 = [[-0.01185047 -0.0020565 0.01486148 0.00236716] [-0.01023785 -0.00712993 0.00625245 -0.00160513] [-0.00768836 -0.00230031 0.00745056 0.01976111]] b2 = [[ 0.] [ 0.] [ 0.]]
5- 前向传播模型
5-1 线性传播部分
代码如下:
# GRADED FUNCTION: linear_forward def linear_forward(A, W, b): """ Implement the linear part of a layer's forward propagation. Arguments: A -- activations from previous layer (or input data): (size of previous layer, number of examples) W -- weights matrix: numpy array of shape (size of current layer, size of previous layer) b -- bias vector, numpy array of shape (size of the current layer, 1) Returns: Z -- the input of the activation function, also called pre-activation parameter cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently """ ### START CODE HERE ### (≈ 1 line of code) Z = np.dot(W, A) + b ### END CODE HERE ### assert(Z.shape == (W.shape[0], A.shape[1])) cache = (A, W, b) return Z, cache
测试代码:
def linear_forward_test_case(): np.random.seed(1) """ X = np.array([[-1.02387576, 1.12397796], [-1.62328545, 0.64667545], [-1.74314104, -0.59664964]]) W = np.array([[ 0.74505627, 1.97611078, -1.24412333]]) b = np.array([[1]]) """ A = np.random.randn(3,2) W = np.random.randn(1,3) b = np.random.randn(1,1) return A, W, b A, W, b = linear_forward_test_case() Z, linear_cache = linear_forward(A, W, b) print("Z = " + str(Z))
测试代码运行结果:
Z = [[ 3.26295337 -1.23429987]]
5-2 激活部分
5-2-1 相邻两层的激活实现
代码实现:
# GRADED FUNCTION: linear_activation_forward def linear_activation_forward(A_prev, W, b, activation): """ Implement the forward propagation for the LINEAR->ACTIVATION layer Arguments: A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples) W -- weights matrix: numpy array of shape (size of current layer, size of previous layer) b -- bias vector, numpy array of shape (size of the current layer, 1) activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu" Returns: A -- the output of the activation function, also called the post-activation value cache -- a python dictionary containing "linear_cache" and "activation_cache"; stored for computing the backward pass efficiently """ if activation == "sigmoid": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". ### START CODE HERE ### (≈ 2 lines of code) Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = sigmoid(Z) ### END CODE HERE ### elif activation == "relu": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". ### START CODE HERE ### (≈ 2 lines of code) Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = relu(Z) ### END CODE HERE ### assert (A.shape == (W.shape[0], A_prev.shape[1])) cache = (linear_cache, activation_cache) return A, cache #其中sigmoid和relu定义如下: def sigmoid(Z): """ Implements the sigmoid activation in numpy Arguments: Z -- numpy array of any shape Returns: A -- output of sigmoid(z), same shape as Z cache -- returns Z as well, useful during backpropagation """ A = 1/(1+np.exp(-Z)) cache = Z return A, cache def relu(Z): """ Implement the RELU function. Arguments: Z -- Output of the linear layer, of any shape Returns: A -- Post-activation parameter, of the same shape as Z cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently """ A = np.maximum(0,Z) assert(A.shape == Z.shape) cache = Z return A, cache
测试代码:
def linear_activation_forward_test_case(): """ X = np.array([[-1.02387576, 1.12397796], [-1.62328545, 0.64667545], [-1.74314104, -0.59664964]]) W = np.array([[ 0.74505627, 1.97611078, -1.24412333]]) b = 5 """ np.random.seed(2) A_prev = np.random.randn(3,2) W = np.random.randn(1,3) b = np.random.randn(1,1) return A_prev, W, b A_prev, W, b = linear_activation_forward_test_case() A, linear_activation_cache = linear_activation_forward(A_prev, W, b, activation = "sigmoid") print("With sigmoid: A = " + str(A)) A, linear_activation_cache = linear_activation_forward(A_prev, W, b, activation = "relu") print("With ReLU: A = " + str(A))
测试代码运行结果:
With sigmoid: A = [[ 0.96890023 0.11013289]] With ReLU: A = [[ 3.43896131 0. ]]
5-2-2 L层模型:
上面已经阐述了相邻两层之间的激活模型,那么对于L层的神经网络,激活函数为RELU的linear_activation_forward 需要重复L-1次,而最后的输出层采用的参数为SIGMOID的linear_activation_forward 。
L层的前向传播如下:
代码中我们用 AL
表示A[L]=σ(Z[L])=σ(W[L]A[L−1]+b[L]),即 ^ Y
Tips:
- 复用此前的代码
- 循环 [LINEAR->RELU] (L-1) 次
- 注意保持 “caches” 中的数据。
代码:
# GRADED FUNCTION: L_model_forward def L_model_forward(X, parameters): """ Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation Arguments: X -- data, numpy array of shape (input size, number of examples) parameters -- output of initialize_parameters_deep() Returns: AL -- last post-activation value caches -- list of caches containing: every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2) the cache of linear_sigmoid_forward() (there is one, indexed L-1) """ caches = [] A = X L = len(parameters) /2 # number of layers in the neural network # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list. for l in range(1, L): A_prev = A ### START CODE HERE ### (≈ 2 lines of code) A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation = "relu") caches.append(cache) ### END CODE HERE ### # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list. ### START CODE HERE ### (≈ 2 lines of code) AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation = "sigmoid") caches.append(cache) ### END CODE HERE ### assert(AL.shape == (1,X.shape[1])) return AL, caches
测试代码:
def L_model_forward_test_case(): """ X = np.array([[-1.02387576, 1.12397796], [-1.62328545, 0.64667545], [-1.74314104, -0.59664964]]) parameters = {'W1': np.array([[ 1.62434536, -0.61175641, -0.52817175], [-1.07296862, 0.86540763, -2.3015387 ]]), 'W2': np.array([[ 1.74481176, -0.7612069 ]]), 'b1': np.array([[ 0.], [ 0.]]), 'b2': np.array([[ 0.]])} """ np.random.seed(1) X = np.random.randn(4,2) W1 = np.random.randn(3,4) b1 = np.random.randn(3,1) W2 = np.random.randn(1,3) b2 = np.random.randn(1,1) parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} return X, parameters X, parameters = L_model_forward_test_case() AL, caches = L_model_forward(X, parameters) print("AL = " + str(AL)) print("Length of caches list = " + str(len(caches)))
测试代码运行结果:
AL = [[ 0.17007265 0.2524272 ]] Length of caches list = 2
至此,我们可以计算得到AL的值,该值包含了所有的预测结果。在caches中也记录了中间值。为此,我们可以用AL值来计算代价。
6- 代价函数
代码:
# GRADED FUNCTION: compute_cost def compute_cost(AL, Y): """ Implement the cost function defined by equation (7). Arguments: AL -- probability vector corresponding to your label predictions, shape (1, number of examples) Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples) Returns: cost -- cross-entropy cost """ m = Y.shape[1] # Compute loss from aL and y. ### START CODE HERE ### (≈ 1 lines of code) cost = -np.sum(np.multiply(Y, np.log(AL)) + np.multiply(1-Y, np.log(1-AL)), axis=1 ,keepdims=True)/m ### END CODE HERE ### cost = np.squeeze(cost) # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17). assert(cost.shape == ()) return cost
测试代码:
def compute_cost_test_case(): Y = np.asarray([[1, 1, 1]]) aL = np.array([[.8,.9,0.4]]) return Y, aL Y, AL = compute_cost_test_case() print("cost = " + str(compute_cost(AL, Y)))
测试代码运行结果:
cost = 0.414931599615397
7-后向传播模型
后向传播是为了计算各个参数梯度,其模型如下:
Figure 3 : 前向传播和后向传播:LINEAR->RELU->LINEAR->SIGMOID
紫色模块表示前向传播, 红色模块表示反向传播
和之前的前向传播类似,后向传播模块的建立分以下三个步骤:
- 后向LINEAR(Linear backward)
- ReLU 或者 sigmoid 激活函数的后向LINEAR -> ACTIVATION
- [LINEAR -> RELU] × (L-1) -> LINEAR -> SIGMOID backward (whole model)
代码:
# GRADED FUNCTION: linear_backward def linear_backward(dZ, cache): """ Implement the linear portion of backward propagation for a single layer (layer l) Arguments: dZ -- Gradient of the cost with respect to the linear output (of current layer l) cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer Returns: dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev dW -- Gradient of the cost with respect to W (current layer l), same shape as W db -- Gradient of the cost with respect to b (current layer l), same shape as b """ A_prev, W, b = cache m = A_prev.shape[1] ### START CODE HERE ### (≈ 3 lines of code) dW = np.dot(dZ,A_prev.T)/m db = np.sum(dZ,axis = 1, keepdims=True)/m dA_prev = np.dot(W.T, dZ) ### END CODE HERE ### assert (dA_prev.shape == A_prev.shape) assert (dW.shape == W.shape) assert (db.shape == b.shape) return dA_prev, dW, db
测试代码:
def linear_backward_test_case(): """ z, linear_cache = (np.array([[-0.8019545 , 3.85763489]]), (np.array([[-1.02387576, 1.12397796], [-1.62328545, 0.64667545], [-1.74314104, -0.59664964]]), np.array([[ 0.74505627, 1.97611078, -1.24412333]]), np.array([[1]])) """ np.random.seed(1) dZ = np.random.randn(1,2) A = np.random.randn(3,2) W = np.random.randn(1,3) b = np.random.randn(1,1) linear_cache = (A, W, b) return dZ, linear_cache # Set up some test inputs dZ, linear_cache = linear_backward_test_case() dA_prev, dW, db = linear_backward(dZ, linear_cache) print ("dA_prev = "+ str(dA_prev)) print ("dW = " + str(dW)) print ("db = " + str(db))
测试代码运行结果:
dA_prev = [[ 0.51822968 -0.19517421] [-0.40506361 0.15255393] [ 2.37496825 -0.89445391]] dW = [[-0.10076895 1.40685096 1.64992505]] db = [[ 0.50629448]]
7-2 Linear-Activation backward
对于sigmoid函数,可以定义两个函数:
sigmoid_backward
:用以计算 SIGMOID单元:
dZ = sigmoid_backward(dA, activation_cache)#其用到的cache值是Z值
relu_backward
: 用以计算RELU的 backward propagation:
dZ = relu_backward(dA, activation_cache)
对于 g(.)的激活函数:
代码:
# GRADED FUNCTION: linear_activation_backward def linear_activation_backward(dA, cache, activation): """ Implement the backward propagation for the LINEAR->ACTIVATION layer. Arguments: dA -- post-activation gradient for current layer l cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu" Returns: dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev dW -- Gradient of the cost with respect to W (current layer l), same shape as W db -- Gradient of the cost with respect to b (current layer l), same shape as b """ linear_cache, activation_cache = cache if activation == "relu": ### START CODE HERE ### (≈ 2 lines of code) dZ = relu_backward(dA, activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) ### END CODE HERE ### elif activation == "sigmoid": ### START CODE HERE ### (≈ 2 lines of code) dZ = sigmoid_backward(dA, activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) ### END CODE HERE ### return dA_prev, dW, db #relu_backward定义如下: def relu_backward(dA, cache): """ Implement the backward propagation for a single RELU unit. Arguments: dA -- post-activation gradient, of any shape cache -- 'Z' where we store for computing backward propagation efficiently Returns: dZ -- Gradient of the cost with respect to Z """ Z = cache dZ = np.array(dA, copy=True) # just converting dz to a correct object. # When z <= 0, you should set dz to 0 as well. dZ[Z <= 0] = 0 assert (dZ.shape == Z.shape) return dZ #sigmoid_backward定义如下: def sigmoid_backward(dA, cache): """ Implement the backward propagation for a single SIGMOID unit. Arguments: dA -- post-activation gradient, of any shape cache -- 'Z' where we store for computing backward propagation efficiently Returns: dZ -- Gradient of the cost with respect to Z """ Z = cache s = 1/(1+np.exp(-Z)) dZ = dA * s * (1-s) assert (dZ.shape == Z.shape) return dZ
注意上述两个激活函数dZ的求法。
测试代码:
def linear_activation_backward_test_case(): """ aL, linear_activation_cache = (np.array([[ 3.1980455 , 7.85763489]]), ((np.array([[-1.02387576, 1.12397796], [-1.62328545, 0.64667545], [-1.74314104, -0.59664964]]), np.array([[ 0.74505627, 1.97611078, -1.24412333]]), 5), np.array([[ 3.1980455 , 7.85763489]]))) """ np.random.seed(2) dA = np.random.randn(1,2) A = np.random.randn(3,2) W = np.random.randn(1,3) b = np.random.randn(1,1) Z = np.random.randn(1,2) linear_cache = (A, W, b) activation_cache = Z linear_activation_cache = (linear_cache, activation_cache) return dA, linear_activation_cache AL, linear_activation_cache = linear_activation_backward_test_case() dA_prev, dW, db = linear_activation_backward(AL, linear_activation_cache, activation = "sigmoid") print ("sigmoid:") print ("dA_prev = "+ str(dA_prev)) print ("dW = " + str(dW)) print ("db = " + str(db) + "\n") dA_prev, dW, db = linear_activation_backward(AL, linear_activation_cache, activation = "relu") print ("relu:") print ("dA_prev = "+ str(dA_prev)) print ("dW = " + str(dW)) print ("db = " + str(db))
sigmoid: dA_prev = [[ 0.11017994 0.01105339] [ 0.09466817 0.00949723] [-0.05743092 -0.00576154]] dW = [[ 0.10266786 0.09778551 -0.01968084]] db = [[-0.05729622]] relu: dA_prev = [[ 0.44090989 -0. ] [ 0.37883606 -0. ] [-0.2298228 0. ]] dW = [[ 0.44513824 0.37371418 -0.10478989]] db = [[-0.20837892]]
7-3 L层模型的后向传播
现在我们开始对整个神经网络做后向传播,定义函数为L_model_forward
。在每次的迭代过程中,我们都将 cache值=(X,W,b, z)保留,用以后向模块中梯度的计算。在L_model_forward
中,我们是重复了L次上述的步骤。
后向传播初始化:
对于后向传播,我们知道前向传播的输出是A[L]=σ(Z[L]),我们需要计算 dAL
=∂L∂A[L],我们用以下的公式表示:
dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL
之后,我们可以用这个后向激活的梯度 dAL
进行向后传播。再从 dAL
计算 LINEAR->SIGMOID 的后向传播结果。对于 LINEAR->RELU backward 函数,我们可以采用for
循环来处理这L-1次操作。在此期间,我们要存储 dA, dW, db,本文用grads字典来存储:
模型如下:
[LINEAR->RELU] × (L-1) -> LINEAR -> SIGMOID
代码实现:
# GRADED FUNCTION: L_model_backward def L_model_backward(AL, Y, caches): """ Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group Arguments: AL -- probability vector, output of the forward propagation (L_model_forward()) Y -- true "label" vector (containing 0 if non-cat, 1 if cat) caches -- list of caches containing: every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2) the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1]) Returns: grads -- A dictionary with the gradients grads["dA" + str(l)] = ... grads["dW" + str(l)] = ... grads["db" + str(l)] = ... """ grads = {} L = len(caches) # the number of layers m = AL.shape[1] Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL # Initializing the backpropagation ### START CODE HERE ### (1 line of code) dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) ### END CODE HERE ### # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"] ### START CODE HERE ### (approx. 2 lines) current_cache = caches[L-1] grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, activation = "sigmoid") ### END CODE HERE ### print (L) for l in reversed(range(L - 1)): print (l) # lth layer: (RELU -> LINEAR) gradients. # Inputs: "grads["dA" + str(l + 2)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] ### START CODE HERE ### (approx. 5 lines) current_cache = caches[l] dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l + 2)], current_cache, activation = "relu") grads["dA" + str(l + 1)] = dA_prev_temp grads["dW" + str(l + 1)] = dW_temp grads["db" + str(l + 1)] = db_temp ### END CODE HERE ### return grads测试代码运行结果:
AL, Y_assess, caches = L_model_backward_test_case() grads = L_model_backward(AL, Y_assess, caches) print ("dW1 = "+ str(grads["dW1"])) print ("db1 = "+ str(grads["db1"])) print ("dA1 = "+ str(grads["dA1"])) def L_model_backward_test_case(): """ X = np.random.rand(3,2) Y = np.array([[1, 1]]) parameters = {'W1': np.array([[ 1.78862847, 0.43650985, 0.09649747]]), 'b1': np.array([[ 0.]])} aL, caches = (np.array([[ 0.60298372, 0.87182628]]), [((np.array([[ 0.20445225, 0.87811744], [ 0.02738759, 0.67046751], [ 0.4173048 , 0.55868983]]), np.array([[ 1.78862847, 0.43650985, 0.09649747]]), np.array([[ 0.]])), np.array([[ 0.41791293, 1.91720367]]))]) """ np.random.seed(3) AL = np.random.randn(1, 2) Y = np.array([[1, 0]]) A1 = np.random.randn(4,2) W1 = np.random.randn(3,4) b1 = np.random.randn(3,1) Z1 = np.random.randn(3,2) linear_cache_activation_1 = ((A1, W1, b1), Z1) A2 = np.random.randn(3,2) W2 = np.random.randn(1,3) b2 = np.random.randn(1,1) Z2 = np.random.randn(1,2) linear_cache_activation_2 = ( (A2, W2, b2), Z2) caches = (linear_cache_activation_1, linear_cache_activation_2) return AL, Y, caches
测试代码运行结果如下:
dW1 = [[ 0.41010002 0.07807203 0.13798444 0.10502167] [ 0. 0. 0. 0. ] [ 0.05283652 0.01005865 0.01777766 0.0135308 ]] db1 = [[-0.22007063] [ 0. ] [-0.02835349]] dA1 = [[ 0. 0.52257901] [ 0. -0.3269206 ] [ 0. -0.32070404] [ 0. -0.74079187]]
7-4 参数更新
代码实现:
# GRADED FUNCTION: update_parameters def update_parameters(parameters, grads, learning_rate): """ Update parameters using gradient descent Arguments: parameters -- python dictionary containing your parameters grads -- python dictionary containing your gradients, output of L_model_backward Returns: parameters -- python dictionary containing your updated parameters parameters["W" + str(l)] = ... parameters["b" + str(l)] = ... """ L = len(parameters) // 2 # number of layers in the neural network # Update rule for each parameter. Use a for loop. ### START CODE HERE ### (≈ 3 lines of code) for l in range(L): parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate*grads["dW"+str(l+1)] parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate*grads["db"+str(l+1)] ### END CODE HERE ### return parameters
测试代码运行:
parameters, grads = update_parameters_test_case() parameters = update_parameters(parameters, grads, 0.1) print ("W1 = "+ str(parameters["W1"])) print ("b1 = "+ str(parameters["b1"])) print ("W2 = "+ str(parameters["W2"])) print ("b2 = "+ str(parameters["b2"])) def update_parameters_test_case(): """ parameters = {'W1': np.array([[ 1.78862847, 0.43650985, 0.09649747], [-1.8634927 , -0.2773882 , -0.35475898], [-0.08274148, -0.62700068, -0.04381817], [-0.47721803, -1.31386475, 0.88462238]]), 'W2': np.array([[ 0.88131804, 1.70957306, 0.05003364, -0.40467741], [-0.54535995, -1.54647732, 0.98236743, -1.10106763], [-1.18504653, -0.2056499 , 1.48614836, 0.23671627]]), 'W3': np.array([[-1.02378514, -0.7129932 , 0.62524497], [-0.16051336, -0.76883635, -0.23003072]]), 'b1': np.array([[ 0.], [ 0.], [ 0.], [ 0.]]), 'b2': np.array([[ 0.], [ 0.], [ 0.]]), 'b3': np.array([[ 0.], [ 0.]])} grads = {'dW1': np.array([[ 0.63070583, 0.66482653, 0.18308507], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ]]), 'dW2': np.array([[ 1.62934255, 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. ]]), 'dW3': np.array([[-1.40260776, 0. , 0. ]]), 'da1': np.array([[ 0.70760786, 0.65063504], [ 0.17268975, 0.15878569], [ 0.03817582, 0.03510211]]), 'da2': np.array([[ 0.39561478, 0.36376198], [ 0.7674101 , 0.70562233], [ 0.0224596 , 0.02065127], [-0.18165561, -0.16702967]]), 'da3': np.array([[ 0.44888991, 0.41274769], [ 0.31261975, 0.28744927], [-0.27414557, -0.25207283]]), 'db1': 0.75937676204411464, 'db2': 0.86163759922811056, 'db3': -0.84161956022334572} """ np.random.seed(2) W1 = np.random.randn(3,4) b1 = np.random.randn(3,1) W2 = np.random.randn(1,3) b2 = np.random.randn(1,1) parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} np.random.seed(3) dW1 = np.random.randn(3,4) db1 = np.random.randn(3,1) dW2 = np.random.randn(1,3) db2 = np.random.randn(1,1) grads = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2} return parameters, grads
运行结果如下:
W1 = [[-0.59562069 -0.09991781 -2.14584584 1.82662008] [-1.76569676 -0.80627147 0.51115557 -1.18258802] [-1.0535704 -0.86128581 0.68284052 2.20374577]] b1 = [[-0.04659241] [-1.28888275] [ 0.53405496]] W2 = [[-0.55569196 0.0354055 1.32964895]] b2 = [[-0.84610769]]