Large bias (underfitting): Accuracy on both the training and test sets is low; usually can be optimized by increasing the number of neurons in a single layer and increasing the number of layers
Large variance (overfitting): The correct rate of the training set is high, and the correct rate of the test set (or dev) is lower than that of the training set; it is usually possible to increase the value of the regularization lambda and increase the amount of data in the training set.
Overfitting can be addressed by regularizing during model training
l2 regularization, adding l2 regularization cost to the original cost function
For backpropagation, dW needs to add:
Code:
#Calculate cost function
L2_regularization_cost = lambd/(2*m)*(np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3)))
cost = cross_entropy_cost + L2_regularization_cost
#backpropagation
dW3 = 1./m * np.dot(dZ3, A2.T) + lambd/m*W3
dW2 = 1./m * np.dot(dZ2, A1.T) + lambd/m*W2
dW1 = 1./m * np.dot(dZ1, X.T) + lambd/m*W1
dropout
During each training, neurons are randomly turned off. A neuron that is turned off has no effect in both forward and backward propagation. Since dropout randomly turns off neurons, each traversal is equivalent to using a different model, which reduces the dependence on a certain neuron.
Forward pass code implementation
D1 = np.random.rand(A1.shape[0],A1.shape[1]) # Step 1: initialize matrix D1 = np.random.rand(..., ...), the dimension is the same as a1
D1 = np.int64(D1 < keep_prob) # Step 2: convert entries of D1 to 0 or 1 (using keep_prob as the threshold)
A1 = np.multiply(A1,D1) # Step 3: shut down some neurons of A1, drop Drop some neurons
A1 = np.divide(A1,keep_prob) #step 4: Keep the cost basically the same as expected by dividing by keep prob
D2 = np.random.rand(A2.shape[0],A2.shape[1]) # Step 1: initialize matrix D2 = np.random.rand(..., ...)
D2 = np.int64(D2 < keep_prob) # Step 2: convert entries of D2 to 0 or 1 (using keep_prob as the threshold)
A2 = np.multiply(A2,D2) # Step 3: shut down some neurons of A2
A2 = np.divide(A2,keep_prob) #step 4:通过除以keep prob,保持cost与预期基本一致
Backpropagation code implementation
dA2 = np.multiply(dA2,D2) # Step 1: Apply mask D2 to shut down the same neurons as during the forward propagation
dA2 = np.divide(dA2,keep_prob) # Step 2: Scale the value of neurons that haven't been shut down
Refer to Andrew Ng's deep learning course.