[Deep Learning_2.1.2] Neural Network Regularization

Large bias (underfitting): Accuracy on both the training and test sets is low; usually can be optimized by increasing the number of neurons in a single layer and increasing the number of layers

Large variance (overfitting): The correct rate of the training set is high, and the correct rate of the test set (or dev) is lower than that of the training set; it is usually possible to increase the value of the regularization lambda and increase the amount of data in the training set.

Overfitting can be addressed by regularizing during model training


l2 regularization, adding l2 regularization cost to the original cost function  


For backpropagation, dW needs to add:



Code:

    #Calculate cost function

    L2_regularization_cost = lambd/(2*m)*(np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3)))

    cost = cross_entropy_cost + L2_regularization_cost

    #backpropagation

    dW3 = 1./m * np.dot(dZ3, A2.T) + lambd/m*W3

    dW2 = 1./m * np.dot(dZ2, A1.T) + lambd/m*W2

    dW1 = 1./m * np.dot(dZ1, X.T) + lambd/m*W1    

dropout

During each training, neurons are randomly turned off. A neuron that is turned off has no effect in both forward and backward propagation. Since dropout randomly turns off neurons, each traversal is equivalent to using a different model, which reduces the dependence on a certain neuron.

Forward pass code implementation

    D1 = np.random.rand(A1.shape[0],A1.shape[1]) # Step 1: initialize matrix D1 = np.random.rand(..., ...), the dimension is the same as a1
    D1 = np.int64(D1 < keep_prob) # Step 2: convert entries of D1 to 0 or 1 (using keep_prob as the threshold)
    A1 = np.multiply(A1,D1) # Step 3: shut down some neurons of A1, drop Drop some neurons
    A1 = np.divide(A1,keep_prob) #step 4: Keep the cost basically the same as expected by dividing by keep prob

    D2 = np.random.rand(A2.shape[0],A2.shape[1])                                          # Step 1: initialize matrix D2 = np.random.rand(..., ...)
    D2 = np.int64(D2 < keep_prob)                                          # Step 2: convert entries of D2 to 0 or 1 (using keep_prob as the threshold)
    A2 = np.multiply(A2,D2)                                         # Step 3: shut down some neurons of A2
    A2 = np.divide(A2,keep_prob)                               #step 4:通过除以keep prob,保持cost与预期基本一致

Backpropagation code implementation

dA2 = np.multiply(dA2,D2)              # Step 1: Apply mask D2 to shut down the same neurons as during the forward propagation
    dA2 = np.divide(dA2,keep_prob)              # Step 2: Scale the value of neurons that haven't been shut down


Refer to Andrew Ng's deep learning course.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325392967&siteId=291194637