Summary of the first course of Wu Enda's deep learning

 

The core of Mr. Wu Enda's in-depth study of the first course is to understand the three steps of forward propagation, cost calculation, and back propagation (in fact, as long as you calm down and push the formulas in the tutorial on the draft paper, it will not be difficult), Teacher Wu Enda mainly explained these concepts through logistic regression

The first is how to process the input samples. For a color picture, there are three channels of RGB. On the computer, the pixel values ​​​​in the channels are stored through three matrices. For example, in the color picture of dpi in the above figure, the number of pixels is as 64\times 64 follows 64\times 64\times 3=12288. We need to use a feature vector x to represent this picture. The dimension of the feature vector is the number of pixels in the picture, that is, n_{x}this is a sample. For m samples, a two-dimensional matrix can be defined, and the behavior eigenvalues ​​are listed as The number of samples, expressed in code like this

image = np.array(imageio.imread(r"cat.jpg"))  #读取图片
num_px=64    #图片大小
my_image = np.array(Image.fromarray(image).resize((num_px, num_px),Image.ANTIALIAS)) #将图片重塑成指定大小num_px*num_px,Image.ANTIALIAS表示为高质量
my_image = my_image.reshape((1, num_px * num_px * 3)).T   #将特征向量的shape转化为(1,num_px * num_px * 3).T

 With the input, we can now build our neural network. Take a two-layer neural network as an example, that is, there is only one hidden layer.

w600

Suppose there are m samples in the input layer, and the size of a single sample is num_px*num_px, then the size of the input layer is ( , n^{[0]}m)=(num_px*num_px*3,m), there is n^{[1]}one neuron in the hidden layer, and n^{[2]}one neuron in the output layer , first you need to initialize the weight W and bias b:

def param_init(layer_num,layer_dims):
    W = []
    b = []
    for i in range(layer_num):
        sW = np.random.randn(layer_size[i+1],layer_size[i])
        sb = np.zeros((layer_size[i+1],1))
        W.append(sW)
        b.append(sb)
    return W,b

Or you can put the initialized value in a dictionary:

def init_parameters(layer_dims):
    parameters = {}
    for i in range(1,len(layer_dims)):
        parameters['W'+str(i)] = np.random.randn(layer_dims[i],layer_dims[i-1])
        parameters['b'+str(i)] = np.zeros((layer_dims[i],1))
            
    assert(parameters['W'+str(i)].shape == (layer_dims[i],layer_dims[i-1]))
    assert(parameters['b'+str(i)].shape == (layer_dims[i],1))
    
    return parameters

W^{[1]}The dimension of is ( n^{[1]}, n^{[0]}), b^{[1]}the dimension of is ( n^{[1]}, 1), W^{[2]}the dimension of is ( n^{[2}, n^{[1]}), and the dimension of is ( , 1), W^{[1]}and the derivation of three processes is carried out below (lazy, see the figure below):n^{[2]}

  • Forward propagation code:

def forward_propagation(X,W,b,layer_num,layer_size,activate_fun):
    Z = []
    A = []
    for i in range(layer_num):
        if i==0:
            sZ = np.dot(W[i],X)+b[i]
        else:
            sZ = np.dot(W[i],A[i-1])+b[i]
        sA = activate_fun[i](sZ)
        Z.append(sZ)
        A.append(sA)
    return Z,A
  • Calculate the cost:

def compute_cost(prediction,Y):
    m = Y.shape[1]
    logprobs = np.multiply(np.log(prediction+1e-5), Y) + np.multiply((1 - Y), np.log(1 - prediction+1e-5))
    cost = (-1./m)*np.nansum(logprobs)
    cost = np.squeeze(cost)
    return cost
  • Backpropagation (because the activation function of the output layer is a sigmoid function, the dZ of the last layer is directly calculated, ie dZ[l-1]=A[l-1]-Y):

def backward_propagation(l,X,Y,W,Z,A,derivate_function):
    dZ = list(range(l))
    dA = list(range(l))
    dW = list(range(l))
    db = list(range(l))
    m = Y.shape[1]
    
    dZ[l-1] = A[l-1] - Y
    for i in range(l-1,-1,-1):
        if i>0:
            dW[i] = (1/m)*np.dot(dZ[i],A[i-1].T)
        else:
            dW[i] = (1/m)*np.dot(dZ[i],X.T)
        db[i] = (1/m)*np.sum(dZ[i],axis=1,keepdims=True)
        dA[i-1] = np.dot(W[i].T,dZ[i])
        dZ[i-1] = np.multiply(dA[i-1],np.int64(A[i-1]>0))

    return dW,db
  • Update weights and biases:

def update_param(W,b,dW,db,learning_rate=0.5):
    for i in range(len(W)):
        W[i] = W[i] - learning_rate*dW[i]
        b[i] = b[i] - learning_rate*db[i]
    return W,b
  •  forecast result:

def predict(X,W,b,layer_num,layer_size,activate_fun):
    
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    ### START CODE HERE ### (≈ 2 lines of code)
    Z,A = forward_propagation(X,W,b,layer_num,layer_size,activate_fun)
    predictions = np.round(A[layer_num-1])
    
    ### END CODE HERE ###
    
    return predictions

The idea of ​​the above code is to complete a program that can build any number of layers and any number of neurons in each layer, and then use the homework of the third week of the first course of the first course of deep learning by Mr. Wu Enda to verify.

X,Y = load_planar_dataset()

#使用逻辑回归的分类结果

clf = sklearn.linear_model.LogisticRegressionCV()
clf.fit(X.T,Y.T.ravel())

plot_decision_boundary(lambda x:clf.predict(x),X,Y.reshape(X[0,:].shape))

 Continuing, a four-layer (layer_num) neural network is constructed here, and the number of neurons in each layer from the input layer to the rear is 6, 4, 4, 1 respectively:

layer_num = 4
layer_size = [X.shape[0],6,4,4,1]
activate_fun = [relu,relu,relu,sigmoid]
derivate_function = [derivate_relu,derivate_relu,derivate_relu,derivate_sigm]
W,b = param_init(layer_num,layer_size)
for i in range(20000):
    Z,A = forward_propagation(X,W,b,layer_num,layer_size,activate_fun)
    cost = compute_cost(A[layer_num-1],Y)
    dW,db = backward_propagation(layer_num,X,Y,W,Z,A,derivate_function)
    W,b = update_param(W,b,dW,db,learning_rate=0.07)
    if i%100==0:
        print("Cost after iteration %i: %f" % (i, cost))
# Print accuracy
predictions = predict(X,W,b,layer_num,layer_size,activate_fun)
print ('Accuracy: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
plot_decision_boundary(lambda x: predict(x.T,W,b,layer_num,layer_size,activate_fun), X, Y.reshape(X[0,:].shape))
plt.title("Decision Boundary for hidden layer size " + str(layer_num))

result:

The accuracy rate is 90%. Although it is not high, the model can be used, and then another two-category data set is used for further verification. The results are as follows:

# Datasets
noisy_circles, noisy_moons, blobs, gaussian_quantiles, no_structure = load_extra_datasets()

datasets = {"noisy_circles": noisy_circles,
            "noisy_moons": noisy_moons,
            "blobs": blobs,
            "gaussian_quantiles": gaussian_quantiles}

### START CODE HERE ### (choose your dataset)
dataset = "noisy_moons"
### END CODE HERE ###

X, Y = datasets[dataset]
X, Y = X.T, Y.reshape(1, Y.shape[0])
  • Using Logistic Regression Results

  •  Using the neural network results above:



 Because I'm lazy, I didn't further optimize the code, but for the first course of deep learning, it has a learning effect, or a word: just calm down and push slowly. can always come out

Attached code link: https://download.csdn.net/download/weixin_42149550/11634095

Guess you like

Origin blog.csdn.net/weixin_42149550/article/details/100113196