Master deep learning in one article (6)-thoroughly understand forward/back propagation

In the last chapter, we learned about derivatives and the chain rule. I think everyone should have learned it. This chapter starts to learn about forward propagation and back propagation in deep learning.

I explained Logistic regression before, this chapter uses Logistic regression as an example to explain forward propagation and back propagation.

Let us recall the logistic regression formula we learned earlier:

Forward Propagation

The understanding of Forward Propagation is relatively simple. The process of obtaining the output from the input through the hidden layer calculation is the forward process, also known as the forward propagation.

We assume that the sample has two eigenvalues, respectively x1,x2, then the calculation method of forward propagation is as follows:

The so-called forward propagation, that is, starting from the input, calculating according to the direction of the arrow, and finally getting the loss function value, this is a forward propagation.

Backward Propagation

Backward Propagation is the opposite of the calculation direction of forward propagation. It calculates the gradient (partial derivative) of each layer of parameters through the loss function and the reverse flow of the network to update the parameters, as shown in the following figure:

The ultimate goal of backpropagation is to minimize the loss function value by updating the parameters. The specific method is to calculate the gradient of the parameters step by step according to the orange arrows, and then update the parameters.

Since backpropagation is very important, I will derive it step by step below. I hope everyone can gain something after reading this article.

1. Starting from the loss function L\left (a, y \right ), the calculated agradient:

\frac{\partial L}{\partial a} = - \ frac {y} {a} + \ frac {1-y} {1-a}

2. withThe gradient calculated next :

\frac{\partial L}{\partial z}=\frac{\partial L}{\partial a}\times \frac{\partial a}{\partial z}

Which \frac{\partial L}{\partial a}we have calculated in the previous step, we simply calculate \frac{\partial a}{\partial z}can.

Where a=\sigma \left ( z \right )= \frac{1}{1+e^{-z}}, then \frac{\partial a}{\partial z}=a\left ( 1-a \right )

Then \frac{\partial L}{\partial z}=\frac{\partial L}{\partial a}\times \frac{\partial a}{\partial z} = \left ( -\frac{y}{a}+\frac{1-y}{1-a} \right )\times a\left ( 1-a \right )=a-y 

3. Then the calculated w1,w2,b gradient:

\frac{\partial L}{\partial w1}=\frac{\partial L}{\partial a}\times \frac{\partial a}{\partial z}\times \frac{\partial z}{\partial w1}

Where \frac{\partial L}{\partial a}and \frac{\partial a}{\partial z}we already seeking over, so if you ask them \frac{\partial z}{\partial w1}to,

\frac{\partial z}{\partial w1}=x1

then\frac{\partial L}{\partial w1}=\frac{\partial L}{\partial a}\times \frac{\partial a}{\partial z}\times \frac{\partial z}{\partial w1}=\left ( -\frac{y}{a}+\frac{1-y}{1-a} \right )\times a\left ( 1-a \right )\times x1=\left ( a-y \right )x1

The same\frac{\partial L}{\partial w2}=\frac{\partial L}{\partial a}\times \frac{\partial a}{\partial z}\times \frac{\partial z}{\partial w2}=\left ( a-y \right )x2

The same\frac{\partial L}{\partial b}=\frac{\partial L}{\partial a}\times \frac{\partial a}{\partial z}\times \frac{\partial z}{\partial b}= a-y

In this way, the gradient of the parameter is updated, and only the corresponding parameter needs to be updated:

w1=w1-\alpha \times\frac{\partial L}{\partial w1}

w2=w2-\alpha \times\frac{\partial L}{\partial w2}

b=b-\alpha \times\frac{\partial L}{\partial b}

Among them \alphais the learning rate, which is a hyperparameter, which means that manual debugging is required. It has been discussed in the previous article, so I won't repeat it here.

I think I've covered it in enough detail. If you still don't understand, please comment on your questions and I will answer them one by one.

If you have learned something from reading this article, please move your cute little hand to pay attention.

The above is the entire content of this article. To get the deep learning materials and courses, scan the official account below and reply to the word "data" to get it. I wish you a happy learning.

Guess you like

Origin blog.csdn.net/qq_38230338/article/details/107606628