[Deep Learning] 4-1 Error Backpropagation Method - Computational Graph & Chain Rule & Backpropagation

The learning of the neural network in the previous chapter is the gradient of the weight parameters of the neural network calculated by numerical differentiation. Although numerical differentiation is simple and easy to implement, its disadvantage is that it takes more time to calculate. In this chapter, we will learn a method that can efficiently calculate the gradient of weight parameters—the error backpropagation method .
The error backpropagation method can efficiently calculate the gradient of the weight parameter

To correctly understand the error backpropagation method, there are generally two methods: one is based on mathematical formulas, and the other is based on calculation graphs

Computation graph

Computational graphs graphically represent the computational process . The graph mentioned here is a data structure graph, which is represented by multiple nodes and edges (the straight lines connecting nodes are called "edges").

Let's look at a simple problem solved graphically with calculations,

Question 1: You bought two apples worth 100 yuan each in the supermarket, and the consumption tax is 10%, calculate the payment amount.
insert image description here

Question 2: You bought 2 apples and 3 oranges in the supermarket. Among them, apples are 100 yuan each and oranges are 150 yuan each. Consumption tax is 10%, please calculate the payment amount.
The calculation graph is as follows:
insert image description here

To sum up, in the case of using calculation diagrams to solve problems, the following process needs to be carried out:
1. Construct calculation diagrams.
2. On the calculation graph, calculate from left to right

"Computing from left to right" is a kind of propagation in the forward direction, referred to as forward propagation (forward propagation), if it is propagated from right to left on the calculation graph, it is called backward propagation (backward propagation). Backpropagation will play an important role in the next derivative calculation.

Partial Computation
The characteristic of the computation graph is that the final result can be obtained by passing "local computation".
Local calculation means that no matter what happens globally, the next result can be output only based on the information related to itself.

Look at the following example:
For example, I bought 2 apples and many other things in the supermarket.
insert image description here

The important point here is that the calculations at each node are local calculations. This means that the sum operation (4000 + 200 = 4200) for eg apples and many other things doesn't care how the number 4000 was calculated, just adds the two numbers together.
That is to say, each node only needs to perform calculations related to itself (in this example, to add two input numbers), without considering the overall situation.

Why use calculation diagrams to illustrate questions?
What are the advantages of calculation diagrams? One advantage resides in local computation. No matter how complex the global calculation is, local calculations can be used to make each node dedicated to simple calculations , thereby simplifying the problem.
Another advantage is that the calculation graph can save all the intermediate calculation results .

In fact, the biggest reason for using a computational graph is that derivatives can be computed efficiently through backpropagation .
insert image description here
As shown in the figure above, backpropagation is represented by an arrow (thick line) opposite to the forward direction. Backpropagation passes the "local derivative", writing the value of the derivative below the arrow. In this example, backpropagation passes the value of the derivative from right to left (1 → 1.1 → 2.2). From this result, the value of the "derivative of the amount paid with respect to the price of apples" is 2.2. This means that if the price of apples increases by $1, the final payment amount will increase by $2.2 (strictly speaking, if the price of apples increases by some tiny value, the final payment amount will increase by 2.2 times that small value)

chain rule

The principle of transferring local derivatives is based on the chain rule. This section introduces the chain rule and illustrates how it corresponds to backpropagation on computational graphs.

Backpropagation of Computational Graphs

As mentioned earlier, the value of "the derivative of the payment amount with respect to the price of apples" can be obtained through the backpropagation of the calculation graph. Let's take a look at the results first, as shown in the figure below, you can calculate the derivative through the backpropagation of the calculation graph

Without further ado, let's look at an example of backpropagation using a computational graph. Assuming that there is a calculation of y=f(x), the backpropagation of this calculation is shown in the figure
insert image description here

As shown in the figure, the calculation order of backpropagation is to multiply the signal E by the local derivative of the node , and then pass the result to the next node. The local derivative mentioned here refers to the derivative of y=f(x) in the forward propagation . This local derivative is multiplied by the value passed upstream (E in this case) and passed to the previous node.
(How to find the derivative mathematical method of a function)
This is the calculation order of backpropagation. So how is this achieved? It can be explained from the principle of the chain rule. Here is the chain rule.

What is the chain rule
When introducing the chain rule, we must first start with the compound function . A composite function is a function composed of multiple functions . For example, z = (x+y)² is formed by the following two formulas:
z = t²
t = x + y

The chain rule is about the property of the derivative of the composite function, which is defined as follows:
If a certain function is represented by a composite function, then the derivative of the composite function can be expressed by the product of the derivatives of the functions of the composite function.

Taking the above function as an example, the derivative of z with respect to t can be expressed by the product of the derivative of z with respect to t and the derivative of t with respect to x , as follows:
insert image description here

where the derivatives can "cancel each other out" as follows,

insert image description here

The Chain Rule and Computational Graphs
Now try to represent the chain rule in terms of computational graphs.
insert image description here

The calculation sequence of backpropagation is to multiply the input signal of the node by the local derivative (partial derivative) of the node, and then pass it to the next node.

backpropagation

Backpropagation for Adder Nodes
Let's first consider backpropagation for adder nodes. Here we take z=x+y as the object and observe its backpropagation. The derivative of z=x+y can be calculated (analytically) by the following formula.
insert image description here

Expressed in a computational graph, as shown in the figure:
insert image description here

The backpropagation of the addition node outputs the upstream value intact to the downstream

Backpropagation for the Multiply Node
Next, let's look at backpropagation for the Multiply Node. Consider z=xy here. The derivative of this formula is expressed by the following formula:
insert image description here

Multiplicative backpropagation multiplies the upstream value by the " flip value " of the input signal during forward propagation and passes it downstream. The flip value represents a flip relationship , as shown in the figure, if the signal is x during forward propagation, it is y during back propagation; if the signal is y during forward propagation, it is x during back propagation
insert image description here

The backpropagation of addition just transmits the upstream value to the downstream input signal that does not need forward propagation, but the backpropagation of multiplication requires the input signal value during forward propagation. Therefore, when implementing the backpropagation of the multiplication node, the input signal of the forward propagation should be saved

The Apple Example
Consider again the example of buying apples (2 apples and the excise tax) at the beginning of this chapter. The problem to be solved here is how each of the three variables, the price of apples, the number of apples, and the consumption tax, affects the final amount paid. This question is equivalent to finding "the derivative of the payment amount with respect to the price of apples", "the derivative of the payment amount with respect to the number of apples" and "the derivative of the payment amount with respect to the consumption tax". If the backpropagation of the calculation graph is used to solve it, the solution process is shown in the figure below.
insert image description here
As mentioned earlier, the backpropagation of the multiply node will invert the input signal and pass it downstream. From the results in the figure, we can see that the derivative of the price of apples is 2.2, the derivative of the number of apples is 110, and the derivative of consumption tax is 200. This can be interpreted as if the excise tax and the price of apples increase by the same value, then the excise tax will have an effect of 200 times the size on the final price and the price of apples will have an effect of 2.2 times the size. However, in this example, the consumption tax and the price of apples have different dimensions, so this result is formed (1 of the consumption tax is 100%, and 1 of the price of apples is 1 yuan).

おすすめ

転載: blog.csdn.net/loyd3/article/details/130859627