[ZJU-Machine Learning] Backpropagation algorithm

Simple model created

Insert image description here

Insert image description here

gradient descent method

Insert image description here

Expand its Taylor, you can see that f(wk+1)>f(wk)
Insert image description here

BP algorithm based on simple model

Define the objective function:
Insert image description here
algorithm flow:
Insert image description here
Insert image description here
Insert image description here

activation function

step function

Insert image description here

sigmoid

Transform the step function
Insert image description here

fishy

Insert image description here
Insert image description here

Among the above two functions, when x is very large, the function value does not change much and there is an upper bound. That is, when the calculated x is very large, the information transmitted backward is compressed, which is the so-called gradient disappearance. . In addition, gradient explosion may occur.

resume

Insert image description here

For this function, when x<0, make its function value 0, that is, if the neuron calculates that activity) to reduce the size of training. Only neurons with x>0 are retained without setting an upper bound, which avoids gradient disappearance and gradient explosion and prevents overfitting.

Air Leak

Insert image description here
It does not completely kill those neurons with x<0, it just reduces the activity of the neurons.

General BP algorithm

(When looking for partial derivatives, look for them from the back to the front)

Insert image description here

Insert image description here
Insert image description here

Insert image description here
Insert image description here
Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/qq_45654306/article/details/113158597