Depth study paper notes

The most famous thing I read is the 60th anniversary of Artificial Intelligence in Nature published a paper titled Learning Deep, this kind of paper read a hundred times and its meaning is wonderful effect, and would like to do notes.

 

 This shows the most important part of it.

(1) Panel a shows a multi-layer neural network function: input space integration, so that the data sample in the linear separability of different figures represent the red and blue. Upper multilayer neural network is a schematic view of the two input nodes, a hidden layer contains two hidden nodes and one output node of the final composition. Indeed nodes except the input node and output node are hidden nodes. The usual multilayer neural networks contain dozens or hundreds of these nodes.

(2) b is the basic FIG chain rule, the neural network will be used hereinafter, the principles set forth.

(3) FIG c learning neural networks before or during the test to the propagation process (input to output). There are many paths with the right value between every two nodes. For each node of each layer, the first computing node to obtain a layer of a weighted sum of its output (i.e., each upper node by pointing to their own weight to their path and, zj = Σ (wij * xi) , i∈H (n-1), where n is the current number of hidden layers, j is the current node). But not finished, and to obtain simply a weighted input value ZJ nature, we need to go through a nonlinear function to obtain an output current node yj. In recent years, commonly used functions are linear correction unit ReLU (f (zj) = max (0, zj)).

(4) FIG d feedback learning process of the neural network adjustment process (the output results and correct alignment and feedback paths to adjust each weight). First, the correct result is output and the way of comparison, the error of each node in the output layer may be represented by the derivative of the output of a cost function. For example, the cost function node I is 0.5 * (yI-tI) ^ 2, then it is an error yI-tI, where I is the output node yI, tI is the expected value. This error in the output layer as an "input", the output is the next layer, the output of the "Enter" derivative of the input coming from just one derivative (e.g., node I is, ∂E / ∂zI , in conjunction with FIG cd two nodes can read I). In the hidden layer, each layer similar to the process and forward the input, i.e., a layer of all nodes and their outputs them to the product of their values ​​and path weights. And when the output of the output layer just as a derivative of the input is the output of the node (e.g., node k ∂E / ∂zk). At the same time, it can be seen from FIG C, between two layers of a node, such as node has zjk = wjk * yj for j and k, then, via a feedback on error just been zk derivative ∂E / ∂zk later, can be obtained right single path error rate value wjk yj * ∂E / ∂zk. Thus you can adjust the value of the right to a "learning."

(About cost function: refer https://blog.csdn.net/weixin_42338058/article/details/83989571 )

(To be continued two days learning)

Guess you like

Origin www.cnblogs.com/lingchuL/p/11518281.html