Right neural network value, in the end how to adjust it?

Excerpt from Zhang Yuhong "depth study of the United States", the book is really good, oh!

We all know the nature of the neural network learning, in fact, it is to use the loss function to adjust the network weights. The "slim" in English is "weight loss ,,, So you see, I used to lose weight cases in terms of their own loss function, is not very occasional ah?

Perhaps you will say, even if occasional, that weights of the neural network, in the end how to adjust it?

In general, there are two broad approaches relatively easy to use.

The first method for adjusting network parameters from front to rear, the second method of the opposite, front to back adjustment parameter.

The first typical method is the "error back propagation", represents the second type method is popular "deep learning."

For the first type method, is simply the first randomly setting the initial value, calculates the current output of the network, and then a difference between the network output and the expected output, using the iterative algorithm, each layer are changed to the opposite direction of the front parameters until the network convergence and stability.

This example says it is very abstract, we use the example of perceptions about weight loss. For example, two major factors that affect weight loss is "movement" and "diet", but their weights in the weight loss journey is not clear understanding. If my weight loss goal of 150 pounds, the actual value given on the scales 180 pounds and 30 pounds according to this fall, I turn to adjust the weights of "movement" and "diet" in the weight loss process (is more than movement, or is it to eat low-calorie foods).

The most famous words of back-propagation algorithm, BP algorithm than famous. It is composed of Geoffrey Hinton (Geoffrey Hinton) and David Rum Hart (David Rumelhart), who in 1986 put forward his thesis "With study to characterize the back-propagation algorithm (Leaming Representations by Back -propagating errors) "published in leading academic journals Nature (Nature). The paper first systematic and concise exposition of the application on the back-propagation algorithm neural network model.

BP algorithm is very classic, in many fields have a classic application. In the past, it never lost the popular level now deep learning. But later, it was found that the practical application of it, BP algorithm still some problems. For example, many layers in a network, when it is counter-propagating to the forefront of the residual layer (i.e., the input layer), its effects have become very small, even diffusion gradients, resulting in loss of parameter adjustment directionality. BP neural network eventually lead to a very limited number of layers, usually no more than 7 layers.

In fact, this is easy to understand. Because there is a lack of information in layers statement in the "information theory", the information that is processed when layer by layer, the amount of information is declining. For example, information obtained by process A B, then the amount of information carried by B to A must be less than. Beyond this statement to explore the deep-seated, and that is the concept of entropy. The reader is an affect my view of the world of work 11 " 'entropy: a new view of the world."

According to the second law of thermodynamics we know that, although energy can be transformed, but not 100 percent utilization. In the conversion process, there must be a part of the energy will be wasted. This part of the energy is invalid "entropy." The concept of "entropy" Migrating to information theory, it is said that "the degree of disorder" information.

When a form of "ordering (i.e., information)" is converted to another form of "ordered", inevitably accompanied by some degree of "disordered (i.e. entropy)." According to this theory, when a large number of layers of the neural network (such as greater than 7 layers), back propagation "error information" will slowly algorithm "drained", and gradually all become disordered "entropy", Nature it will not be able to adjust the parameters to guide the neural network.

Still later, the neural network method for adjusting parameters of the second category arises, it is the current mainstream method, which is commonly used in deep learning "step by step initialization" training mechanism, unlike the back-propagation algorithm in the "back-to-front" of parameters training methods, depth of learning has taken a "front to back" layer by layer training methods (later chapters will explain in detail, they will not expand here).

Guess you like

Origin blog.csdn.net/YPP0229/article/details/94546802