Propagation neural network algorithms and deficit

1 Artificial Neural Networks

1.1 Neurons

    Neural network consists of a large number of neurons are mutually connected together. After receiving an input neuron for each linear combination, beginning just simple linear weighting, then added to each of the neuron activation function nonlinearity, thereby performing the nonlinear transformation output. Representing a weighted value is connected between each two neurons, called weights (weight). Different weights and activation functions, will lead to different neural network output.

    For example handwriting recognition, given an unknown number, so that the neural network to identify what the number is. At this time, the input of the neural network consists of a set of pixels of the input image input neurons activated defined. After nonlinear converted by a nonlinear activation function, neurons are activated then transmitted to other neurons. This process is repeated until the last output neuron is activated. To identify what the current digital word.

    Each neuron neural network as follows

    The basic form wx + b, wherein

  • , Represents the input vector

  • , For the weight, it means that there are several input several weights that each input is assigned a weight

  • b bias bias

  • g (z) is the activation function

  • a output

    If just above this one that did not come into contact with the previously estimated likelihood and must be confused. In fact, the above simple model can be traced back Perceptron 1950s / 60s, Perceptron can be understood as a model and make decisions based on different factors, and the degree of importance of each factor.

    For example, this weekend, Beijing has a strawberry festival, then go to it? You have to decide whether to go to two factors, these two factors may correspond to the two inputs, respectively x1, x2 represents. In addition, the degree of influence of these two factors to make decisions is not the same, their degree of influence by the weights w1, w2 represent. In general, the music festival concert guests will very much affect you going, sing good premise even if not accompanied can tolerate it, but if you do not sing well you might as well sing it on stage. So, we can be expressed as follows:

  • : Is there a favorite guest singing.  = 1 you like these gentlemen,  = 0 you do not like the guests. Guests heavy weight factor  = 7

  • : Does anyone accompany you to go with.  = 1 go with someone to accompany you,  = 0 did not stay with you. If anyone, accompanied by weight  = 3.

    In this way, our decision model will be built up: G (z) = G (  *  + *  + b), G represents the activation function, b here can be a better understanding and make adjustments to achieve the target of bias term. 

    For simplicity the beginning, people activation function is defined as a linear function, i.e. to make a linear change in the results, such as a simple linear transformation is a linear activation function g (z) = z, the output is input. Later we found practical applications, linear activation function too restrictive, so it introduces a non-linear activation function.

1.2 activation function

    It has used nonlinear activation function sigmoid, tanh, relu the like, the former two sigmoid / tanh more common connection in the whole layer which is common in RELU convolution layer. Under this section introduces the most basic sigmoid function (btw, SVM has the beginning of the article mentioned in this blog).

    sigmoid function of the following expression

 

    Wherein z is a linear combination of, for example z may be equal to: B +  *  +  * . Found great positive or negative by substituting small to g (z) function, the result tends to 0 or 1.

    Thus, sigmoid function g (z) as a graphical representation (the horizontal axis represents the domain of z, and the vertical axis represents the range g (z)):

    That is, the function of the sigmoid function is equivalent to a real number between 0 and 1 to the compression. When z is a very large positive number, g (z) will be close to 1, and z is a very small negative, then g (z) will approach zero .

    Compressed to 0-1 What use is it? Thus it is useful to activate the function can be viewed as a "category of probability", such as the activation function output is 0.9, then it can be interpreted as a 90% probability for the positive samples.

    For example, as shown below (quoted from Stanford FIG learning machine disclosed Division)

Logic and

 

 

    = b + Z  *  +  * , where b is a bias term is assumed to take -30, , are taken to be 20

  • If  = 0  = 0, z = -30, g (z) = 1 / (1 + e ^ -z) approach zero. In addition, the pattern can be seen from the figure sigmoid function, when z = -30 when the value of g (z) is close to 0 

  • If  = 0  = 1, or  = 1  = 0, + B = Z *  +  *  = -30 + -10 = 20 is, likewise, the value of g (z) is close to 0   

  • If  = 1  = 1, B = + Z *  +  *  = * -30 + 20 is 1 + 1 = 10 * 20 is, at this time, g (z) approaches 1.  

    In other words, only , and when they are taken 1, g (z) → 1, a positive sample is determined; or taking the time 0, g (z) → 0, it is determined negative samples, thus achieve the purpose of classification.

1.3 Neural Networks

    Under such a single neuron of FIG.

    Grouped together, they form a neural network. The figure below is a three-layer neural network architecture

    The leftmost figure above the original input information is called the input layer, referred to the rightmost output layer neurons (figure above is only one output layer neuron), called intermediate hidden layer.

    Shajiao input layer and output layer, a hidden layer of it?

  • 输入层(Input layer),众多神经元(Neuron)接受大量非线形输入讯息。输入的讯息称为输入向量。

  • 输出层(Output layer),讯息在神经元链接中传输、分析、权衡,形成输出结果。输出的讯息称为输出向量。

  • 隐藏层(Hidden layer),简称“隐层”,是输入层和输出层之间众多神经元和链接组成的各个层面。如果有多个隐藏层,则意味着多个激活函数。

    同时,每一层都可能由单个或多个神经元组成,每一层的输出将会作为下一层的输入数据。比如下图中间隐藏层来说,隐藏层的3个神经元a1、a2、a3皆各自接受来自多个不同权重的输入(因为有x1、x2、x3这三个输入,所以a1 a2 a3都会接受x1 x2 x3各自分别赋予的权重,即几个输入则几个权重),接着,a1、a2、a3又在自身各自不同权重的影响下 成为的输出层的输入,最终由输出层输出最终结果。

    上图(图引自Stanford机器学习公开课)中

  • 表示第j层第i个单元的激活函数/神经元
  • 表示从第j层映射到第j+1层的控制函数的权重矩阵 

    此外,输入层和隐藏层都存在一个偏置(bias unit),所以上图中也增加了偏置项:x0、a0。针对上图,有如下公式

    此外,上文中讲的都是一层隐藏层,但实际中也有多层隐藏层的,即输入层和输出层中间夹着数层隐藏层,层和层之间是全连接的结构,同一层的神经元之间没有连接。


2 误差逆传播算法(BP)

由上面可以得知:神经网络的学习主要蕴含在权重和阈值中,多层网络使用上面简单感知机的权重调整规则显然不够用了,BP神经网络算法即误差逆传播算法(error BackPropagation)正是为学习多层前馈神经网络而设计,BP神经网络算法是迄今为止最成功的的神经网络学习算法。

一般而言,只需包含一个足够多神经元的隐层,就能以任意精度逼近任意复杂度的连续函数[Hornik et al.,1989],故下面以训练单隐层的前馈神经网络为例,介绍BP神经网络的算法思想。

Write pictures described here

Feedforward neural network topology graph on a previous single hidden layer, BP neural network algorithms using a gradient descent method (gradient descent), the direction of the negative gradient of the mean square error of a single sample weights are adjusted. It can be seen: BP Firstly, the error back propagation to the hidden neurons, adjusted to the weight of the hidden layer is connected to the output layer neuron output layer threshold; The hidden neurons then the mean square error, to adjust the input layer to the hidden layer connection weights and the neuron hidden layer threshold. BP principle with the basic derivation process is Perceptron algorithms are the same, given the right to adjust the hidden layer to the output layer weight adjustment rules following derivation:

 

Reference from: http: //blog.csdn.net/zouxy09/article/details/8781543

http://m.blog.csdn.net/v_JULY_v/article/details/51812459

Guess you like

Origin www.cnblogs.com/cmybky/p/11772854.html