Article for the principle of artificial neural networks

Brain neurons

The human brain may have more than 100 billion neurons, each neuron may connect with each other in several directions, such a large number of neurons and connections on the formation of a super large network. We exist because these networks and a wide range of thought and consciousness.

Here Insert Picture Description

I.e. brain neurons are brain cells, including the cell body, dendrites, axons, synapses and the like.

  • Cell body, from the nucleus, cytoplasm and cell membranes. It is the central metabolism of neurons, and a receiving information processing means.
  • Dendrites, cell body is a fibrous body extending outwardly dendritic, neuronal which input channels, receiving information from other neurons.
  • Axon, is the longest one of the thickest branch fibrous body extending into the cell, i.e. the nerve fibers, it is the neuron output channel. Myelinated axons unmyelinated fibers and structural fibers in two forms, both the transmission of information at different speeds. Many dendrimers fiber axon terminals extending outwardly called nerve endings, which is the output neuron information.
  • Contacting the synapses, the nerve endings of neurons and dendrites, or other nerve cell bodies is the synapse. Each neuron synaptic contact with other neurons, synaptic connections established between the cells, in order to achieve transmission of information, each neuron synapses of about 103 to 104.

Here Insert Picture Description

Simulation of the brain

Neural networks is that people trying to analog form the inner workings of the brain in the computer industry, the origin of this model is quite early, dating back to the mid-1940s, when the computer is actually just appear shortly.

In 1943, McCulloch and Pitts published a paper "logical calculus inherent concept of neural activity," which first proposed a mathematical way to represent the human brain learning function.

Here Insert Picture Description

The above figure similar axonal x, the output of other neurons, dendrites via synaptic connections, after the input to the dendritic cell body, the cell body after the activation of the final output function performs a certain operation after axons.

This simple model is the base model of machine learning in neural networks, without let us exclaim our brains so simple, but that is not the case, the understanding of the human brain is too small, it can be said almost nothing yet substantial progress. The use of this model to simulate the brain, the brain is actually more than a big change too much, if human beings are created by God, and that God certainly will not let you discern how he made.

Although this stage neurons in the brain have been mathematical modeling, and we did not know this model is correct, but there was no clear method for adjusting the weight parameters.

Perceptron Model

Into the 1950s, one of the most simple artificial neuron model is brought up. Perceptron? One feels like there is a real thing, like a computer, it should be visible and tangible machine it! Indeed, in the 1960s the first hardware implementation occurs, then call directly to the entire hardware-aware machine, but later the name was changed to Perceptron algorithm, so it is actually an algorithm.

Here Insert Picture Description

Thought perceptron predecessors as the basis, based on this learning mechanism is proposed feedback loop to re-adjust the weight by calculating the error between the output of the sample with the correct results.

Process substantially as follows:

  • Heavy weights initialization parameters using a random number.
  • An input vector incoming network.
  • Computing network in accordance with the specified weight value input vector and weights the output y ', the perceived machine functions as follows:

Here Insert Picture Description

  • If y '≠ y, then adjusted through all of the connection weights Δw = yxi incremental weight wi.
  • Return to Step 2.

Introducing gradient descent

Unlike perceptron learning mechanisms, ADALINE is to train the neural network model of another algorithm, because it introduces a gradient descent, so it can be more advanced than Perceptron.

Here Insert Picture Description

Probably process:

  • Re-initialized by random numbers right.
  • An input vector incoming network.
  • The specified output of the input vector and the weights of the neural network computing y '.
  • Last output value summation using the formula,

Here Insert Picture Description

  • Calculating an error, the output value of the model is compared with the correct label O,

Here Insert Picture Description

  • Recursively using the following gradient descent to adjust the weights,

Here Insert Picture Description

  • Return to step 2.

The limitations of earlier models

We can see and perceive machine ADALINE have the basic elements of the neural network model, both of which are single-layer neural network, mainly used for binary classification problem, by learning to achieve binary function.

For early neurological model, it is actually a very big limitations, even in a sense, it is that are not useful. Minsky and Papert published in 1969 the "perception machine", which set forth the Perceptron can only deal with linearly separable, for other complex issues completely powerless.

Here Insert Picture Description

For example, XOR function, there is no straight line they are properly divided, the perceptron is the existence of this awkward situation, the situation that linearly inseparable, the Perceptron could not properly separate categories. At this point, the neural network into the winter.

Multilayer Perceptron

Since a single neuron perceptron can not solve nonlinear problems, it is not possible to promote more neurons plurality of neural network layer? Thus the plurality of groups of neurons connected together, the output of a neuron may be input to other neurons.

Spread multi-layer network, data is input after a first layer, each neuron will flow into the corresponding neuron in the next layer. In the hidden layer is then summed and transmitted, and finally to the output layer is processed. The learning multi-layer network is required to support the back-propagation algorithm, multi-layer network increases the complexity of the study, formed from input to output last very long nested function, which increases the difficulty of learning. But fortunately there is help the chain rule when seeking to make things a lot simpler.

Here Insert Picture Description

Probably process:

  • Calculated from the input signal to the feedforward output.
  • The predicted value and the target value is calculated output error E.
  • Gradient by previous layer weights and the activation function of their associated weighted error signal back propagation.
  • Gradient calculating parameter based on the feedforward signal and the input signal error backpropagation of

Here Insert Picture Description

  • Using the calculated gradient parameter update, the formula is

Here Insert Picture Description

Here Insert Picture Description

question type

The neural network can be used for regression and classification problems. Common structural difference is that the output layer, if you want to get a real result, we should not use standardized functions, such as the sigmoid function. Because standardized function causes the output of our limited within a certain range, and sometimes we really want is a continuous numerical results.

  • Regression / function approximation problems, such problems may be the least-squares error function, the output layer uses a linear activation function, the hidden layer activation function with S-shape.
  • Binary classification, cross entropy cost function is generally used, hidden layer and output layer are used to activate the S-shaped function.
  • Multiple classification, cross entropy cost function is generally used, the output layer using the softmax function, the hidden layer using sigmoid activation function.

Depth neural network

The second decade of the 21st century, the depth of learning to become the most shining of artificial intelligence research. 2011 Google X lab youtube extracted from 10 million pictures, fed it uses the Google brain depth study, three days after the brain without human help he had found the cat. 2012 Microsoft use deep learning, complete real-time speaker speech recognition, translation, simultaneous translation is complete.

Although deep learning in the 1980s there have been, but was limited by the hardware capabilities and the lack of data resources, and does not reflect the effect. Only Hinton took the students to adhere to this popular work in the field until 2009, Hinton did they get an unexpected success, they will be learning in depth of field of speech recognition, breaking the world record, the error rate is 25% less than before. Depth study began to fire up.

The reason why there is so much depth study of performance, because it is similar to the human brain's deep neural network, which better simulate the human brain work.

Convolution neural network

Development convolution neural network mainly to solve the problem of human vision, but now the other direction will also be used. The main course of development from Lenet5-> Alexnet-> VGG-> GooLenet-> ResNet and so on.

1980s invented the world convolution layer, but due to hardware limitations can not build a complex network until later in the 1990s began to have practice.

In 1998 LeCun proposed convolution layer, and the layer is fully connected pooling layer combination, in order to solve the identification problem handwritten numbers. In this case the effect has been very good, can compare with other classical machine learning models. Architecture is as follows, a 32 x input 32, characterized by convolution extracted, and then downsampled again convolution and down-sampling, and a full connection is connected behind Gauss. That Lenet5.

Later, with the available structured data and processing power of exponential growth, so that the model can be further enhanced, in particular the emergence Imagenet open source datasets, millions of images are marked classified.

2012 LSVRC Challenge, Hinton and his student Alex Krizhevsky developed Alexnet convolution depth network structure similar Lenet5, but convolution deeper layer, the total number of parameters tens of millions. Each network layers and multiple convolution depth of up to several hundred. That Alexnet.

2014 LSVRC Challenge formidable competitor, by Oxford University visual geometric organization VGG proposed model. Compared Alexnet, it is mainly to reduce the convolution kernel, all use 3x3. Same general structure, but the configuration may be different convolution. Activation function using ReLU, pooled use max pooling, use softmax final output of probabilities.

2014, GoogLenet network model to win LSVRC Challenge, the series debut by the large companies and success, has since been run by large companies with huge budgets to win the game. GoogLenet mainly composed of nine modules Inception combination. GoogLenet number of parameters dropped more than ten million, and accuracy higher than Alexnet, the error down to 6.7% from 16.4%.

With the 2015 "Rethinking the architecture of computer vision Inception" article was published, google researchers publish new architecture of Inception, mainly to solve the problem of covariance shift, which will standardize applied to the original input and each output value. In addition to the convolution kernel size also varies, and increases the overall depth of the decomposition of the network and convolution.

2015 ResNet been proposed, from Microsoft Research Dr. He Kaiming asked, now a research scientist for the Facebook AI. ResNet brilliant record, when it made five first.

Recurrent Neural Networks

Recurrent neural network is the recurrent neural network, which has been proposed mainly to deal with sequence data, sequence data is what? Is input and input behind the front there is a link, such as word sentence, before and after are related, "I'm hungry, ready to go xx", based on previous input judgment "xx" is very likely. " eat". This is the sequence data.

Recurrent neural network has many variants, such as LSTM, GRU and so on.

For conventional neural network, and then from the input layer to the hidden layer and output layer, a plurality of, between the layers are fully connected, and a node between the inner layer is not connected. This network model for predicting the sequence data on the basic powerless.

Recurrent Neural Networks is good at data processing sequence, it would be in front of the memory and the information involved in the calculation of the current output, in theory, the neural network can process cycle sequence data of arbitrary length.

Here Insert Picture Description
Here Insert Picture Description

For example, character-level prediction can be done, as shown below, if there are only four characters, samples "hello" word, the next character input as the prediction h e, e and then output l, output l l, l is the last input output o.

Here Insert Picture Description

------------ ------------- Recommended Reading

I'm open source project summary (machine & deep learning, NLP, network IO, AIML, mysql agreement, chatbot)

Why write "Tomcat core design analysis"

2018 summary data structure algorithms papers

2018 Summary machine learning articles

2018 Summary Java depth articles

2018 Summary Natural Language Processing articles

2018 Summary depth learning articles

2018 Summary JDK source article

2018 Summary of concurrent Java core papers

2018 Summary reading articles


Exchange with me, ask me questions:

Here Insert Picture Description

Welcome concern: artificial intelligence, reading and feelings and talk about mathematics, distributed, machine learning, deep learning, natural language processing, algorithms and data structures, Java depth, Tomcat kernel and other related articles

Here Insert Picture Description

Guess you like

Origin juejin.im/post/5d10244b6fb9a07eb67d9a7a