Andrew Ng machine learning (eight) - neural network model

First, assuming linear (neural network)

The following examples aim to illustrate neural network algorithm to solve these problems are dependent on the study of complex nonlinear classifier.
Here Insert Picture Description

Consider the problem of supervised learning classification, we have the corresponding training set, if the use of logistic regression algorithm to solve this problem. First need to construct a logistic regression function contains many nonlinear term.

In fact, when the number of polynomial terms enough, then you may be able to get a separate dividing line between positive and negative samples when only two such x1, x2 this method can really get a good result, because you can the combination of x1 and x2 are all included in the polynomial, but for many sophisticated machine learning problems, often involving more than two terms.

House prices predicted problem: Suppose now be processed is the probability that a house in the next six months, can be sold, which is a classification problem. For different houses there are hundreds of possible features, for such problems, if the quadratic term to include all, in the case of n = 100, ultimately also the quadratic 5000, with the number of features n increase. The number of secondary items is about to increase the order n ^ 2, so you want to include all the quadratic term is very difficult, so this may not be a good idea.
Here Insert Picture Description
And because too many entries, the final result is likely to be too fit. Furthermore, when dealing with so many, there is also the problem of excessive operation. Of course, you can also try to include only a subset of these top quadratic term, but due to ignoring the many related items, in dealing with similar upper-left corner of the data, it is impossible to obtain the desired results. 5000 quadratic term seems to have a lot, but now assuming that included three items or third-order terms, about 17,000 cubic term, this is not a good practice.

For example: on a problem in computer vision. Suppose you want to use a machine learning algorithm to train a classifier, it detects an image to determine whether the image is a car, we removed a small part of this picture, it is amplified. For example, part of the figure in the red box, when the human eye to see the car, the computer actually see is that a data matrix, which represents the pixel intensity values, let us know brightness value of each pixel in the image . Therefore, for computer vision is the question becomes: to tell us these values ​​represent a car door handle according to the brightness of the pixel matrix.
Here Insert Picture Description

Specifically, when the learning algorithm constructs a vehicle identifier machines, we want a sample set with a label. Some of these samples are all kinds of cars, the other portion of the sample is anything else, this sample set the input to the learning algorithm to train a classifier. After the training is complete, we enter a new picture, so that the classifier determines that "what is this thing?" Ideally, the classifier can recognize that this is a car:
Here Insert Picture Description

In order to understand the necessity of introducing non-linear classifiers, we pick out some of the pictures and some non-automotive car pictures from the training sample learning algorithm, we pick out a set of pixels pixel1 and pixel2 from which each piece of the picture.

Draw more new samples in the coordinate system, using the '' + "indicates the car image by" - "indicates a non-car images, we now need a non-linear classifier to try to separate these two types of samples.

The dimension of this classification in the feature space is how much? We assume that 50 to 50 pixels, a total of 2500 pixels. Thus, we have the number of elements of a feature vector n = 2500, the feature vector X comprises luminance values of all pixels. If we use an RGB color image, each pixel comprises red, green, and blue sub-pixels, then the number of elements of our feature vectors becomes n = 7500. So, if we have to contain all through quadratic nonlinear to solve this problem, then this is the formula in all conditions. XI XJ beginning of 2500 pixels along with a total of about 3 million. This computational cost is too high is not a good way to solve complex nonlinear problems.

Second, neurons and brain

The reason neural networks is that people want to try to generate an algorithm designed to mimic the brain. In a sense, if we want to build learning system, why not learn to imitate the most amazing machine we know - the human brain do?

Neural Networks gradual rise in the 1980s and 1990s, is used extensively. However, due to various reasons, the application in the late 1990s reduced. But recently, the neural network has a comeback. One reason: the neural network is a computational algorithm is somewhat too large. However, in recent years, probably due to the running speed of the computer faster, just enough to actually run from the large-scale neural networks. It is for this reason, and others we will discuss later to the technical factors. Today's neural network for many applications is the most advanced technology.

When you want to simulate the brain, it refers to want to create the same effect as the role of human brain machine, right?

The brain can learn to go to see and not what manner of image, learn to deal with our sense of touch.

We can learn math, learn to do calculus.

Brain can handle a variety of amazing things.

If you want to imitate it seems, you have to write a lot of different software to simulate all, the brain tells us these sorts of wonderful things. However, the brain can not be assumed that all these methods do different things. To realize the need to use thousands of different programs. Instead, the brain processes the method requires only a single learning algorithm on it? Although this is only a hypothesis.
Here Insert Picture Description

But let me share with you some of this evidence: This part of the brain, this area is a small piece of red your auditory cortex. You now understand my words, this is by ear. Ear receives sound signals and transmits sound signals to your auditory cortex is why you can understand my words.

Neuroscientists do the following interesting experiment:

① ear to auditory cortex nerve is cut, in this case, to a re-animal brain, so that the signal from the eye to the optic nerve will eventually spread to the auditory cortex. If this is done, the results showed that the auditory cortex will learn to "see." Here to "see" each represents the meaning of what we know, so, if you do this to an animal, then the animal can complete visual discrimination task, they can see the image, and make appropriate decisions based on the image, it is through them brain tissue of the partially completed.

② the right Another example: red piece of brain tissue is your somatosensory cortex, which is used to process your sense of touch, and if you just do a similar experiment reconnection. Then the somatosensory cortex can learn to "see" this experiment and other similar experiments, known as nerve reconnection experiment.

In this sense, if the human body has the same piece of brain tissue can handle light, sound or tactile signal, then perhaps there is a learning algorithm can handle visual, auditory and tactile, rather than the need to run thousands of different programs, or thousands of different algorithms to do these brain done by thousands of beautiful things, maybe we need to do is to find some approximate or actual brain learning algorithm, and then implement it.

Brain through self-study to learn how to handle these different types of data, to a great extent, if we can guess almost any kind of sensor access to almost any part of the brain, then the brain will learn to deal with it.
Here Insert Picture Description

The first example: the upper left corner of this picture is an example to learn to "see" with his tongue. Its principle: This is actually a system called BrainPort, which is now FDA (US Food and Drug Administration) in clinical trials, it can help blind people see things. Its principle is: you take a gray camera on the forehead, face forward, it will be able to get a low-resolution gray scale image of something in front of you. You connect a wire to the electrode array is mounted on the tongue. Each pixel is then mapped to a location on your tongue, may be the high voltage value corresponding to a dark pixel point, a low voltage value corresponds to a bright pixel point. Even though it is now relying on the functions, you can make use of this system I learned to use our tongues to "see" something in the tens of minutes.

A second example: on human echolocation or sonar human body, there are two ways you can achieve. You can refer to snapping or raspberry head, but there are blind people do accept such training in school, and learn to interpret sound waves bounce back from an environmental model - that's sonar. If, after you search YouTube, you will find: some amazing video tells the story of a child, he was brutally removed because of cancer of the eye, although the loss of the eye, but by snap your fingers, he can move around without hitting anything. He can skateboard, he can put the basketball in the basket, note that this is not the eye of a child.

A third example: touch belt. If you wear it around the waist, the buzzer will sound, and always buzzing sound when facing north. It can make people have a sense of direction, a similar manner to the birds perceive direction.

There are some examples of bizarre: If you insert the third eye in frogs, frog can learn to use that eye.

These examples are very surprising if you can access almost any sensor to the brain, the brain's learning algorithm will be able to find a way of learning data and processes the data. In a sense, if we can find brain learning algorithm, and then perform brain learning algorithm or with a similar algorithm on a computer. Perhaps it would be best to try to make our move to artificial intelligence. The dream of artificial intelligence is to one day be able to create a truly intelligent machine.

Third, the neural network model

When using neural network, how do we express our assumptions or models:

Here Insert Picture Description
Neural network was invented by imitating the brain's neurons or neural network, therefore, to explain how to represent the model assumptions, let's look at what individual neurons in the brain.

Our brain is full of such neurons, neurons are brain cells, in which there are two points worth noting: First, there is neuronal cell bodies like this, and second neuron has a number of input neurons.

These input neurons, called dendrites, we can think of them as input wires. They receive information from other neurons. Output neurons called axons, neural These outputs are used to transmit signals to other neurons or transmitting information.

Briefly, the neurons are a computing unit. It accepts a certain number of information from input neurons and do some calculations. The result is then transmitted through its axon to other nodes, or other neurons in the brain.

This is the model of all human thought: our neurons put their message is subject to calculation, and other neurological yuan deliver the message.
Here Insert Picture Description

Input x1, x2, x3, output hθ (x) = 1 / (1 + e (-θTX)), which is a very simple model to simulate the work of neurons. We neuronal modeled as a logical unit, similar to neurons yellow circle, an s-shaped function or logic function as an artificial neuron activation functions, neural network terminology, the excitation function just similar nonlinear function g (z ) another term call. θ is the parameters of the model, sometimes called "weight", x0 is biased nerve cells, because x0 is always equal to 1, sometimes painted, sometimes not drawn, for example depending on whether beneficial. , X1 x2 x3 nerve similar to the input, h (x) is the output neurons.
Here Insert Picture Description

It is actually a collection of these different neuronal combined neural network. Specifically, here is x0 x1 x2 and x3, neuron a (2) 1 a (2) 2 and a (2) 3 (a (2) 0 is the additional bias unit, the value input unit us is 1), the last layer output function h is assumed that the result (x) calculation.

In this example, we have an input layer - the first layer, a hidden layer - the second layer, an output layer - layer 3. But in fact, the input layer or any layer of the non-output layer is called the hidden layer.
Here Insert Picture Description

Under a superscript (j) represents the subscript i: j-th layer or the i-th neuron excitation, the so-called excitation (Activation) means a value of a particular neuron reads, calculates and outputs.

Here, we have three input units and three hidden units. As a control we matrix parameters from three input units, mapping three hidden units. Thus θ (1) becomes a 3-dimensional matrix by 4.

More generally, if a network in the j-th layer Sj units, the j + 1 layer Sj + 1 cells, the matrix [theta] (j) i.e., the control matrix layer j to j + 1-layer mapping dimension to Sj + 1 * (Sj + 1)

Finally, in the output layer, we have a unit which calculates h (x) This can also be written a (3) 1 (third layer, the first element)

Fourth, the model represents

To achieve quantified before the spread of
Here Insert Picture Description

These values ​​are a linear combination of z, is the input value θ0 θ1 θ2 θ3 weighted linear combination with x0 x1 x2 x3, I can define a (1) equal to the vector x. The calculated h (x) of the process is also referred to as forward propagation (forward propagation). So named because we started from the excitation input layer to the hidden layer and then spread to the front and were calculated excitation hidden layer. Then, before we continue to spread, and calculate the excitation output layer. The former from the input layer to the hidden layer and then to the output layer, are sequentially calculated excitation process is called propagation.
Here Insert Picture Description
Here Insert Picture Description

We hide the input layer, visible feature items a1 a2 a3 as they are input to learn. Specifically, it is the mapping function from the first layer to the second layer, the function is determined by another set of parameters θ (1).

So in a neural network, it does not use the input feature x1 x2 x3 trained logistic regression, logistic regression but his training as input. If a1 a2 a3 conceivable to select a different parameter θ (1) sometimes can learn some interesting and complex features. You can get a better hypothesis, x1 x2 or x3 obtained when assuming that better than the original input, the next section will talk about why.

You can also use other types of diagrams to represent the neural network, the way the neural network neurons are connected is called neural network architecture, so that architecture refers to how different neurons are connected to each other.

V. Application

Examples describes how the neural network is a complex nonlinear function of the calculation input

Let's look at two simple questions:

And a first operation or a second operation

PS: numbers on a straight line, for example -30, + 30, is a weight θ. Function g (x) is a pattern on the S-shaped function.

Here Insert Picture Description
Here Insert Picture Description

The following non-implementation and operation and then the first
Here Insert Picture Description

The above-mentioned binding operation, or to achieve the same operation
Here Insert Picture Description

Ideas, or with the same binary result is two to one. It is the same as x1, x2 (the first layer) and after operation (a second layer neuron), X operation (a second layer of a second neuron), and then through the OR operation (third layer neurons) after or to obtain the same operation (output).

In the input layer, we only have the original input values, then we have established a hidden layer used to calculate the number of input quantities slightly more complex functions, and by adding another hidden layer, we get a little more complicated function, which is neural network may be calculated on some more complex function of visual interpretation.

Six multi-class classification

How do neural network multi-class classification, before it comes to binary classification, we can be represented by an output, 0 or 1.

For multivariate classification problem, we do not use an output expressed as 1,2, ... 10; it is represented using the vector, as follows:

A final example, the first neuron of the output layer, indicates that it is a person, it indicates 0 nonhuman. The second output layer neurons, indicates that it is an automotive, non-automotive ... 0 indicates that it can determine what it is through the final output vector, i.e., for multi-class classification.

Here Insert Picture Description
Reference 07 Machine Learning (Andrew Ng): non-linear hypothesis (neural network model)
Andrew Ng machine learning - learning neural network

Published 80 original articles · won praise 140 · views 640 000 +

Guess you like

Origin blog.csdn.net/linjpg/article/details/104108582
Recommended