B station Andrew Ng depth study video study notes (2) --- What is a neural network (Neural Network)

Foreword

This lesson by way of example the image of the child vividly describes what is neural networks, the following basic translation of the original video content, the content is very vivid and detailed, well understood.

What is a neural network? (What is a Neural Network)

We often use the term to refer to deep learning process for training the neural network. Sometimes it refers to a particular large-scale neural network training. So what is it neural networks? In this video, I will explain some of the visual basics.

First, let us examples from a forecast of house prices began to talk.

Suppose you have a data set that contains information on six houses. So, you know how many square feet area of ​​the house or square, and know that housing prices. At this point, you want to fit a function of a predicted prices according to floor area.

If you are familiar with linear regression, you might say: "Well, let's use these data fit a straight line." So you might get a straight line. This is a basic regression line, this should be a note!

Question 1: What is the return?

Many of my friends do not know what is the return, in order to better understand the following, I wrote the following content will speak about what is before return.

Linear regression hypothesis is a black box, that according to the programmer's thinking, this is a black box function, then, as long as we pass some parameters to the function as an input, you can get a result as an output. That return is what does that mean? In fact, plainly, it is this black box output is the result of consecutive values. If the output value is not consecutive but discrete values ​​it is called classification. What is called continuous value it? Very simple, give chestnuts: For example, I tell you I've got a house, the house has 40 levels, in the subway, then I guess you come to the house a total worth? This is a continuous value, because the house may value 800,000 802,000 also possible value, it may value 801,110. As another example, I tell you I have a house, 120 flat, in the subway, for a total value of 1.8 million, then I guess you come to the house have several bedrooms? Well, this is a discrete value. Because the number of bedrooms may only be 1, 2, 3, 4, 5 cap to the best, and what can not be the bedroom number 1.1, 2.9. You just know that I want to accomplish is to predict a continuous value, then the task is to return. Is a discrete value, then that is classified.

Well, then we look at what Andrew Ng Here to talk

Here Insert Picture DescriptionHouse prices predicted line

But strange is that you may also find, and we know that the price will never be negative. Therefore, in order to replace a negative price may make a straight line, we put a little bend straight line, it eventually ends in a zero. The final piece of thick blue line is your function to predict price floor space. Some are zero, and some straight-line fit well. You might think that this function is only fitting housing prices.

As a neural network, which is almost probably the most simple neural network. We put the housing area as the input of neural network (we call it), by a node (a small circle), the final output price (we represent). In fact, this small circle is a single neuron. Then you realize the network function on the left of this function.

In the literature on neural networks, you often see this function. From beginning to approach zero, and then becomes a straight line. This function is called ReLU activation function , it stands Rectified Linear Unit. rectify (correction) can be understood as, and this is the reason you get a function of this shape.

Here Insert Picture Description

You do not have to worry about not understanding ReLU function, you will see it again later in this course.

Question 2: Key Question: What is the activation function, what is the role of the activation function?

Here Andrew Ng teacher leads to a concept: the activation function

Very Figuratively speaking: neural network activation function (activation function) is a group of space magician, twisting flip feature space in which to find a linear boundary.

If the function is not activated, then the weights of the neural network of heavy bias all linear affine transformation (affine transformation):
Here Insert Picture Description
this neural network, and even the following simple classification problem can not solve this:
Here Insert Picture Description
Here Insert Picture Description
In this two-dimensional feature space, blue line shows the negative case (y = 0), a front case of the green line (y = 1)

No activation function of blessing, the neural network can do up to this level:

Here Insert Picture Description

Linear boundaries - does not look very good, is not it?

At this time, the activation function shot, twisting flip about space:
Here Insert Picture Description

Linear boundary appeared! Restore to go back again, we do not get the boundary original feature space?

Of course, different activation functions because different genres belongs, the cast magic also vary.

Here Insert Picture Description
Figure above, the appearance of three spaces mage, respectively sigmoid, tanh, relu.

sigmoid

sigmoid is a grandmother, the activation function is the most qualified.
Here Insert Picture Description
Although relatively old, old-fashioned, is not as popular as in those days, but in the output layer classification tasks, people still trust the experienced sigmoid.
Here Insert Picture Description
We can see, sigmoid input squeezed into the 0-1 range (the range that is consistent and probabilities), which is the classification task sigmoid very popular.

fishy

tanh also a veteran space magician:

Here Insert Picture Description

And so on, is not that sigmoid? Turned backwards that we do not know yet?

Yes, tanh is disguised sigmoid:

Here Insert Picture DescriptionHere Insert Picture Description
As shown above, similar to the Sigmoid and tanh shape, but the tanh "squeeze" is inputted to the interval (-1, 1). Thus, the center is zero (to some extent) the activation value has been entered under the normal distribution layer.

As for the gradient, it has a much larger peak of 1.0 (the same at z = 0), but it falls faster, when | Z | already reaches a value close to zero when 3. This is the reason behind the disappearance of the so-called gradient issue (vanishing gradients), it will lead to progress in training the network slows down.

resume

ReLU is a gatekeeper, all Muggles (0) all shut (closed neurons). It is also the activation function Andrew Ng teacher mentioned.
Here Insert Picture Description
It is unusual to use the activation function today. ReLU handle its sigmoid, tanh gradient disappears common problem, but also the fastest gradient calculated activation function.
Here Insert Picture Description
Here Insert Picture DescriptionAs shown above, is a completely different RELU Beast: it does not "squeeze" to a range value - it is only retained positive, negative and zero all conversions.

Use ReLU positive aspects is its gradient is either 1 (positive) or 0 (negative) - no gradient disappeared! This mode enables faster network convergence. On the other hand, this leads to the performance of so-called "neuronal death" issue, which is ongoing negative input neuron activation value is always zero.

Well, Complete understanding of the activation function, we then look at Andrew Ng teacher what was going to say.

Here Insert Picture Description
If this is a single neuron network, regardless of size, it is through these single neurons stacked together to form. If you imagine these neurons into individual Lego bricks, you will be done by a larger neural network building blocks.

Let's look at an example:

We do not just use it to predict the area of ​​housing prices, now you have a number of other housing-related features, such as number of bedrooms, and perhaps there is a very important factor, the number of family can also affect housing prices, the housing can stay family or four or five people in the family? And this is indeed based on house size, number of bedrooms and the house really decide whether you can fit the number of households.

Change the subject, you probably know the postal code as a feature might be able to tell you the degree of walking. For example, this neighborhood is not the height of the walk, if you can walk to the grocery store or school, and whether you need to drive a car. Some people like to live in the main pedestrian area, while also based on zip code and affluence-related (in the United States is so). But in other countries may also reflect how good the vicinity of schools.
Here Insert Picture Description
In Fig each small circle drawn ReLU may be part of, but also refers to a linear correction unit, a nonlinear function, or other slightly. Based on floor area and number of bedrooms, family size can be estimated based on zip code, you can walk estimate the degree or quality of schools. Finally, you might think that these people are willing to decide how much money to spend.

For a house, these things are closely related with it. In this scenario, the family size, the degree and quality of school walk can help you predict the price of the house. As an example, x is any of the four inputs, y is that you try to predict the price of these single neurons added together, we have a slightly larger neural network. This shows the magic of neural networks, although I have described a neural network, it can take you to get home area, the degree and quality factors walking school, or other influences prices.

Here Insert Picture Description
Part of the magic of neural networks is that when you achieve it, you have to do is enter x, will be able to get the output y. Because it can calculate the number of your own training set of samples and all the intermediate process. So, you actually have to do is: There are four inputs of the neural network, this feature might enter the house is the size, the number of bedrooms, zip code and the wealth of the region. After features are given inputs, the neural network is predicted work corresponding price. Also note that these are called hidden unit circle, in a neural network, they each get their input from four input feature, for example, the first node on behalf of family size, family size and depends only on x1 wherein and x2, in other words, in the neural network, decide what you want to get in this node, then all of the four inputs to be obtained is calculated. Therefore, we say the input layer and the intermediate layer is tightly connected up.

It is noteworthy that the neural network has given enough data on the x and y, given enough training samples and x related. Neural networks are very good at precise mapping function calculated from x to y.

This is the basis of a neural network. You may find yourself in an environment of neural network supervised learning is so effective and powerful, that is to say as long as you try to enter an x, you can map it to y, just as we have just seen in the example of prices forecast to the effect.

In the next lesson, we will learn more oversight for examples, some examples will make you feel that your network will be very useful, and you practice it as well.

Class!

Published 25 original articles · won praise 21 · views 50000 +

Guess you like

Origin blog.csdn.net/nine_mink/article/details/104828426
Recommended