Master deep learning in one article (9)-thoroughly understand neural networks (must read)

This chapter is a very important chapter, because deep learning is inspired by the neural network of animals. There are also many videos and articles on the introduction of neural networks on the Internet, but most of them are mixed. This article strives to use the most popular language. Help everyone thoroughly master the core of deep learning-neural network.

The neural network to be explained in this chapter is a shallow neural network. The deep neural network will be discussed later, but it is nothing unusual. The deep neural network is just a few more layers than the shallow neural network, and the principles are the same.

Recall the Logistic regression we learned before. It is the simplest neural network, as shown in the figure:

x1,x2,x3...,xnCharacterized by a sample value, w1,w2,w3...,wnweight corresponding to the weight for each feature, \Sigmarepresents z=w1*x1+w2*x2+...+wn*xn+b, \sigmaon behalf of a = \ sigma (z), the final output of a, i.e. the predicted values obtained \hat{y}=a.

Through the simplest neural network diagram, it can be seen that the forward propagation of the previously learned Logistic regression is carried out in the direction of the arrow in this figure, and the backward propagation can be carried out in the opposite direction of the arrow. It is very clear and intuitive. It is equivalent to reviewing the previous content.

The following formally explains the shallow neural network, we take a two-layer neural network as an example for detailed analysis:

The above picture is a two-layer fully connected neural network. Some students may ask: "This picture clearly has three layers. Why do you have to say it is two layers?" Because in deep learning, the number of layers of neural networks is calculated. Usually the input layer is not calculated, so the number of layers in the above figure is two layers, so what is a fully connected neural network? As the name implies, fully connected means that each neuron is connected to all neurons in the upper layer, and the neural network formed in this way is a fully connected neural network.

Let's explain each layer:

1. Input layer

For the binary classification problem, the input layer is the picture we want to recognize. For example, if we want to judge whether there is a cat in a picture, then our input is this picture. We need to turn the picture into pixel features because the computer recognizes Is the pixel. Assuming that the size of a picture is 28*28, then the picture is turned into a pixel matrix, the size is also 28*28, because in deep learning, the input layer is a column vector, so the pixel matrix must be turned into a column vector with dimensions 784*1, the same dimension as the input layer in the figure above.

Each input in the input layer represents a feature in the picture, for example, the first input x1may represent the texture of cat hair, the second input x2represents the cat’s pupils, and so on.

2. Hidden layer

The hidden layer combines the features in the input layer into more abstract features. For example, the neurons in the first hidden layer represent the stripes of the cat, the second represents the eyes of the cat, and so on.

The calculation formula is:

Z^{[1]}=W^{[1]}\times X+b^{[1]}

A^{[1]}=\sigma \left ( Z^{[1]} \right )

Here the superscript "[1]" means the first layer.

The above uses a vector to represent, assuming there are m samples, let's check whether the dimension is correct:

For Z^{[1]}=W^{[1]]}\times X+b^{[1]}, Z ^ {[1]}the dimension of which can be seen from the above figure is (10, m), W^{[1]}the dimension is (10, 784), b^{[1]}the dimension of X is (784, m), and the dimension is (10, m). The calculation shows that the dimension is correct .

For A^{[1]}=\sigma \left ( Z^{[1]} \right ), the known A^{[1]}dimensions Z ^ {[1]}are the same as (10, m).

The hidden layer calculates both Z and A, and everyone should remember this.

Note: In A=\sigma \left ( Z \right ), it \sigmais the activation function, why use the activation function, because if there is no activation function, our model is always linear, so the model is too simple to deal with complex problems, so the activation function is needed to make the model become Non-linearity, so as to deal with more complex classification problems.

Summary: The hidden layer is to extract the features of the input layer and combine them into more abstract features.

3. Output layer

The output layer is to combine the features in the hidden layer to calculate the predicted value. For the two-class classification, if the predicted value is >=0.5, the prediction is correct, otherwise the prediction is wrong.

The formula is as follows:

Z^{[2]}=W^{[2]}\times A^{[1]}+b^{[2]}

A^{[2]}=\sigma \left ( Z^{[2]} \right )

It can be seen that the formula difference between the output layer and the hidden layer is that the number of layers of Z, W, b, and A is different, and the input has changed from the previous X to the present A^{[1]}.

Because it is not difficult to see through the neural network diagram, the output of the previous layer is the input of the next layer.

The calculation of the dimension is left to everyone, and the method is the same as the method of calculating the dimension in the hidden layer.

The entire content of this article above has explained the input layer, hidden layer and output layer of the shallow neural network. I believe you will have a clear understanding of the neural network after reading it. See you next time.

If you think the article is helpful to you, please pay attention not to get lost~

The above is the entire content of this article. Get the deep learning materials and the teacher Wu Enda course (Chinese subtitles), scan the official account below, and reply to the word "data" to get it. I wish you a happy learning.

Guess you like

Origin blog.csdn.net/qq_38230338/article/details/107682470