Understanding and Application of Neural Networks

Recently, I am studying the deep learning of Mr. Wu Mengda. This article is used to record the relevant knowledge points of the neural network~

Includes an understanding of neural networks and how neural networks work.

Understanding Neural Networks

Use the following picture to understand neural network:

 The first layer in the figure is the input layer, where x1, x2, and x3 are the characteristics of the input samples, such as the music name and singer name of the music data sample.

The second layer is the hidden layer, the key to the neural network. The hidden layer generally has many layers, that is, the deep neural network. Only one layer of the hidden layer is drawn in the figure.

The reason why deep neural networks are used is that neural networks are usually used to process more complex data such as images or audio and video. For these data, multi-layer neural networks are required to learn. Generally, the first few layers learn some local feature information, and the latter few layers integrate the local information to explore more complex information. This is one of the reasons why neural networks are popular.

Take the hidden layer in the figure as an example, 5, 6, 7, and 8 in the figure are all individual neurons of the same layer. The simple understanding of a neuron is a function. Given a corresponding input, the corresponding output is calculated through the neuron. A general neuron consists of two parts, a linear function and an activation function, as shown in the following figure:

Taking a neuron as an example, w and b are the parameters of the neuron (that is, the parameters that need to be learned and adjusted continuously during the training process).

The left side of the neuron is to perform a linear operation on the input X=[x1,x2,x3] to obtain the vector z1. Among them, w is a vector, which weights the different characteristics of the sample, and finally adds the bias b to complete the linear operation.

The right side of the neuron is the activation function. The figure shows δ(z), which is the sigmoid activation function. The sigmoid function is: , and the sigmoid function \delta \left ( z \right )= \frac{1}{1+e^{-z}}fixes the output range between (0,1), which is suitable for binary classification problems. Commonly used activation functions include tanh, ReLU, and leaky RelU. I will explain in detail about the activation function in another article. What we need to know here is that the activation function is a nonlinear function, which plays the role of the hidden layer and prevents the output of the neural network from being a linear combination of the inputs.

It is worth noting that each neuron has different parameters, and for the neural network to function, the initialization of neuron parameters requires random initialization. The activation functions of different layers can be different.

The third layer is the output layer, which has only one neuron, and the calculation of the neuron is similar to the calculation of a single neuron mentioned above.

How neural networks work:

Take logistic regression (used to solve binary classification problems) as an example:

1. Given an input sample x, initialize parameters w, b. Use z=w*x+b, \hat{y}=\delta \left ( z \right )=\frac{1}{1+e^{^{z}}}  calculate \hat{y}, get the prediction result of sample x.

2. Calculate the error between the predicted result of the sample and the real result according to the cost function.

Error for a single sample:L(\hat{y},y)=-(y*log(\hat{y})+(1-y)*log(1-\hat{y}))

yRepresents the true value of the sample and \hat{y}represents the predicted value of the sample. In binary classification problems, there are only two possible values, 0 and 1, for both the true value and the predicted value.

Cost function:J(w,b)=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}*log(\hat{y}^{(i)})+(1-y^{(i)})*log(1-\hat{y}^{(i)})]

Add the error values ​​of all samples and divide by the number of samples.

3. Adjust the parameters w and b according to the error, adjust the parameters w and b using the gradient descent method, and find the parameters w and b that make the error of the sample as small as possible.

Guess you like

Origin blog.csdn.net/m0_45267220/article/details/128594664