Quick learning in one article - let the neural network no longer be mysterious, learn the basics of neural network in one day - forward propagation (3)


foreword

I have been thinking for a long time whether to publish in-depth learning content. After all, more than half of the machine learning content in the mathematical modeling column has not been updated. I have considered for a long time and decided to come up with a series of articles on neural networks. Otherwise, if neural networks are used in mathematical modeling competitions or other more optimized models in the future (such as using LSTM for time series model prediction), then it will be better to explain to everyone. And explained the principle. However, the content of deep learning is not so easy to master. It contains a lot of mathematical theoretical knowledge and a lot of calculation formula principles that require reasoning. And it is difficult to understand what the code we write represents in the neural network computing framework without actual operation. However, I will try my best to simplify the knowledge and convert it into something we are more familiar with. I will try my best to let everyone understand and be familiar with the neural network framework, so as to ensure smooth understanding and smooth deduction, try not to use too many mathematical formulas and Professional theoretical knowledge. Quickly understand and implement the algorithm in one article, proficient in this knowledge in the most efficient way.

Although many competitions do not limit the use of algorithm frameworks, more and more award-winning teams have used deep learning algorithms, and traditional machine learning algorithms are gradually declining. For example, in the 2022 American College Students Mathematical Modeling Question C, the parameter team used the deep learning network team, and the winning ratio is very high. Now artificial intelligence competitions and data mining competitions are increasing one after another, and the demand for neural network knowledge is also increasing, so it is very useful. It is necessary to master various neural network algorithms.

The blogger has focused on modeling for four years, participated in dozens of mathematical modeling, large and small, and understands the principles of various models, the modeling process of each model, and various topic analysis methods. The purpose of this column is to quickly use various mathematical models, machine learning and deep learning, and codes with zero foundation. Each article contains practical projects and runnable codes. Bloggers keep up with all kinds of digital and analog competitions. For each digital and analog competition, bloggers will write the latest ideas and codes into this column, as well as detailed ideas and complete codes. I hope that friends in need will not miss the column carefully created by the author.
 


forward propagation

The last two articles described the basic architecture of the neural network and various commonly used activation functions, then we know that in the neural network, each neuron is connected to each neuron in the front and rear layers, so how does the neural network pass the input How is the data calculated to the output layer? Let's see how it works now.

The transfer process of the neural network can be described as four key steps:

from input to hidden layer

The way neural networks are computed from the input layer to the hidden layer involves a linear combination of weights and biases, and then passing the result to an activation function.

  1. Input signal: The input layer receives external input data, which can be images, text, numbers, etc. Each input corresponds to an input neuron in the network. n^{(0)}Suppose there is a neuron in the input layer , respectively a_1^{(0)}, a_2^{(0)}, ..., a_{n^{(0)}}^{(0)}, they respectively represent the output of the first to the first n^{(0)}input neuron.
  2. Weights and biases: The hidden layer contains multiple neurons, each neuron has a connection to each neuron in the input layer, and there is a weight on the connection w_{ji}^{(1)}. Among them, $j$ represents the neuron index in the hidden layer and irepresents the neuron index in the input layer. Each hidden layer neuron also has a bias b_j^{(1)}.
  3. Linear combination:j For the first neuron in the hidden layer , its input signal will be multiplied by the weight and biased to obtain the value of the linear combination z_j^{(1)}:

  4. Activation function: For the value of the linear combination z_j^{(1)}, input it into the activation function $f$ to get the output of the hidden layer neurons a_j^{(1)}: , common activation functions include sigmoid, ReLU, tanh, etc., which introduce nonlinear properties, so that the neural network can Learn more complex functions.
  5. Layer-by-layer pass: The above steps are repeated for each neuron in each hidden layer. The output of each neuron in the hidden layer becomes the input to the neuron in the next layer.

 This calculation process will be repeated in each neuron of each layer until the output of the hidden layer is obtained. The output of these hidden layers becomes the input of the next layer, and so on, until the output layer is reached. Through this layer-by-layer calculation, the neural network can extract and represent higher-level features from the input data.

 If understanding the above text description feels abstract and difficult, we can use a small example to specifically understand the forward transfer process of the neural network:

X_{1}=0.4,X_{2}=-0.6As shown in the figure above  , it shows the basic structure of a neural network. We set two input nodes Yas the actual true value Y=0.1, then we set the weight:W_{1}=0.3,W_{2}=-0.6,W_{3}=0.9,W_{4}=-0.4,W_{5}=0.4,W_{6}=0.7

The weighted summation of the nodes from the input layer to the hidden layer is performed, and the results are as follows:

The value of node 1 is:X_{1}*W_{1}+X_{2}*W_{3}=0.4*0.3+-0.6*0.9=-0.42

The value of node 2 is:X_{1}*W_{2}+X_{2}*W_{4}=0.4*-0.6+-0.6*-0.4=0

 Then perform Sigmoid activation on the value of the node in the hidden layer. The sigmoid function is described in detail in my last article, and you can directly calculate it:

\frac{1}{1+e^{-0.42}}=0.4,\frac{1}{1+e^{0}}=0.5

The output of the hidden layer is then weighted and summed to the output node:

-0.42*0.4+0.5*0.7=0.18

Finally, we found that there is still a gap between 0.18 and the true value of 0.1. If the weight setting is not appropriate, it will lead to worse results. At this time, we need to use back propagation to make the predicted value closer to the real value. Of course, if there are many input layers and the hidden layer is more complicated, we generally use a matrix to do it, for example:

We can use matrix operations to express:

\begin{pmatrix} w11&w21 \\ w12&w22 \end{pmatrix}*\begin{pmatrix} x1\\x2 \end{pmatrix}=\begin{pmatrix} w11*x1+w21*x2\\ w12*x1+w22*x2 \end{pmatrix}

Now assuming that the input data source is [0.9,0.1,0.8], we do another calculation:

import numpy as np
def _sigmoid(in_data):
    return 1/(1+np.exp(-in_data))
#输入层
x = np.array([0.9,0.1,0.8])
#隐藏层:需要计算输入层到中间隐藏层每个节点的组合,中间隐藏层的每个节点都与输入层相连,所以w1是一个3*3的矩阵
#因此每个节点都会得到输入信号的部分信息
#第一个输入节点与中间隐藏层第一个节点之间的权重w11=0.9,输入的第二个节点与隐藏层之间的连接的权重为w22=0.8
w1 = np.array([[0.9,0.3,0.4],
              [0.2,0.8,0.2],
              [0.1,0.5,0.6]]
             )
#因为输出层包含了3个节点,所以w2也是一个3x3的矩阵
w2 = np.array([
    [0.3,0.7,0.5],
    [0.6,0.5,0.2],
    [0.8,0.1,0.9]
])

Xhidden = _sigmoid(w1.dot(x))
print(Xhidden)
Xoutput = w2.dot(Xhidden)
print(Xoutput)  #最终输出结果

 

 Let's look at a more complex example:

 In this case, let's add a hidden layer and see how it works:

def _sigmoid(in_data):
    return 1/(1+np.exp(-in_data))

def init_network():
    network={}
    network['w1']=np.array([[0.1,0.3,0.5],[0.2,0.4,0.6]])
    network['b1']=np.array([0.1,0.2,0.3])
    network['w2']=np.array([[0.1,0.4],[0.2,0.5],[0.3,0.6]])
    network['b2']=np.array([0.1,0.2])
    network['w3']=np.array([[0.1,0.3],[0.2,0.4]])
    network['b3']=np.array([0.1,0.2])
    
    return network
    
def forward(network,x):
    w1,w2,w3 = network['w1'],network['w2'],network['w3']
    b1,b2,b3 = network['b1'],network['b2'],network['b3']
    a1 = x.dot(w1) + b1
    z1 = _sigmoid(a1)
    a2 = z1.dot(w2) + b2
    z2 = _sigmoid(a2)
    a3 = z2.dot(w3)+b3
    y=a3
    return y

network = init_network()
x = np.array([1.0,0.5])
y = forward(network,x)
print(y)

 

Then the forward propagation is all finished here. There is no complicated content, just linear calculation. In the next chapter, we will focus on the calculation and function of the output layer.


 

Guess you like

Origin blog.csdn.net/master_hunter/article/details/132531951