content
- problem introduction
- Examples of Neural Networks
- Signal Transmission Mechanisms in Perceptrons
- Signal Transmission Mechanisms in Neural Networks
- Signal passing between layers in a neural network
problem introduction
In the previous section, we introduced the perceptron. In fact, we can solve many complex problems by stacking the number of perceptron layers.
However, in the example of the simple logic circuit implemented by the perceptron in the previous section, the shortcomings are still obvious, and the input weights are all artificial. The emergence of the set
neural network perfectly solves this problem, it can automatically learn the appropriate weight parameters from the data
Examples of Neural Networks
Let's take a network with three layers of neurons below
Signal Transmission Mechanisms in Perceptrons
The functional relationship between the output layer and the input layer in a single-layer perceptron
In the previous section, the perceptron obtains the quasi-output by weighting and summing the input signals. When the quasi-output is greater than the threshold, the output is 1. Otherwise, the output is 0
expressed as a mathematical expression
. There is not a single functional relationship between them, namely:
- The proposed output y has a functional relationship with x,
- The final output final has another functional relationship with the intended output y
Diagram of signal transmission process
step function
In the perceptron algorithm, we define the above unknown function as a step function, and its function is to convert the weighted sum of the input signal (ie, the output y) to the final output value (ie final), and its function image is the
order Jump function implementation
def step_funtion(x):
return 1 if x>0 else 0
def step_fun(x):
'''x:ndarray'''
y=x>0
return y.astype(np.int)
def step_fun(x):
'''x:ndarray'''
return np.array(x>0,dtype=np.int)
Signal Transmission Mechanisms in Neural Networks
Symbols in Neural Networks
Now take an example of a 4-layer neural network with 2 hidden layers
Optimizing mathematical expressions using linear algebra
Let's review the network representation between the input layer and the hidden layer again
About bias b: We will find that this picture is not the same as the above picture - it seems that there is one more neuron in the input layer, in fact, the author just omitted it in the previous picture, the neural network with the bias neuron layers (including input and hidden layers) are correct.
In the perceptron in the previous section, we know the origin of the bias value b: it represents the threshold of the neuron, which reflects the ease with which a neuron is activated, so each neuron should add it when it accepts the input value. Bias value
Now we know that the value of the first neuron a1 in the hidden layer 1 of the above neural network is as follows.
If the number of neurons in the input layer is large, the above formula will become very verbose, so we can put the input value Both x and weight w are represented by matrices
If all the neurons in hidden layer 1 are represented by an algebraic formula,
knowing the mathematical principle, how can we implement matrix multiplication in code?
numpy's broadcasting mechanism
Broadcasting is a term that describes how numpy handles arrays with different shapes during arithmetic operations. It is a general concept, we only use the np.dot() function in it to calculate the inner product of the input value and the weight parameter
Let's take an example, assuming that the input values and weight parameters are as follows
Code
import numpy as np
X=np.array([5.0,4.0])
W1=np.array([[0.1,0.3,0.6],[0.25,0.25,0.5]])
B1=np.array([0.1,0.2,0.3])
A1=np.dot(X,W1)+B1
print(A1)
# [1.6 2.7 5.3]
Activation function debuts
There is a layer of functional relationship between the quasi-output obtained in the single-layer perceptron and the final output, which is called a step function, which means that the neuron will only be activated to generate an output when the input value accepted by the neuron reaches a certain threshold. The neural network is similar. The matrix A1 we got above is actually the quasi-output matrix of hidden layer 1. To get the final output of hidden layer 1, we also need to set up a functional relationship between the two.
However, we see The image of the step function is too abrupt, and the function value rises suddenly after the output value is greater than 0, which makes it look cold and unrealistic. Therefore, after the research of scholars, other functions are used in the neural network instead. , the most famous of which is the sigmoid function
Can the activation function be a linear function?
No, the use of linear functions makes it meaningless to deepen the number of layers in the neural network. A linear function usually refers to a function whose input value is a constant multiple of the output value, such as h(x)=cx+b (c and b are constants). If a linear function is used as the activation function, for a three-layer neural network y(x)=h(h(h(x))), that is, y=c c c*x+... it is equivalent to a single-layer neural network y=pow (c,3)*x+…
sigmoid function
Mathematical expression of functional relation
Function image From the image point of view, the sigmoid function is relatively smooth, which means that the output value will change continuously with the input value, which is very important
for the learning process (weight update) of the neural network.
Code
def sigmoid(x):
return 1/(1+np.exp(-x))
Signal passing between layers in a neural network
We still use the 4-layer neural network with 2 hidden layers as an example
input layer to hidden layer 1
Hidden Layer 1 to Hidden Layer 2
Hidden layer 2 to output layer
Code
The signal transmission of the first three layers is very similar, and the code is used together here.
def init_network():
network={
}
# 下面参数为随意设置的值,注意数据维度
network['W1']=np.array([[0.2,0.4,0.4],[0.1,0.3,0.6]])
network['B1']=np.array([0.1,0.3,0.2])
network['W2']=np.array([[0.1,0.1],[0.3,0.4],[0.3,0.6]])
network['B2']=np.array([0.4,0.3])
network['W3']=np.array([[0.1,0.3],[0.2,0.4]])
network['B3']=np.array([0.1,0.2])
return network
def forward(network,x):
# 确定权重参数以及偏置
W1,W2,W3=network['W1'],network['W2'],network['W3']
B1,B2,B3=network['B1'],network['B2'],network['B3']
# 计算各层拟输出及最终输出
A1=np.dot(x,W1)+B1
z1=sigmoid(A1)
A2 = np.dot(z1, W2) + B2
z2 = sigmoid(A2)
A3 = np.dot(z2, W3) + B3
y = sigmoid(A3)
return y
my_network=init_network()
x=np.array([5.0,4.0])
y=forward(my_network,x)
print(y)
# [0.58266985 0.67742112]
Through the above code, we have been able to calculate the quasi-output matrix of the output layer through the input value matrix! Let's see how the final output value of the output layer is calculated
After the above code, we have basically operated the forward signal propagation of the neural network, and the seemingly mysterious neural network has been half realized by us!
The output of the output layer
Machine learning can be roughly divided into regression problems and classification problems. The former is to correctly classify the data, and the latter is to predict a continuous value based on an input value.
For the general regression problem of the output layer of the neural network, the identity function is used. The classification problem generally uses the softmax function
Neuron settings for the output layer
It needs to be set according to specific problems. For example, given a 28x28 mnist handwritten data set picture, to classify it, the number of neurons in the output layer needs to be set to 10
identity function
receive input x, output x
def identical(x):
return x
softmax function
mathematical expression
Mathematical principles of using the softmax function for classification problems
In the above picture, a picture needs to be classified into ten categories. After the input value is calculated by the hidden layer, the quasi-output matrix A1=[a1,a2,…,a10] of the output layer is obtained. This matrix can already express the final output of the entire neural network (usually the element with the largest relative value in the matrix is taken The corresponding label is used as the final predicted label)
However, the elements in this matrix are not necessarily all positive values . In order to facilitate the comparison of relative values, we use a monotonically increasing function (here we select an exponential function with base e), and at the same time, we compare the final relative values in a proportional form.
So the form of the softmax function appears
In this way, we get the final output value matrix Z=[z1,z2,…] of the output layer, where all elements sum to 1, and each element represents the probability of belonging to the corresponding label
Softmax function code implementation
def softmax(x):
return(np.exp(x)/np.sum(np.exp(x)))
Defects when the softmax function is applied to a computer
The exponential function has the characteristic of "exponential explosion", that is, the value of the function will increase very rapidly as the value of the independent variable x increases. Cause the overflow problem of the computer
To solve this problem, we can start from the properties of the softmax function
The above properties show that adding or subtracting a constant value will not cause the value of the sofmax function to change. In order to prevent overflow, we usually use the maximum value in the array x. The code implementation of the softmax function is improved below.
def softmax(x):
c=np.max(x)
return np.exp(x-c)/np.sum(np.exp(x-c))
Others about the softmax function
Due to the monotonous increase of the exponential function, the position of the neuron with the largest output value will not change, and softmax only plays the role of calculating the proportion (the final output information of the neural network is stored in the pseudo-output matrix A, but humans generally cannot It is too intuitive to infer the final output based on the information contained in it, so from this point of view, softmax has a certain role
. Another role of the softmax function is the error transfer of the weight update process of the neural network (see the next section)