Implementing Neural Networks (Python) from Zero to One: Two

problem introduction

In the previous section, we introduced the perceptron. In fact, we can solve many complex problems by stacking the number of perceptron layers.
However, in the example of the simple logic circuit implemented by the perceptron in the previous section, the shortcomings are still obvious, and the input weights are all artificial. The emergence of the set
neural network perfectly solves this problem, it can automatically learn the appropriate weight parameters from the data

Examples of Neural Networks

Let's take a network with three layers of neurons below
insert image description here

Signal Transmission Mechanisms in Perceptrons

The functional relationship between the output layer and the input layer in a single-layer perceptron

In the previous section, the perceptron obtains the quasi-output by weighting and summing the input signals. When the quasi-output is greater than the threshold, the output is 1. Otherwise, the output is 0
expressed as a mathematical expression
insert image description here
. There is not a single functional relationship between them, namely:

  1. The proposed output y has a functional relationship with x,
  2. The final output final has another functional relationship with the intended output y

insert image description here
Diagram of signal transmission process
insert image description here

step function

In the perceptron algorithm, we define the above unknown function as a step function, and its function is to convert the weighted sum of the input signal (ie, the output y) to the final output value (ie final), and its function image is the
insert image description here
order Jump function implementation

def step_funtion(x):
	return 1 if x>0 else 0

def step_fun(x):
	'''x:ndarray'''
	y=x>0
	return y.astype(np.int)
	
def step_fun(x):
	'''x:ndarray'''
	return np.array(x>0,dtype=np.int)	

Signal Transmission Mechanisms in Neural Networks

Symbols in Neural Networks

Now take an example of a 4-layer neural network with 2 hidden layers
insert image description here

Optimizing mathematical expressions using linear algebra

Let's review the network representation between the input layer and the hidden layer again
insert image description here

About bias b: We will find that this picture is not the same as the above picture - it seems that there is one more neuron in the input layer, in fact, the author just omitted it in the previous picture, the neural network with the bias neuron layers (including input and hidden layers) are correct.
In the perceptron in the previous section, we know the origin of the bias value b: it represents the threshold of the neuron, which reflects the ease with which a neuron is activated, so each neuron should add it when it accepts the input value. Bias value

Now we know that the value of the first neuron a1 in the hidden layer 1 of the above neural network is as follows.
insert image description here
If the number of neurons in the input layer is large, the above formula will become very verbose, so we can put the input value Both x and weight w are represented by matrices

If all the neurons in hidden layer 1 are represented by an algebraic formula,
insert image description here
knowing the mathematical principle, how can we implement matrix multiplication in code?

numpy's broadcasting mechanism

Broadcasting is a term that describes how numpy handles arrays with different shapes during arithmetic operations. It is a general concept, we only use the np.dot() function in it to calculate the inner product of the input value and the weight parameter

Let's take an example, assuming that the input values ​​and weight parameters are as follows
insert image description here

Code

import numpy as np
X=np.array([5.0,4.0])
W1=np.array([[0.1,0.3,0.6],[0.25,0.25,0.5]])
B1=np.array([0.1,0.2,0.3])
A1=np.dot(X,W1)+B1
print(A1)
# [1.6 2.7 5.3]

Activation function debuts

There is a layer of functional relationship between the quasi-output obtained in the single-layer perceptron and the final output, which is called a step function, which means that the neuron will only be activated to generate an output when the input value accepted by the neuron reaches a certain threshold. The neural network is similar. The matrix A1 we got above is actually the quasi-output matrix of hidden layer 1. To get the final output of hidden layer 1, we also need to set up a functional relationship between the two.
However, we see The image of the step function is too abrupt, and the function value rises suddenly after the output value is greater than 0, which makes it look cold and unrealistic. Therefore, after the research of scholars, other functions are used in the neural network instead. , the most famous of which is the sigmoid function

Can the activation function be a linear function?
No, the use of linear functions makes it meaningless to deepen the number of layers in the neural network. A linear function usually refers to a function whose input value is a constant multiple of the output value, such as h(x)=cx+b (c and b are constants). If a linear function is used as the activation function, for a three-layer neural network y(x)=h(h(h(x))), that is, y=c c c*x+... it is equivalent to a single-layer neural network y=pow (c,3)*x+…

sigmoid function

Mathematical expression of functional relation
insert image description here

Function image From the image point of view, the sigmoid function is relatively smooth, which means that the output value will change continuously with the input value, which is very important
insert image description here
for the learning process (weight update) of the neural network.

Code

def sigmoid(x):
	return 1/(1+np.exp(-x))

Signal passing between layers in a neural network

We still use the 4-layer neural network with 2 hidden layers as an example

input layer to hidden layer 1

insert image description here

Hidden Layer 1 to Hidden Layer 2

insert image description here

Hidden layer 2 to output layer

insert image description here

Code

The signal transmission of the first three layers is very similar, and the code is used together here.

def init_network():
    network={
    
    }
    # 下面参数为随意设置的值,注意数据维度
    network['W1']=np.array([[0.2,0.4,0.4],[0.1,0.3,0.6]])
    network['B1']=np.array([0.1,0.3,0.2])
    network['W2']=np.array([[0.1,0.1],[0.3,0.4],[0.3,0.6]])
    network['B2']=np.array([0.4,0.3])
    network['W3']=np.array([[0.1,0.3],[0.2,0.4]])
    network['B3']=np.array([0.1,0.2])

    return network

def forward(network,x):
	# 确定权重参数以及偏置
    W1,W2,W3=network['W1'],network['W2'],network['W3']
    B1,B2,B3=network['B1'],network['B2'],network['B3']
	# 计算各层拟输出及最终输出
    A1=np.dot(x,W1)+B1
    z1=sigmoid(A1)
    A2 = np.dot(z1, W2) + B2
    z2 = sigmoid(A2)
    A3 = np.dot(z2, W3) + B3
    y = sigmoid(A3)

    return y


my_network=init_network()
x=np.array([5.0,4.0])
y=forward(my_network,x)
print(y)
# [0.58266985 0.67742112]

Through the above code, we have been able to calculate the quasi-output matrix of the output layer through the input value matrix! Let's see how the final output value of the output layer is calculated

After the above code, we have basically operated the forward signal propagation of the neural network, and the seemingly mysterious neural network has been half realized by us!

The output of the output layer

Machine learning can be roughly divided into regression problems and classification problems. The former is to correctly classify the data, and the latter is to predict a continuous value based on an input value.
For the general regression problem of the output layer of the neural network, the identity function is used. The classification problem generally uses the softmax function

Neuron settings for the output layer

It needs to be set according to specific problems. For example, given a 28x28 mnist handwritten data set picture, to classify it, the number of neurons in the output layer needs to be set to 10
insert image description here

identity function

receive input x, output x

def identical(x):
	return x

softmax function

mathematical expression

insert image description here

Mathematical principles of using the softmax function for classification problems

In the above picture, a picture needs to be classified into ten categories. After the input value is calculated by the hidden layer, the quasi-output matrix A1=[a1,a2,…,a10] of the output layer is obtained. This matrix can already express the final output of the entire neural network (usually the element with the largest relative value in the matrix is ​​taken The corresponding label is used as the final predicted label)

However, the elements in this matrix are not necessarily all positive values . In order to facilitate the comparison of relative values, we use a monotonically increasing function (here we select an exponential function with base e), and at the same time, we compare the final relative values ​​in a proportional form.

So the form of the softmax function appears

In this way, we get the final output value matrix Z=[z1,z2,…] of the output layer, where all elements sum to 1, and each element represents the probability of belonging to the corresponding label

Softmax function code implementation

def softmax(x):
	return(np.exp(x)/np.sum(np.exp(x)))

Defects when the softmax function is applied to a computer

The exponential function has the characteristic of "exponential explosion", that is, the value of the function will increase very rapidly as the value of the independent variable x increases. Cause the overflow problem of the computer
To solve this problem, we can start from the properties of the softmax function
insert image description here

The above properties show that adding or subtracting a constant value will not cause the value of the sofmax function to change. In order to prevent overflow, we usually use the maximum value in the array x. The code implementation of the softmax function is improved below.

def softmax(x):
	c=np.max(x)
	return np.exp(x-c)/np.sum(np.exp(x-c))

Others about the softmax function

Due to the monotonous increase of the exponential function, the position of the neuron with the largest output value will not change, and softmax only plays the role of calculating the proportion (the final output information of the neural network is stored in the pseudo-output matrix A, but humans generally cannot It is too intuitive to infer the final output based on the information contained in it, so from this point of view, softmax has a certain role
. Another role of the softmax function is the error transfer of the weight update process of the neural network (see the next section)

Guess you like

Origin blog.csdn.net/m0_54510474/article/details/124037721