[Study Notes] Introduction to Deep Learning: Theory and Implementation Based on Python - Neural Network

3. Neural Network

3.1 From Perceptron to Neural Network

If the neural network is represented by a graph, as shown in the figure below, we call the leftmost column the input layer , the rightmost column the output layer , and the middle column the middle layer (hidden layer) .

insert image description here

In the network shown above, the bias bbb is not drawn. If you want to express clearlybbb , it can be done as shown in the figure below. The weight is added to the figure below asbbb input signal1 11 . This perceptron willx 1 , x 2 , 1 x_1,x_2,1x1,x2,1 The three signals are used as the input of the neuron, multiplied by their respective weights, and sent to the next neuron. In the next neuron, the sum of these weighted signals is calculated. If this sum exceeds0 00 , then output1 11 , otherwise output0 00 . Also, since the biased input signal is always1 11 , so in order to distinguish it from other neurons, we color this neuron in gray in the figure.

insert image description here

We use a function to represent the action of this situation (more than 0 00 then output1 11 , otherwise output0 00 ),y = h (b + w 1 x 1 + w 2 x 2 ) y=h(b+w_1x_1+w_2x_2)y=h(b+w1x1+w2x2) , where the functionh ( x ) h(x)h ( x ) is shown in the following formula:

insert image description here

h ( x ) h(x) The h ( x ) function converts the sum of input signals into an output signal, and this function is generally calledan activation function. The role of the activation function is to decide how to activate the sum of the input signals.

The above formula can be refined into the following two formulas:

insert image description here

The calculation process of the activation function is shown in the figure below:

insert image description here

3.2 Activation function

An activation function often used in neural networks is the sigmoid sigmoid represented by the following formulas i g m o i d function:

insert image description here

Now, let's first try to draw the graph of the step function, when the input exceeds 0 00 , output1 11 , otherwise output0 00 . The step function can be implemented simply as follows:

# 参数只能为实数
def step_function(x):
	if x > 0:
		return 1
	else:
		return 0

# 参数可以为NumPy数组
def step_function(x):
	y = x > 0
	return y.astype(np.int)

Then draw the function image:

import numpy as np
import matplotlib.pylab as plt

def step_function(x):
	return np.array(x > 0, dtype=np.int)
x = np.arange(-5.0, 5.0, 0.1)
y = step_function(x)
plt.plot(x, y)
plt.ylim(-0.1, 1.1)  # 指定y轴的范围
plt.show()

The result is shown in the figure below:

insert image description here

Next we implement sigmoid sigmoids i g m o i d function:

def sigmoid(x):
	return 1 / (1 + np.exp(-x))

and graph the function:

x = np.arange(-5.0, 5.0, 0.1)
y = sigmoid(x)
plt.plot(x, y)
plt.ylim(-0.1, 1.1)  # 指定y轴的范围
plt.show()

The result is shown in the figure below:

insert image description here

s i g m o i d sigmoid The s i g m o i d function is a smooth curve, and the output changes continuously with the input. while the step function takes0 00 as the boundary, the output changes sharply. sigmoid sigmoidThe smoothness of s i g m o i d function is of great significance to the learning of neural network.

Relative to the step function can only return 0 00 or1 11 s i g m o i d sigmoid The s i g m o i d function can return real numbers such as $0.731\dots, 0.880\dots $ (this is related to the smoothness just now). In other words, the flow between neurons in the perceptron is0 00 or1 1A binary signal of 1 , while a continuous real-valued signal flows in the neural network.

Step function and sigmoid sigmoidAlthough the s i g m o i d functions differ in smoothness, they have similar shapes. Both are structured as "input hour, output close to0 00 (for0 00 ); As the input increases, the output moves toward1 11 close (becomes1 11 )". That is, when the input signal is important information, the step function andsigmoid sigmoidBoth s i g m o i d functions output larger values; when the input signal is unimportant information, both output smaller values.

Step function and sigmoid sigmoidThe s i g m o i d function has another thing in common, that is, both arenonlinear functions. The activation function of the neural network must use a nonlinear function. In other words, the activation function cannot use a linear function. Why can't we use linear functions? Because if you use a linear function, it doesn't make sense to deepen the number of layers in the neural network.

Next, introduce another very important activation function: R e LU ReLUR e L U function. R e LU ReLUR e L U function when the input is greater than0 0When 0 , directly output the value; when the input is less than or equal to0 00 , output0 00 , which can be represented by the following formula:

insert image description here

Its code implementation and function image are as follows:

def relu(x):
	return np.maximum(0, x)

insert image description here

3.3 Operations on multidimensional arrays

A multidimensional array is simply a "collection of numbers", a collection of numbers arranged in a column, in a rectangle, in three dimensions, or (more generally) NNN- dimensional collections are called multidimensional arrays.

A = np.array([1, 2, 3, 4])
np.ndim(A)  # 1,获得数组的维数
A.shape  # (4,)
A.shape[0]  # 4

B = np.array([[1, 2], [3, 4], [5, 6]])
np.ndim(B)  # 2
B.shape  # (3, 2)

Next, let's introduce the product of matrices (two-dimensional arrays). For example 2 × 2 2\times 22×2 matrix, its product can be calculated as shown below:

insert image description here

The product of the matrix is ​​obtained by multiplying the rows of the left matrix (horizontal) and the columns of the right matrix (vertical) by corresponding elements and then summing. This operation can be implemented in Python with the following code:

A = np.array([[1, 2], [3, 4]])
A.shape  # (2, 2)
B = np.array([[5, 6], [7, 8]])
B.shape  # (2, 2)
np.dot(A, B)  # array([[19, 22], [43, 50]]),dot()称为点积运算

It should be noted that in the product operation of multidimensional arrays, the number of elements of the corresponding dimensions in the two matrices must be the same, as shown in the following figure:

insert image description here

Below we use NumPy matrices to implement neural networks. Here we take the simple neural network in the figure below as the object. This neural network omits biases and activation functions, only weights:

insert image description here

3.4 Realization of three-layer neural network

Before introducing the processing in the neural network, we first import w 12 ( 1 ) w_{12}^{(1)}w12(1) a 1 ( 1 ) a_{1}^{(1)} a1(1)and other symbols. Take a look at the figure below, which only highlights neurons from the input layer x 2 x_2x2To the neuron a of the next layer 1 ( 1 ) a_{1}^{(1)}a1(1)the weight of. Weight and hidden layer neurons have a "( 1 )" "(1)" in the upper right corner" ( 1 ) " , which represents the layer number of the weight and neuron (that is, the 1st1Layer 1 weights, 1st 1neurons in layer 1 ). In addition, there are two numbers in the lower right corner of the weights, which are the index numbers of the neurons in the next layer and the neurons in the previous layer. For example,w 12 ( 1 ) w_{12}^{(1)}w12(1)Indicates the 2nd of the previous layer 22 neuronsx 2 x_2x2to the 1st of the next layer 11 neurona 1 ( 1 ) a_{1}^{(1)}a1(1)the weight of.

insert image description here

Now look at from the input layer to the 1st 11st Floor1 1The signal transmission process of a neuron is shown in the figure below:

insert image description here

Now use the mathematical formula to express a 1 ( 1 ) a_{1}^{(1)}a1(1) a 1 ( 1 ) a_{1}^{(1)} a1(1)Calculated by the sum of weighted signal and bias according to this formula: a 1 ( 1 ) = w 11 ( 1 ) x 1 + w 12 ( 1 ) x 2 + b 1 ( 1 ) a_{1}^{(1) }=w_{11}^{(1)}x_1+w_{12}^{(1)}x_2+b_{1}^{(1)}a1(1)=w11(1)x1+w12(1)x2+b1(1)

The calculation process using the multiplication operation of matrices is as follows:

insert image description here

Next, we use NumPy multidimensional arrays to implement the above formula, where the input signal, weight, and bias are set to arbitrary values:

X = np.array([1.0, 0.5])
W1 = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]])
B1 = np.array([0.1, 0.2, 0.3])
print(W1.shape)  # (2, 3)
print(X.shape)  # (2,)
print(B1.shape) # (3,)
A1 = np.dot(X, W1) + B1

Next, we observe the first 1The calculation process of the activation function in layer 1 . If the calculation process is represented by a graph, it is shown in the following figure:

insert image description here

The weighted sum of the hidden layer (the sum of the weighted signal and bias) is aaa indicates that the signal converted by the activation function is represented byzzz said. In addition, h ( ) 4 in the figurerepresents the activation function, here we use h()4 to represent the activation function, here we useh ( ) 4 represents the activation function , here we use the sigmoid $ function . Implemented in Python, the code is as follows:

Z1 = sigmoid(A1)
print(A1)  # [0.3, 0.7, 1.1]
print(Z1)  # [0.57444252, 0.66818777, 0.75026011]

Next, let's implement the first 11st floor to2nd floor 2Layer 2 signaling:

W2 = np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]])
B2 = np.array([0.1, 0.2])
print(Z1.shape) # (3,)
print(W2.shape) # (3, 2)
print(B2.shape) # (2,)
A2 = np.dot(Z1, W2) + B2
Z2 = sigmoid(A2)

insert image description here

The last is the 2nd 2Signal transmission from layer 2 to output layer. The implementation of the output layer is also basically the same as the previous implementation. However, the final activation function is different from the previous hidden layer. Here we defineidentity_function ( ) identity\_function()i d e n t it y _ f u n c t i o n ( ) function (also known as the "identity function"), and use it as the activation function of the output layer . The identity function will output the input as it is, so there is no need to specifically defineidentity_function ( ) identity\_function()i d e n t i t y _ f u n c t i o n ( ) . This implementation here is just to maintain unity with the previous process.

def identity_function(x):
	return x
W3 = np.array([[0.1, 0.3], [0.2, 0.4]])
B3 = np.array([0.1, 0.2])
A3 = np.dot(Z2, W3) + B3
Y = identity_function(A3)  # 或者Y = A3

insert image description here

So far, we have introduced 3 3Implementation of a 3- layer neural network. Now let's sort out all the previous code implementations. Here, in accordance with the implementation convention of neural networks, we only write the weights as capital lettersW 1 W1W 1 , others (bias or intermediate results, etc.) are denoted in lowercase letters.

def init_network():
	network = {
    
    }
	network['W1'] = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]])
	network['b1'] = np.array([0.1, 0.2, 0.3])
	network['W2'] = np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]])
	network['b2'] = np.array([0.1, 0.2])
	network['W3'] = np.array([[0.1, 0.3], [0.2, 0.4]])
	network['b3'] = np.array([0.1, 0.2])
	return network

def forward(network, x):
	W1, W2, W3 = network['W1'], network['W2'], network['W3']
	b1, b2, b3 = network['b1'], network['b2'], network['b3']
	a1 = np.dot(x, W1) + b1
	z1 = sigmoid(a1)
	a2 = np.dot(z1, W2) + b2
	z2 = sigmoid(a2)
	a3 = np.dot(z2, W3) + b3
	y = identity_function(a3)
	return y

network = init_network()
x = np.array([1.0, 0.5])
y = forward(network, x)
print(y)  # [ 0.31682708 0.69627909]

3.5 Design of output layer

Neural networks can be used in classification and regression problems, but the activation function of the output layer needs to be changed according to the situation. In general, identity functions are used for regression problems, and softmax softmax is used for classification problemssoft max function . _ _ _ _

The identity function will output the input as it is, and output the input information directly without any modification. Softmax softmax used in classification problemsThe s o f t max function can be expressed by the following formula:

insert image description here

Implemented in Python:

def softmax(a):
	exp_a = np.exp(a)
	sum_exp_a = np.sum(exp_a)
	y = exp_a / sum_exp_a
	return y

Looking at the code, we found that due to the overflow problem of the exponential function, if a division operation is performed between these very large values, the result will appear "indeterminate". So we need to improve the formula:

insert image description here

Therefore, in the softmax softmaxIn the operation of the exponential function of soft max , adding (or subtracting) a constant will not change the result of the operation . HereC'C'C' can use any value, but in order to prevent overflow, the largest value in the input signal is generally used. For example:

a = np.array([1010, 1000, 990])
np.exp(a) / np.sum(np.exp(a))  # softmax函数的运算
# 返回结果为array([nan, nan, nan]),没有被正确计算

c = np.max(a)  # 1010
a - c  # array([0, -10, -20])
np.exp(a - c) / np.sum(np.exp(a - c))
# 返回结果为array([9.99954600e-01, 4.53978686e-05, 2.06106005e-09])

To sum up, we can implement softmax softmax as followss o f t max function : _

def softmax(a):
	c = np.max(a)
	exp_a = np.exp(a - c)  # 溢出对策
	sum_exp_a = np.sum(exp_a)
	y = exp_a / sum_exp_a
	return y

s o f t m a x softmax The output of the s o f t max function is0.0 0.00 . 0 to1.0 1.0A real number between 1.0 . And,softmax softmaxThe sum of the output values ​​of the s o f t max function is 1 11 . The output sum is1 11 issoftmax softmaxAn important property of soft max function . Because of this property, we can putsoftmax softmaxThe output of the soft max function is interpreted as a "probability".

The number of neurons in the output layer needs to be determined according to the problem to be solved. For classification problems, the number of neurons in the output layer is generally set to the number of categories. For example, for an input image, the prediction is the number 0 0 in the picture0 to9 9Which of the 9 questions ( 10 101 0 category classification problem), you can set the neurons of the output layer to10 101 0 (assumingy 2 y_2y2the maximum output value).

insert image description here

3.6 Handwritten digit recognition

Assuming that the learning of the neural network has all been completed, we use the learned parameters to first implement the "reasoning processing" of the neural network. This inference process is also known as the forward pass of the neural network .

The dataset used here is the MNIST dataset of images of handwritten digits. MNIST is one of the most famous datasets in machine learning, used in everything from simple experiments to published research papers.

The image data of MNIST is 28 × 28 28\times 2828×2 8 -pixel grayscale images (1 11 channel), the value of each pixel is0 00 to255 255Between 2 5 5 . Each image data is marked with"7", "2", "1" "7", "2", "1" accordingly7,2,Labels such as " 1 " .

Assuming that a convenient Python script has been provided mnist.pythat supports processing from downloading the MNIST dataset to converting these data into NumPy arrays, load_mnist()mnist.py in use load\_mnist()l o a d _ m n i s t ( ) function, you can easily read in the MNIST data as follows.

import sys, os
sys.path.append('D:\VS Code Project\Deep Learning')  # 为了导入父目录中的文件而进行的设定
from dataset.mnist import load_mnist

# 第一次调用会花费几分钟 ……
(x_train, t_train), (x_test, t_test) = load_mnist(flatten=True, normalize=False)

# 输出各个数据的形状
print(x_train.shape)  # (60000, 784)
print(t_train.shape)  # (60000,)
print(x_test.shape)  # (10000, 784)
print(t_test.shape)  # (10000,)

load_mnistThe function (训练图像, 训练标签), (测试图像, 测试标签)returns the read MNIST data in the form of . Also, like load_mnist(normalize=True, flatten=True, one_hot_label=False)this, set 3 33 parameters. No.1 11 parameternormalizesetting whether to normalize the input image to0.0 ∼ 1.0 0.0\sim 1.00.01. The value of 0 . If this parameter is set toFalse, the pixels of the input image will remain the original0 ∼ 255 0\sim 25502 5 5 . Second22 parametersflattenset whether to expand the input image (into a one-dimensional array). If this parameter is set toFalse, the input image is1 × 28 × 28 1\times 28\times 281×28×2 8 three-dimensional array; if set toTrue, the input image will be saved as784 7847 8 A one-dimensional array of 4 elements . 3rd3rd3 parametersone_hot_labelto set whether to save the label asone − hot one-hotonehot表示( o n e − h o t   r e p r e s e n t a t i o n one-hot\ representation onehot representation)。 o n e − h o t one-hot oneh o t means that only the correct solution label is1 11 , the rest are0 00 's, like[0,0,1,0,0,0,0,0,0,0]this. whenone_hot_labelitFalse 's just like7 , 2 7,27,2 It is simple to save the correct solution label; whenone_hot_labelit isTrue , the label is saved asone − hot one-hotoneH o t said.

Next use the PIL module to display the first image of the training images:

import sys, os
sys.path.append('D:\VS Code Project\Deep Learning')
import numpy as np
from dataset.mnist import load_mnist
from PIL import Image

def img_show(img):
	pil_img = Image.fromarray(np.uint8(img))
	pil_img.show()

(x_train, t_train), (x_test, t_test) = load_mnist(flatten=True, normalize=False)
img = x_train[0]
label = t_train[0]
print(label)  # 5
print(img.shape)  # (784,)
img = img.reshape(28, 28)  # 把图像的形状变成原来的尺寸
print(img.shape)  # (28, 28)
img_show(img)

The displayed results are shown in the figure below:

insert image description here

The input layer of the neural network has 784 7847 8 4 neurons, output layer has10 1010 neurons . 784 784of the input layerThe number 7 8 4 comes from the image size of28 × 28 = 784 28\times 28=78428×28=7 8 4 ,10 10The number 1 0 comes from 10 101 0 category classification (digital0 ∼ 9 0\sim 909 of 10 out of101 0 categories). Furthermore, this neural network has2 22 hidden layers, 1st11 hidden layer has50 505 0 neurons, 2nd22 hidden layers have100 1001 0 0 neurons. this50 505 0 and100 1001 0 0 can be set to any value.

Let us first define 3 33 functions:

def get_data():
	(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, flatten=True, one_hot_label=False)
	return x_test, t_test

def init_network():
	with open("sample_weight.pkl", 'rb') as f:
		network = pickle.load(f)
	return network

def predict(network, x):
	W1, W2, W3 = network['W1'], network['W2'], network['W3']
	b1, b2, b3 = network['b1'], network['b2'], network['b3']
	a1 = np.dot(x, W1) + b1
	z1 = sigmoid(a1)
	a2 = np.dot(z1, W2) + b2
	z2 = sigmoid(a2)
	a3 = np.dot(z2, W3) + b3
	y = softmax(a3)
	return y

init_network()Will read in the learned weight parameters AA saved in picklethe filesample_weight.pklA. _ This file holds weight and bias parameters in the form of dictionary variables. remaining2 2The implementation of the two functions is basically the same as the code introduced earlier, so no further explanation is needed. Now, we use this3 3Three functions are used to implement the inference processing of the neural network and evaluate its recognition accuracy:

x, t = get_data()
network = init_network()
accuracy_cnt = 0
for i in range(len(x)):  # 逐一取出保存在x中的图像数据
	y = predict(network, x[i])
	p = np.argmax(y)  # 获取概率最高的元素的索引
	if p == t[i]:
		accuracy_cnt += 1
print("Accuracy:" + str(float(accuracy_cnt) / len(x)))
# Accuracy:0.9352

Next, we use the Python interpreter to output the shape of the weights of each layer of the neural network just now:

x, _ = get_data()
network = init_network()
W1, W2, W3 = network['W1'], network['W2'], network['W3']
x.shape  # (10000, 784)
x[0].shape  # (784,)
W1.shape  # (784, 50)
W2.shape  # (50, 100)
W3.shape  # (100, 10)

Confirm the shape of the matrix:

insert image description here

Now let's consider the case of packing multiple input images. For example, we want to use predict()functions to package and process 100 100 at one time1 0 0 images. To this end, you can putxxThe shape of x is changed to 100 × 784 100\times 784100×7 8 4 will be100 1001 0 0 images are packaged as input data, and this packaged input data is calledbatch(batch batchb a t c h ). Batch means "bundle", and the images are bundled together like banknotes, as shown in the following figure:

insert image description here

Let's implement the batch-based code below:

x, t = get_data()
network = init_network()
batch_size = 100  # 批数量
accuracy_cnt = 0
for i in range(0, len(x), batch_size):
	x_batch = x[i:i + batch_size]
	y_batch = predict(network, x_batch)
	p = np.argmax(y_batch, axis=1)
	accuracy_cnt += np.sum(p == t[i:i + batch_size])
print("Accuracy:" + str(float(accuracy_cnt) / len(x)))

Next section: [Study Notes] Introduction to Deep Learning: Theory and Implementation Based on Python - Learning of Neural Networks .

Guess you like

Origin blog.csdn.net/m0_51755720/article/details/128129048