Deep Learning|BP Neural Network

1. Artificial neural network

1.1 Artificial Neural Network Theory

Artificial Neural Network (ANN) is a complex network system that connects a large number of neurons to simulate the way the human brain processes information, and performs information parallel processing and nonlinear conversion. From a biological point of view, the main structure of nerve cells in the human brain is divided into dendrites, cell bodies, and axons, which are used to receive information, process information, and transmit information, as shown in the following figure:

Think of biological neurons as a directed graph, the nodes in the graph are neurons, and the input and output of the neuron are defined as the left and right sides of the neuron. For a neuron, it should contain information input, output channel, current state and threshold. In the figure on the right, x represents the input signal, w represents the input weight, θ represents the threshold, and y represents the output signal. Then the internal state u of the neuron can be expressed for:

In general, the output function of a neuron is represented by f , and the nonlinear characteristics of the network are usually represented by piecewise linear functions , step functions , Sigmoid functions, etc., as shown in the figure:

It is generally believed that the neural network is a highly complex nonlinear dynamic system, which has a wide range of applications in many fields such as pattern recognition, function approximation, and loan risk assessment, and can solve problems that are difficult to solve by traditional problems. According to the different connection methods, the connections between the neurons of the neural network are as follows:

(1) Forward network

The layered arrangement of neurons in the feedforward network is divided into input layer, hidden layer and output layer. The neurons in each layer only receive input from the neurons in the previous layer, and the subsequent layers have no signal feedback to the previous layer. The input pattern is sequentially propagated through the layers, and finally gets the output on the output layer. Both the perceptron network and the BP neural network belong to the forward network , and the structural form of this type of network is as follows:

(2) Forward network with feedback

The output layer has information feedback to the input layer. This kind of network can be used to store a certain pattern sequence. For example, the neural perceptron and the regression BP network belong to this type. The structure of this type of network is as follows:

(3) There are forward networks combined with each other in the layer

Through the mutual combination of neurons in the same layer, the lateral inhibition or excitation mechanism between neurons in the same layer can be realized, which can limit the neurons that can act at the same time in each layer, or divide the neurons in each layer into several groups. , let each group operate as a whole, the structure of this type of network is as follows:

(4) Mutually integrated network

The network is also known as a fully interconnected network, and it is possible to connect between any two neurons in this network. Both Hopfield networks and Boltzmann machines are of this type. In a feed-forward network without feedback, once a signal passes through a neuron, the processing of that neuron is over. In interlinked networks, however, signals are passed repeatedly from neuron to neuron, and the network is in a state-changing dynamic. The signal starts from an initial state and reaches a certain equilibrium state after several changes. According to the structure of the network and the characteristics of neurons, the operation of the network may also enter into periodic oscillation or other equilibrium states such as chaos.

1.2 Learning method of artificial neural network

The learning of the neural network is also called training, which refers to the process of adjusting the parameters (weights and thresholds) of the neural network through the stimulation of the environment where the neural network is located, so that the neural network can respond to the external environment in a new way. According to the organization of the learning process, the learning methods are divided into supervised learning and unsupervised learning, among which BP algorithm is the most typical supervised learning algorithm.

For different neural network structures and models, there are different learning rules in the process of network learning. These rules are used to adjust the connection weights between neurons to realize the learning of neural networks. The common learning rules are as follows:

1.2.1 Hebb rule

According to the conditioned reflex mechanism in physiology, Donall Hebb proposed the rules for the change of neuron connection strength in 1949. The main idea of ​​the Hebb rule is that if two neurons are excited at the same time, that is, they are activated at the same time, the synaptic connection between them is strengthened, otherwise it is weakened. Commonly used in auto-associative networks, such as Hopfield networks.

1.2.2 Delta rule

Delta规则根据输出节点的外部反馈来改变权系数,在方法上和梯度下降法等效,按局部改善最大的方向一步步进行优化,从而最终找到全局优化值。感知器学习就采用这种纠错学习机制,如 BP 算法。用于统计性算法的模拟退火算法也属于这种学习规则。

1.2.3 相近学习规则

相近学习规则根据神经元之间的输出决定权值的调整,如果两个神经元的输出比较相似,则连接它们的权值增,反之减小。相近学习规则多用于竞争型神经网络的学习,ART和SOFM等自组织竞争型网络均采用该学习规则。

二、BP神经网络

2.1 BP神经网络算法原理

BP神经网络的全称为Back-Propagation Network,即反向传播网络,它是一个前向多层网络,利用误差反向传播算法对网络进行训练。BP神经网络的结构由输入层、隐含层和输出层构成,结构简单、可塑性强,输入层的节点只起到缓冲器的作用,负责把网络的输入数据传递给第一隐含层,因而各节点之间没有传递函数的功能。BP神经网络的结构形态属于前向网络,如下:

BP神经网络的上下层之间实现全连接,而每层神经元之间无连接。当一对学习样本提供给网络后,神经元的激活值从输入层经各隐含层传递至输出层。按照减少目标输出和实际输出之间误差的方向,从输出层反向经过中间层回到输入层,从而逐步修正各连接权值。与感知机不同的是由于误差反向传播中会对传递函数进行求到计算,BP神经网络的传递函数必须是可微的,所以不能使用感知机网络中的硬阈值传递函数,而通常使用的传递函数仍是上文所述的分段线性函数阶跃函数Sigmoid函数

BP算法使用广义的δ规则,可以在多网络上进行有效学习,其关键是对隐含层节点的偏差δ如何定义计算。设有n个节点的任意网络,任意节点i的输出为O,并设有N个样本,对某一输入网络输出为,阶点的输出为,节点j的输入则为:

使用平方型误差函数:

式中为网络的实际输出,定义如下:

式中,因此:

当j为输出节点时,,则:

若j不是输出节点时,有:

如果网络共有M层,而第M层仅含输出节点,第一层为输入节点,则 BP神经网络可以描述为:

  1. 选定初始权值W

  1. 重复下述过程直到收敛:

(1)对k=1到N,计算(前向传播过程),对各层从M到2方向计算

(2)修正权值

η称为学习步长,用以调整梯度下降的幅度,决定了网络学习的稳定性和训练的速度。此处算法采用批量模式,用全局误差对网络权值进行调整。

2.2 BP神经网络优缺点

2.2.1 BP神经网络优点

  1. 网络由很多微处理单元连接而成,每个单元功能简单,但集成后的整体功能强大,处理速度较快;

  1. 具有较强容错性,局部神经元损坏后,不会对全局造成太大影响;

  1. 网络记忆的信息存储于神经元之间的连接权值上,是分布式存储方式;

  1. 学习功能强大,连接权值和连接结构可通过学习得到。

2.2.2 BP神经网络缺点

  1. 网络学习和记忆不稳定,增加样本时需要重新训练;

  1. 隐含层层数和每层神经元个数需根据经验值或反复试算确定;

  1. 网络收敛较慢,对于复杂问题训练时间较长,可通过变化学习速率自适应学习速率加以改进;

  1. 网络可以使权值收敛到一定的值,但难以保证其为误差平面的全局最小值,因为梯度下降法可能产生一个局部最小值,可采用附加动量法解决此问题。

三、使用Python实现BP神经网络

以下通过两个案例,说明如何使用Python实现BP神经网络的编程。

3.1 XOR异或运算

逻辑运算之中,除了AND和OR,还有一种XOR运算,中文称为“异或运算”。它的定义是:两个值相同时,返回false,否则返回true。也就是说,XOR可以用来判断两个值是否不同。例如有列表a,想对其中的元素进行异或判断:

a = [[0, 0], [0, 1], [1, 1], [1, 0]]

Python中提供了异或运算符号“^”,可直接得出运算结果:

b = []
for item in a:
    b.append(item[0]^item[1])
print(b)

## 输出结果:
[0, 1, 0, 1]

如果不借用“^”运算符,我们希望构建一个具有两层隐含层的人工神经网络对此运算过程进行学习,达到异或运算的目的。按照构建BP神经网络的逻辑,主要步骤分为:

(1)导入工具包:

import numpy as np

为说明构建原理,此处只调用numpy包

(2)初始化神经网络权重:

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.weights1 = np.random.randn(input_size, hidden_size)  ## 输入层与隐含层的权重
        self.weights2 = np.random.randn(hidden_size, hidden_size)  ## 隐含层与隐含层的权重
        self.weights3 = np.random.randn(hidden_size, output_size)  ## 隐含层与输入层的权重

(3)定义激活函数和激活函数的导数

def sigmoid(self, x):
    # 定义sigmoid激活函数
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(self, x):
    # 定义sigmoid激活函数的导数
    return x * (1 - x)

通常使用sigmoid函数,当然也可根据需要选择线性函数或阶跃函数。

(4)定义前向传播和后向传递

def feedforward(self, X):
    # 前向传播计算输出值
    self.layer1 = self.sigmoid(np.dot(X, self.weights1))
    self.layer2 = self.sigmoid(np.dot(self.layer1, self.weights2))
    self.output = self.sigmoid(np.dot(self.layer2, self.weights3))
    return self.output

def backpropagation(self, X, y, learning_rate):
    # 计算输出误差
    output_error = y - self.output
    output_delta = output_error * self.sigmoid_derivative(self.output)

    # 计算隐藏层2误差
    layer2_error = output_delta.dot(self.weights3.T)
    layer2_delta = layer2_error * self.sigmoid_derivative(self.layer2)

    # 计算隐藏层1误差
    layer1_error = layer2_delta.dot(self.weights2.T)
    layer1_delta = layer1_error * self.sigmoid_derivative(self.layer1)

    # 更新权重
    self.weights1 += X.T.dot(layer1_delta) * learning_rate
    self.weights2 += self.layer1.T.dot(layer2_delta) * learning_rate
    self.weights3 += self.layer2.T.dot(output_delta) * learning_rate

(5)训练与预测

def train(self, X, y, epochs, learning_rate):
    for i in range(epochs):
        output = self.feedforward(X)
        self.backpropagation(X, y, learning_rate)

def predict(self, X):
    return self.feedforward(X)

(6)完整代码

import numpy as np

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # 初始化神经网络的权重
        self.weights1 = np.random.randn(input_size, hidden_size)
        self.weights2 = np.random.randn(hidden_size, hidden_size)
        self.weights3 = np.random.randn(hidden_size, output_size)

    def sigmoid(self, x):
        # 定义sigmoid激活函数
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        # 定义sigmoid激活函数的导数
        return x * (1 - x)

    def feedforward(self, X):
        # 前向传播计算输出值
        self.layer1 = self.sigmoid(np.dot(X, self.weights1))
        self.layer2 = self.sigmoid(np.dot(self.layer1, self.weights2))
        self.output = self.sigmoid(np.dot(self.layer2, self.weights3))
        return self.output

    def backpropagation(self, X, y, learning_rate):
        # 计算输出误差
        output_error = y - self.output
        output_delta = output_error * self.sigmoid_derivative(self.output)

        # 计算隐藏层2误差
        layer2_error = output_delta.dot(self.weights3.T)
        layer2_delta = layer2_error * self.sigmoid_derivative(self.layer2)

        # 计算隐藏层1误差
        layer1_error = layer2_delta.dot(self.weights2.T)
        layer1_delta = layer1_error * self.sigmoid_derivative(self.layer1)

        # 更新权重
        self.weights1 += X.T.dot(layer1_delta) * learning_rate
        self.weights2 += self.layer1.T.dot(layer2_delta) * learning_rate
        self.weights3 += self.layer2.T.dot(output_delta) * learning_rate

    def train(self, X, y, epochs, learning_rate):
        for i in range(epochs):
            output = self.feedforward(X)
            self.backpropagation(X, y, learning_rate)

    def predict(self, X):
        return self.feedforward(X)

# 创建神经网络对象
nn = NeuralNetwork(2, 5, 1)

# 训练神经网络
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
nn.train(X, y, 10000, 0.1)

# 使用神经网络进行预测
print(nn.predict(np.array([[0, 0], [1, 1], [1, 1], [0, 1]])))

输出结果为:

[[0.02377718]
 [0.03460649]
 [0.03460649]
 [0.96909858]]

预测结果接近于[0, 0, 0, 1],与实际情况相吻合,说明模型训练有效并能成功实现预测。

3.2 油价预测案例

假设某地市场连续20个周期的油价(元/升)如下:7.82、7.89、7.46、8.23、7.99、7.97、7.78、7.66、7.97、8.01、8.10、8.15、8.16、8.02、7.87、7.89、8.03、8.04、8.28、8.34。我们期望建立一个具有可行性的油价预测模型,以上数据共20项,我们可以将前10项作为输入数据,后10项作为输出数据训练一个神经网络,假设只建立一层隐含层,则可构建如下程序:

3.2.1 使用numpy库

import numpy as np

# 定义sigmoid函数和其导数
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# 定义神经网络类
class NeuralNetwork:
    
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # 初始化权重矩阵
        self.weights1 = np.random.randn(self.input_size, self.hidden_size)
        self.weights2 = np.random.randn(self.hidden_size, self.output_size)
        
    def forward(self, X):
        # 前向传播
        self.hidden_layer = sigmoid(np.dot(X, self.weights1))
        self.output_layer = sigmoid(np.dot(self.hidden_layer, self.weights2))
        return self.output_layer
    
    def backward(self, X, y, output):
        # 反向传播
        output_error = y - output
        output_delta = output_error * sigmoid_derivative(output)
        
        hidden_error = output_delta.dot(self.weights2.T)
        hidden_delta = hidden_error * sigmoid_derivative(self.hidden_layer)
        
        # 更新权重矩阵
        self.weights2 += self.hidden_layer.T.dot(output_delta)
        self.weights1 += X.T.dot(hidden_delta)
        
    def train(self, X, y, epochs):
        for i in range(epochs):
            output = self.forward(X)
            self.backward(X, y, output)

# 定义训练函数和预测函数
def train(X, y, input_size, hidden_size, output_size, epochs):
    nn = NeuralNetwork(input_size, hidden_size, output_size)
    nn.train(X, y, epochs)
    return nn

def predict(X, model):
    return model.forward(X)

# 输入和输出数据
X = np.array([[7.82], [7.89], [7.46], [8.23], [7.99], [7.97], [7.78], [7.66], [7.97], [8.01]])
y = np.array([[8.10], [8.15], [8.16], [8.02], [7.87], [7.89], [8.03], [8.04], [8.28], [8.34]])

# 标准化输入和输出数据
X_norm = (X - X.mean()) / X.std()
y_norm = (y - y.min()) / (y.max() - y.min())

# 训练神经网络模型
model = train(X_norm, y_norm, input_size=1, hidden_size=100, output_size=1, epochs=10000)

模型训练完成后,即可输入目前的油价值以预测未来周期的油价,如下:

# 预测结果并反归一化
X_new = np.array([[7.75], [7.33], [7.65], [8.34], [8.23]])
y_new_norm = predict((X_new - X.mean()) / X.std(), model)
y_new = y_new_norm * (y.max() - y.min()) + y.min()
print(y_new)

# 输出结果
[[8.25072217]
 [8.33999685]
 [8.31532709]
 [7.87717202]
 [8.00565704]]

最后得到的结果即为未来五个周期的油价,分别为8.25、8.34、8.32、7.88、8.01。

3.2.2 使用sklearn库

sklearn,全称scikit-learn,是python中的机器学习库,建立在numpyscipymatplotlib等数据科学包的基础之上,涵盖了机器学习中的样例数据、数据预处理、模型验证、特征选择、分类、回归、聚类、降维等几乎所有环节,功能十分强大。深度学习库中存在pytorch、Tensorflow等众多选择,而sklearn是python中传统机器学习的首选库,并不存在其他的竞争者。

import numpy as np
from sklearn.neural_network import MLPRegressor

# 输入和输出数据
X = np.array([[7.82], [7.89], [7.46], [8.23], [7.99], [7.97], [7.78], [7.66], [7.97], [8.01]])
y = np.array([[8.10], [8.15], [8.16], [8.02], [7.87], [7.89], [8.03], [8.04], [8.28], [8.34]])

# 标准化输入和输出数据
X_norm = (X - X.mean()) / X.std()
y_norm = (y - y.min()) / (y.max() - y.min())

# 构建BP神经网络模型
model = MLPRegressor(hidden_layer_sizes=(100,), activation='relu', solver='adam', max_iter=1000, random_state=1)

# 训练模型
model.fit(X_norm, y_norm)

同样在模型训练完成后,输入指定值后可得到预测值:

# 预测结果
X_new = np.array([[7.75], [7.33], [7.65], [8.34], [8.23]])
y_new_norm = model.predict((X_new - X.mean()) / X.std())
y_new = y_new_norm * (y.max() - y.min()) + y.min()
print(y_new)

# 输出结果
[[8.25072217]
 [8.33999685]
 [8.31532709]
 [7.87717202]
 [8.00565704]]

可以看到使用sklearn库得到的结果与numpy库相同,说明其底层结构仍然建立在numpy上,方便起见,可使用sklearn库简化代码进行编程。

3.2.3 使用pytorch库

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# 构建神经网络
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(1, 10)
        self.fc2 = nn.Linear(10, 1)

    def forward(self, x):
        x = torch.sigmoid(self.fc1(x))
        x = self.fc2(x)
        return x

# 准备数据
x_train = np.array([[7.82], [7.89], [7.46], [8.23], [7.99], [7.97], [7.78], [7.66], [7.97], [8.01]])
y_train = np.array([[8.10], [8.15], [8.16], [8.02], [7.87], [7.89], [8.03], [8.04], [8.28], [8.34]])

# 转换为tensor类型
x_train = torch.from_numpy(x_train).float()
y_train = torch.from_numpy(y_train).float()

# 初始化神经网络
net = Net()

# 定义损失函数和优化器
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.1)

# 训练神经网络
for epoch in range(1000):
    optimizer.zero_grad()
    outputs = net(x_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()
    if (epoch+1) % 100 == 0:
        print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, 1000, loss.item()))

该网络同样使用一个输入节点和一个输出节点,中间含有一层10个节点的隐藏层。训练过程中使用均方误差(MSE)作为损失函数,使用随机梯度下降(SGD)作为优化器。经过1000次迭代,训练出的模型可以用来预测新的数据,如下:

# 预测并输出结果
x_test = np.array([[7.75], [7.33], [7.65], [8.34], [8.23]])
x_test = torch.from_numpy(x_test).float()
predicted = net(x_test).detach().numpy()
print('Predicted values:', predicted)

# 输出结果
Epoch [100/1000], Loss: 0.0207
Epoch [200/1000], Loss: 0.0207
Epoch [300/1000], Loss: 0.0207
Epoch [400/1000], Loss: 0.0207
Epoch [500/1000], Loss: 0.0207
Epoch [600/1000], Loss: 0.0207
Epoch [700/1000], Loss: 0.0207
Epoch [800/1000], Loss: 0.0207
Epoch [900/1000], Loss: 0.0207
Epoch [1000/1000], Loss: 0.0207
Predicted values: 
[[8.086877]
 [8.081727]
 [8.085794]
 [8.091827]
 [8.091065]]

由于底层逻辑不同,深度学习库的pytoch得到的预测结果和传统机器学习得到的预测结果不同,在实际预测时,可根据具体要求决定采用哪种库。

Guess you like

Origin blog.csdn.net/weixin_58243219/article/details/129657374