Implementation of Radial Basis Neural Network (RBFNN) (Python, with source code and dataset)

1. Theoretical basis

Radial Basis Function Neural Network (RBFNN) is a three-layer forward network with strong mapping function. Its principle is close to that of backpropagation neural network (BPNN). The radial basis function is used as the activation function of the hidden layer. After the data is passed from the input layer to the hidden layer, it is mapped nonlinearly through the radial basis function, and then passed to the output layer for output through linear calculation.
Backpropagation Neural Network (BPNN) Principle Reference:
Implementation of Backpropagation Neural Network (BPNN) (Python, with source code and dataset)

1. Radial basis neural network structure

The structural diagram of the radial basis neural network is shown in the figure below:
insert image description here

2. Forward propagation process

The forward propagation process of the radial basis neural network is similar to unsupervised learning. First, the data is clustered by using a clustering algorithm such as K-means, and the center point generated by the cluster is used as the center point of the radial basis function of the hidden layer. , where the radial basis function generally uses a Gaussian function, and then uses the center point information to calculate the width vector of the radial basis function. The calculation formula of the width vector is as follows:
insert image description here

Where c_max is the maximum distance between center points and h is the number of nodes.
After that, the input data go through the hidden layer and the output layer for correlation calculation, and the output of the input sample x_i in the jth node of the hidden layer is calculated by the following formula:
insert image description here

Where c_j and σ_j are the center point and width vector of the jth node in the hidden layer, respectively. In addition to the above methods, the center point and width vector of the hidden layer can also be directly randomly generated.
The output of the input sample x_i at the mth node of the output layer is calculated by the following formula:
insert image description here

Where ω_m is the weight of the node, and φ is the activation function.
Activation function principle reference:
activation function of neural network basics

3. Backpropagation process

The backpropagation process of the radial basis neural network is similar to supervised learning. It is mainly a process of continuously correcting the center point, width vector of the hidden layer of the network, and the weight and threshold of the output layer. This process is mainly calculated by the loss function Get the gradient value of each parameter, and then use the backpropagation algorithm such as stochastic gradient descent (SGD) to continuously correct the weight value. Taking the weight value of the output layer as an example, the update formula is as follows:
insert image description here

where E is the loss function and μ is the learning rate.
Loss function principle reference:
loss function of basic knowledge of machine learning
Backpropagation principle reference:
neural network backpropagation algorithm (gradient, error backpropagation algorithm BP)

4. Modeling steps

Taking forecasting using radial basis neural network as an example, the modeling steps of radial basis neural network forecasting model can be summarized as follows:

  1. Determine the number of nodes in the input layer, hidden layer and output layer of the radial basis neural network according to the relevant characteristics of the input data;
  2. Use the K-means algorithm to cluster the input data of the model, use the center point generated by the clustering as the center point of the radial basis function of the hidden layer, and obtain the width vector of the radial basis function of the hidden layer through the calculation of the center point;
  3. Choose a parameter initialization method to randomly initialize the connection weights and thresholds of the radial basis neural network output layer;
  4. The data is input into the radial basis neural network from the input layer, and after being transmitted to the hidden layer, the data is transformed nonlinearly by the radial basis function;
  5. After the data is output in the hidden layer, it is passed to the output layer, and after linear calculation with the connection weight of the output layer, the activation function performs nonlinear conversion, and finally obtains the forward propagation output of the network;
  6. Select a loss function to perform correlation calculations on the forward propagation output of the network and the target value to obtain the loss value;
  7. Calculate the output layer connection weight and the gradient of the threshold with the loss value of the output layer, and select a backpropagation algorithm to adjust them;
  8. The loss value is passed to the hidden layer, and the same backpropagation algorithm is used to adjust the center point and width vector of the hidden layer;
  9. Obtain a radial basis neural network whose parameters are updated;
  10. Before reaching the maximum number of iterations or satisfying the stop iteration condition, repeat steps 4 to 9. After reaching the maximum number of iterations, output the radial basis neural network determined by the parameters of the hidden layer and the output layer.

Parameter initialization method reference:
Parameter initialization of basic knowledge of neural network

2. Realization of Radial Basis Neural Network

Taking data prediction as an example, the following describes the process of implementing a radial basis neural network based on Python.
The data set of heavy metal elements in the surface soil of a province and city is selected as the experimental data. The data set has a total of 96 groups, 24 of which are randomly selected as the test data set, and 72 groups are used as the training data set. The content of heavy metal Ti is selected as the output feature to be predicted, and the heavy metals Co, Cr, Mg, and Pb are selected as the input features of the model.

1. Training process (RBFNN.py)

#库的导入
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans

#激活函数
def tanh(x):
    return (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
#激活函数偏导数
def de_tanh(x):
    return (1-x**2)


#参数设置
samnum = 72  #输入数据数量
hiddenunitnum = 8  #隐含层节点数
indim = 4  #输入层节点数
outdim = 1  #输出层节点数
maxepochs = 500  #最大训练次数
errorfinal = 0.65*10**(-3)  #停止迭代训练条件
learnrate = 0.001  #学习率

#输入数据的导入
df = pd.read_csv("train.csv")
df.columns = ["Co", "Cr", "Mg", "Pb", "Ti"]
Co = df["Co"]
Co = np.array(Co)
Cr = df["Cr"]
Cr = np.array(Cr)
Mg=df["Mg"]
Mg=np.array(Mg)
Pb = df["Pb"]
Pb =np.array(Pb)
Ti = df["Ti"]
Ti = np.array(Ti)
samplein = np.mat([Co,Cr,Mg,Pb])
#数据归一化,将输入数据压缩至0到1之间,便于计算,后续通过反归一化恢复原始值
sampleinminmax = np.array([samplein.min(axis=1).T.tolist()[0],samplein.max(axis=1).T.tolist()[0]]).transpose()#对应最大值最小值
sampleout = np.mat([Ti])
sampleoutminmax = np.array([sampleout.min(axis=1).T.tolist()[0],sampleout.max(axis=1).T.tolist()[0]]).transpose()#对应最大值最小值
sampleinnorm = ((np.array(samplein.T)-sampleinminmax.transpose()[0])/(sampleinminmax.transpose()[1]-sampleinminmax.transpose()[0])).transpose()
sampleoutnorm = ((np.array(sampleout.T)-sampleoutminmax.transpose()[0])/(sampleoutminmax.transpose()[1]-sampleoutminmax.transpose()[0])).transpose()

#给归一化后的数据添加噪声
noise = 0.03*np.random.rand(sampleoutnorm.shape[0],sampleoutnorm.shape[1])
sampleoutnorm += noise

#聚类生成隐含层径向基函数的中心点w1
x = sampleinnorm.transpose()
estimator=KMeans(n_clusters=8,max_iter=10000)
estimator.fit(x)
w1 = estimator.cluster_centers_


#计算得到隐含层的宽度向量b1
b1 = np.mat(np.zeros((hiddenunitnum,outdim)))
for i in range(hiddenunitnum):
    cmax = 0
    for j in range(hiddenunitnum):
        temp_dist=np.sqrt(np.sum(np.square(w1[i,:]-w1[j,:])))
        if cmax<temp_dist:
            cmax=temp_dist
    b1[i] = cmax/np.sqrt(2*hiddenunitnum)


#随机生成输出层的权值w2、阈值b2
scale = np.sqrt(3/((indim+outdim)*0.5))
w2 = np.random.uniform(low=-scale,high=scale,size=[hiddenunitnum,outdim])
b2 = np.random.uniform(low=-scale, high=scale, size=[outdim,1])

#将输入数据、参数设置为矩阵,便于计算
inputin=np.mat(sampleinnorm.T)
w1=np.mat(w1)
b1=np.mat(b1)
w2=np.mat(w2)
b2=np.mat(b2)

#errhistory存储误差
errhistory = np.mat(np.zeros((1,maxepochs)))
#开始训练
for i in range(maxepochs):
    #前向传播计算
    #hidden_out为隐含层输出
    hidden_out = np.mat(np.zeros((samnum, hiddenunitnum)))
    for a in range(samnum):
        for j in range(hiddenunitnum):
            d=(inputin[a, :] - w1[j, :]) * (inputin[a, :] - w1[j, :]).T
            c=2 * b1[j, :] * b1[j, :]
            hidden_out[a, j] = np.exp((-1.0 )* (d/c))
    #output为输出层输出
    output = tanh(hidden_out * w2 + b2)
    # 计算误差
    out_real = np.mat(sampleoutnorm.transpose())
    err = out_real - output
    loss = np.sum(np.square(err))
    #判断是否停止训练
    if loss < errorfinal:
        break
    errhistory[:,i] = loss
    #反向传播计算
    output=np.array(output.T)
    belta=de_tanh(output).transpose()
    #分别计算每个参数的误差项
    dw1now = np.zeros((8,4))
    db1now = np.zeros((8,1))
    dw2now = np.zeros((8,1))
    db2now = np.zeros((1,1))
    for j in range(hiddenunitnum):
        sum1 = 0.0
        sum2 = 0.0
        sum3 = 0.0
        sum4 = 0.0
        for a in range(samnum):
            sum1 +=err[a,:] * belta[a,:] * hidden_out[a,j] * (inputin[a,:]-w1[j,:])
            sum2 +=err[a,:] * belta[a,:] * hidden_out[a,j] * (inputin[a,:]-w1[j,:])*(inputin[a,:]-w1[j,:]).T
            sum3 +=err[a,:] * belta[a,:] * hidden_out[a,j]
            sum4 +=err[a,:] * belta[a,:]
        dw1now[j,:]=(w2[j,:]/(b1[j,:]*b1[j,:])) * sum1
        db1now[j,:] =(w2[j,:]/(b1[j,:]*b1[j,:]*b1[j,:])) * sum2
        dw2now[j,:] =sum3
        db2now = sum4
    #根据误差项对四个参数进行更新
    w1 += learnrate * dw1now
    b1 += learnrate * db1now
    w2 += learnrate * dw2now
    b2 += learnrate * db2now
    print("the iteration is:",i+1,",the loss is:",loss)

print('更新的权重w1:',w1)
print('更新的偏置b1:',b1)
print('更新的权重w2:',w2)
print('更新的偏置b2:',b2)
print("The loss after iteration is :",loss)

#保存训练结束后的参数,用于测试
np.save("w1.npy",w1)
np.save("b1.npy",b1)
np.save("w2.npy",w2)
np.save("b2.npy",b2)

2. Test process (test.py)

#库的导入
import numpy as np
import pandas as pd

#激活函数tanh
def tanh(x):
    return (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))

#输入数据的导入,用于测试数据的归一化与返归一化
df = pd.read_csv("train.csv")
df.columns = ["Co", "Cr", "Mg", "Pb", "Ti"]
Co = df["Co"]
Co = np.array(Co)
Cr = df["Cr"]
Cr = np.array(Cr)
Mg=df["Mg"]
Mg=np.array(Mg)
Pb = df["Pb"]
Pb =np.array(Pb)
Ti = df["Ti"]
Ti = np.array(Ti)
samplein = np.mat([Co,Cr,Mg,Pb])
sampleinminmax = np.array([samplein.min(axis=1).T.tolist()[0],samplein.max(axis=1).T.tolist()[0]]).transpose()#对应最大值最小值
sampleout = np.mat([Ti])
sampleoutminmax = np.array([sampleout.min(axis=1).T.tolist()[0],sampleout.max(axis=1).T.tolist()[0]]).transpose()#对应最大值最小值

#导入训练的参数
w1=np.load('w1.npy')
w2=np.load('w2.npy')
b1=np.load('b1.npy')
b2=np.load('b2.npy')
w1 = np.mat(w1)
w2 = np.mat(w2)
b1 = np.mat(b1)
b2 = np.mat(b2)

#测试数据的导入
df = pd.read_csv("test.csv")
df.columns = ["Co", "Cr", "Mg", "Pb", "Ti"]
Co = df["Co"]
Co = np.array(Co)
Cr = df["Cr"]
Cr = np.array(Cr)
Mg=df["Mg"]
Mg=np.array(Mg)
Pb = df["Pb"]
Pb =np.array(Pb)
Ti = df["Ti"]
Ti = np.array(Ti)
input=np.mat([Co,Cr,Mg,Pb])

#测试数据数量
testnum = 24
#隐含层节点数量
hiddenunitnum = 8

#测试数据中输入数据的归一化
inputnorm=(np.array(input.T)-sampleinminmax.transpose()[0])/(sampleinminmax.transpose()[1]-sampleinminmax.transpose()[0])
#hidden_out2用于保存隐含层输出
hidden_out2 = np.mat(np.zeros((testnum,hiddenunitnum)))
#计算隐含层输出
for a in range(testnum):
    for j in range(hiddenunitnum):
        d = (inputnorm[a, :] - w1[j, :]) * (inputnorm[a, :] - w1[j, :]).T
        c = 2 * b1[j, :] * b1[j, :].T
        hidden_out2[a, j] = np.exp((-1.0) * (d / c))
#计算输出层输出
output = tanh(hidden_out2 * w2 + b2)
#对输出结果进行反归一化
diff = sampleoutminmax[:,1]-sampleoutminmax[:,0]
networkout2 = output*diff+sampleoutminmax[0][0]
networkout2 = np.array(networkout2).transpose()
output1=networkout2.flatten()#降成一维数组
output1=output1.tolist()
for i in range(testnum):
    output1[i] = float('%.2f'%output1[i])
print("the prediction is:",output1)

#将输出结果与真实值进行对比,计算误差
output=Ti
err = output1 - output
rmse = (np.sum(np.square(output-output1))/len(output)) ** 0.5
mae = np.sum(np.abs(output-output1))/len(output)
average_loss1=np.sum(np.abs((output-output1)/output))/len(output)
mape="%.2f%%"%(average_loss1*100)
f1 = 0
for m in range(testnum):
    f1 = f1 + np.abs(output[m]-output1[m])/((np.abs(output[m])+np.abs(output1[m]))/2)
f2 = f1 / testnum
smape="%.2f%%"%(f2*100)
print("the MAE is :",mae)
print("the RMSE is :",rmse)
print("the MAPE is :",mape)
print("the SMAPE is :",smape)

#计算预测值与真实值误差与真实值之比的分布
A=0
B=0
C=0
D=0
E=0
for m in range(testnum):
    y1 = np.abs(output[m]-output1[m])/np.abs(output[m])
    if y1 <= 0.1:
        A = A + 1
    elif y1 > 0.1 and y1 <= 0.2:
        B = B + 1
    elif y1 > 0.2 and y1 <= 0.3:
        C = C + 1
    elif y1 > 0.3 and y1 <= 0.4:
        D = D + 1
    else:
        E = E + 1
print("Ratio <= 0.1 :",A)
print("0.1< Ratio <= 0.2 :",B)
print("0.2< Ratio <= 0.3 :",C)
print("0.3< Ratio <= 0.4 :",D)
print("Ratio > 0.4 :",E)

3. Test results

insert image description here
Note: Since the parameters generated by each initialization are different, the test results will not be completely consistent if the neural network with the same parameter settings is trained and predicted multiple times. In addition, the test results will also be affected by the number of hidden layer nodes and the learning rate. , training times and other parameters.

4. Reference source code and experimental data set

Reference source code and experimental data set

Guess you like

Origin blog.csdn.net/weixin_42051846/article/details/128765163