Backpropagation algorithm of neural network (adaptive moment estimation algorithm Adam deformation form Adamax, Nadam)

Two Variations of Adaptive Moment Estimation Algorithm (Adam)

On the basis of the adaptive moment estimation algorithm, two deformations are further proposed. One deformation is named Adamax in English, and the English name of the other deformation is Nadam.
Adaptive Moment Estimation Algorithm Principle Reference:
Neural Network Backpropagation Algorithm (Adaptive Moment Estimation Algorithm Adam)

1. Principle of Adamax algorithm

Adamax mainly adjusts the calculation formula of the update amount of the last parameter. The denominator in the original formula is the following square root of the second-order moment estimation: Adamax uses the following formula to determine this part: ρ_2 is the
insert image description here
decay
insert image description here
rate, and g_t is The gradient value at the current t-th iteration.
Calculation principle reference of gradient g:
Neural Network Backpropagation Algorithm (Gradient, Error Backpropagation Algorithm BP)

Then use the following formula to calculate the adjustment amount of the parameter:
insert image description here

The main feature of this algorithm is that it provides a simpler upper bound for the learning rate.

2. Nadam algorithm principle

Nadam is simpler than Adamax. This algorithm is equivalent to introducing the temporary gradient idea of ​​the Nestrov momentum method into the adaptive moment estimation algorithm. When calculating the gradient, a temporary parameter update amount is obtained first, and the parameter is temporarily updated. Calculate the temporary gradient after updating, use the temporary gradient to estimate the first-order moment and second-order moment, and use the temporary first-order moment and second-order moment to calculate the update amount of the parameter.
Nestrov momentum method principle reference:
Neural network backpropagation algorithm (error backpropagation algorithm with Nesterov momentum added)

3. Algorithm implementation

Taking data prediction as an example, the following describes the implementation process of Adamax and Nadam respectively, and applies the Adamax and Nadam algorithms to the backpropagation process of a common three-layer neural network (input layer, hidden layer, and output layer).
The data set of heavy metal elements in the surface soil of a province and city is selected as the experimental data. The data set has a total of 96 groups, 24 of which are randomly selected as the test data set, and 72 groups are used as the training data set. The content of heavy metal Ti is selected as the output feature to be predicted, and the heavy metals Co, Cr, Mg, and Pb are selected as the input features of the model.

3.1 Adamax training process

#库的导入
import numpy as np
import pandas as pd

#激活函数tanh
def tanh(x):
    return (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
#激活函数偏导数
def de_tanh(x):
    return (1-x**2)

#用于计算一阶矩估计与二阶矩估计的函数,其中s为一阶矩估计,r为二阶矩估计,delta为梯度值,
#,0.9和0.999分别表示两个衰减系数,0.1为1-衰减系数的计算结果
def accumulation(s,r,delta):
    s = 0.9 * s + 0.1 * delta
    r = max((0.999*r),abs(delta))
    return  s,r
#参数更新函数,w为待更新参数,s与r分别为修正后的一阶矩估计和二阶矩估计
def adjust(s,r,w):
    change1 =r + 0.000001
    change2 =s/change1
    change = (-0.001)*change2
    w = w + change
    return w

maxepochs = 1000  #迭代训练次数
errorfinal = 0.65*10**(-3)  #停止训练误差阈值
samnum = 72  #输入数据数量
indim = 4  #输入层节点数
outdim = 1  #输出层节点数
hiddenunitnum = 8  #隐含层节点数

#输入数据的导入
df = pd.read_csv("train.csv")
df.columns = ["Co", "Cr", "Mg", "Pb", "Ti"]
Co = df["Co"]
Co = np.array(Co)
Cr = df["Cr"]
Cr = np.array(Cr)
Mg=df["Mg"]
Mg=np.array(Mg)
Pb = df["Pb"]
Pb =np.array(Pb)
Ti = df["Ti"]
Ti = np.array(Ti)
samplein = np.mat([Co,Cr,Mg,Pb])
sampleout = np.mat([Ti])
#数据归一化,将输入数据压缩至0到1之间,便于计算,后续通过反归一化恢复原始值
sampleinminmax = np.array([samplein.min(axis=1).T.tolist()[0],samplein.max(axis=1).T.tolist()[0]]).transpose()
sampleoutminmax = np.array([sampleout.min(axis=1).T.tolist()[0],sampleout.max(axis=1).T.tolist()[0]]).transpose()
sampleinnorm = (2*(np.array(samplein.T)-sampleinminmax.transpose()[0])/(sampleinminmax.transpose()[1]-sampleinminmax.transpose()[0])-1).transpose()
sampleoutnorm = (2*(np.array(sampleout.T)-sampleoutminmax.transpose()[0])/(sampleoutminmax.transpose()[1]-sampleoutminmax.transpose()[0])-1).transpose()
sampleinmax = np.array([sampleinnorm.max(axis=1).T.tolist()]).transpose()
sampleinmin = np.array([sampleinnorm.min(axis=1).T.tolist()]).transpose()
#为归一化后的数据添加噪声
noise = 0.03*np.random.rand(sampleoutnorm.shape[0],sampleoutnorm.shape[1])
sampleoutnorm += noise
sampleinnorm = np.mat(sampleinnorm)

#利用归一化后的输入数据初始化参数w1、b1、w2、b2
dvalue = sampleinmax-sampleinmin
valuemid=(sampleinmin+sampleinmax)/2
wmag=0.7*(hiddenunitnum**(1/indim))
rand1=np.random.rand(hiddenunitnum,outdim)
rand2=np.random.randn(hiddenunitnum,indim)
rand1=rand1*wmag
rand2=rand2*wmag
b1=rand1-np.dot(rand2,valuemid)
for i in range(hiddenunitnum):
    for j in range(indim):
        rand2[i][j]=(2*rand2[i][j])/dvalue[j]
w1=rand2
w2 = np.random.uniform(low=-1, high=1, size=[outdim,hiddenunitnum])
b2 = np.random.uniform(low=-1, high=1, size=[outdim,1])

#参数w1、b1、w2、b2均为矩阵形式参与计算,其形状依次为8*4,8*1,1*8,1*1
w1 = np.mat(w1)
b1 = np.mat(b1)
w2 = np.mat(w2)
b2 = np.mat(b2)
#errhistory存储每次训练后的预测值与真实值的误差
errhistory = []

#sw1、sb1,sw2,sb2分别保存参数w1、b1、w2、b2的一阶矩估计,其形状与w1、b1、w2、b2一一对应
sw2 = np.zeros((1,8))
sb2 = np.zeros((1,1))
sw1 = np.zeros((8,4))
sb1 = np.zeros((8,1))

#rw1、rb1,rw2,rb2分别保存参数w1、b1、w2、b2的二阶矩估计,其形状与w1、b1、w2、b2一一对应
rw2 = np.zeros((1,8))
rb2 = np.zeros((1,1))
rw1 = np.zeros((8,4))
rb1 = np.zeros((8,1))

#t用于对一阶矩估计和二阶矩估计进行修正,随训练次数不断累加
t = 0
for i in range(maxepochs):
    t = t + 1
    #前向传播
    #计算隐含层输出hiddenout,输出层输出networkout
    hiddenout = tanh((np.dot(w1,sampleinnorm).transpose()+b1.transpose())).transpose()
    networkout = np.dot(w2,hiddenout).transpose()+b2.transpose()
    for j in range(samnum):
        networkout[j,:] = tanh(networkout[j,:])
    networkout = networkout.transpose()
    #计算损失函数
    err = sampleoutnorm - networkout
    loss = np.sum(np.abs(err))/samnum
    sse = np.sum(np.square(err))
    #判断是否满足停止训练条件
    errhistory.append(sse)
    if sse < errorfinal:
        break
    #反向传播
    #利用损失函数计算结果和激活函数偏导数,来计算参数w1、b1、w2、b2的梯度值
    delta2 = np.zeros((outdim,samnum))
    for n in range(samnum):
        delta2[:,n] = (-1) * err[:,n] * de_tanh(networkout[:,n])
    delta1 = np.zeros((hiddenunitnum,samnum))
    for e in range(samnum):
        for f in range(hiddenunitnum):
            delta1[f,e] = w2[:,f] * delta2[:,e] * de_tanh(hiddenout[f,e])
    dw2now = np.dot(delta2,hiddenout.transpose()) #1*8
    db2now = np.dot(delta2,np.ones((samnum,1))) #1*1
    dw1now = np.dot(delta1,sampleinnorm.transpose()) #8*4
    db1now = np.dot(delta1,np.ones((samnum,1))) #8*1

    #先更新输出层参数
    #w2更新
    for m in range(hiddenunitnum):
        #计算一阶矩估计sw2和二阶矩估计rw2
        sw2[:, m],rw2[:,m] = accumulation(sw2[:,m],rw2[:,m],dw2now[:,m])
        #使用t值对一阶矩估计进行修正
        saw2 = sw2[:,m] / (1 - (0.9**t))
        #利用修正后的一阶矩估计和二阶矩估计对w2进行更新
        w2[:,m] = adjust(saw2,rw2[:,m],w2[:,m])

    #b2更新
    #计算一阶矩估计sb2和二阶矩估计rb2
    sb2,rb2 = accumulation(sb2,rb2,db2now)
    #使用t值对一阶矩估计进行修正
    sab2 = sb2/(1 - (0.9**t))
    #利用修正后的一阶矩估计和二阶矩估计对b2进行更新
    b2 = adjust(sab2, rb2, b2)

    #更新隐含层参数
    #w1更新
    #计算一阶矩估计sw1和二阶矩估计rw1
    for a in range(hiddenunitnum):
        for b in range(indim):
            sw1[a,b],rw1[a,b] = accumulation(sw1[a,b],rw1[a,b],dw1now[a,b])
            #使用t值对一阶矩估计进行修正
            saw1 = sw1[a,b]/(1 - (0.9**t))
            #利用修正后的一阶矩估计和二阶矩估计对w1进行更新
            w1[a,b] = adjust(saw1,rw1[a,b],w1[a,b])

    #b1更新
    #计算一阶矩估计sb1和二阶矩估计rb1
    for n in range(hiddenunitnum):
        sb1[n,:],rb1[n,:] = accumulation(sb1[n,:],rb1[n,:],db1now[n,:])
        #使用t值对一阶矩估计进行修正
        sab1 = sb1[n,:]/(1 - (0.9**t))
        #利用修正后的一阶矩估计和二阶矩估计对b1进行更新
        b1[n,:] = adjust(sab1,rb1[n,:],b1[n,:])

    print("the generation is:",i,",the loss is:",loss)

#达到最大训练次数,保存此时的参数w1、b1、w2、b2
np.save("w1.npy",w1)
np.save("b1.npy",b1)
np.save("w2.npy",w2)
np.save("b2.npy",b2)

3.2 Adamax test process and results

The test process only needs to use the relevant parameters generated by the training process to perform a forward propagation process on the test data to obtain the predicted value, and then use the relevant error indicators to evaluate the predicted value. For the detailed test process source code, see the reference source code and dataset .
insert image description here

3.3 Nadam training process

#库的导入
import numpy as np
import pandas as pd

#激活函数tanh
def tanh(x):
    return (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
#激活函数偏导数
def de_tanh(x):
    return (1-x**2)


#用于计算一阶矩估计与二阶矩估计的函数,其中s为一阶矩估计,r为二阶矩估计,delta为梯度值,
#,0.9和0.999分别表示两个衰减系数,0.1和0.001均为1-衰减系数的计算结果
def accumulation(s,r,delta):
    s = 0.9 * s + 0.1 * delta
    r = 0.999 * r + 0.001 * (delta**2)
    return  s,r
#参数更新量计算函数,w为待更新参数,s与r分别为修正后的一阶矩估计和二阶矩估计
def adjust(s,r):
    change1 =r**0.5 + 0.000001
    change2 =s/change1
    #0.001为学习率
    change = (-0.001)*change2
    return change



maxepochs = 1000  #迭代训练次数
errorfinal = 0.65*10**(-3)  #停止训练误差阈值
samnum = 72  #输入数据数量
indim = 4  #输入层节点数
outdim = 1  #输出层节点数
hiddenunitnum = 8  #隐含层节点数

#输入数据的导入
df = pd.read_csv("train.csv")
df.columns = ["Co", "Cr", "Mg", "Pb", "Ti"]
Co = df["Co"]
Co = np.array(Co)
Cr = df["Cr"]
Cr = np.array(Cr)
Mg=df["Mg"]
Mg=np.array(Mg)
Pb = df["Pb"]
Pb =np.array(Pb)
Ti = df["Ti"]
Ti = np.array(Ti)
samplein = np.mat([Co,Cr,Mg,Pb])
sampleout = np.mat([Ti])
#数据归一化,将输入数据压缩至0到1之间,便于计算,后续通过反归一化恢复原始值
sampleinminmax = np.array([samplein.min(axis=1).T.tolist()[0],samplein.max(axis=1).T.tolist()[0]]).transpose()
sampleoutminmax = np.array([sampleout.min(axis=1).T.tolist()[0],sampleout.max(axis=1).T.tolist()[0]]).transpose()
sampleinnorm = (2*(np.array(samplein.T)-sampleinminmax.transpose()[0])/(sampleinminmax.transpose()[1]-sampleinminmax.transpose()[0])-1).transpose()
sampleoutnorm = (2*(np.array(sampleout.T)-sampleoutminmax.transpose()[0])/(sampleoutminmax.transpose()[1]-sampleoutminmax.transpose()[0])-1).transpose()
sampleinmax = np.array([sampleinnorm.max(axis=1).T.tolist()]).transpose()
sampleinmin = np.array([sampleinnorm.min(axis=1).T.tolist()]).transpose()
#为归一化后的数据添加噪声
noise = 0.03*np.random.rand(sampleoutnorm.shape[0],sampleoutnorm.shape[1])
sampleoutnorm += noise
sampleinnorm = np.mat(sampleinnorm)

#利用归一化后的输入数据初始化参数w1、b1、w2、b2
dvalue = sampleinmax-sampleinmin
valuemid=(sampleinmin+sampleinmax)/2
wmag=0.7*(hiddenunitnum**(1/indim))
rand1=np.random.rand(hiddenunitnum,outdim)
rand2=np.random.randn(hiddenunitnum,indim)
rand1=rand1*wmag
rand2=rand2*wmag
b1=rand1-np.dot(rand2,valuemid)
for i in range(hiddenunitnum):
    for j in range(indim):
        rand2[i][j]=(2*rand2[i][j])/dvalue[j]
w1=rand2
w2 = np.random.uniform(low=-1, high=1, size=[outdim,hiddenunitnum])
b2 = np.random.uniform(low=-1, high=1, size=[outdim,1])

#参数w1、b1、w2、b2均为矩阵形式参与计算,其形状依次为8*4,8*1,1*8,1*1
w1 = np.mat(w1)
b1 = np.mat(b1)
w2 = np.mat(w2)
b2 = np.mat(b2)
#errhistory存储每次训练后的预测值与真实值的误差
errhistory = []

#sw1、sb1,sw2,sb2分别保存参数w1、b1、w2、b2的一阶矩估计,其形状与w1、b1、w2、b2一一对应
sw2 = np.zeros((1,8))
sb2 = np.zeros((1,1))
sw1 = np.zeros((8,4))
sb1 = np.zeros((8,1))

#rw1、rb1,rw2,rb2分别保存参数w1、b1、w2、b2的二阶矩估计,其形状与w1、b1、w2、b2一一对应
rw2 = np.zeros((1,8))
rb2 = np.zeros((1,1))
rw1 = np.zeros((8,4))
rb1 = np.zeros((8,1))

#deltaw1、deltab1、deltaw2 、deltab2分别保存参数w1、b1、w2、b2的临时更新量
deltaw2 = np.zeros((1,8))
deltab2 = np.zeros((1,1))
deltaw1 = np.zeros((8,4))
deltab1 = np.zeros((8,1))

#t用于对一阶矩估计和二阶矩估计进行修正,随训练次数不断累加
t = 0
for i in range(maxepochs):
    t = t + 1
    #利用参数临时更新量对参数w1、b1、w2、b2进行临时更新
    w1 += deltaw1
    b1 += deltab1
    w2 += deltaw2
    b2 += deltab2
    # 前向传播
    # 计算隐含层输出hiddenout,输出层输出networkout
    hiddenout = tanh((np.dot(w1, sampleinnorm).transpose() + b1.transpose())).transpose()
    networkout = np.dot(w2, hiddenout).transpose() + b2.transpose()
    for j in range(samnum):
        networkout[j, :] = tanh(networkout[j, :])
    networkout = networkout.transpose()
    # 计算损失函数
    err = sampleoutnorm - networkout
    loss = np.sum(np.abs(err)) / samnum
    sse = np.sum(np.square(err))
    # 判断是否满足停止训练条件
    errhistory.append(sse)
    if sse < errorfinal:
        break
    #反向传播
    #利用损失函数计算结果和激活函数偏导数,来计算参数w1、b1、w2、b2的梯度值
    delta2 = np.zeros((outdim,samnum))
    for n in range(samnum):
        delta2[:,n] = (-1) * err[:,n] * de_tanh(networkout[:,n])
    delta1 = np.zeros((hiddenunitnum,samnum))
    for e in range(samnum):
        for f in range(hiddenunitnum):
            delta1[f,e] = w2[:,f] * delta2[:,e] * de_tanh(hiddenout[f,e])
    dw2now = np.dot(delta2,hiddenout.transpose()) #1*8
    db2now = np.dot(delta2,np.ones((samnum,1))) #1*1
    dw1now = np.dot(delta1,sampleinnorm.transpose()) #8*4
    db1now = np.dot(delta1,np.ones((samnum,1))) #8*1

    #先更新输出层参数
    #w2更新
    for m in range(hiddenunitnum):
        #计算一阶矩估计sw2和二阶矩估计rw2
        sw2[:, m], rw2[:, m] = accumulation(sw2[:, m], rw2[:, m], dw2now[:, m])
        #使用t值对一阶矩估计和二阶矩估计进行修正
        saw2 = sw2[:, m] / (1 - (0.9 ** t))
        raw2 = rw2[:, m] / (1 - (0.999 ** t))
        #获得参数更新量,并作为下一次训练时的临时更新量
        deltaw2[:,m] = adjust(saw2,raw2)
        #对参数w2进行更新
        w2[:,m] += deltaw2[:,m]

    #b2更新
    #计算一阶矩估计sb2和二阶矩估计rb2
    sb2,rb2 = accumulation(sb2,rb2,db2now)
    #使用t值对一阶矩估计和二阶矩估计进行修正
    sab2 = sb2/(1 - (0.9**t))
    rab2 = rb2/(1-(0.999**t))
    #获得参数更新量,并作为下一次训练时的临时更新量
    deltab2 = adjust(sab2,rab2)
    #对参数b2进行更新
    b2 += deltab2

    #更新隐含层参数
    #w1更新
    #计算一阶矩估计sw1和二阶矩估计rw1
    for a in range(hiddenunitnum):
        for b in range(indim):
            #计算一阶矩估计sw1和二阶矩估计rw1
            sw1[a,b],rw1[a,b] = accumulation(sw1[a,b],rw1[a,b],dw1now[a,b])
            #使用t值对一阶矩估计和二阶矩估计进行修正
            saw1 = sw1[a,b]/(1 - (0.9**t))
            raw1 = rw1[a,b]/(1 - (0.999**t))
            #获得参数更新量,并作为下一次训练时的临时更新量
            deltaw1[a,b] = adjust(saw1,raw1)
            #对参数w1进行更新
            w1[a, b] += deltaw1[a,b]

    #b1更新
    #计算一阶矩估计sb1和二阶矩估计rb1
    for n in range(hiddenunitnum):
        #计算一阶矩估计sb1和二阶矩估计rb1
        sb1[n,:],rb1[n,:] = accumulation(sb1[n,:],rb1[n,:],db1now[n,:])
        #使用t值对一阶矩估计和二阶矩估计进行修正
        sab1 = sb1[n,:]/(1 - (0.9**t))
        rab1 = rb1[n,:]/(1 - (0.999**t))
        #获得参数更新量,并作为下一次训练时的临时更新量
        deltab1[n,:] = adjust(sab1,rab1)
        #对参数b1进行更新
        b1[n,:] += deltab1[n,:]

    print("the generation is:",i,",the loss is:",loss)

#达到最大训练次数,保存此时的参数w1、b1、w2、b2
np.save("w1.npy",w1)
np.save("b1.npy",b1)
np.save("w2.npy",w2)
np.save("b2.npy",b2)

3.4 Nadam test process and results

The test process only needs to use the relevant parameters generated by the training process to perform a forward propagation process on the test data to obtain the predicted value, and then use the relevant error indicators to evaluate the predicted value. For the detailed test process source code, see the reference source code and dataset .
insert image description here

4. Reference source code and data set

Adamax reference source code and dataset
Nadam reference source code and dataset

Guess you like

Origin blog.csdn.net/weixin_42051846/article/details/128972879