BP神经网络python算法实现

BP神经网络python算法实现

  • 误差逆传播(errorBackPropagation)算法,简称BP算法

例.用BP神经网络解决异或问题
异或运算结果是0或者1,属于分类问题。
输入数据为 ( 0 , 0 ) , ( 0 , 1 ) , ( 1 , 0 ) , ( 1 , 1 ) (0,0),(0,1),(1,0),(1,1) (0,0),(0,1),(1,0),(1,1)输出为0,1,1,0

偏置量设置为 x 0 = 1 x_0 = 1 x0=1,则输入神经元为 x 0 , x 1 , x 2 x_0,x_1,x_2 x0,x1,x2,隐藏层设置10个神经元,输入层到隐藏层的权重系数为 v v v,隐藏层到输出层权重系数为 w w w,通过BP神经网络训练得到 v , w v,w v,w的取值。

训练目标:让代价函数取得最小值。

代价函数/目标函数/损失函数: E = 1 2 ( y ^ − y ) 2 E = \dfrac{1}{2}(\hat y-y)^2 E=21(y^y)2

输入: X = ( x 0 , x 1 , x 2 ) X = (x_0,x_1,x_2) X=(x0,x1,x2)
输出/label Y = [ 0 , 1 , 1 , 0 ] T Y = [0,1,1,0]^T Y=[0,1,1,0]T

X = np.array([[1,0,0],
             [1,0,1],
             [1,1,0],
             [1,1,1]])
Y = np.array([[0],
              [1],
              [1],
              [0]])

W , V W,V W,V都以随机数初始化,同时设置学习率

# 生成 -1~1的随机数
V = np.random.random([3,10]) * 2 - 1
W = np.random.random([10,1]) * 2 - 1
# 学习率
lr = 0.21

δ \delta δ学习规则:
激活函数:sigmoid函数 f ( x ) = 1 1 − e − x f(x) = \dfrac{1}{1-e^{-x}} f(x)=1ex1,求导得 f ′ ( x ) = f ( 1 − f ) f'(x) = f(1-f) f(x)=f(1f)

最后一层学习信号 δ = ( y ^ − y ) f ′ ( L 1 W ) \delta = (\hat y-y)f'(L_1W) δ=(y^y)f(L1W)

前一层学习信号 δ l = δ l + 1 W T f ′ ( X V ) \delta^l = \delta^{l+1}W^Tf'(XV) δl=δl+1WTf(XV)

Δ W l = − η ∂ E ∂ W l = η X T δ l \Delta W^l = -\eta \dfrac{\partial E}{\partial W^l} = \eta X^T\delta^l ΔWl=ηWlE=ηXTδl(梯度下降法, Δ W l \Delta W^l ΔWl是第 l l l层权重变化, η \eta η是学习率, δ l \delta^l δl是第 l l l层学习信号)

尾层学习信号与输入信号和输入权重有关,非尾层学习信号与它后一层(右一层)学习信号和后一层输入权重有关,因此要从最后一层信号往前算。(误差逆传播)
输入权值变化受到输入信号和当前层的学习信号影响。 Δ V \Delta V ΔV是第一层输入权重变化,与输入 X X X和第一层学习信号 δ 1 \delta^1 δ1有关, Δ W \Delta W ΔW是第二层输入权重变化,与输入 L 1 L_1 L1和第一层学习信号 δ 2 \delta^2 δ2有关。

# 权值调整函数
def update():
    global V,W
    
    # 每一层输出 
    L1 = sigmoid(np.dot(X,V))
    L2 = sigmoid(np.dot(L1,W))
    # 每一层的学习信号
    L2_delta = (Y - L2)*dsigmoid(np.dot(L1,W)) 
    L1_delta = np.dot(L2_delta,W.T)*dsigmoid(np.dot(X,V))
    
    # 求每一层权值的变化
    delta_W = lr*np.dot(L1.T,L2_delta)
    delta_V = lr*np.dot(X.T,L1_delta)
    
    W = W + delta_W
    V = V + delta_V

输入输出都是矩阵,为了便于理解代码,这里将部分矩阵运算关系写出来
第一层输入: X = ( x 0 , x 1 , x 2 ) X = (x_0,x_1,x_2) X=(x0,x1,x2)
第一层输出/第二层输入: L 1 = ( l 1 , l 2 , . . . , l 10 ) L_1 = (l_1,l_2,...,l_{10}) L1=(l1,l2,...,l10)
第二次层输出: L 2 = ( y ^ 1 , y ^ 2 , y ^ 3 , y ^ 4 ) L_2 = (\hat y_1,\hat y_2,\hat y_3,\hat y_4) L2=(y^1,y^2,y^3,y^4)
权重 V = ( v 1 , v 2 , v 3 ) T V = (v_1,v_2,v_3)^T V=(v1,v2,v3)T
权重 W = ( w 1 , w 2 , . . . , w 10 ) T W = (w_1,w_2,...,w_{10})^T W=(w1,w2,...,w10)T

L 1 = ( l 1 , l 2 , . . . , l 10 ) = f ( x 0 v 0 + x 1 v 1 + x 2 v 2 ) = f ( X V ) L_1 = (l_1,l_2,...,l_{10}) = f(x_0v_0+x_1v_1+x_2v_2) = f(XV) L1=(l1,l2,...,l10)=f(x0v0+x1v1+x2v2)=f(XV)
L 2 = ( y ^ 1 , y ^ 2 , y ^ 3 , y ^ 4 ) = f ( l 1 w 1 + l 2 w 2 + . . . + l 10 w 10 ) = f ( L 1 W ) L_2 = (\hat y_1,\hat y_2,\hat y_3,\hat y_4) = f(l_1w_1+l_2w_2+...+l_{10}w_{10})= f(L_1W) L2=(y^1,y^2,y^3,y^4)=f(l1w1+l2w2+...+l10w10)=f(L1W)

训练并判断
loss为损失函数的向量,这里进行了一个求均值的操作

for i in range(10001):
    update()
    if i%500 == 0:
        L1 = sigmoid(np.dot(X,V))
        L2 = sigmoid(np.dot(L1,W))   
        loss = np.mean(np.square(Y-L2)/2)
        print("loss:",loss)

print(L2)
def judge(x):
    if x >= 0.5:
        return 1
    else:
        return 0
for i in map(judge,L2):
    print(i)

输出训练结果

loss: 0.1503049849879402
loss: 0.11292215088746196
loss: 0.055260106483890375
loss: 0.012689599356839564
loss: 0.005019800627645192
loss: 0.002838688933267325
loss: 0.0019049200410914762
loss: 0.0014063388085483982
loss: 0.0011023445653717398
loss: 0.000900027309043023
loss: 0.0007567885904853457
loss: 0.000650615357323275
loss: 0.0005690855354596988
loss: 0.0005047001580278678
loss: 0.0004526842864463839
loss: 0.00040986323784031537
loss: 0.0003740495228501279
loss: 0.0003436899378899049
loss: 0.0003176530747398405
loss: 0.00029509639853180273
loss: 0.0002753803240158825
[[0.01758859]
 [0.97396267]
 [0.97818043]
 [0.02719647]]
0
1
1
0

分类正确
完整代码如下

import numpy as np

X = np.array([[1,0,0],
             [1,0,1],
             [1,1,0],
             [1,1,1]])
Y = np.array([[0],
              [1],
              [1],
              [0]])
# 3-10-1
# 生成 -1~1的随机数
V = np.random.random([3,10]) * 2 - 1
W = np.random.random([10,1]) * 2 - 1
# 学习率
lr = 0.21

def sigmoid(x):
    return 1/(1+np.exp(-x))
def dsigmoid(x):
    s = 1/(1+np.exp(-x))
    return s*(1-s)
# 权值调整函数
def update():
    global V,W
    
    # 每一层输出 
    L1 = sigmoid(np.dot(X,V))
    L2 = sigmoid(np.dot(L1,W))
    # 每一层的学习信号
    L2_delta = (Y - L2)*dsigmoid(np.dot(L1,W)) 
    L1_delta = np.dot(L2_delta,W.T)*dsigmoid(np.dot(X,V))
    
    # 求每一层权值的变化
    delta_W = lr*np.dot(L1.T,L2_delta)
    delta_V = lr*np.dot(X.T,L1_delta)
    
    W = W + delta_W
    V = V + delta_V
    
for i in range(10001):
    update()
    if i%500 == 0:
        L1 = sigmoid(np.dot(X,V))
        L2 = sigmoid(np.dot(L1,W))   
        loss = np.mean(np.square(Y-L2)/2)
        print("loss:",loss)

print(L2)

def judge(x):
    if x >= 0.5:
        return 1
    else:
        return 0
# map函数可以将L2带入judge中运算
for i in map(judge,L2):
    print(i)

猜你喜欢

转载自blog.csdn.net/weixin_44823313/article/details/112396838