另辟蹊径——误差逆传播算法识别MNIST手写数字

MNIST数据集(Mixed National Institute of Standards and Technology database)是美国国家标准与技术研究院收集整理的大型手写数字数据库,包含60,000个示例的训练集以及10,000个示例的测试集。

每个数字都是28*28的矩阵,总计是784个数据,我们将这784个数据“拉成”一维矩阵,将其作为多层感知机的“x”。

 误差逆传播算法总体上来看就是由众多y=kx+b和sigmoid函数组成的集合,在算法中,阈值和连接权(权重)使用随即初始值,例如在pytorch中我们可以使用以下方法实现

num_inputs = 784
num_outputs = 10
num_hiddens = 256

w1 = nn.Parameter(torch.randn(num_inputs,num_hiddens,requires_grad=True)*0.1)
b1 = nn.Parameter(torch.zeros(num_hiddens,requires_grad=True))
w2 = nn.Parameter(torch.randn(num_hiddens,num_outputs,requires_grad=True)*0.1)
b2 = nn.Parameter(torch.zeros(num_outputs,requires_grad=True))

假定训练集为\left ( \sum_{i=1}^{784}x_{ik},y_{k} \right ),我们设一层隐层,一层输出层,隐层中有256个神经元,输出层有10个神经元,分别是0,1,2,3,4,5,6,7,8,9。那么隐层的输入应为

\alpha _{h} = \sum_{i=1}^{784}v_{ih}x_{i}+b_{1h}

输出层神经元的输入为

\beta \widehat{}_{j} = \sum_{i=1}^{256}\omega _{ij}x_{i}

 这里还需要使用sigmoid函数对每一个y进行“压缩”

y\widehat{}_{j}^{k} = sigmoid\left ( \beta _{j}-\theta _{j}\right )

 则网络在\left ( \sum_{i=1}^{784}x_{ik},y_{k} \right )的均方误差为

E_{k} = \frac{1}{2}\sum_{j=1}^{l}\left ( y\widehat{}_{j}^{k}-y_{j}^{k} \right )

 误差逆传播算法基于梯度下降策略,以目标的负梯度方向对参数进行调整,任意参数v的更新估计式为

v\leftarrow v+\Delta v

对于上述均方误差,给定学习率\eta,有

\Delta \omega_{hj} = -\eta \frac{\delta E_{k}}{\delta \omega _{hj}}

 误差逆传播算法是一个迭代学习算法,类似上述公式,将其余值的\Delta求出,并结合更新估计式对参数进行更新估计。需注意的是,误差逆传播算法的目标是要最小化训练集D上的累计误差

E=\frac{1}{m}\sum_{k=1}^{m}E_{k}

import image_train
import torch
from torch import nn
from d2l import torch as d2l

train_x = torch.tensor(image_train.images).float()
train_y = torch.tensor(image_train.labels).float()

batch_size = 60000
num_inputs = 784
num_outputs = 10
num_hiddens = 256

w1 = nn.Parameter(torch.randn(num_inputs,num_hiddens,requires_grad=True)*0.1)
b1 = nn.Parameter(torch.zeros(num_hiddens,requires_grad=True))
w2 = nn.Parameter(torch.randn(num_hiddens,num_outputs,requires_grad=True)*0.1)
b2 = nn.Parameter(torch.zeros(num_outputs,requires_grad=True))

parms = [w1,b1,w2,b2]

def sigmoid(x):
    y = torch.sigmoid(x)
    return y

index = 0
for i in range(20001):
    h = sigmoid(train_x[i] @ w1 + b1)
    y = sigmoid(h @ w2 + b2)
    g = y * (1 - y) * (train_y[i] - y)
    index=g
    gg = g.reshape(1,-1).T
    hh = h.reshape(1,-1)
    delta_w2 = 0.1*(gg@hh).T
    delta_b2 = -0.1*g
    w2 = w2+delta_w2
    b2 = b2+delta_b2
    e = hh*(1-hh)*([email protected]((-1,1))).T#1,256
    xx = train_x[i].reshape(1,-1)
    delta_w1 = 0.1*xx.T@e
    delta_b1 = -0.1*e
    w1 = w1+delta_w1
    b1 = b1+delta_b1


a = index.tolist()[0]
b = train_y[20000].tolist()
print('预测',a.index(max(a)))
print('实际',b.index(max(b)))

猜你喜欢

转载自blog.csdn.net/m0_61789994/article/details/128588793