Another way - error backpropagation algorithm to recognize MNIST handwritten digits

The MNIST dataset (Mixed National Institute of Standards and Technology database) is a large-scale handwritten digital database collected by the National Institute of Standards and Technology, including a training set of 60,000 examples and a test set of 10,000 examples.

Each number is a 28*28 matrix, which is 784 data in total. We "pull" these 784 data into a one-dimensional matrix and use it as "x" of the multilayer perceptron.

 The error backpropagation algorithm is generally y=kx+ba collection of many and sigmoid functions. In the algorithm, the threshold and connection weight (weight) use random initial values. For example, in pytorch, we can use the following method to achieve

num_inputs = 784
num_outputs = 10
num_hiddens = 256

w1 = nn.Parameter(torch.randn(num_inputs,num_hiddens,requires_grad=True)*0.1)
b1 = nn.Parameter(torch.zeros(num_hiddens,requires_grad=True))
w2 = nn.Parameter(torch.randn(num_hiddens,num_outputs,requires_grad=True)*0.1)
b2 = nn.Parameter(torch.zeros(num_outputs,requires_grad=True))

Assuming that the training set is \left ( \sum_{i=1}^{784}x_{ik},y_{k} \right ), we set up a hidden layer and an output layer. There are 256 neurons in the hidden layer and 10 neurons in the output layer, which are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Then the input to the hidden layer should be

\alpha _{h} = \sum_{i=1}^{784}v_{ih}x_{i}+b_{1h}

The input to the neurons in the output layer is

\beta \widehat{}_{j} = \sum_{i=1}^{256}\omega _{ij}x_{i}

 Here you also need to use the sigmoid function to "compress" each y

y\widehat{}_{j}^{k} = sigmoid\left ( \beta _{j}-\theta _{j}\right )

 Then \left ( \sum_{i=1}^{784}x_{ik},y_{k} \right )the mean square error of the network is

E_{k} = \frac{1}{2}\sum_{j=1}^{l}\left ( y\widehat{}_{j}^{k}-y_{j}^{k} \right )

 The error backpropagation algorithm is based on the gradient descent strategy, and the parameters are adjusted in the direction of the negative gradient of the target. The update estimation formula of any parameter v is

v\leftarrow v+\Delta v

For the above mean square error, given the learning rate and, we have

\Delta \omega_{hj} = -\eta \frac{\delta E_{k}}{\delta \omega _{hj}}

 The error backpropagation algorithm is an iterative learning algorithm, similar to the above formula, which calculates the remaining values \Delta​​and combines the update estimation formula to update the parameters. It should be noted that the goal of the error back propagation algorithm is to minimize the cumulative error on the training set D

E=\frac{1}{m}\sum_{k=1}^{m}E_{k}

import image_train
import torch
from torch import nn
from d2l import torch as d2l

train_x = torch.tensor(image_train.images).float()
train_y = torch.tensor(image_train.labels).float()

batch_size = 60000
num_inputs = 784
num_outputs = 10
num_hiddens = 256

w1 = nn.Parameter(torch.randn(num_inputs,num_hiddens,requires_grad=True)*0.1)
b1 = nn.Parameter(torch.zeros(num_hiddens,requires_grad=True))
w2 = nn.Parameter(torch.randn(num_hiddens,num_outputs,requires_grad=True)*0.1)
b2 = nn.Parameter(torch.zeros(num_outputs,requires_grad=True))

parms = [w1,b1,w2,b2]

def sigmoid(x):
    y = torch.sigmoid(x)
    return y

index = 0
for i in range(20001):
    h = sigmoid(train_x[i] @ w1 + b1)
    y = sigmoid(h @ w2 + b2)
    g = y * (1 - y) * (train_y[i] - y)
    index=g
    gg = g.reshape(1,-1).T
    hh = h.reshape(1,-1)
    delta_w2 = 0.1*(gg@hh).T
    delta_b2 = -0.1*g
    w2 = w2+delta_w2
    b2 = b2+delta_b2
    e = hh*(1-hh)*([email protected]((-1,1))).T#1,256
    xx = train_x[i].reshape(1,-1)
    delta_w1 = 0.1*xx.T@e
    delta_b1 = -0.1*e
    w1 = w1+delta_w1
    b1 = b1+delta_b1


a = index.tolist()[0]
b = train_y[20000].tolist()
print('预测',a.index(max(a)))
print('实际',b.index(max(b)))

 

Guess you like

Origin blog.csdn.net/m0_61789994/article/details/128588793