Depth hands-on learning 10- pytorch MLP achieved from zero

Multilayer Perceptron

import torch
import numpy as np
import sys
sys.path.append('..')
import d2lzh_pytorch as d2l

We still use Fashion_MNIST data sets, using MLP classify images

batch_size = 256
train_iter,test_iter = d2l.get_fahsion_mnist(batch_size)
Parameter defines the model

Shape pattern Fashion_MNIST aggregated data set is 28x28, the category number of 10, we still use this section of 28x28 = length of vector 784 represents an image, the input number is 784, the number 10 is output. Set the number of hyper-parameters of hidden units 256.

num_inputs, num_outputs,num_hiddens = 784,10,256
W1 = torch.tensor(np.random.normal(0,0.01,(num_inputs,num_hiddens)),dtype=torch.float32               
                 )
b1 = torch.zeros(num_hiddens,dtype=torch.float32)
W2 = torch.tensor(np.random.normal(0,0.01,(num_hiddens,num_outputs)),dtype=torch.float32
                 )
b2 = torch.zeros(num_outputs,dtype=torch.float32)
params = [W1,b1,W2,b2]
for param in params:
    param.requires_grad_(requires_grad=True)
    
Define Activation function

We use the max function to achieve ReLU, not a direct function calls relu

def relu(X):
    return torch.max(input=X, other=torch.tensor(0.0))
Definition Model

With softmax return, we view the function by each original image into a vector of length num_inputs of. Then we will implement the calculation expression on a multi-layer perceptron

def net(X):
    X = X.view((-1, num_inputs))
    H = relu(torch.matmul(X, W1) + b1)
    return torch.matmul(H, W2) + b2
Defined loss function
def sgd(params,lr,batch_size):
    for param in params:
#         param.data -=lr* param.grad/batch_size   
        param.data-= lr* param.grad   # 计算loss使用的是pytorch的交叉熵
# 这个梯度可以不用除以batch_size,pytorch 在计算loss的时候已经除过一次了,
'''
mxnet中的softmaxCrossEntropyLoss在反向传播的时候相对于延batch维度求和,
而pytorch默认的是求平均,所以用pytorch计算得到的loss也比mxnet小很多
大概得到的mxnet计算得到的1/batch_sie这个量级的,所以反向传播得到的梯度也小得多
为了得到跟原书差不多的效果,应该吧学习率调成batch_size倍,原书的学习率为0.5,
设置是100,
pytorch在计算loss的时候已经除过一次了,这里个的sgd不用除了
'''
loss = torch.nn.CrossEntropyLoss()
Trainer
def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
              params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()

            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()

            l.backward()
            if optimizer is None:
                sgd(params, lr, batch_size)
            else:
                optimizer.step()  # “softmax回归的简洁实现”一节将用到


            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

num_epochs, lr = 5, 0.5
train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)
epoch 1, loss 0.0031, train acc 0.702, test acc 0.775
epoch 2, loss 0.0019, train acc 0.821, test acc 0.807
epoch 3, loss 0.0016, train acc 0.843, test acc 0.831
epoch 4, loss 0.0015, train acc 0.855, test acc 0.818
epoch 5, loss 0.0014, train acc 0.863, test acc 0.816

summary

  • It can be realized simply by manually MLP implementation model and its parameters
  • When more layers MLP, the code will achieve this tedious details, especially when the definition of the model parameters

Guess you like

Origin www.cnblogs.com/onemorepoint/p/11811339.html