"Hands-on deep learning + PyTorch" 3.9 multi-layer perceptron (MLP) realizes study notes from scratch


foreword

We have studied single layer neural networks including linear regression and softmax regression. However, deep learning mainly focuses on multi-layer models. This time, we will complete a simple multilayer perceptron (multilayer perceptron, MLP).


1. Import library

import torch
import numpy as np
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l

Two, steps

1. Read data

batch_size = 256
train_iter , test_iter = d2l.load_data_fashion_mnist(batch_size)

Read the training and test sets

2. Parameter setting

num_inputs, num_outputs, num_hiddens = 784, 10, 256

W1 = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_hiddens)), dtype=torch.float)
b1 = torch.zeros(num_hiddens, dtype=torch.float)
W2 = torch.tensor(np.random.normal(0, 0.01, (num_hiddens, num_outputs)), dtype=torch.float)
b2 = torch.zeros(num_outputs, dtype=torch.float)

params =[W1, b1, W2, b2]
for param in params:
    param.requires_grad_(requires_grad=True)

2. Activation function

def relu(X):
    return torch.max(input=X, other=torch.tensor(0.0))

def net(X):
    X =X.view((-1, num_inputs))
    H = relu(torch.matmul(X, W1) + b1)
    return torch.matmul(H, W2) + b2

torch.max(input=X, other=torch.tensor(0.0))

The meaning of the other parameter in the function can be understood through a piece of test code:

python import torch import numpy as np import sys
sys.path.append("..") import d2lzh_pytorch as d2l

def relu(X):
    return torch.max(input=X, other=torch.tensor(0.0))

X = torch.tensor(np.random.normal(0, 0.01, (2, 2)), dtype=torch.float)
Y = relu(X)

print(X, '\n'  , Y) 

The output is as follows:

python tensor([[-0.0035,  0.0065],
        [-0.0013,  0.0013]])   tensor([[0.0000, 0.0065],
        [0.0000, 0.0013]]) 

It can be found that the defined rule function realizes the function that the activation function should have. The meaning of the other parameter should be to give a tensor. In the input tensor X, if the corresponding position is smaller than the given other tensor, replace the value with the changed The value of other tensor, otherwise it means unchanged.

torch.matmul(X, W1)t

orch.matmul()It is the multiplication of tensor, and the input can be high-dimensional.
When the input is two-dimensional, it is ordinary matrix multiplication, which tensor.mm()is the same as the function usage.

3. Loss function

loss = torch.nn.CrossEntropyLoss()

For a detailed understanding of this function, see: Detailed understanding of the cross-entropy loss function

4. Training model

Code from the book:

num_epochs, lr = 5, 100.0
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)

I didn't find out lr=100.0that when I was learning linear fitting, I said that the learning rate should not be too large. Why is lr so large here?
If we forcibly change the learning rate to lr=0.1:

epoch 1, loss 0.0090, train acc 0.139, test acc 0.163
epoch 2, loss 0.0090, train acc 0.190, test acc 0.210
epoch 3, loss 0.0090, train acc 0.237, test acc 0.251
epoch 4, loss 0.0089, train acc 0.284, test acc 0.293
epoch 5, loss 0.0089, train acc 0.320, test acc 0.324

It can be seen that the accuracy rate of the model after training is very low, so it is not advisable to forcibly modify the learning rate.

sgd()Definitions we found :

def sgd(params, lr, batch_size):
    # 为了和原书保持一致,这里除以了batch_size,但是应该是不用除的,因为一般用PyTorch计算loss时就
    # 默认已经沿batch维求了平均了。
    for param in params:
        param.data -= lr * param.grad / batch_size # 注意这里更改param时用的param.data

The definition we know sgd()is stored in the package provided in the book d2lzh_pytorch, not officially provided by PyTorch. The definition comment states that the loss calculation function officially given by PyTorch has been averaged along the batch dimension. In order to be consistent with the original book, here Divided batch_size, but should not be divided.
Because our model uses the official PyTorch torch.nn.CrossEntropyLoss()average along the batch dimension, there is no need to divide by in sgd batch_size.

The sgd function was originally defined in the "Hands-on Deep Learning + PyTorch" 3.2 Linear Regression Implementation from Scratch
Study Notes
. In that blog, the loss function and the sgd() function of the parameter return are either in the d2lzh_pytorch package. , or they are all PyTorch official, and they are only removed once batch_size, so there is no problem, but in this article, the loss function is official, and the package is d2l.train_ch3()used in the training model , so this problem arises.sgd()d2lzh_pytorch

So we batch_sizecomment out the division in sgd(), and the code for training the model becomes:

num_epochs, lr = 5, 0.2
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)

output:

epoch 1, loss 0.0034, train acc 0.690, test acc 0.786
epoch 2, loss 0.0020, train acc 0.814, test acc 0.783
epoch 3, loss 0.0018, train acc 0.837, test acc 0.805
epoch 4, loss 0.0017, train acc 0.849, test acc 0.824
epoch 5, loss 0.0016, train acc 0.858, test acc 0.847

It can be seen that the accuracy rate is still ok.


Summarize

"Hands-on deep learning + PyTorch" 3.9 multi-layer perceptron (MLP) realizes study notes from scratch

Guess you like

Origin blog.csdn.net/qq_27839923/article/details/122552687