Article directory
foreword
We have studied single layer neural networks including linear regression and softmax regression. However, deep learning mainly focuses on multi-layer models. This time, we will complete a simple multilayer perceptron (multilayer perceptron, MLP).
1. Import library
import torch
import numpy as np
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l
Two, steps
1. Read data
batch_size = 256
train_iter , test_iter = d2l.load_data_fashion_mnist(batch_size)
Read the training and test sets
2. Parameter setting
num_inputs, num_outputs, num_hiddens = 784, 10, 256
W1 = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_hiddens)), dtype=torch.float)
b1 = torch.zeros(num_hiddens, dtype=torch.float)
W2 = torch.tensor(np.random.normal(0, 0.01, (num_hiddens, num_outputs)), dtype=torch.float)
b2 = torch.zeros(num_outputs, dtype=torch.float)
params =[W1, b1, W2, b2]
for param in params:
param.requires_grad_(requires_grad=True)
2. Activation function
def relu(X):
return torch.max(input=X, other=torch.tensor(0.0))
def net(X):
X =X.view((-1, num_inputs))
H = relu(torch.matmul(X, W1) + b1)
return torch.matmul(H, W2) + b2
torch.max(input=X, other=torch.tensor(0.0))
The meaning of the other parameter in the function can be understood through a piece of test code:
python import torch import numpy as np import sys sys.path.append("..") import d2lzh_pytorch as d2l def relu(X): return torch.max(input=X, other=torch.tensor(0.0)) X = torch.tensor(np.random.normal(0, 0.01, (2, 2)), dtype=torch.float) Y = relu(X) print(X, '\n' , Y)
The output is as follows:
python tensor([[-0.0035, 0.0065], [-0.0013, 0.0013]]) tensor([[0.0000, 0.0065], [0.0000, 0.0013]])
It can be found that the defined rule function realizes the function that the activation function should have. The meaning of the other parameter should be to give a tensor. In the input tensor X, if the corresponding position is smaller than the given other tensor, replace the value with the changed The value of other tensor, otherwise it means unchanged.
torch.matmul(X, W1)t
orch.matmul()
It is the multiplication of tensor, and the input can be high-dimensional.
When the input is two-dimensional, it is ordinary matrix multiplication, whichtensor.mm()
is the same as the function usage.
3. Loss function
loss = torch.nn.CrossEntropyLoss()
For a detailed understanding of this function, see: Detailed understanding of the cross-entropy loss function
4. Training model
Code from the book:
num_epochs, lr = 5, 100.0
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)
I didn't find out lr=100.0
that when I was learning linear fitting, I said that the learning rate should not be too large. Why is lr so large here?
If we forcibly change the learning rate to lr=0.1
:
epoch 1, loss 0.0090, train acc 0.139, test acc 0.163
epoch 2, loss 0.0090, train acc 0.190, test acc 0.210
epoch 3, loss 0.0090, train acc 0.237, test acc 0.251
epoch 4, loss 0.0089, train acc 0.284, test acc 0.293
epoch 5, loss 0.0089, train acc 0.320, test acc 0.324
It can be seen that the accuracy rate of the model after training is very low, so it is not advisable to forcibly modify the learning rate.
sgd()
Definitions we found :
def sgd(params, lr, batch_size):
# 为了和原书保持一致,这里除以了batch_size,但是应该是不用除的,因为一般用PyTorch计算loss时就
# 默认已经沿batch维求了平均了。
for param in params:
param.data -= lr * param.grad / batch_size # 注意这里更改param时用的param.data
The definition we know sgd()
is stored in the package provided in the book d2lzh_pytorch
, not officially provided by PyTorch. The definition comment states that the loss calculation function officially given by PyTorch has been averaged along the batch dimension. In order to be consistent with the original book, here Divided batch_size
, but should not be divided.
Because our model uses the official PyTorch torch.nn.CrossEntropyLoss()
average along the batch dimension, there is no need to divide by in sgd batch_size
.
The sgd function was originally defined in the "Hands-on Deep Learning + PyTorch" 3.2 Linear Regression Implementation from Scratch
Study Notes . In that blog, the loss function and the sgd() function of the parameter return are either in the d2lzh_pytorch package. , or they are all PyTorch official, and they are only removed oncebatch_size
, so there is no problem, but in this article, the loss function is official, and the package isd2l.train_ch3()
used in the training model , so this problem arises.sgd()
d2lzh_pytorch
So we batch_size
comment out the division in sgd(), and the code for training the model becomes:
num_epochs, lr = 5, 0.2
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)
output:
epoch 1, loss 0.0034, train acc 0.690, test acc 0.786
epoch 2, loss 0.0020, train acc 0.814, test acc 0.783
epoch 3, loss 0.0018, train acc 0.837, test acc 0.805
epoch 4, loss 0.0017, train acc 0.849, test acc 0.824
epoch 5, loss 0.0016, train acc 0.858, test acc 0.847
It can be seen that the accuracy rate is still ok.
Summarize
"Hands-on deep learning + PyTorch" 3.9 multi-layer perceptron (MLP) realizes study notes from scratch