[Pytorch] Learning record (seven) MNIST multi-category problem

When dealing with multi-classification problems , a classifier called softmax is used to classify the output results into [0,1] . This lecture will mainly start with the softmax classifier to implement multi-classification problems. In the previous chapter, we performed a binary classification on the diabetes model. In the binary classification problem, only one probability needs to be output, and the other probability can be obtained by subtracting 1. But multi-classification needs to output multiple probabilities.

This time we use the MNIST handwritten digit dataset. First, let's take a look at what their output would look like if there were ten categories . If there are ten categories, then the output of these 10 probabilities should be sum = 1 and all > 0. But in some cases, there may be such a situation as P(y=1)=0.8, P(y=2)=0.9, so when we find P(y=1)=0.8, we need to calculate the probability of the following situation to suppress.

The result calculated by the neural network may be less than 0, and the sum may not be 1, so we will invite softmax below. The formula of softmax is as follows:

P(y=i)=\frac{e^{z_i}}{\sum_{j=0}^{k-1}e^{z_j}},i\epsilon(0,...,K-1)

Among them, z_l is the output of the last layer of the linear layer, the function of e^zi is to force it to be >0, and the function of the denominator is to ensure that the probability is 1 after calculating Σ, so that the functional requirements are realized. A schematic diagram of the role of softmax is shown in Figure 1.

Figure 1 Softmax processing principle

Next we will solve the loss function in the multi-classification problem. In multi-classification problems, the loss function is very simple, that is, take the log of the probability value of the predicted label as 1 and add a negative sign.

Figure 2 loss function

 The following is the code part for finding the loss value:

import numpy as np

y = np.array([1, 0, 0])
z = np.array([0.2, 0.1, -0.1])
y_pred = np.exp(z) / np.exp(z).sum()
loss = (-y * np.log(y_pred)).sum()

print(loss)

Pytorch provides a ready-made cross-entropy loss function framework, which includes the whole process from softmax to output, so when inputting, you only need to input the original results calculated by the neural network into the framework, and no activation is required. The following is the complete code for calculating the loss value:

import torch

criterion = torch.nn.CrossEntropyLoss ()
Y = torch. LongTensor ([2, 0, 1])
Y_predl = torch.Tensor([[0.1,0.2, 0.9],                  
                        [1.1, 0.1, 0.2],                  
                        [0.2, 2.1, 0.1]])                  
Y_pred2 = torch.Tensor([[0.8, 0.2, 0.3],                  
                        [0.2, 0.3, 0.5],                  
                        [0.2, 0.2, 0.5]])   

l1 = criterion(Y_predl, Y)                                    
l2 = criterion(Y_pred2, Y)                                    
print("Batch Lossl =", l1.data,"\nBatch Loss2 =",l2.data)

In this example, [2, 0, 1] represents three labels. 2 means that the third probability of the predicted result is the largest, and 0 means that the first is the largest. In y1, we can see that [0.1, 0.2, 0.9] obviously has the third highest probability, which corresponds to 2 in [2, 0, 1]. In y2, it is obviously a random guess, which cannot be matched at all, so the loss value of y2 must be very large.

The running results of the two loss values ​​are as follows:

Batch Lossl = tensor(0.4966) 
Batch Loss2 = tensor(1.2389)

Let's look back at the multi-classification problem of MNIST handwritten digits. In the examples we talked about before, the input is a vector, and here are the input pictures. It's not difficult, you just need to map it to an image tensor, as shown in Figure 3.

Figure 3 Image to image tensor

 The following is the preparation of the data set, where Normalize is to normalize the data (that is, map to between 0 and 1), and the average and standard deviation are numbers obtained after a large number of calculations based on the MNIST data set. ToTensor represents the mapping channel, just write it directly.

Figure 4 Dataset preparation

Next, let's look at the construction of the model. Since the input is a 28×28=784, we need to tile the matrix row by row into a row, that is, a vector with 1 row and 784 columns. One row of vectors cannot meet the input requirements. Here we use the view() method to change the shape of the tensor to a second-order one. The first parameter is set to -1 to allow it to be calculated automatically. Then, the reason we need to drop the 784 down to 512, 256, 128, 64, 10, to 10 is that the output is 10 possible. The reason you can't drop it to 10 in one go is that you lose too much information to train effectively . During this period, the ReLU activation function is interspersed to complete the construction of the model. The process is shown in Figure 5.

Figure 5 Process of model building

 The following is the code implementation of the model construction:

class Net (torch.nn.Module) :
    def __init__(self) :
        super(Net, self).__init__()
        self.l1 = torch.nn.Linear(784, 512)
        self.l2 = torch.nn.Linear(512, 256)
        self.l3 = torch.nn.Linear(256, 128)
        self.l4 = torch.nn.Linear(128, 64)
        self.l5 = torch.nn.Linear(64, 10)

    def forward(self, x) :
        x = x.view(-1, 784)
        x = F.relu(self.l1(x))
        x = F.relu(self.l2(x))
        x = F.relu(self.l3(x))
        x = F.relu(self.l4(x))
        return self.l5(x)    # 最后一层不做激活

model = Net()

The loss function and optimizer selected here need some changes. The cross-entropy loss is used as a method to calculate the loss function. For gradient descent, we use impulse. The purpose of introducing impulse is to speed up the rate of gradient descent and break through the local minimum problem.

criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(),lr = 0.01,momentum = 0.5)

The training process will not be explained too much, the entire code is as follows:

import torch
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch.optim as optim
 
# prepare dataset
 
batch_size = 64
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]) # 归一化,均值和方差
 
train_dataset = datasets.MNIST(root='../dataset/mnist/', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size)
test_dataset = datasets.MNIST(root='../dataset/mnist/', train=False, download=True, transform=transform)
test_loader = DataLoader(test_dataset, shuffle=False, batch_size=batch_size)
 
# design model using class
 
 
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.l1 = torch.nn.Linear(784, 512)
        self.l2 = torch.nn.Linear(512, 256)
        self.l3 = torch.nn.Linear(256, 128)
        self.l4 = torch.nn.Linear(128, 64)
        self.l5 = torch.nn.Linear(64, 10)
 
    def forward(self, x):
        x = x.view(-1, 784)  # -1其实就是自动获取mini_batch
        x = F.relu(self.l1(x))
        x = F.relu(self.l2(x))
        x = F.relu(self.l3(x))
        x = F.relu(self.l4(x))
        return self.l5(x)  # 最后一层不做激活,不进行非线性变换
 
 
model = Net()
 
# construct loss and optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
 
# training cycle forward, backward, update
 
 
def train(epoch):
    running_loss = 0.0
    for batch_idx, data in enumerate(train_loader, 0):
        # 获得一个批次的数据和标签
        inputs, target = data
        optimizer.zero_grad()
        # 获得模型预测结果(64, 10)
        outputs = model(inputs)
        # 交叉熵代价函数outputs(64,10),target(64)
        loss = criterion(outputs, target)
        loss.backward()
        optimizer.step()
 
        running_loss += loss.item()
        if batch_idx % 300 == 299:
            print('[%d, %5d] loss: %.3f' % (epoch+1, batch_idx+1, running_loss/300))
            running_loss = 0.0
 
 
def test():
    correct = 0
    total = 0
    with torch.no_grad():
        for data in test_loader:
            images, labels = data
            outputs = model(images)
            _, predicted = torch.max(outputs.data, dim=1) # dim = 1 列是第0个维度,行是第1个维度
            total += labels.size(0)
            correct += (predicted == labels).sum().item() # 张量之间的比较运算
    print('accuracy on test set: %d %% ' % (100*correct/total))
 
 
if __name__ == '__main__':
    for epoch in range(10):
        train(epoch)
        test()

Guess you like

Origin blog.csdn.net/m0_55080712/article/details/122905682
Recommended