Pytorch implements convolutional neural network handwritten digit recognition (MNIST)

This article uses pytorch to complete a very classic task - handwritten digit recognition. The data set is the handwritten digit set MNIST constructed by LeCun et al. in the 1990s. The focus of this article is on the data processing, because although the torchvision library can directly load the training data, the data is invisible in this process, so this article uses the dataset and dataloader library to load the data, and you can experience the data processing process from it.

        First reference all the libraries needed for this project, and define some hyperparameters of this project.

import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import ExponentialLR
import torch.nn as nn


epochs = 10
batch_size = 128
learning_rate = 0.01

1 Data processing

1.1 Dataset structure

        The MNIST data set can be downloaded from the official page . The entire data set is divided into four files: training/testing pictures and training/testing labels. These four files are stored in binary form. For image files, the first 16 bits are all kinds of information of the image set, and the pixel information of the image is stored in the following order; for label files, the first 8 bits are It is various information of the tag set, and the tag value is stored behind it.

1.2 Load training data

        Download and unzip the four files to start reading data. According to the introduction in 1.1, we read the data set into numpy array after processing.

#数据集路径
images_train = 'D:\\datasets\\MNIST\\train-images.idx3-ubyte'
images_test = 'D:\\datasets\\MNIST\\t10k-images.idx3-ubyte'
labels_train = 'D:\\datasets\\MNIST\\train-labels.idx1-ubyte'
labels_test = 'D:\\datasets\\MNIST\\t10k-labels.idx1-ubyte'


#读取数据到numpy数组
def read_images(dir):
    with open(dir, 'rb') as f:
        f.read(16)
        tmp = f.read()
        images = np.frombuffer(tmp, dtype=np.uint8).astype("float32").reshape(int(len(tmp)/(28*28)), 1, 28, 28)
    return images


def read_labels(dir):
    with open(dir, 'rb') as f:
        f.read(8)
        tmp = f.read()
        labels = np.frombuffer(tmp, dtype=np.uint8).astype("int64")
    return labels

        After processing by the function read_images, the image set will be processed into a numpy array of (C, 1, H, W). C represents the number of pictures; for each picture, its size is (1, H, W), where H and W are the height and width of the picture, respectively. The reason for processing the picture set as (C, 1, H, W) instead of (C, H, W) is that the features extracted by the network in the future will be high-dimensional, and an array dimension needs to be reserved.

        After processing by the function read_labels, the label set will be processed as a one-dimensional vector.

        At this point, we have read the image and tag information respectively. Next, we need to establish the corresponding relationship between pictures and labels one by one, which can be done with the Dataset (torch.utils.data.Dataset) in pytorch. Note that when we create a dataset, we must first inherit the Dataset class, and according to the official documentation , we also need to rewrite __getitem__() and __len__(). As you can see, the correspondence between images and labels is built in __getitem__().

        After establishing the corresponding relationship, we also need to build an iterator through DataLoader (torch.utils.data.Dataloader), which can divide the entire data set into multiple batches according to batch_size, and iteratively send these data in batches during training into the network.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("Running on", device, '\n')


class dataset(Dataset):
    def __init__(self, images, labels):
        self.images = torch.tensor(images).to(device)
        self.labels = torch.tensor(labels).to(device)

    def __getitem__(self, item):
        return self.images[item], self.labels[item]

    def __len__(self):
        return self.images.shape[0]


train_data = dataset(read_images(images_train), read_labels(labels_train))
train_dataloader = DataLoader(train_data, batch_size=batch_size, shuffle=True, drop_last=False)
test_data = dataset(read_images(images_test), read_labels(labels_test))
test_dataloader = DataLoader(test_data, batch_size=batch_size, shuffle=False, drop_last=False)

        The first two lines of this code specify the device on which the code will run. It should be noted here that to use gpu, we need to store both data and network weights on the gpu.

2 Network Construction

        The network part is the focus of this task, and this part can be designed by itself, such as MyNet designed by myself:

class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        #(1, 28, 28) -> (32, 12, 12)
        self.conv1 = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=1),
                                   nn.BatchNorm2d(32),
                                   nn.ReLU(),
                                   nn.MaxPool2d(kernel_size=2))

        #(32, 12, 12) -> (64, 23, 23)
        self.convtrans = nn.Sequential(nn.ConvTranspose2d(in_channels=32, out_channels=64, kernel_size=2, stride=4),
                                       nn.BatchNorm2d(64),
                                       nn.ReLU(),
                                       nn.MaxPool2d(kernel_size=2))

        #(64, 23, 23) -> (128, 5, 5)
        self.conv2 = nn.Sequential(nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=2),
                                   nn.BatchNorm2d(128),
                                   nn.ReLU(),
                                   nn.MaxPool2d(kernel_size=2))

        #(128, 5, 5) -> (3200,)
        self.flatten = nn.Flatten(start_dim=1, end_dim=-1)

        self.linear = nn.Linear(3200, 10)

    def forward(self, input):
        input = self.conv1(input)
        input = self.convtrans(input)
        input = self.conv2(input)
        input = self.flatten(input)
        output = self.linear(input)
        return output

 3 Model loading

        This part instantiates the defined network and loads the model onto the computing device (cpu or gpu). Also defines the optimizer, learning rate decay strategy and loss function.

model = MyNet()
model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
scheduler = ExponentialLR(optimizer, gamma=0.7)
loss_calculator = nn.CrossEntropyLoss().to(device)

 4 Training and Testing

        In this article, a test is performed after training 100 batches to observe the training situation. The training and testing code is as follows:

def train(train_dataloader, test_dataloader, optimizer, loss_calculator):
    for epoch in range(epochs):
        for batch, (images, labels) in enumerate(train_dataloader):
            preds = model(images)
            loss = loss_calculator(preds, labels)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            if batch % 100 == 0:
                print("Epoch: {: <5}Batch: {: <8}Train_loss: {: <15.6f}".format(epoch, batch, float(loss.data)), end='')
                test(test_dataloader)
        scheduler.step()


def test(test_dataloader):
    match = 0
    total = 0
    with torch.no_grad():
        for imgs, labels in test_dataloader:
            preds = model(imgs)
            preds = torch.max(preds, 1)[1]
            match += int((preds == labels).sum())
            total += int(labels.size()[0])
    accuracy = match/total
    print('Test accuracy: {:.2f}%'.format(accuracy*100))

5 Run the code and save the model

train(train_dataloader, test_dataloader, optimizer, loss_calculator)
torch.save(model, "MyNet_BS{}_LR{}_ExpLR{}.pth".format(batch_size, learning_rate, 0.7))

There are many ways to save and load the model, which will not be described in this article.

The complete code of this article is here: https://download.csdn.net/download/diqiudq/85593886 , if the points are not enough, you can copy and splicing all the above code segments into one document to run.

Guess you like

Origin blog.csdn.net/diqiudq/article/details/124419577