PyTorch deep learning practice (6) - neural network performance optimization technology

0. Preface

We've learned the basic concepts of neural networks and seen how to build practical neural network models using the PyTorch library . At the same time, we also mentioned that there are various hyperparameters that can affect the accuracy of neural networks. In this section, we will use Fashion MNISTthe dataset to build a neural network model for image classification tasks, and compare the performance of the model trained with different parameters.

1. Data preparation

1.1 Dataset analysis

Fashion MNISTThe dataset is a classic dataset for image classification tasks, which contains 10images of fashion clothing in categories. Each sample is a 28x28grayscale image of pixels, and there are a total of 60000training samples and 10000testing samples. Due to its simplicity and ease of use, Fashion MNISTthe dataset has become one of the commonly used benchmark datasets in academia and researchers, which can be used to verify the performance of image classification algorithms.

1.2 Dataset loading

(1) First download the data set and import the relevant library, torchvisionwhich contains multiple machine learning data sets, including Fashion MNISTthe data set:

from torchvision import datasets
import torch
data_folder = './data/FMNIST' # This can be any directory you want to download FMNIST to
fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)

data_folderIn the above code, the folder ( ) where the downloaded dataset is to be stored is specified . Next, use the datasets.FashionMNISTto fetch fmnistthe data and store it in the data_folder. Also, train = Truespecify to download only the training images via the parameter .

(2) Next, fmnist.datastore the images available in tr_imagesand store the corresponding image labels ( fmnist.targets) as tr_targets:

tr_images = fmnist.data
tr_targets = fmnist.targets

(3) Check the loaded tensor data:

unique_values = tr_targets.unique()
print(f'tr_images & tr_targets:\n\tX - {
      
      tr_images.shape}\n\tY - {
      
      tr_targets.shape}\n\tY - Unique Values : {
      
      unique_values}')
print(f'TASK:\n\t{
      
      len(unique_values)} class Classification')
print(f'UNIQUE CLASSES:\n\t{
      
      fmnist.classes}')

The code output is as follows:

tr_images & tr_targets:
        X - torch.Size([60000, 28, 28])
        Y - torch.Size([60000])
        Y - Unique Values : tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
TASK:
        10 class Classification
UNIQUE CLASSES:
        ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

In the above results, it can be seen that the training data set has 60,000images, each image is of size 28 x 28, and contains 10possible categories, tr_targetsincluding the category label (in numerical value) for each image, and fmnist.classesrepresents tr_targetsthe category name corresponding to each numerical value in .

(4) Draw random image samples.

Import related libraries for drawing images and processing image arrays:

import matplotlib.pyplot as plt
import numpy as np

To create an 10 x 10image grid of where each row of the grid corresponds to a category, iterate over all categories ( label_class) and get the row index corresponding to a given category ( label_x_rows):

R, C = len(tr_targets.unique()), 10
fig, ax = plt.subplots(R, C, figsize=(10,10))
for label_class, plot_row in enumerate(ax):
    label_x_rows = np.where(tr_targets == label_class)[0]

In the code above, np.wherethe index of 0the output is taken (since its output has length 1), which contains all indices where the target value ( tr_targets) is equal to .label_class

Filling all the image grids in a loop times, we pick a random value ( ) 10from the previously obtained index ( ) of the given class and draw:label_x_rowsix

    for plot_cell in plot_row:
        plot_cell.grid(False); plot_cell.axis('off')
        ix = np.random.choice(label_x_rows)
        x, y = tr_images[ix], tr_targets[ix]
        plot_cell.imshow(x, cmap='gray')

sample image

10In the figure above, each row represents a sample of different images belonging to the same class .

2. Using PyTorch to train neural networks

Next, you'll learn how to use PyTorchto train a neural network to predict image categories from input images. In addition, we will also understand the impact of various hyperparameters on the prediction accuracy of the model.

2.1 Neural Network Training Process

To PyTorchtrain a neural network using , the following steps are usually required:

Import related libraries
Build the dataset, taking one data point at a time
Use DataLoaderthe wrapper dataset
Build the model, and define the loss function and optimizer
Define two functions for training and validation on a batch of data respectively
Define a function for the accuracy of model predictions
Update the model weights during each batch of data training, and epochtrain the model through multiple iterations

2.2 PyTorch neural network training

(1) Import related libraries and Fashion MNISTdatasets:

from torch.utils.data import Dataset, DataLoader
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

device = "cuda" if torch.cuda.is_available() else "cpu"
from torchvision import datasets
data_folder = './data/FMNIST' # This can be any directory you want to download FMNIST to
fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)
tr_images = fmnist.data
tr_targets = fmnist.targets

(2) Build a class for obtaining data sets, which inherits from the class Dataset, and needs to define the following three functions, __init__, __getitem__and __len__:

class FMNISTDataset(Dataset):
    def __init__(self, x, y):
        x = x.float()
        x = x.view(-1, 28*28)
        self.x, self.y = x, y
    def __getitem__(self, ix):
        x, y = self.x[ix], self.y[ix]
        return x.to(device), y.to(device)
    def __len__(self):
        return len(self.x)

In __init__the method, the input is converted to a floating-point number, and each image is flattened into 28*28 = 784values (where each value corresponds to a pixel value); __len__the number of data is specified in the method; __getitem__the method is used to return ixthe data corresponding to the index (an integer ixbetween 0to __len__).

(3)FMNISTDataset Create a function to generate a training data from the dataset ( ) DataLoader, each trn_dlbatch of data contains randomly sampled 32data points:

def get_data():
    train = FMNISTDataset(tr_images, tr_targets)
    trn_dl = DataLoader(train, batch_size=32, shuffle=True)
    return trn_dl

FMNISTDatasetIn the above code, an object of class is created trainand called DataLoaderto make it randomly obtain 32data points and return to training DataLoader.

(4) Define the model, as well as the loss function and optimizer:

from torch.optim import SGD
def get_model():
    model = nn.Sequential(
        nn.Linear(28*28, 1000),
        nn.ReLU(),
        nn.Linear(1000, 10)
    ).to(device)
    loss_fn = nn.CrossEntropyLoss()
    optimizer = SGD(model.parameters(), lr=1e-2)
    return model, loss_fn, optimizer

The model uses a 1,000hidden layer with neurons, and the output layer contains 10neurons, corresponding to 10possible classes. Since the output represents 10the probability that the input image belongs to the class, CrossEntropyLossthe loss function is called. Finally, lrinitialize the learning rate to 0.01, instead of using the default value 0.001.
The " " function is not used in neural networks softmax(so the model output range is not limited, while cross-entropy loss usually expects the output to be probabilities - the sum of predictions for each image 1), because nn.CrossEntropyLossaccepts raw logits(i.e. unconstrained values) and performs it internally softmax.

(5) Define the function that will train the model on a batch of images:

def train_batch(x, y, model, optimizer, loss_fn):
    model.train()
    # call your model like any python function on your batch of inputs
    prediction = model(x)
    # compute loss
    batch_loss = loss_fn(prediction, y)
    # based on the forward pass in `model(x)` compute all the gradients of 'model.parameters()'
    batch_loss.backward()
    # apply new-weights = f(old-weights, old-weight-gradients) where "f" is the optimizer
    optimizer.step()
    # Flush gradients memory for next batch of calculations
    optimizer.zero_grad()
    return batch_loss.item()

In the forward propagation, the input image is processed by the model, the input batch data loss is calculated, the gradient is calculated and the weight is updated through the back propagation, and the memory of the gradient is finally refreshed so as not to affect the calculation of the gradient in the next pass. Scalar loss values can batch_lossbe extracted by getting on top of .batch_loss.item()

(6) Write a function to calculate the accuracy of the model on a given data set:

@torch.no_grad()
def accuracy(x, y, model):
    model.eval()
    # get the prediction matrix for a tensor of `x` images
    prediction = model(x)
    # compute if the location of maximum in each row coincides with ground truth
    max_values, argmaxes = prediction.max(-1)
    is_correct = argmaxes == y
    return is_correct.cpu().numpy().tolist()

In the above code, @torch.no_grad()it is explicitly stated that no gradient calculation is required by using . Call prediction.max(-1)to identify the corresponding argmaxindex for each row; in addition, compare argmaxes == ythe predicted result argmaxeswith the true value ( ground true) to check whether the prediction was correct. Finally, is_correctthe list of objects is moved into CPUand converted to numpyan array and returned.

(7) Training neural network.

First initialize the model, loss, optimizer and data loader:

trn_dl = get_data()
model, loss_fn, optimizer = get_model()

epochRecord the accuracy and loss values at the end of each :

losses, accuracies = [], []

Define the number of model training epoch:

for epoch in range(10):
    print(epoch)

The initialization list is used to record epochthe accuracy and loss values corresponding to each batch of data in a :

    epoch_losses, epoch_accuracies = [], []

Create batches of training data by iterating DataLoader:

    for ix, batch in enumerate(iter(trn_dl)):
        x, y = batch

Use the function to train the model with batches of data, and store train_batchthe loss value at the end of the batch training in the list:batch_lossepoch_losses

        batch_loss = train_batch(x, y, model, optimizer, loss_fn)
        epoch_losses.append(batch_loss)

Store epochthe average loss over all batches trained in a :

    epoch_loss = np.array(epoch_losses).mean()

Compute the predicted accuracy at the end of training for all batches:

    for ix, batch in enumerate(iter(trn_dl)):
        x, y = batch
        is_correct = accuracy(x, y, model)
        epoch_accuracies.extend(is_correct)
    epoch_accuracy = np.mean(epoch_accuracies)

epochStore the loss and accuracy values at the end of each in a list:

    losses.append(epoch_loss)
    accuracies.append(epoch_accuracy)

(8) Draw training loss and accuracy over time:

epochs = np.arange(10)+1
plt.figure(figsize=(20,5))
plt.subplot(121)
plt.title('Loss value over increasing epochs')
plt.plot(epochs, losses, label='Training Loss')
plt.legend()
plt.subplot(122)
plt.title('Accuracy value over increasing epochs')
plt.plot(epochs, accuracies, label='Training Accuracy')
plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()]) 
plt.legend()
plt.show()

Loss and accuracy over time

When training 5a epoch, the training accuracy of the model is 15%, and with epochthe increase of , the loss value does not decrease significantly. In other words, no matter how long you retrain, the accuracy of the model is unlikely to increase significantly.
Now that we have a complete understanding of the complete process of training a neural network, let's get better model performance by fine-tuning the hyperparameters.

3. Scale the dataset

Scaling a dataset is the process of ensuring that variables are constrained within a given range, to ensure that the data are not spread over large intervals. In this section, we constrain the values of the independent variable between and by dividing each input value by the largest possible value in the 0dataset 1. In general, scaling the input dataset can improve the performance of the neural network.

(1) Obtain a dataset, including training images and their labels:

from torch.utils.data import Dataset, DataLoader
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

device = "cuda" if torch.cuda.is_available() else "cpu"
from torchvision import datasets
data_folder = './data/FMNIST' 
fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)
tr_images = fmnist.data
tr_targets = fmnist.targets

(2) Modify FMNISTDatasetthe class of the acquired data, divide the input image by 255(maximum pixel intensity value):

class FMNISTDataset(Dataset):
    def __init__(self, x, y):
        x = x.float() / 255.
        x = x.view(-1, 28*28)
        self.x, self.y = x, y
    def __getitem__(self, ix):
        x, y = self.x[ix], self.y[ix]
        return x.to(device), y.to(device)
    def __len__(self):
        return len(self.x)

Compared to the previous subsection, the only modification required is to divide the input data by the largest possible pixel value ( 255), dividing them by 255will give values between 0and 1.

(3) To train the model, first obtain the data, define the model and the data used to train and verify the data, then train the model, and finally draw the changes in loss and accuracy during training:


def get_data():
    train = FMNISTDataset(tr_images, tr_targets)
    trn_dl = DataLoader(train, batch_size=32, shuffle=True)
    return trn_dl

trn_dl = get_data()
model, loss_fn, optimizer = get_model()

losses, accuracies = [], []
for epoch in range(10):
    print(epoch)
    epoch_losses, epoch_accuracies = [], []
    for ix, batch in enumerate(iter(trn_dl)):
        x, y = batch
        batch_loss = train_batch(x, y, model, optimizer, loss_fn)
        epoch_losses.append(batch_loss)
    epoch_loss = np.array(epoch_losses).mean()
    for ix, batch in enumerate(iter(trn_dl)):
        x, y = batch
        is_correct = accuracy(x, y, model)
        epoch_accuracies.extend(is_correct)
    epoch_accuracy = np.mean(epoch_accuracies)
    losses.append(epoch_loss)
    accuracies.append(epoch_accuracy)

epochs = np.arange(10)+1
import matplotlib.pyplot as plt
plt.figure(figsize=(20,5))
plt.subplot(121)
plt.title('Loss value over increasing epochs')
plt.plot(epochs, losses, label='Training Loss')
plt.legend()
plt.subplot(122)
plt.title('Accuracy value over increasing epochs')
plt.plot(epochs, accuracies, label='Training Accuracy')
plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()]) 
plt.legend()
plt.show()

Loss and accuracy over time

As shown in the figure above, the training loss is continuously reduced and the training accuracy is continuously improved, which can increase the accuracy to approx 85%.

Next, we'll look at why scaling the dataset can lead to better neural network performance. Assuming the input data is unscaled, take the computed sigmoidvalue as an example:

enter	Weights	sigmoid value
255	0.01	0.93
255	0.1	1.00
255	0.2	1.00
255	0.4	1.00
255	0.8	1.00
255	1.6	1.00
255	3.2	1.00
255	6.4	1.00

In the above table, even if the weight value changes between 0.01to , the output does not change much after passing through the function. The calculation formula of the function is as follows: $\frac1 {1+e^{-(w*x + b)}}$ 6.4SigmoidSigmoid
$o u tp u t = \frac{1}{1 + e ^{- (w * x + b)}}$
in that $w$ is the weight, $x$ is the input, $b$ is the bias value. SigmoidThe reason why the output does not change is due to $w * The product of x$ is large (because $x$ is large), causingSigmoidthe values to always fallSigmoidin the saturated part of the curve (Sigmoidvalues in the upper right or lower left corner of the curve are called the saturated part).
If we multiply different weight values by a smaller input number like this:

enter	Weights	sigmoid value
1	0.01	0.50
1	0.1	0.52
1	0.2	0.55
1	0.4	0.60
1	0.8	0.69
1	1.6	0.83
1	3.2	0.96
1	6.4	1.00

SigmoidThe outputs in the above table vary widely due to the small input values . With this example, we see the effect of scaling an input on a dataset, when weights (assuming the weights do not have large ranges) are multiplied by input values, the resulting value range space is not abrupt enough for the input data to have a significant enough impact on the output.
When the weight value is also large, the influence of the input value on the output will also become less important. Therefore, we generally initialize the weight values to smaller values that are closer to zero. At the same time, in order to obtain the best weight value, the range of setting the initial weight usually does not change much, for example, the weight is initialized to a random value between and -1.+1

4. Modify the optimizer

Different optimizers may also affect the speed at which the model learns to fit the input and output. In this section, we will understand the impact of modifying the optimizer on the accuracy of the model. To facilitate comparison of stochastic gradient descent ( Stochastic Gradient Descent, SGD) and performance on Adammore , modify to .epochepoch20

(1) Modify the optimizer, get_model()use SGDthe optimizer in the function, and ensure that other settings remain unchanged:

from torch.optim import SGD, Adam
def get_model():
    model = nn.Sequential(
        nn.Linear(28 * 28, 1000),
        nn.ReLU(),
        nn.Linear(1000, 10)
    ).to(device)

    loss_fn = nn.CrossEntropyLoss()
    optimizer = SGD(model.parameters(), lr=1e-2)
    return model, loss_fn, optimizer

(2) Increase the number of training models epoch:

trn_dl, val_dl = get_data()
model, loss_fn, optimizer = get_model()

train_losses, train_accuracies = [], []
val_losses, val_accuracies = [], []
for epoch in range(20):
    print(epoch)
    train_epoch_losses, train_epoch_accuracies = [], []
    for ix, batch in enumerate(iter(trn_dl)):
        x, y = batch
        batch_loss = train_batch(x, y, model, optimizer, loss_fn)
        train_epoch_losses.append(batch_loss) 
    train_epoch_loss = np.array(train_epoch_losses).mean()

    for ix, batch in enumerate(iter(trn_dl)):
        x, y = batch
        is_correct = accuracy(x, y, model)
        train_epoch_accuracies.extend(is_correct)
    train_epoch_accuracy = np.mean(train_epoch_accuracies)
    for ix, batch in enumerate(iter(val_dl)):
        x, y = batch
        val_is_correct = accuracy(x, y, model)
        validation_loss = val_loss(x, y, model, loss_fn)
    val_epoch_accuracy = np.mean(val_is_correct)
    train_losses.append(train_epoch_loss)
    train_accuracies.append(train_epoch_accuracy)
    val_losses.append(validation_loss)
    val_accuracies.append(val_epoch_accuracy)

epochs = np.arange(20)+1
import matplotlib.ticker as mtick
import matplotlib.ticker as mticker
plt.subplot(121)
plt.plot(epochs, train_losses, 'bo', label='Training loss')
plt.plot(epochs, val_losses, 'r', label='Validation loss')
plt.gca().xaxis.set_major_locator(mticker.MultipleLocator(1))
plt.title('Training and validation loss with SGD optimizer')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid('off')
plt.subplot(122)
plt.plot(epochs, train_accuracies, 'bo', label='Training accuracy')
plt.plot(epochs, val_accuracies, 'r', label='Validation accuracy')
plt.gca().xaxis.set_major_locator(mticker.MultipleLocator(1))
plt.title('Training and validation accuracy with SGD optimizer')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()]) 
plt.legend()
plt.grid('off')
plt.show()

SGDAfter making these changes, the change in accuracy and loss on the training and validation datasets for the optimizer is as follows:

Changes in Accuracy and Loss

When the optimizer is Adam, the accuracy and loss values on the training and validation datasets change as follows:

Changes in Accuracy and Loss

Typically the optimizer achieves best accuracy faster than other optimizers Adam, some other available optimizers include Adagrad, Adadelta, AdamW, LBFGSand RMSprop.

5. Building a Deep Neural Network

So far, the neural network architectures we have built have only one hidden layer. In this section, we compare the performance of neural network models with two hidden layers and no hidden layers.

(1) Construct a neural network model containing two hidden layers:

def get_model():
    model = nn.Sequential(
        nn.Linear(28 * 28, 1000),
        nn.ReLU(),
        nn.Linear(1000, 512),
        nn.ReLU(),
        nn.Linear(512, 10)
    ).to(device)

    loss_fn = nn.CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=1e-3)
    return model, loss_fn, optimizer

(2) Similarly, modify get_model()the function to build a neural network without a hidden layer, and connect the input directly to the output layer:

from torch.optim import SGD, Adam
def get_model():
    model = nn.Sequential(
        nn.Linear(28 * 28, 10)
    ).to(device)

    loss_fn = nn.CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=1e-3)
    return model, loss_fn, optimizer

The accuracy and loss changes for the training and validation datasets are as follows:

Variation in accuracy and loss for training and validation datasets

From the above results it can be seen that:

When there are no hidden layers, the model cannot learn
The model overfits more severely when there are two hidden layers than when there is one hidden layer

A deep neural network means that there are multiple hidden layers between the input layer and the output layer. Multiple hidden layers ensure that the neural network can learn complex non-linear relationships between inputs and outputs, which cannot be done with simple neural networks (due to the limited number of hidden layers).

summary

Neural network performance optimization technology refers to the method of improving the performance and generalization ability of the neural network by improving the structure, parameter initialization, regularization and training process of the neural network. This section first trains a simple fully connected network, and then introduces simple and effective neural network performance improvement techniques on this basis. In subsequent studies, common techniques including batch normalization and dynamic learning rate will be further introduced.

series link

PyTorch Deep Learning Combat (1) - Neural Network and Model Training Process Detailed
PyTorch Deep Learning Combat (2) - PyTorch Basics
PyTorch Deep Learning Combat (3) - Using PyTorch to Build a Neural Network PyTorch Deep Learning Combat (4 )
- Commonly used activation functions and loss functions in detail