PyTorch deep learning practice (6) - neural network performance optimization technology
0. Preface
We've learned the basic concepts of neural networks and seen how to build practical neural network models using the PyTorch library . At the same time, we also mentioned that there are various hyperparameters that can affect the accuracy of neural networks. In this section, we will use Fashion MNIST
the dataset to build a neural network model for image classification tasks, and compare the performance of the model trained with different parameters.
1. Data preparation
1.1 Dataset analysis
Fashion MNIST
The dataset is a classic dataset for image classification tasks, which contains 10
images of fashion clothing in categories. Each sample is a 28x28
grayscale image of pixels, and there are a total of 60000
training samples and 10000
testing samples. Due to its simplicity and ease of use, Fashion MNIST
the dataset has become one of the commonly used benchmark datasets in academia and researchers, which can be used to verify the performance of image classification algorithms.
1.2 Dataset loading
(1) First download the data set and import the relevant library, torchvision
which contains multiple machine learning data sets, including Fashion MNIST
the data set:
from torchvision import datasets
import torch
data_folder = './data/FMNIST' # This can be any directory you want to download FMNIST to
fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)
data_folder
In the above code, the folder ( ) where the downloaded dataset is to be stored is specified . Next, use the datasets.FashionMNIST
to fetch fmnist
the data and store it in the data_folder
. Also, train = True
specify to download only the training images via the parameter .
(2) Next, fmnist.data
store the images available in tr_images
and store the corresponding image labels ( fmnist.targets
) as tr_targets
:
tr_images = fmnist.data
tr_targets = fmnist.targets
(3) Check the loaded tensor data:
unique_values = tr_targets.unique()
print(f'tr_images & tr_targets:\n\tX - {
tr_images.shape}\n\tY - {
tr_targets.shape}\n\tY - Unique Values : {
unique_values}')
print(f'TASK:\n\t{
len(unique_values)} class Classification')
print(f'UNIQUE CLASSES:\n\t{
fmnist.classes}')
The code output is as follows:
tr_images & tr_targets:
X - torch.Size([60000, 28, 28])
Y - torch.Size([60000])
Y - Unique Values : tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
TASK:
10 class Classification
UNIQUE CLASSES:
['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
In the above results, it can be seen that the training data set has 60,000
images, each image is of size 28 x 28
, and contains 10
possible categories, tr_targets
including the category label (in numerical value) for each image, and fmnist.classes
represents tr_targets
the category name corresponding to each numerical value in .
(4) Draw random image samples.
Import related libraries for drawing images and processing image arrays:
import matplotlib.pyplot as plt
import numpy as np
To create an 10 x 10
image grid of where each row of the grid corresponds to a category, iterate over all categories ( label_class
) and get the row index corresponding to a given category ( label_x_rows
):
R, C = len(tr_targets.unique()), 10
fig, ax = plt.subplots(R, C, figsize=(10,10))
for label_class, plot_row in enumerate(ax):
label_x_rows = np.where(tr_targets == label_class)[0]
In the code above, np.where
the index of 0
the output is taken (since its output has length 1
), which contains all indices where the target value ( tr_targets
) is equal to .label_class
Filling all the image grids in a loop times, we pick a random value ( ) 10
from the previously obtained index ( ) of the given class and draw:label_x_rows
ix
for plot_cell in plot_row:
plot_cell.grid(False); plot_cell.axis('off')
ix = np.random.choice(label_x_rows)
x, y = tr_images[ix], tr_targets[ix]
plot_cell.imshow(x, cmap='gray')
10
In the figure above, each row represents a sample of different images belonging to the same class .
2. Using PyTorch to train neural networks
Next, you'll learn how to use PyTorch
to train a neural network to predict image categories from input images. In addition, we will also understand the impact of various hyperparameters on the prediction accuracy of the model.
2.1 Neural Network Training Process
To PyTorch
train a neural network using , the following steps are usually required:
- Import related libraries
- Build the dataset, taking one data point at a time
- Use
DataLoader
the wrapper dataset - Build the model, and define the loss function and optimizer
- Define two functions for training and validation on a batch of data respectively
- Define a function for the accuracy of model predictions
- Update the model weights during each batch of data training, and
epoch
train the model through multiple iterations
2.2 PyTorch neural network training
(1) Import related libraries and Fashion MNIST
datasets:
from torch.utils.data import Dataset, DataLoader
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
device = "cuda" if torch.cuda.is_available() else "cpu"
from torchvision import datasets
data_folder = './data/FMNIST' # This can be any directory you want to download FMNIST to
fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)
tr_images = fmnist.data
tr_targets = fmnist.targets
(2) Build a class for obtaining data sets, which inherits from the class Dataset
, and needs to define the following three functions, __init__
, __getitem__
and __len__
:
class FMNISTDataset(Dataset):
def __init__(self, x, y):
x = x.float()
x = x.view(-1, 28*28)
self.x, self.y = x, y
def __getitem__(self, ix):
x, y = self.x[ix], self.y[ix]
return x.to(device), y.to(device)
def __len__(self):
return len(self.x)
In __init__
the method, the input is converted to a floating-point number, and each image is flattened into 28*28 = 784
values (where each value corresponds to a pixel value); __len__
the number of data is specified in the method; __getitem__
the method is used to return ix
the data corresponding to the index (an integer ix
between 0
to __len__
).
(3)FMNISTDataset
Create a function to generate a training data from the dataset ( ) DataLoader
, each trn_dl
batch of data contains randomly sampled 32
data points:
def get_data():
train = FMNISTDataset(tr_images, tr_targets)
trn_dl = DataLoader(train, batch_size=32, shuffle=True)
return trn_dl
FMNISTDataset
In the above code, an object of class is created train
and called DataLoader
to make it randomly obtain 32
data points and return to training DataLoader
.
(4) Define the model, as well as the loss function and optimizer:
from torch.optim import SGD
def get_model():
model = nn.Sequential(
nn.Linear(28*28, 1000),
nn.ReLU(),
nn.Linear(1000, 10)
).to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = SGD(model.parameters(), lr=1e-2)
return model, loss_fn, optimizer
The model uses a 1,000
hidden layer with neurons, and the output layer contains 10
neurons, corresponding to 10
possible classes. Since the output represents 10
the probability that the input image belongs to the class, CrossEntropyLoss
the loss function is called. Finally, lr
initialize the learning rate to 0.01
, instead of using the default value 0.001
.
The " " function is not used in neural networks softmax
(so the model output range is not limited, while cross-entropy loss usually expects the output to be probabilities - the sum of predictions for each image 1
), because nn.CrossEntropyLoss
accepts raw logits
(i.e. unconstrained values) and performs it internally softmax
.
(5) Define the function that will train the model on a batch of images:
def train_batch(x, y, model, optimizer, loss_fn):
model.train()
# call your model like any python function on your batch of inputs
prediction = model(x)
# compute loss
batch_loss = loss_fn(prediction, y)
# based on the forward pass in `model(x)` compute all the gradients of 'model.parameters()'
batch_loss.backward()
# apply new-weights = f(old-weights, old-weight-gradients) where "f" is the optimizer
optimizer.step()
# Flush gradients memory for next batch of calculations
optimizer.zero_grad()
return batch_loss.item()
In the forward propagation, the input image is processed by the model, the input batch data loss is calculated, the gradient is calculated and the weight is updated through the back propagation, and the memory of the gradient is finally refreshed so as not to affect the calculation of the gradient in the next pass. Scalar loss values can batch_loss
be extracted by getting on top of .batch_loss.item()
(6) Write a function to calculate the accuracy of the model on a given data set:
@torch.no_grad()
def accuracy(x, y, model):
model.eval()
# get the prediction matrix for a tensor of `x` images
prediction = model(x)
# compute if the location of maximum in each row coincides with ground truth
max_values, argmaxes = prediction.max(-1)
is_correct = argmaxes == y
return is_correct.cpu().numpy().tolist()
In the above code, @torch.no_grad()
it is explicitly stated that no gradient calculation is required by using . Call prediction.max(-1)
to identify the corresponding argmax
index for each row; in addition, compare argmaxes == y
the predicted result argmaxes
with the true value ( ground true
) to check whether the prediction was correct. Finally, is_correct
the list of objects is moved into CPU
and converted to numpy
an array and returned.
(7) Training neural network.
First initialize the model, loss, optimizer and data loader:
trn_dl = get_data()
model, loss_fn, optimizer = get_model()
epoch
Record the accuracy and loss values at the end of each :
losses, accuracies = [], []
Define the number of model training epoch
:
for epoch in range(10):
print(epoch)
The initialization list is used to record epoch
the accuracy and loss values corresponding to each batch of data in a :
epoch_losses, epoch_accuracies = [], []
Create batches of training data by iterating DataLoader
:
for ix, batch in enumerate(iter(trn_dl)):
x, y = batch
Use the function to train the model with batches of data, and store train_batch
the loss value at the end of the batch training in the list:batch_loss
epoch_losses
batch_loss = train_batch(x, y, model, optimizer, loss_fn)
epoch_losses.append(batch_loss)
Store epoch
the average loss over all batches trained in a :
epoch_loss = np.array(epoch_losses).mean()
Compute the predicted accuracy at the end of training for all batches:
for ix, batch in enumerate(iter(trn_dl)):
x, y = batch
is_correct = accuracy(x, y, model)
epoch_accuracies.extend(is_correct)
epoch_accuracy = np.mean(epoch_accuracies)
epoch
Store the loss and accuracy values at the end of each in a list:
losses.append(epoch_loss)
accuracies.append(epoch_accuracy)
(8) Draw training loss and accuracy over time:
epochs = np.arange(10)+1
plt.figure(figsize=(20,5))
plt.subplot(121)
plt.title('Loss value over increasing epochs')
plt.plot(epochs, losses, label='Training Loss')
plt.legend()
plt.subplot(122)
plt.title('Accuracy value over increasing epochs')
plt.plot(epochs, accuracies, label='Training Accuracy')
plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()])
plt.legend()
plt.show()
When training 5
a epoch
, the training accuracy of the model is 15%
, and with epoch
the increase of , the loss value does not decrease significantly. In other words, no matter how long you retrain, the accuracy of the model is unlikely to increase significantly.
Now that we have a complete understanding of the complete process of training a neural network, let's get better model performance by fine-tuning the hyperparameters.
3. Scale the dataset
Scaling a dataset is the process of ensuring that variables are constrained within a given range, to ensure that the data are not spread over large intervals. In this section, we constrain the values of the independent variable between and by dividing each input value by the largest possible value in the 0
dataset 1
. In general, scaling the input dataset can improve the performance of the neural network.
(1) Obtain a dataset, including training images and their labels:
from torch.utils.data import Dataset, DataLoader
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
device = "cuda" if torch.cuda.is_available() else "cpu"
from torchvision import datasets
data_folder = './data/FMNIST'
fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)
tr_images = fmnist.data
tr_targets = fmnist.targets
(2) Modify FMNISTDataset
the class of the acquired data, divide the input image by 255
(maximum pixel intensity value):
class FMNISTDataset(Dataset):
def __init__(self, x, y):
x = x.float() / 255.
x = x.view(-1, 28*28)
self.x, self.y = x, y
def __getitem__(self, ix):
x, y = self.x[ix], self.y[ix]
return x.to(device), y.to(device)
def __len__(self):
return len(self.x)
Compared to the previous subsection, the only modification required is to divide the input data by the largest possible pixel value ( 255
), dividing them by 255
will give values between 0
and 1
.
(3) To train the model, first obtain the data, define the model and the data used to train and verify the data, then train the model, and finally draw the changes in loss and accuracy during training:
def get_data():
train = FMNISTDataset(tr_images, tr_targets)
trn_dl = DataLoader(train, batch_size=32, shuffle=True)
return trn_dl
trn_dl = get_data()
model, loss_fn, optimizer = get_model()
losses, accuracies = [], []
for epoch in range(10):
print(epoch)
epoch_losses, epoch_accuracies = [], []
for ix, batch in enumerate(iter(trn_dl)):
x, y = batch
batch_loss = train_batch(x, y, model, optimizer, loss_fn)
epoch_losses.append(batch_loss)
epoch_loss = np.array(epoch_losses).mean()
for ix, batch in enumerate(iter(trn_dl)):
x, y = batch
is_correct = accuracy(x, y, model)
epoch_accuracies.extend(is_correct)
epoch_accuracy = np.mean(epoch_accuracies)
losses.append(epoch_loss)
accuracies.append(epoch_accuracy)
epochs = np.arange(10)+1
import matplotlib.pyplot as plt
plt.figure(figsize=(20,5))
plt.subplot(121)
plt.title('Loss value over increasing epochs')
plt.plot(epochs, losses, label='Training Loss')
plt.legend()
plt.subplot(122)
plt.title('Accuracy value over increasing epochs')
plt.plot(epochs, accuracies, label='Training Accuracy')
plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()])
plt.legend()
plt.show()
As shown in the figure above, the training loss is continuously reduced and the training accuracy is continuously improved, which can increase the accuracy to approx 85%
.
Next, we'll look at why scaling the dataset can lead to better neural network performance. Assuming the input data is unscaled, take the computed sigmoid
value as an example:
enter | Weights | bias | sigmoid value |
---|---|---|---|
255 | 0.01 | 0 | 0.93 |
255 | 0.1 | 0 | 1.00 |
255 | 0.2 | 0 | 1.00 |
255 | 0.4 | 0 | 1.00 |
255 | 0.8 | 0 | 1.00 |
255 | 1.6 | 0 | 1.00 |
255 | 3.2 | 0 | 1.00 |
255 | 6.4 | 0 | 1.00 |
In the above table, even if the weight value changes between 0.01
to , the output does not change much after passing through the function. The calculation formula of the function is as follows: output = 1 1 + e − ( w ∗ x + b ) output = \frac1 {1+e^{-(w*x + b)}}6.4
Sigmoid
Sigmoid
output=1+e−(w∗x+b)1
in that www is the weight,xxx is the input,bbb is the bias value. Sigmoid
The reason why the output does not change is due tow ∗ xw*xw∗The product of x is large (becausexxx is large), causingSigmoid
the values to always fallSigmoid
in the saturated part of the curve (Sigmoid
values in the upper right or lower left corner of the curve are called the saturated part).
If we multiply different weight values by a smaller input number like this:
enter | Weights | bias | sigmoid value |
---|---|---|---|
1 | 0.01 | 0 | 0.50 |
1 | 0.1 | 0 | 0.52 |
1 | 0.2 | 0 | 0.55 |
1 | 0.4 | 0 | 0.60 |
1 | 0.8 | 0 | 0.69 |
1 | 1.6 | 0 | 0.83 |
1 | 3.2 | 0 | 0.96 |
1 | 6.4 | 0 | 1.00 |
Sigmoid
The outputs in the above table vary widely due to the small input values . With this example, we see the effect of scaling an input on a dataset, when weights (assuming the weights do not have large ranges) are multiplied by input values, the resulting value range space is not abrupt enough for the input data to have a significant enough impact on the output.
When the weight value is also large, the influence of the input value on the output will also become less important. Therefore, we generally initialize the weight values to smaller values that are closer to zero. At the same time, in order to obtain the best weight value, the range of setting the initial weight usually does not change much, for example, the weight is initialized to a random value between and -1
.+1
4. Modify the optimizer
Different optimizers may also affect the speed at which the model learns to fit the input and output. In this section, we will understand the impact of modifying the optimizer on the accuracy of the model. To facilitate comparison of stochastic gradient descent ( Stochastic Gradient Descent
, SGD
) and performance on Adam
more , modify to .epoch
epoch
20
(1) Modify the optimizer, get_model()
use SGD
the optimizer in the function, and ensure that other settings remain unchanged:
from torch.optim import SGD, Adam
def get_model():
model = nn.Sequential(
nn.Linear(28 * 28, 1000),
nn.ReLU(),
nn.Linear(1000, 10)
).to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = SGD(model.parameters(), lr=1e-2)
return model, loss_fn, optimizer
(2) Increase the number of training models epoch
:
trn_dl, val_dl = get_data()
model, loss_fn, optimizer = get_model()
train_losses, train_accuracies = [], []
val_losses, val_accuracies = [], []
for epoch in range(20):
print(epoch)
train_epoch_losses, train_epoch_accuracies = [], []
for ix, batch in enumerate(iter(trn_dl)):
x, y = batch
batch_loss = train_batch(x, y, model, optimizer, loss_fn)
train_epoch_losses.append(batch_loss)
train_epoch_loss = np.array(train_epoch_losses).mean()
for ix, batch in enumerate(iter(trn_dl)):
x, y = batch
is_correct = accuracy(x, y, model)
train_epoch_accuracies.extend(is_correct)
train_epoch_accuracy = np.mean(train_epoch_accuracies)
for ix, batch in enumerate(iter(val_dl)):
x, y = batch
val_is_correct = accuracy(x, y, model)
validation_loss = val_loss(x, y, model, loss_fn)
val_epoch_accuracy = np.mean(val_is_correct)
train_losses.append(train_epoch_loss)
train_accuracies.append(train_epoch_accuracy)
val_losses.append(validation_loss)
val_accuracies.append(val_epoch_accuracy)
epochs = np.arange(20)+1
import matplotlib.ticker as mtick
import matplotlib.ticker as mticker
plt.subplot(121)
plt.plot(epochs, train_losses, 'bo', label='Training loss')
plt.plot(epochs, val_losses, 'r', label='Validation loss')
plt.gca().xaxis.set_major_locator(mticker.MultipleLocator(1))
plt.title('Training and validation loss with SGD optimizer')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid('off')
plt.subplot(122)
plt.plot(epochs, train_accuracies, 'bo', label='Training accuracy')
plt.plot(epochs, val_accuracies, 'r', label='Validation accuracy')
plt.gca().xaxis.set_major_locator(mticker.MultipleLocator(1))
plt.title('Training and validation accuracy with SGD optimizer')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()])
plt.legend()
plt.grid('off')
plt.show()
SGD
After making these changes, the change in accuracy and loss on the training and validation datasets for the optimizer is as follows:
When the optimizer is Adam
, the accuracy and loss values on the training and validation datasets change as follows:
Typically the optimizer achieves best accuracy faster than other optimizers Adam
, some other available optimizers include Adagrad
, Adadelta
, AdamW
, LBFGS
and RMSprop
.
5. Building a Deep Neural Network
So far, the neural network architectures we have built have only one hidden layer. In this section, we compare the performance of neural network models with two hidden layers and no hidden layers.
(1) Construct a neural network model containing two hidden layers:
def get_model():
model = nn.Sequential(
nn.Linear(28 * 28, 1000),
nn.ReLU(),
nn.Linear(1000, 512),
nn.ReLU(),
nn.Linear(512, 10)
).to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=1e-3)
return model, loss_fn, optimizer
(2) Similarly, modify get_model()
the function to build a neural network without a hidden layer, and connect the input directly to the output layer:
from torch.optim import SGD, Adam
def get_model():
model = nn.Sequential(
nn.Linear(28 * 28, 10)
).to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=1e-3)
return model, loss_fn, optimizer
The accuracy and loss changes for the training and validation datasets are as follows:
From the above results it can be seen that:
- When there are no hidden layers, the model cannot learn
- The model overfits more severely when there are two hidden layers than when there is one hidden layer
A deep neural network means that there are multiple hidden layers between the input layer and the output layer. Multiple hidden layers ensure that the neural network can learn complex non-linear relationships between inputs and outputs, which cannot be done with simple neural networks (due to the limited number of hidden layers).
summary
Neural network performance optimization technology refers to the method of improving the performance and generalization ability of the neural network by improving the structure, parameter initialization, regularization and training process of the neural network. This section first trains a simple fully connected network, and then introduces simple and effective neural network performance improvement techniques on this basis. In subsequent studies, common techniques including batch normalization and dynamic learning rate will be further introduced.
series link
PyTorch Deep Learning Combat (1) - Neural Network and Model Training Process Detailed
PyTorch Deep Learning Combat (2) - PyTorch Basics
PyTorch Deep Learning Combat (3) - Using PyTorch to Build a Neural Network PyTorch Deep Learning Combat (4 )
- Commonly used activation functions and loss functions in detail