PyTorch deep learning practice (3) - use PyTorch to build neural networks

0. Preface

We have learned how to build a neural network from scratch . A neural network usually includes basic components such as input layer, hidden layer, output layer, activation function, loss function, and learning rate. In this section, we will learn how to PyTorchbuild tensor object operations and gradient value calculations to update network weights.

1. PyTorch first experience in building neural networks

1.1 Using PyTorch to build a neural network

To introduce how to PyTorchbuild , we'll try to solve the problem of adding two numbers.

(1) Initialize the data set, define the input ( x) and output ( y) values:

import torch
x = [[1,2],[3,4],[5,6],[7,8]]
y = [[3],[7],[11],[15]]

In the initialized input and output variables, the sum of the values ​​of each list in the input is the corresponding value in the output list.

(2) Convert the input list to a tensor object:

X = torch.tensor(x).float()
Y = torch.tensor(y).float()

In the above code, the tensor object is converted to a float object. Additionally, register the input ( X) and output ( Y) data points deviceinto :

device = 'cuda' if torch.cuda.is_available() else 'cpu'
X = X.to(device)
Y = Y.to(device)

(3) Define the neural network architecture.

The import torch.nnmodule is used to build the neural network model:

from torch import nn

Create a neural network architecture class MyNeuralNet, inherited from nn.Module, nn.Modulewhich is the base class for all neural network modules:

class MyNeuralNet(nn.Module):

In the class, use __init__the method to initialize all components of the neural network, calling to super().__init__()ensure class inheritance nn.Module:

    def __init__(self):
        super().__init__()

Using the above code, the initialized component will be used by different methods in the class by specifying that super().__init__()it can take advantage of all the pre-built functions written for .nn.ModuleMyNeuralNet

Define layers in a neural network:

        self.input_to_hidden_layer = nn.Linear(2,8)
        self.hidden_layer_activation = nn.ReLU()
        self.hidden_to_output_layer = nn.Linear(8,1)

In the above code, all the layers of the neural network are specified - a fully connected layer ( self.input_to_hidden_layer), using ReLUthe activation function ( self.hidden_layer_activation), and finally a fully connected layer ( self.hidden_to_output_layer).

Connect the initialized neural network components together and define the forward propagation method of the network forward:

    def forward(self, x):
        x = self.input_to_hidden_layer(x)
        x = self.hidden_layer_activation(x)
        x = self.hidden_to_output_layer(x)
        return x

must be used forwardas the function name of the forward pass, because PyTorchthis function is reserved as the method for performing the forward pass, and any other name will raise an error.

See what the function does by printing the output of nn.Linearthe method :

print(nn.Linear(2, 7))
# Linear(in_features=2, out_features=8, bias=True)

In the code above, the fully connected layer 2takes values ​​as input and outputs 7values ​​with bias parameters associated with them.

(4) Access the initial weights of each neural network component by executing the following code.

Create an instance of MyNeuralNetthe class object and register it with device:

mynet = MyNeuralNet().to(device)

The weights and biases of each layer can be accessed with code like:

print(mynet.input_to_hidden_layer.weight)

The code output is as follows:

Parameter containing:
tensor([[ 0.0984,  0.3058],
        [ 0.2913, -0.3629],
        [ 0.0630,  0.6347],
        [-0.5134, -0.2525],
        [ 0.2315,  0.3591],
        [ 0.1506,  0.1106],
        [ 0.2941, -0.0094],
        [-0.0770, -0.4165]], device='cuda:0', requires_grad=True)

The output value is not the same for each execution because the neural network is initialized with random values ​​each time. If you want to keep the same output each time you execute the same code, you need to specify a random seed using the method Torchin .manual_seedtorch.manual_seed(0)

(5) All parameters of the neural network can be obtained through the following code:

mynet.parameters()

The above code will return a generator object, and finally get the parameters through the generator loop:

for param in mynet.parameters():
    print(param)

The code output is as follows:

Parameter containing:
tensor([[ 0.2955,  0.3231],
        [ 0.5153,  0.1734],
        [-0.6359, -0.1406],
        [ 0.3820, -0.1085],
        [ 0.2816, -0.2283],
        [ 0.4633,  0.6564],
        [-0.1605, -0.4450],
        [ 0.0788, -0.0147]], device='cuda:0', requires_grad=True)
Parameter containing:
tensor([[ 0.2955,  0.3231],
        [ 0.5153,  0.1734],
        [-0.6359, -0.1406],
        [ 0.3820, -0.1085],
        [ 0.2816, -0.2283],
        [ 0.4633,  0.6564],
        [-0.1605, -0.4450],
        [ 0.0788, -0.0147]], device='cuda:0', requires_grad=True)
Parameter containing:
tensor([-0.4761,  0.6370,  0.6744, -0.4103, -0.3714,  0.1491, -0.2304,  0.5571],
       device='cuda:0', requires_grad=True)
Parameter containing:
tensor([[-0.0440,  0.0028,  0.3024,  0.1915,  0.1426, -0.2859, -0.2398, -0.2134]],
       device='cuda:0', requires_grad=True)
Parameter containing:
tensor([-0.3238], device='cuda:0', requires_grad=True)

The model has registered these tensors as special objects necessary to track forward and backward propagation. __init__When defining nna neural network layer in the method, it will automatically create the corresponding tensors and register them at the same time. You can also use nn.parameter(<tensor>)the function register manually These parameters. Therefore, the neural network class defined in this section myNeuralNetis equivalent to the following code:

class MyNeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.input_to_hidden_layer = nn.parameter(torch.rand(2,8))
        self.hidden_layer_activation = nn.ReLU()
        self.hidden_to_output_layer = nn.parameter(torch.rand(8,1))
    def forward(self, x):
        x = x @ self.input_to_hidden_layer
        x = self.hidden_layer_activation(x)
        x = x @ self.hidden_to_output_layer
        return x

(6) Define the loss function. Since the continuous output needs to be predicted, the mean square error is used as the loss function:

loss_func = nn.MSELoss()

By passing the input values ​​to a neural network object, the value of the loss function is computed for the given input:

_Y = mynet(X)
loss_value = loss_func(_Y,Y)
print(loss_value)
# tensor(127.4498, device='cuda:0', grad_fn=<MseLossBackward>)

In the above code, mynet(X)the output value is calculated by the neural network given the input, the function calculates the value loss_funccorresponding to the neural network prediction ( _Y) and actual value ( ). Note that, by convention , when calculating the loss, we always pass in the predicted result first, and then pass in the actual labeled value.YMSELossPyTorch

(7) Define an optimizer for reducing the loss value. The input of the optimizer is the parameters (weights and biases) corresponding to the neural network and the learning rate when updating the weights. In this section, we use stochastic gradient descent ( Stochastic Gradient Descent, SGD). Import the method from torch.optimthe module SGD, then pass the neural network object ( mynet) and learning rate ( lr) as arguments to SGDthe method :

from torch.optim import SGD
opt = SGD(mynet.parameters(), lr = 0.001)

(8) A epochtraining process includes the following steps:

  • Calculate the loss value corresponding to the given input and output
  • Calculate the gradient corresponding to the parameter
  • Update the weights according to the learning rate and gradient of the parameters
  • After updating the weights, make sure to refresh the gradients epochcalculated step before calculating the gradients in the next
    opt.zero_grad()
    loss_value = loss_func(mynet(X),Y)
    loss_value.backward()
    opt.step()

Use fora loop to repeat the above steps. In the following code, executes 50and epoch, in addition loss_history, stores epochthe loss value for each in the list:

loss_history = []
for _ in range(50):
    opt.zero_grad()
    loss_value = loss_func(mynet(X),Y)
    loss_value.backward()
    opt.step()
    loss_history.append(loss_value)

Plot loss epochas a function of :

import matplotlib.pyplot as plt
plt.plot(loss_history)
plt.title('Loss variation over increasing epochs')
plt.xlabel('epochs')
plt.ylabel('loss value')
plt.show()

Changes in loss value

1.2 Neural Network Data Loading

Batch size ( batch size) is an important hyperparameter in neural networks, batch size refers to the number of data samples considered when calculating loss values ​​or updating weights. Assuming there are millions of data samples in the dataset, it is not optimal to use all data points for one weight update at a time, because memory may not be able to hold so much data. Using sampling samples to adequately represent the data, the batch size can be used to obtain multiple data samples that are sufficiently representative. In this section, we specify the batch size to consider when computing the gradient of the weights to update the weights, and then compute the updated loss value.

(1) Import methods for loading data and processing datasets:

from torch.utils.data import Dataset, DataLoader
import torch
import torch.nn as nn

(2) Import the data, convert the data to floating point numbers, and register them in the corresponding device:

x = [[1,2],[3,4],[5,6],[7,8]]
y = [[3],[7],[11],[15]]

X = torch.tensor(x).float()
Y = torch.tensor(y).float()

device = 'cuda' if torch.cuda.is_available() else 'cpu'
X = X.to(device)
Y = Y.to(device)

(3) Create a dataset class MyDataset:

class MyDataset(Dataset):

In MyDatasetthe class , data information is stored so that a batch ( batch) of data points can be bundled (consumed DataLoader) and the weights are updated through a forward and back pass.

Define __init__a method that takes input and output pairs and converts them to Torchfloat objects:

    def __init__(self,x,y):
        self.x = x.clone().detach() # torch.tensor(x).float()
        self.y = y.clone().detach() # torch.tensor(y).float()

Specify the length of the input dataset ( __len__):

    def __len__(self):
        return len(self.x)

__getitem__The method is used to obtain the specified data sample:

    def __getitem__(self, ix):
        return self.x[ix], self.y[ix]

In the above code, ixit represents the data index to be obtained from the dataset.

(4) Create an instance of the custom class:

ds = MyDataset(X, Y)

(5) Obtain data points from the original input and output tensor object by DataLoaderpassing the dataset instance:batch_size

dl = DataLoader(ds, batch_size=2, shuffle=True)

dsIn the above code, it is specified to take two ( batch_size=2) random samples ( shuffle=True) of data points from the original input dataset ( ).

Loop through dlto get batch data information:

for x, y in dl:
    print(x, y)

The output is as follows:

tensor([[3., 4.],
        [5., 6.]], device='cuda:0') tensor([[ 7.],
        [11.]], device='cuda:0')
tensor([[1., 2.],
        [7., 8.]], device='cuda:0') tensor([[ 3.],
        [15.]], device='cuda:0')

You can see that the above code produces two sets 输入-输出of pairs because there 4are and the specified batch size is 2.

(6) Define the neural network class:

class MyNeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.input_to_hidden_layer = nn.Linear(2,8)
        self.hidden_layer_activation = nn.ReLU()
        self.hidden_to_output_layer = nn.Linear(8,1)
    def forward(self, x):
        x = self.input_to_hidden_layer(x)
        x = self.hidden_layer_activation(x)
        x = self.hidden_to_output_layer(x)
        return x

(7) Define the model object ( mynet), loss function ( loss_func) and optimizer ( opt):

mynet = MyNeuralNet().to(device)
loss_func = nn.MSELoss()
from torch.optim import SGD
opt = SGD(mynet.parameters(), lr = 0.001)

(8) Finally, loop through the batch of data points to minimize the loss value:

import time
loss_history = []
start = time.time()
for _ in range(50):
    for data in dl:
        x, y = data
        opt.zero_grad()
        loss_value = loss_func(mynet(x),y)
        loss_value.backward()
        opt.step()
        loss_history.append(loss_value)
end = time.time()
print(end - start)
# 0.08548569679260254

Although the above code is very similar to the code used in the previous subsection, compared with the previous subsection, the number of times epochto update becomes the original 2times, because the batch size used in this section is 2, while the previous subsection The batch size is 4(i.e. all data points are used at once).

1.3 Model testing

In the previous section, we learned how to fit a model on known data points. In this section, we will learn how to use the forward propagation method defined in mynetthe model forwardto predict data points that the model has not seen (test data).

(1) Create data points for testing the model:

val_x = [[10,11]]

The new dataset ( val_x) is the same as the input dataset, a list of list data.

(2) Convert the new data point to a tensor float object and register it devicein :

val_x = torch.tensor(val_x).float().to(device)

(3)mynet Pass the tensor object through the trained neural network ( ), the same usage as performing forward propagation through the model:

print(mynet(val_x))
# tensor([[20.0105]], device='cuda:0', grad_fn=<AddmmBackward>)

The above code returns the predicted output value of the model for the input data points.

1.4 Get the value of the middle layer

In practical applications, we may need to obtain the intermediate value of the neural network, such as style transfer and transfer learning, etc., and PyTorchtwo methods are provided to obtain the intermediate value of the neural network.

One way is to call the neural network layers directly, using them as functions:

print(mynet.hidden_layer_activation(mynet.input_to_hidden_layer(X)))

It should be noted that we must install the model input and output in order to call the corresponding neural network layer. For example, in the above input_to_hidden_layercode is hidden_layer_activationthe input of layer.

Another way is to specify the network layer you want to look at in forwardthe method . Although the following code MyNeuralNetis basically the same as the class in the previous section, forwardthe method returns not only the output, but also the activated hidden layer value ( hidden2):

class MyNeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.input_to_hidden_layer = nn.Linear(2,8)
        self.hidden_layer_activation = nn.ReLU()
        self.hidden_to_output_layer = nn.Linear(8,1)
    def forward(self, x):
        hidden1 = self.input_to_hidden_layer(x)
        hidden2 = self.hidden_layer_activation(hidden1)
        x = self.hidden_to_output_layer(hidden2)
        return x, hidden2

The hidden layer values ​​are accessed by using the following code, mynetwhere the 0th indexed output is the final output from the forward pass of the network, and 1the th indexed output is the value after the hidden layer activation:

print(mynet(X)[1])

2. Use the Sequential class to build a neural network

We have learned how to build a neural network by defining a class in which we define the layers and how they are connected. However, unless you need to build a complex network, you only need to use Sequentialthe class and specify the order of layers and layer stacking to build a neural network. This section continues to use a simple data set to train a neural network.

(1) Import the relevant library and define the device used:

import torch
import torch.nn as nn
import numpy as np
from torch.utils.data import Dataset, DataLoader
device = 'cuda' if torch.cuda.is_available() else 'cpu'

(2) Define dataset and dataset class ( MyDataset):

x = [[1,2],[3,4],[5,6],[7,8]]
y = [[3],[7],[11],[15]]

class MyDataset(Dataset):
    def __init__(self, x, y):
        self.x = torch.tensor(x).float().to(device)
        self.y = torch.tensor(y).float().to(device)
    def __getitem__(self, ix):
        return self.x[ix], self.y[ix]
    def __len__(self): 
        return len(self.x)

(3) Define the dataset ( ds) and data loader ( dl) objects:

ds = MyDataset(x, y)
dl = DataLoader(ds, batch_size=2, shuffle=True)

(4)nn Use Sequentialthe class in the module to define the model architecture:

model = nn.Sequential(
    nn.Linear(2, 8),
    nn.ReLU(),
    nn.Linear(8, 1)
).to(device)

In the above code, we define the same network architecture as in the previous section, nn.Linearaccepting a two-dimensional input and providing an eight-dimensional output for each data point, nn.ReLUperforming ReLUan activation , and finally, using nn.Linearto accept an eight-dimensional input and get a dimension output.

(5) Print the summary of the model ( summary) to view the model architecture information.

In order to view model summaries pip, torchsummarythe library needs to be installed with:

pip install torchsummary

Once installed, import the library torchsummary:

from torchsummary import summary

To print a model summary, the function accepts the model name and model input size (requires a tuple of integers) as parameters:

print(summary(model, (2,)))

The output is as follows:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Linear-1                    [-1, 8]              24
              ReLU-2                    [-1, 8]               0
            Linear-3                    [-1, 1]               9
================================================================
Total params: 33
Trainable params: 33
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------

Taking the output of the first layer as an example, its shape is , where(-1, 8) indicates that for each data point, a dimension , and an output with a shape of will be obtained.-1batch size88batch size x 8

(6) Next, define the loss function ( loss_func) and optimizer ( opt) and train the model:

loss_func = nn.MSELoss()
from torch.optim import SGD
opt = SGD(model.parameters(), lr = 0.001)
import time
loss_history = []
start = time.time()
for _ in range(50):
    for ix, iy in dl:
        opt.zero_grad()
        loss_value = loss_func(model(ix),iy)
        loss_value.backward()
        opt.step()
        loss_history.append(loss_value)
end = time.time()
print(end - start)

(7) After training the model, predict the value on the verification data set.

Define the validation dataset:

val = [[8,9],[10,11],[1.5,2.5]]

Convert the validation data to floats, then convert them to tensor objects and register them devicein , passing the validation data through the model to predict the output:

val = torch.tensor(val).float()
print(model(val.to(device)))
"""
tensor([[16.7774],
        [20.6186],
        [ 4.2415]], device='cuda:0', grad_fn=<AddmmBackward>)
"""

3. Saving and loading of PyTorch models

An important aspect of neural network model processing is saving and loading the model after training. After saving the model, we can use the already trained model for inference. We only need to load the already trained model without retraining it.

3.1 Components required for model saving

First understand the complete components required to save a neural network model:

  • A unique name (key) for each tensor (parameter)
  • The way tensors are connected in the network
  • The value of each tensor (weight/bias value)

The first component is handled during the definition __init__phase , while the second component is handled during the forward evaluation method definition. By default, the values ​​in tensors are initialized randomly during __init__the stage , but when loading a pretrained model we need to load a set of fixed weight values ​​learned while training the model and associate each value with a specific name.

3.2 Model Status

model.state_dict()Can be used to understand how saving and loading PyTorchmodels work, model.state_dict()where the dictionary ( OrderedDict) corresponds to the model's parameter names (keys) and their values ​​(weights and bias values), statereferring to the current snapshot of the model, and in the returned output, The keys are the names of the layers of the model network, and the values ​​correspond to the weights of those layers:

print(model.state_dict())
"""
OrderedDict([('0.weight', tensor([[-0.4732,  0.1934],
        [ 0.1475, -0.2335],
        [-0.2586,  0.0823],
        [-0.2979, -0.5979],
        [ 0.2605,  0.2293],
        [ 0.0566,  0.6848],
        [-0.1116, -0.3301],
        [ 0.0324,  0.2609]], device='cuda:0')), ('0.bias', tensor([ 0.6835,  0.2860,  0.1953, -0.2162,  0.5106,  0.3625,  0.1360,  0.2495],
       device='cuda:0')), ('2.weight', tensor([[ 0.0475,  0.0664, -0.0167, -0.1608, -0.2412, -0.3332, -0.1607, -0.1857]],
       device='cuda:0')), ('2.bias', tensor([0.2595], device='cuda:0'))])
"""

3.3 Model saving

A model can be saved on disk in a serialized format torch.save(model.state_dict(), 'mymodel.pth')using a , where denotes the filename. It is best to transfer the model into before calling , saving tensors as tensors is useful for loading models on arbitrary machines:Pythonmymodel.pthtorch.saveCPUCPU

save_path = 'mymodel.pth'
torch.save(model.state_dict(), save_path)

3.4 Model loading

Loading a model first requires initializing the model and then loading the weights from state_dictit :

(1) Create an empty model using the same code as for training:

model = nn.Sequential(
    nn.Linear(2, 8),
    nn.ReLU(),
    nn.Linear(8, 1)
).to(device)

(2) Load the model from disk and deserialize to create a OrderedDictvalue :

state_dict = torch.load('mymodel.pth')

(3) Load state_dictinto the model, and register it devicein to perform prediction tasks:

model.load_state_dict(state_dict)
model.to(device)

val = [[8,9],[10,11],[1.5,2.5]]
val = torch.tensor(val).float()
model(val.to(device))

summary

In this section, PyTorchwe build a neural network on a simple dataset, train the neural network to map the input and output, and update the weight values ​​to minimize the loss by performing backpropagation, and Sequentialsimplify the network construction process using the class ; Introduces common methods to obtain the intermediate values ​​of the network, and how to use save, loadmethod to save and load the model to avoid retraining the model.

series link

PyTorch Deep Learning Combat (1) - Neural Network and Model Training Process Detailed
PyTorch Deep Learning Combat (2) - PyTorch Basics

Guess you like

Origin blog.csdn.net/LOVEmy134611/article/details/130875404