PyTorch deep learning practice (3) - use PyTorch to build neural networks
0. Preface
We have learned how to build a neural network from scratch . A neural network usually includes basic components such as input layer, hidden layer, output layer, activation function, loss function, and learning rate. In this section, we will learn how to PyTorch
build tensor object operations and gradient value calculations to update network weights.
1. PyTorch first experience in building neural networks
1.1 Using PyTorch to build a neural network
To introduce how to PyTorch
build , we'll try to solve the problem of adding two numbers.
(1) Initialize the data set, define the input ( x
) and output ( y
) values:
import torch
x = [[1,2],[3,4],[5,6],[7,8]]
y = [[3],[7],[11],[15]]
In the initialized input and output variables, the sum of the values of each list in the input is the corresponding value in the output list.
(2) Convert the input list to a tensor object:
X = torch.tensor(x).float()
Y = torch.tensor(y).float()
In the above code, the tensor object is converted to a float object. Additionally, register the input ( X
) and output ( Y
) data points device
into :
device = 'cuda' if torch.cuda.is_available() else 'cpu'
X = X.to(device)
Y = Y.to(device)
(3) Define the neural network architecture.
The import torch.nn
module is used to build the neural network model:
from torch import nn
Create a neural network architecture class MyNeuralNet
, inherited from nn.Module
, nn.Module
which is the base class for all neural network modules:
class MyNeuralNet(nn.Module):
In the class, use __init__
the method to initialize all components of the neural network, calling to super().__init__()
ensure class inheritance nn.Module
:
def __init__(self):
super().__init__()
Using the above code, the initialized component will be used by different methods in the class by specifying that super().__init__()
it can take advantage of all the pre-built functions written for .nn.Module
MyNeuralNet
Define layers in a neural network:
self.input_to_hidden_layer = nn.Linear(2,8)
self.hidden_layer_activation = nn.ReLU()
self.hidden_to_output_layer = nn.Linear(8,1)
In the above code, all the layers of the neural network are specified - a fully connected layer ( self.input_to_hidden_layer
), using ReLU
the activation function ( self.hidden_layer_activation
), and finally a fully connected layer ( self.hidden_to_output_layer
).
Connect the initialized neural network components together and define the forward propagation method of the network forward
:
def forward(self, x):
x = self.input_to_hidden_layer(x)
x = self.hidden_layer_activation(x)
x = self.hidden_to_output_layer(x)
return x
must be used forward
as the function name of the forward pass, because PyTorch
this function is reserved as the method for performing the forward pass, and any other name will raise an error.
See what the function does by printing the output of nn.Linear
the method :
print(nn.Linear(2, 7))
# Linear(in_features=2, out_features=8, bias=True)
In the code above, the fully connected layer 2
takes values as input and outputs 7
values with bias parameters associated with them.
(4) Access the initial weights of each neural network component by executing the following code.
Create an instance of MyNeuralNet
the class object and register it with device
:
mynet = MyNeuralNet().to(device)
The weights and biases of each layer can be accessed with code like:
print(mynet.input_to_hidden_layer.weight)
The code output is as follows:
Parameter containing:
tensor([[ 0.0984, 0.3058],
[ 0.2913, -0.3629],
[ 0.0630, 0.6347],
[-0.5134, -0.2525],
[ 0.2315, 0.3591],
[ 0.1506, 0.1106],
[ 0.2941, -0.0094],
[-0.0770, -0.4165]], device='cuda:0', requires_grad=True)
The output value is not the same for each execution because the neural network is initialized with random values each time. If you want to keep the same output each time you execute the same code, you need to specify a random seed using the method Torch
in .manual_seed
torch.manual_seed(0)
(5) All parameters of the neural network can be obtained through the following code:
mynet.parameters()
The above code will return a generator object, and finally get the parameters through the generator loop:
for param in mynet.parameters():
print(param)
The code output is as follows:
Parameter containing:
tensor([[ 0.2955, 0.3231],
[ 0.5153, 0.1734],
[-0.6359, -0.1406],
[ 0.3820, -0.1085],
[ 0.2816, -0.2283],
[ 0.4633, 0.6564],
[-0.1605, -0.4450],
[ 0.0788, -0.0147]], device='cuda:0', requires_grad=True)
Parameter containing:
tensor([[ 0.2955, 0.3231],
[ 0.5153, 0.1734],
[-0.6359, -0.1406],
[ 0.3820, -0.1085],
[ 0.2816, -0.2283],
[ 0.4633, 0.6564],
[-0.1605, -0.4450],
[ 0.0788, -0.0147]], device='cuda:0', requires_grad=True)
Parameter containing:
tensor([-0.4761, 0.6370, 0.6744, -0.4103, -0.3714, 0.1491, -0.2304, 0.5571],
device='cuda:0', requires_grad=True)
Parameter containing:
tensor([[-0.0440, 0.0028, 0.3024, 0.1915, 0.1426, -0.2859, -0.2398, -0.2134]],
device='cuda:0', requires_grad=True)
Parameter containing:
tensor([-0.3238], device='cuda:0', requires_grad=True)
The model has registered these tensors as special objects necessary to track forward and backward propagation. __init__
When defining nn
a neural network layer in the method, it will automatically create the corresponding tensors and register them at the same time. You can also use nn.parameter(<tensor>)
the function register manually These parameters. Therefore, the neural network class defined in this section myNeuralNet
is equivalent to the following code:
class MyNeuralNet(nn.Module):
def __init__(self):
super().__init__()
self.input_to_hidden_layer = nn.parameter(torch.rand(2,8))
self.hidden_layer_activation = nn.ReLU()
self.hidden_to_output_layer = nn.parameter(torch.rand(8,1))
def forward(self, x):
x = x @ self.input_to_hidden_layer
x = self.hidden_layer_activation(x)
x = x @ self.hidden_to_output_layer
return x
(6) Define the loss function. Since the continuous output needs to be predicted, the mean square error is used as the loss function:
loss_func = nn.MSELoss()
By passing the input values to a neural network object, the value of the loss function is computed for the given input:
_Y = mynet(X)
loss_value = loss_func(_Y,Y)
print(loss_value)
# tensor(127.4498, device='cuda:0', grad_fn=<MseLossBackward>)
In the above code, mynet(X)
the output value is calculated by the neural network given the input, the function calculates the value loss_func
corresponding to the neural network prediction ( _Y
) and actual value ( ). Note that, by convention , when calculating the loss, we always pass in the predicted result first, and then pass in the actual labeled value.Y
MSELoss
PyTorch
(7) Define an optimizer for reducing the loss value. The input of the optimizer is the parameters (weights and biases) corresponding to the neural network and the learning rate when updating the weights. In this section, we use stochastic gradient descent ( Stochastic Gradient Descent
, SGD
). Import the method from torch.optim
the module SGD
, then pass the neural network object ( mynet
) and learning rate ( lr
) as arguments to SGD
the method :
from torch.optim import SGD
opt = SGD(mynet.parameters(), lr = 0.001)
(8) A epoch
training process includes the following steps:
- Calculate the loss value corresponding to the given input and output
- Calculate the gradient corresponding to the parameter
- Update the weights according to the learning rate and gradient of the parameters
- After updating the weights, make sure to refresh the gradients
epoch
calculated step before calculating the gradients in the next
opt.zero_grad()
loss_value = loss_func(mynet(X),Y)
loss_value.backward()
opt.step()
Use for
a loop to repeat the above steps. In the following code, executes 50
and epoch
, in addition loss_history
, stores epoch
the loss value for each in the list:
loss_history = []
for _ in range(50):
opt.zero_grad()
loss_value = loss_func(mynet(X),Y)
loss_value.backward()
opt.step()
loss_history.append(loss_value)
Plot loss epoch
as a function of :
import matplotlib.pyplot as plt
plt.plot(loss_history)
plt.title('Loss variation over increasing epochs')
plt.xlabel('epochs')
plt.ylabel('loss value')
plt.show()
1.2 Neural Network Data Loading
Batch size ( batch size
) is an important hyperparameter in neural networks, batch size refers to the number of data samples considered when calculating loss values or updating weights. Assuming there are millions of data samples in the dataset, it is not optimal to use all data points for one weight update at a time, because memory may not be able to hold so much data. Using sampling samples to adequately represent the data, the batch size can be used to obtain multiple data samples that are sufficiently representative. In this section, we specify the batch size to consider when computing the gradient of the weights to update the weights, and then compute the updated loss value.
(1) Import methods for loading data and processing datasets:
from torch.utils.data import Dataset, DataLoader
import torch
import torch.nn as nn
(2) Import the data, convert the data to floating point numbers, and register them in the corresponding device:
x = [[1,2],[3,4],[5,6],[7,8]]
y = [[3],[7],[11],[15]]
X = torch.tensor(x).float()
Y = torch.tensor(y).float()
device = 'cuda' if torch.cuda.is_available() else 'cpu'
X = X.to(device)
Y = Y.to(device)
(3) Create a dataset class MyDataset
:
class MyDataset(Dataset):
In MyDataset
the class , data information is stored so that a batch ( batch
) of data points can be bundled (consumed DataLoader
) and the weights are updated through a forward and back pass.
Define __init__
a method that takes input and output pairs and converts them to Torch
float objects:
def __init__(self,x,y):
self.x = x.clone().detach() # torch.tensor(x).float()
self.y = y.clone().detach() # torch.tensor(y).float()
Specify the length of the input dataset ( __len__
):
def __len__(self):
return len(self.x)
__getitem__
The method is used to obtain the specified data sample:
def __getitem__(self, ix):
return self.x[ix], self.y[ix]
In the above code, ix
it represents the data index to be obtained from the dataset.
(4) Create an instance of the custom class:
ds = MyDataset(X, Y)
(5) Obtain data points from the original input and output tensor object by DataLoader
passing the dataset instance:batch_size
dl = DataLoader(ds, batch_size=2, shuffle=True)
ds
In the above code, it is specified to take two ( batch_size=2
) random samples ( shuffle=True
) of data points from the original input dataset ( ).
Loop through dl
to get batch data information:
for x, y in dl:
print(x, y)
The output is as follows:
tensor([[3., 4.],
[5., 6.]], device='cuda:0') tensor([[ 7.],
[11.]], device='cuda:0')
tensor([[1., 2.],
[7., 8.]], device='cuda:0') tensor([[ 3.],
[15.]], device='cuda:0')
You can see that the above code produces two sets 输入-输出
of pairs because there 4
are and the specified batch size is 2
.
(6) Define the neural network class:
class MyNeuralNet(nn.Module):
def __init__(self):
super().__init__()
self.input_to_hidden_layer = nn.Linear(2,8)
self.hidden_layer_activation = nn.ReLU()
self.hidden_to_output_layer = nn.Linear(8,1)
def forward(self, x):
x = self.input_to_hidden_layer(x)
x = self.hidden_layer_activation(x)
x = self.hidden_to_output_layer(x)
return x
(7) Define the model object ( mynet
), loss function ( loss_func
) and optimizer ( opt
):
mynet = MyNeuralNet().to(device)
loss_func = nn.MSELoss()
from torch.optim import SGD
opt = SGD(mynet.parameters(), lr = 0.001)
(8) Finally, loop through the batch of data points to minimize the loss value:
import time
loss_history = []
start = time.time()
for _ in range(50):
for data in dl:
x, y = data
opt.zero_grad()
loss_value = loss_func(mynet(x),y)
loss_value.backward()
opt.step()
loss_history.append(loss_value)
end = time.time()
print(end - start)
# 0.08548569679260254
Although the above code is very similar to the code used in the previous subsection, compared with the previous subsection, the number of times epoch
to update becomes the original 2
times, because the batch size used in this section is 2
, while the previous subsection The batch size is 4
(i.e. all data points are used at once).
1.3 Model testing
In the previous section, we learned how to fit a model on known data points. In this section, we will learn how to use the forward propagation method defined in mynet
the model forward
to predict data points that the model has not seen (test data).
(1) Create data points for testing the model:
val_x = [[10,11]]
The new dataset ( val_x
) is the same as the input dataset, a list of list data.
(2) Convert the new data point to a tensor float object and register it device
in :
val_x = torch.tensor(val_x).float().to(device)
(3)mynet
Pass the tensor object through the trained neural network ( ), the same usage as performing forward propagation through the model:
print(mynet(val_x))
# tensor([[20.0105]], device='cuda:0', grad_fn=<AddmmBackward>)
The above code returns the predicted output value of the model for the input data points.
1.4 Get the value of the middle layer
In practical applications, we may need to obtain the intermediate value of the neural network, such as style transfer and transfer learning, etc., and PyTorch
two methods are provided to obtain the intermediate value of the neural network.
One way is to call the neural network layers directly, using them as functions:
print(mynet.hidden_layer_activation(mynet.input_to_hidden_layer(X)))
It should be noted that we must install the model input and output in order to call the corresponding neural network layer. For example, in the above input_to_hidden_layer
code is hidden_layer_activation
the input of layer.
Another way is to specify the network layer you want to look at in forward
the method . Although the following code MyNeuralNet
is basically the same as the class in the previous section, forward
the method returns not only the output, but also the activated hidden layer value ( hidden2
):
class MyNeuralNet(nn.Module):
def __init__(self):
super().__init__()
self.input_to_hidden_layer = nn.Linear(2,8)
self.hidden_layer_activation = nn.ReLU()
self.hidden_to_output_layer = nn.Linear(8,1)
def forward(self, x):
hidden1 = self.input_to_hidden_layer(x)
hidden2 = self.hidden_layer_activation(hidden1)
x = self.hidden_to_output_layer(hidden2)
return x, hidden2
The hidden layer values are accessed by using the following code, mynet
where the 0
th indexed output is the final output from the forward pass of the network, and 1
the th indexed output is the value after the hidden layer activation:
print(mynet(X)[1])
2. Use the Sequential class to build a neural network
We have learned how to build a neural network by defining a class in which we define the layers and how they are connected. However, unless you need to build a complex network, you only need to use Sequential
the class and specify the order of layers and layer stacking to build a neural network. This section continues to use a simple data set to train a neural network.
(1) Import the relevant library and define the device used:
import torch
import torch.nn as nn
import numpy as np
from torch.utils.data import Dataset, DataLoader
device = 'cuda' if torch.cuda.is_available() else 'cpu'
(2) Define dataset and dataset class ( MyDataset
):
x = [[1,2],[3,4],[5,6],[7,8]]
y = [[3],[7],[11],[15]]
class MyDataset(Dataset):
def __init__(self, x, y):
self.x = torch.tensor(x).float().to(device)
self.y = torch.tensor(y).float().to(device)
def __getitem__(self, ix):
return self.x[ix], self.y[ix]
def __len__(self):
return len(self.x)
(3) Define the dataset ( ds
) and data loader ( dl
) objects:
ds = MyDataset(x, y)
dl = DataLoader(ds, batch_size=2, shuffle=True)
(4)nn
Use Sequential
the class in the module to define the model architecture:
model = nn.Sequential(
nn.Linear(2, 8),
nn.ReLU(),
nn.Linear(8, 1)
).to(device)
In the above code, we define the same network architecture as in the previous section, nn.Linear
accepting a two-dimensional input and providing an eight-dimensional output for each data point, nn.ReLU
performing ReLU
an activation , and finally, using nn.Linear
to accept an eight-dimensional input and get a dimension output.
(5) Print the summary of the model ( summary
) to view the model architecture information.
In order to view model summaries pip
, torchsummary
the library needs to be installed with:
pip install torchsummary
Once installed, import the library torchsummary
:
from torchsummary import summary
To print a model summary, the function accepts the model name and model input size (requires a tuple of integers) as parameters:
print(summary(model, (2,)))
The output is as follows:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Linear-1 [-1, 8] 24
ReLU-2 [-1, 8] 0
Linear-3 [-1, 1] 9
================================================================
Total params: 33
Trainable params: 33
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------
Taking the output of the first layer as an example, its shape is , where(-1, 8)
indicates that for each data point, a dimension , and an output with a shape of will be obtained.-1
batch size
8
8
batch size x 8
(6) Next, define the loss function ( loss_func
) and optimizer ( opt
) and train the model:
loss_func = nn.MSELoss()
from torch.optim import SGD
opt = SGD(model.parameters(), lr = 0.001)
import time
loss_history = []
start = time.time()
for _ in range(50):
for ix, iy in dl:
opt.zero_grad()
loss_value = loss_func(model(ix),iy)
loss_value.backward()
opt.step()
loss_history.append(loss_value)
end = time.time()
print(end - start)
(7) After training the model, predict the value on the verification data set.
Define the validation dataset:
val = [[8,9],[10,11],[1.5,2.5]]
Convert the validation data to floats, then convert them to tensor objects and register them device
in , passing the validation data through the model to predict the output:
val = torch.tensor(val).float()
print(model(val.to(device)))
"""
tensor([[16.7774],
[20.6186],
[ 4.2415]], device='cuda:0', grad_fn=<AddmmBackward>)
"""
3. Saving and loading of PyTorch models
An important aspect of neural network model processing is saving and loading the model after training. After saving the model, we can use the already trained model for inference. We only need to load the already trained model without retraining it.
3.1 Components required for model saving
First understand the complete components required to save a neural network model:
- A unique name (key) for each tensor (parameter)
- The way tensors are connected in the network
- The value of each tensor (weight/bias value)
The first component is handled during the definition __init__
phase , while the second component is handled during the forward evaluation method definition. By default, the values in tensors are initialized randomly during __init__
the stage , but when loading a pretrained model we need to load a set of fixed weight values learned while training the model and associate each value with a specific name.
3.2 Model Status
model.state_dict()
Can be used to understand how saving and loading PyTorch
models work, model.state_dict()
where the dictionary ( OrderedDict
) corresponds to the model's parameter names (keys) and their values (weights and bias values), state
referring to the current snapshot of the model, and in the returned output, The keys are the names of the layers of the model network, and the values correspond to the weights of those layers:
print(model.state_dict())
"""
OrderedDict([('0.weight', tensor([[-0.4732, 0.1934],
[ 0.1475, -0.2335],
[-0.2586, 0.0823],
[-0.2979, -0.5979],
[ 0.2605, 0.2293],
[ 0.0566, 0.6848],
[-0.1116, -0.3301],
[ 0.0324, 0.2609]], device='cuda:0')), ('0.bias', tensor([ 0.6835, 0.2860, 0.1953, -0.2162, 0.5106, 0.3625, 0.1360, 0.2495],
device='cuda:0')), ('2.weight', tensor([[ 0.0475, 0.0664, -0.0167, -0.1608, -0.2412, -0.3332, -0.1607, -0.1857]],
device='cuda:0')), ('2.bias', tensor([0.2595], device='cuda:0'))])
"""
3.3 Model saving
A model can be saved on disk in a serialized format torch.save(model.state_dict(), 'mymodel.pth')
using a , where denotes the filename. It is best to transfer the model into before calling , saving tensors as tensors is useful for loading models on arbitrary machines:Python
mymodel.pth
torch.save
CPU
CPU
save_path = 'mymodel.pth'
torch.save(model.state_dict(), save_path)
3.4 Model loading
Loading a model first requires initializing the model and then loading the weights from state_dict
it :
(1) Create an empty model using the same code as for training:
model = nn.Sequential(
nn.Linear(2, 8),
nn.ReLU(),
nn.Linear(8, 1)
).to(device)
(2) Load the model from disk and deserialize to create a OrderedDict
value :
state_dict = torch.load('mymodel.pth')
(3) Load state_dict
into the model, and register it device
in to perform prediction tasks:
model.load_state_dict(state_dict)
model.to(device)
val = [[8,9],[10,11],[1.5,2.5]]
val = torch.tensor(val).float()
model(val.to(device))
summary
In this section, PyTorch
we build a neural network on a simple dataset, train the neural network to map the input and output, and update the weight values to minimize the loss by performing backpropagation, and Sequential
simplify the network construction process using the class ; Introduces common methods to obtain the intermediate values of the network, and how to use save
, load
method to save and load the model to avoid retraining the model.
series link
PyTorch Deep Learning Combat (1) - Neural Network and Model Training Process Detailed
PyTorch Deep Learning Combat (2) - PyTorch Basics