[Deep Learning] pytorch—Quick Start

The notes are study notes compiled by myself. If there are any mistakes, please point them out~

Introduction

PyTorch is an open source machine learning framework that provides a rich set of tools and libraries for building and training deep learning models. Here is some basic information about PyTorch:

张量(Tensor)操作: The core object in PyTorch is the tensor, which is a multidimensional array. PyTorch provides a wide range of tensor manipulation functions that can perform a variety of mathematical operations and linear algebra operations.

自动微分: PyTorch uses dynamic graph calculations to allow users to easily define and calculate gradients. By setting requires_grad=True on a tensor, gradients can be tracked and automatically calculated. This makes it simple to implement the backpropagation algorithm, which is important for building and training deep learning models.

模型构建: PyTorch provides a flexible way to build models. You can define your own model by inheriting the torch.nn.Module class and specify the forward propagation operation in the forward() method. Such a model can be composed of multiple layers, and each layer can be a fully connected layer, a convolutional layer, a recurrent neural network, etc.

损失函数: PyTorch provides a variety of common loss functions for evaluating the difference between the model output and the target. For example, the mean square error loss function is used for regression problems and the cross-entropy loss function is used for classification problems.

优化器: PyTorch provides the implementation of a variety of optimization algorithms, such as stochastic gradient descent (SGD), Adam, RMSprop, etc. These optimizers can be used to update model parameters to minimize a defined loss function.

数据加载和预处理: PyTorch provides tools for loading and processing data. You can use the torchvision library to load common image datasets and apply various data augmentation techniques to augment the training dataset.

分布式训练: PyTorch supports distributed training and can run in parallel on multiple GPUs or multiple machines. This helps speed up the training process and handle larger datasets and models.

Overall, PyTorch provides a flexible and easy-to-use deep learning framework that makes building, training, and deploying deep learning models simple and efficient. It is widely used in academia and industry, and is loved and supported by a large number of developers.

import torch

print("pytorch版本:", torch.__version__)

The output is:

pytorch版本: 2.1.0+cpu

Tensor operations

Data type: In PyTorch, Tensor can be of various data types, such as float, double, int, long, etc. These types correspond to different precisions and ranges.

Tensor operations: PyTorch provides a large number of tensor operation functions that can perform various mathematical operations and linear algebra operations. These functions include addition, subtraction, multiplication, division, matrix multiplication, vector dot product, transpose, inverse matrix, and more.

Automatic derivation: Tensor can enable automatic derivation by setting requires_grad=True. This allows the user to easily define and calculate gradients for use in the backpropagation algorithm.

Tensor shape: The shape of a Tensor refers to the size of its various dimensions. The shape of a Tensor can be obtained through the shape attribute. Shape information is very important for building and training deep learning models.

Broadcast mechanism: When performing tensor operations, if the shapes of the two tensors are not exactly the same, PyTorch will automatically implement the broadcast mechanism. This means that smaller tensors will be automatically expanded to match the shape of larger tensors.

Indexing and slicing: Elements in a Tensor can be accessed using integer indexing and slicing operations. In addition, you can use features such as Boolean indexing and advanced indexing to perform more complex Tensor operations.

GPU acceleration: On GPUs that support CUDA, PyTorch can use the GPU to accelerate Tensor operations to achieve faster computing speeds.

Create tensor

Use the torch.zeros() or torch.ones() function to create an all-zero or all-tensor:

zeros_tensor = torch.zeros(5, 3)
ones_tensor = torch.ones(5, 3)

Use the torch.randn() function to create a random number tensor conforming to the standard normal distribution:

random_tensor = torch.randn(5, 3)

Create a tensor using torch.Tensor():

import torch

# 从 Python 标量创建张量
scalar_tensor = torch.tensor(3.14)
print(scalar_tensor)   # 输出 tensor(3.1400)

# 从 Python 列表创建张量
list_tensor = torch.tensor([1, 2, 3])
print(list_tensor)   # 输出 tensor([1, 2, 3])

# 从 NumPy 数组创建张量
numpy_array = np.array([4, 5, 6])
numpy_tensor = torch.tensor(numpy_array)
print(numpy_tensor)   # 输出 tensor([4, 5, 6], dtype=torch.int32)

# 创建一个指定大小的空张量
empty_tensor = torch.tensor([])
print(empty_tensor)   # 输出 tensor([])

# 从其他张量创建张量
x = torch.tensor([1, 2, 3])
y = torch.tensor(x)
print(y)   # 输出 tensor([1, 2, 3])
# x 和 y是两个完全独立的张量对象,它们在内存中占据不同的位置。
x[0] = 100
print(x)   # tensor([100,   2,   3])
print(y)   # tensor([1, 2, 3])

Note that the above three methods can transfer the tensor to the GPU device by adding the .to(device) method (if your system supports GPU acceleration).

vector copy

torch.tensor(), tensor.clone(), tensor.detach() perform data copy, and the new tensor and the original data no longer share memory.

.clone() and .detach() are both used to create new tensors in PyTorch, but they have some differences in functionality and usage:

1..clone() Method:

Function: The .clone() method creates a new tensor with the same values, shape, and gradient information as the original tensor.

Usage: Usually used when gradient information needs to be preserved in the calculation graph, such as backpropagation and optimization processes.

Sample code:

import torch

x = torch.tensor([1, 2, 3], requires_grad=True)
y = x.clone()

print(x)  # 输出: tensor([1, 2, 3], requires_grad=True)
print(y)  # 输出: tensor([1, 2, 3], grad_fn=<CloneBackward>)

In this example, y is created from the original tensor via the .clone() method, which retains gradient information. xx

  1. .detach()method:

Function: The .detach() method creates a new tensor with the same values ​​and shape as the original tensor, but does not retain any gradient information.

For tensors of type integer, the requires_grad parameter cannot be used. If you need automatic differentiation on a tensor of type integer, convert it to a tensor of type floating point first.

Usage: Typically used when the gradient information of a tensor needs to be separated from the computational graph so that calculations can be performed without the need for gradients.

Sample code:

import torch

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x.detach()

print(x)  # 输出: tensor([1., 2., 3.], requires_grad=True)
print(y)  # 输出: tensor([1., 2., 3.])

In this example, y is created from the original tensor via the .detach() method, which does not retain gradient information. xx

To summarize the differences:

  • .clone()Creates a new tensor, preserving the gradient information of the original tensor.
  • .detach() creates a new tensor without retaining the gradient information of the original tensor.
    So if you want to share memory, it is recommended to use torch.from_numpy() or tensor.detach() to create a new tensor, both of which share memory.

tensor dimensions

In PyTorch, the size() method returns a tuple representing the size of each dimension of the tensor. torch.size is a subclass of tuple objects, so it supports all operations on tuples.
Both x.size()[0] and x.size(0) are used to get the size of the first dimension of the tensor, which is the number of rows.
Both x.size()[1] and x.size(1) are used to obtain the size of the second dimension of the tensor, which is the number of columns.

x = t.rand(5, 3)  
print(x.size()) # 查看x的形状
print(x.size()[0]) # 查看行的个数
print(x.size(0)) # 查看行的个数
print(x.size()[1]) # 查看列的个数
print(x.size(1)) # 查看列的个数

The output is:

torch.Size([5, 3])
5
5
3
3

tensor addition

In PyTorch, you can use the torch.add() function or the tensor's add() method to perform tensor addition. The functions of the two are similar, but there are differences in how they are used.

  1. use torch.add() function

torch.add()The syntax of the function is as follows:

torch.add(input, other, alpha=1, out=None)

Among them, input represents the tensor to be added, other represents the tensor to be added to input Quantity, alpha represents the scaling factor, out represents the output tensor (default is None).

The sample code is as follows:

# 创建两个张量
tensor1 = torch.tensor([[1, 2], [3, 4]])
tensor2 = torch.tensor([[5, 6], [7, 8]])

# 使用 torch.add() 进行张量加法
result = torch.add(tensor1, tensor2)
print(result)

The output is:

tensor([[ 6,  8],
        [10, 12]])

In this example, two 2x2 tensors tensor1 and tensor2 are created, and then the torch.add() function is used They are added and the result is stored in the result tensor.

  1. Using tensor's add() method

The syntax of the tensor's add() method is as follows:

add(other, alpha=1, out=None) -> Tensor

Among them, other represents the tensor to be added to the current tensor, alpha represents the scaling coefficient, out Represents the output tensor (defaults to None).

The sample code is as follows:

import torch

# 创建两个张量
tensor1 = torch.tensor([[1, 2], [3, 4]])
tensor2 = torch.tensor([[5, 6], [7, 8]])

# 使用张量的 add() 方法进行张量加法
result = tensor1.add(tensor2)
print(result)

The result of running is the same as using the torch.add() function:

tensor([[ 6,  8],
        [10, 12]])

In this example, two 2x2 tensors are first created tensor1 and tensor2, and then used tensor1.add(tensor2) An addition operation is performed and the result is stored in the result tensor.

It should be noted that add method does not modify the current tensor, but returns a new tensor. If you need to modify the current tensor in place, you can use the add_ method. For example:

import torch

# 创建一个张量
tensor = torch.tensor([[1, 2], [3, 4]])

# 就地修改
tensor.add_(2)

print(tensor)

The output is:

tensor([[3, 4],
        [5, 6]])

Function with an underscore after the function name _

In PyTorch, functions with an underscore after the function name_ usually represent in-place operations. These functions directly modify the tensor itself without A new tensor will be created.

In-place operations can save memory space, which is especially useful when processing large-scale data. However, it is important to note that these in-place operations will change the original tensor, so care needs to be taken when using them to ensure that the original data is not accidentally modified.

Here are some examples of common in-place operations:

import torch

# 创建一个张量
tensor = torch.tensor([1, 2, 3])

# 就地加法操作
tensor.add_(1)
print(tensor)  # 输出 tensor([2, 3, 4])

# 就地乘法操作
tensor.mul_(2)
print(tensor)  # 输出 tensor([4, 6, 8])

# 就地取负操作
tensor.neg_()
print(tensor)  # 输出 tensor([-4, -6, -8])

It should be noted that the naming convention for in-place operations is function names ending with an underscore, such as add_(), mul_() and < a i=3>. After performing the operation, these functions directly modify the original tensor without returning a new tensor. neg_()

When using in-place operations, please note the following:

  • The value of the original tensor will be modified, which may have an impact on subsequent calculations.
  • The in-place operation does not create a new tensor, so assignment operations (such as tensor = tensor.add_(1)) cannot be used, otherwise the variable reference will change.
  • In-place operations can only be used on tensor types that can be modified in-place, such as floating-point tensors and integer tensors. In-place operations are not allowed for immutable tensor types (such as torch.Tensor) or automatically differentiated tensors (such as torch.autograd.Variable).

Indexing and slicing

In PyTorch, you can use indexing and slicing operations to select specific elements, subsets, or slices of a dimension in a tensor. These operations allow you to obtain part of the tensor data for further processing or analysis as needed.

Here are some examples of common tensor selection operations:

  1. Select a single element using index
import torch

# 创建一个张量
tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])

# 使用索引选取单个元素
element = tensor[0, 1]
print(element)  # 输出 tensor(2)

In this example, we create a 2x3 tensortensor, and then use the index0, 1 to select the first row and second column element, which is 2.

  1. Slicing operation selects a subset
import torch

# 创建一个张量
tensor = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# 使用切片操作选择子集
# 选取张量 tensor 中第 2 行到第 3 行(不包括第 3 行),以及第 1 列到第 2 列(不包括第 2 列)的所有元素。
subset = tensor[1:3, 0:2]
print(subset)

The output is:

tensor([[4, 5],
        [7, 8]])

In this example, we create a 3x3 tensortensor, and then use the slicing operation to select the first two columns of the second and third rows, that is, the subset /span>[[4, 5], [7, 8]].

  1. Use boolean index to select elements that satisfy a condition
import torch

# 创建一个张量
tensor = torch.tensor([1, 2, 3, 4, 5, 6])

# 使用布尔索引选择满足条件的元素
selected_elements = tensor[tensor > 3]
print(selected_elements)

The output is:

tensor([4, 5, 6])

In this example, we create a tensor containing integerstensor and then use boolean indexingtensor > 3 to select elements greater than 3, i.e. Elements that satisfy the condition[4, 5, 6].

In addition to the above examples, you can also use other advanced selection operations, such as using the torch.where() function, using the torch.masked_select() function, etc. These operations enable more complex data selection and filtering based on specific needs.

Conversion between Tensor and Numpy arrays

.numpy()The method converts a tensor into a NumPy array and performs some operations supported by NumPy.
torch.from_numpy() method converts a NumPy array back to a PyTorch tensor.

Here is an example:

import torch
import numpy as np

# 创建一个PyTorch张量
tensor = torch.tensor([1, 2, 3, 4, 5])

# 将PyTorch张量转换为NumPy数组
array = tensor.numpy()
print(array)  # 输出 [1, 2, 3, 4, 5]

# 使用NumPy数组进行操作
array = np.square(array)
print(array)  # 输出 [1, 4, 9, 16, 25]

# 将NumPy数组转换回PyTorch张量
tensor = torch.from_numpy(array)
print(tensor)  # 输出 tensor([1, 4, 9, 16, 25])
  • Created a PyTorch tensortensor
  • Convert it to a NumPy array using the .numpy() method array.
  • did some operations on the NumPy array (calculated the square of each element) and stored the result back to array.
  • Convert NumPy array back to PyTorch tensor using torch.from_numpy() method. arraytensor

It is important to note that processing by converting tensors to NumPy arrays may incur some performance penalties, since NumPy arrays and PyTorch tensors use different underlying implementations. Therefore, performance and functionality needs need to be weighed before and after conversion.

Tensor and numpy objects share memory

# 改变NumPy数组的值
array[0] = 100
print(array)  # 输出 [100   4   9  16  25]

# 查看PyTorch张量的值
print(tensor)  # 输出 tensor([100,   4,   9,  16,  25])

If you need to avoid problems caused by shared memory, you can use the .clone() method to create a copy to ensure that there is no shared memory between the two objects.

Tensor and scalar

In PyTorch, a scalar is a zero-dimensional tensor, that is, a tensor that does not contain any axes. You can directly use Python's scalar data types, such as integers or floating point numbers
Here are several examples of creating scalars:

import torch

# 创建一个零维张量
scalar_tensor = torch.tensor(5)
print(scalar_tensor)  # 输出 tensor(5)
print(scalar_tensor.shape)  # 输出 torch.Size([])

# 创建一个一维张量
vector_tensor = torch.tensor([5])
print(vector_tensor)  # 输出 tensor([5])
print(vector_tensor.shape)  # 输出 torch.Size([1])

# 直接使用Python的标量数据类型
scalar_value = 5
print(scalar_value)  # 输出 5
print(type(scalar_value))  # 输出 <class 'int'>

scalar.item and tensor[idx]

scalar.item can get the value of a certain element
tensor[idx] still gets a tensor: a 0-dim tensor, generally called scalar

import torch

scalar_tensor = torch.tensor([5])
print(scalar_tensor)  # 输出 tensor([5])
print(scalar_tensor.shape) #输出 torch.Size([1])

tensor_value = scalar_tensor[0]
print(scalar_value)  # 输出 tensor(5)

scalar_value = scalar_tensor.item()
print(scalar_value)  # 输出 5

Automatic differentiation Autograd (Automatic differentiation)

Autograd (Automatic differentiation) is an automatic differentiation engine in PyTorch, used to calculate and track derivatives on tensors. Autograd makes it relatively simple to perform backpropagation and calculate gradients in neural network training.

In PyTorch, each tensor has a requires_grad attribute that specifies whether gradient calculations are to be performed on that tensor. When requires_grad=True is set, PyTorch tracks all operations on the tensor and builds a computational graph. This computational graph can be used to automatically calculate backpropagation and calculate gradients.

When the forward pass is completed, the gradient can be automatically calculated by calling the backward() method. At this point, for tensors with requires_grad=True, they will have a .grad attribute containing the gradient with respect to some scalar loss function. This gradient can be used to update model parameters and optimize the model during training.

Simple example of automatic derivation using PyTorch:

import torch

# 创建张量并设置 requires_grad=True
x = torch.tensor(3.0, requires_grad=True)
y = torch.tensor(4.0, requires_grad=True)

# 执行一些操作
z = x ** 2 + y ** 3

# 计算 z 相对于 x 和 y 的梯度
z.backward()

# 输出梯度值
print(x.grad)  # 输出: tensor(6.)
print(y.grad)  # 输出: tensor(48.)

Creates two tensors x and y and sets the requires_grad attribute to < a i=4> for automatic derivation. True

defines an operation to calculate z = x ** 2 + y ** 3 and calls the backward() method to calculate the gradient.

Get the gradient values ​​of and with respect to through the grad attribute . xyz

It should be noted that automatic derivation only applies to scalar tensors. If you want to calculate the derivative of a vector or matrix with respect to a tensor, you can use the Jacobian matrix method. Additionally, to avoid memory leaks caused by gradient calculations, gradient calculations can be disabled using the torch.no_grad() context manager.

.grad attribute

The .grad attribute is an attribute used in PyTorch to access the gradient value of a tensor
Insert image description here

import torch

# 创建张量并设置 requires_grad=True
a = torch.tensor(5.0, requires_grad=True)
b = torch.tensor(3.0, requires_grad=True)
c = torch.tensor(2.0, requires_grad=True)
u = b*c
v = a+u
j = 3*v

# 执行反向传播计算梯度
j.backward()

# 输出梯度值
print(a.grad)  # 输出: tensor(3.)
print(b.grad)  # 输出: tensor(6.)
print(c.grad)  # 输出: tensor(9.)
  • Created the tensor and set the requires_grad property to True for automatic differentiation.
  • defines some calculation operations and performs backpropagation to calculate the gradient by calling the .backward() method.
  • Gets the gradient value of the tensor relative to through the .grad attribute. j

.grad_fnproperty

In PyTorch, each tensor has agrad_fn attribute, which is used to represent the operation (or function) that creates the tensor, which is what needs to be called during backpropagation function. This property can be used to construct computational graphs and perform automatic differentiation.

Here is a simple example showing how to use the grad_fn attribute:

import torch

# 创建一个张量并执行一些操作
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2 + 3
z = y ** 2

# 输出每个张量的 grad_fn 属性
print(x.grad_fn)  # 输出: None
print(y.grad_fn)  # 输出: <AddBackward0 object at 0x7f3e8bd82b10>
print(z.grad_fn)  # 输出: <PowBackward0 object at 0x7f3e8bd82b10>

In this example, we create a tensorx and set the requires_grad property to True so that Automatic derivation. We then defined some calculation operations and output the grad_fn properties of each tensor.

Note that a tensor only has a grad_fn attribute if it was created by an operation. For manually created tensors, such as ordinary Python arrays or tensors created via torch.tensor() , their grad_fn attributes are None .

Model building

PyTorch provides a flexible way to build models. You can define your own model by inheriting the torch.nn.Module class and specify the forward propagation operation in the forward() method. Such a model can be composed of multiple layers, and each layer can be a fully connected layer, a convolutional layer, a recurrent neural network, etc.

1. Import the PyTorch library and related packages

import torch
import torch.nn as nn
import torch.optim as optim

2. Define model class (inherited from nn.Module)

class MyModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MyModel, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

In this example, we define a class named MyModel, which inherits from nn.Module. The model has two fully connected layers and a ReLU activation function. The __init__ method is used to initialize the weights and biases of the model, while the forward method defines the forward propagation process of the model.

This code defines a simple neural network model MyModel, which has three layers: an input layer, a hidden layer and an output layer.

In the __init__ method, the structure of the model is defined as follows:

  • self.fc1 = nn.Linear(input_size, hidden_size): This line creates a linear layer (fully connected layer), converting the input feature dimension input_size into the hidden layer feature dimension hidden_size.
  • self.relu = nn.ReLU(): This line creates a ReLU activation function object that is used to introduce nonlinear transformations behind the hidden layer.
  • self.fc2 = nn.Linear(hidden_size, output_size): This line creates a second linear layer that converts the hidden layer features of dimension hidden_size into output features of dimension output_size.

In the forward method, the forward propagation process of the model is defined as follows:

  • out = self.fc1(x): First, the input x is linearly transformed through the first linear layer self.fc1.
  • out = self.relu(out): Then, nonlinear transformation is performed through the ReLU activation function self.relu.
  • out = self.fc2(out): Finally, the transformed features are linearly transformed again through the second linear layer self.fc2 to obtain the final output.

The structure of the entire model can be summarized as:

Input layer (input_size) -> Hidden layer (hidden_size) -> ReLU activation function -> Output layer (output_size)

This model can be used for various tasks such as classification, regression, etc. During the training process, we can minimize the gap between the model's predictions and the true values ​​by defining loss functions and optimization algorithms to achieve the purpose of training the model.

loss function

The loss function module in PyTorch torch.nn contains many commonly used loss functions, such as mean square error loss (MSELoss), cross-entropy loss (CrossEntropyLoss), etc. These loss functions can help us measure the gap between the model's predicted values ​​and the true values, and are an important component in deep learning.

Here are some common loss functions and examples of their usage:

  • Mean square error loss (MSELoss):

    criterion = nn.MSELoss()
    loss = criterion(output, target)
    

    In this example, output is the output value of the model, target is the true value, criterion is an MSELoss The loss function object, loss, is the MSE loss value between the model predicted value and the true value.

  • Cross-entropy loss (CrossEntropyLoss):

    criterion = nn.CrossEntropyLoss()
    loss = criterion(output, target)
    

    In this example, output is the output value of the model, target is the ground truth label, criterion is a CrossEntropyLoss loss function object, loss is the cross-entropy loss value between the model predicted value and the true value.

  • KL divergence loss (KLDivLoss):

    criterion = nn.KLDivLoss()
    loss = criterion(output.log(), target)
    

    In this example, output is the output value of the model, target is the true value, criterion is a KLDivLoss The loss function object, loss is the KL divergence loss value.

The use of loss functions usually includes the following steps:

  1. Define the loss function: Choose an appropriate loss function and instantiate it.
  2. In each training iteration, the difference between the model output value and the true value is calculated by calling the loss function, and the corresponding loss value is obtained.
  3. Call the backpropagation algorithm backward() to calculate the gradient of the loss with respect to the model parameters.

Below is a complete sample code that demonstrates how to use MSELoss for model training:

import torch
import torch.nn as nn

# 定义模型
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.fc = nn.Linear(10, 1)
        
    def forward(self, x):
        out = self.fc(x)
        return out

# 实例化模型和损失函数
model = MyModel()
criterion = nn.MSELoss()

# 模拟数据
inputs = torch.randn(100, 10)
labels = torch.randn(100, 1)

# 训练模型
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
for epoch in range(10):
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    print('Epoch:', epoch+1, 'Loss:', loss.item())

The output is:

Epoch: 1 Loss: 1.7458306550979614
Epoch: 2 Loss: 1.722163438796997
Epoch: 3 Loss: 1.6993693113327026
Epoch: 4 Loss: 1.6774134635925293
Epoch: 5 Loss: 1.6562625169754028
Epoch: 6 Loss: 1.6358848810195923
Epoch: 7 Loss: 1.6162495613098145
Epoch: 8 Loss: 1.5973275899887085
Epoch: 9 Loss: 1.579091191291809
Epoch: 10 Loss: 1.561513066291809

This code is an example of model training using PyTorch.

  • defines a simple linear model MyModel, which contains a linear layer self.fc = nn.Linear(10, 1). This model maps an input tensor of size 10 to an output tensor of size 1.
  • instantiates this model and a mean square error loss function MSELoss, which is criterion = nn.MSELoss().
  • generated an input tensor of size (100, 10)inputs and a label tensor of size (100, 1)labels , used to simulate training data.
  • Use stochastic gradient descent optimizer SGD, i.e. optimizer = torch.optim.SGD(model.parameters(), lr=0.01), and set the learning rate to 0.01.
  • A cycle of 10 training iterations was performed. In each iteration:
    • Call optimizer.zero_grad() to clear the previous gradient.
    • By passing the input tensor inputs into the model model, the output tensor outputs is obtained.
    • Use the loss functioncriterion to calculate the mean square error loss between the predicted valueoutputs and the true valuelabels, and Save it in variableloss.
    • Call loss.backward() to perform backpropagation to calculate the gradient.
    • Call optimizer.step() to update the model parameters and adjust them in a better direction.
  • In each training iteration, print out the epoch number and loss value of the current iteration loss.item().

optimizer

PyTorch provides a variety of optimizers for updating the parameters of a deep learning model when training it. Here are a few common PyTorch optimizers:

1.SGD (Stochastic Gradient Descent)

optimizer = optim.SGD(model.parameters(), lr=learning_rate)

This is one of the most basic optimizers, using stochastic gradient descent to update the parameters of the model. model.parameters() returns the learnable parameters in the model, lr is the learning rate.

2.Adam (adaptive moment estimation)

optimizer = optim.Adam(model.parameters(), lr=learning_rate)

Adam is an adaptive learning rate optimization algorithm that combines momentum methods and RMSprop. It can adaptively adjust the learning rate based on the gradient of each parameter.

3.Adagrad

optimizer = optim.Adagrad(model.parameters(), lr=learning_rate)

Adagrad is an adaptive learning rate optimization algorithm that maintains a learning rate for each parameter and reduces the learning rate for each parameter as training proceeds.

4.RMSprop (root mean square propagation)

optimizer = optim.RMSprop(model.parameters(), lr=learning_rate)

RMSprop is an adaptive learning rate optimization algorithm that uses the root mean square of the gradient to adjust the learning rate.

5. Adadelta

optimizer = optim.Adadelta(model.parameters(), lr=learning_rate)

Adadelta is an adaptive learning rate optimization algorithm that adjusts the learning rate based on past gradients and the magnitude of parameter updates.

These optimizers can update the parameters of the model by calling optimizer.step(). In each training iteration, optimizer.zero_grad() is usually called first to clear the previous gradient, and then perform the process of forward propagation, calculation of loss, back propagation, calculation of gradient and parameter update.

Choosing the appropriate optimizer depends on the specific task and data set. Generally speaking, the Adam optimizer performs well in many cases, but on some problems you may want to try a different optimizer to get better performance. In addition, hyperparameters such as learning rate and momentum can be adjusted to further improve the performance of the optimizer.

Data loading and preprocessing

In PyTorch, data loading and preprocessing are usually done through custom dataset classes and data transformation functions. The following are general data loading and preprocessing steps:

  1. Create a custom dataset class: First, you need to create a custom dataset class, inherited from torch.utils.data.Dataset. In this class, you need to implement the __len__ method that returns the size of the dataset, and the __getitem__ method that returns a single sample based on a given index.
  2. Data conversion: Data conversion is the process of preprocessing original data. You can use the conversion functions provided by the torchvision.transforms module, such as Compose, ToTensor, Normalize, etc. These functions can convert raw data into tensors and perform normalization, cropping, scaling, etc.
  3. Create a dataset instance: Create a dataset instance using the above custom dataset class and conversion function. Input the path, label and other information of the data set, and apply the corresponding data transformation as needed.
  4. Create a data loader: The data loader is responsible for dividing the data set into small batches for training. You can use the torch.utils.data.DataLoader class to create a data loader, passing in the dataset instance and some parameters, such as batch size, whether to randomly shuffle the data, etc.

Here is a sample code that demonstrates how to load and preprocess data:

import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

# 自定义数据集类
class MyDataset(Dataset):
    def __init__(self, data, labels, transform=None):
        self.data = data
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        sample = self.data[index]
        label = self.labels[index]

        if self.transform:
            sample = self.transform(sample)

        return sample, label

# 数据转换
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])
])

# 创建数据集实例
data = [...]  # 原始数据
labels = [...]  # 标签
dataset = MyDataset(data, labels, transform=transform)

# 创建数据加载器
batch_size = 64
shuffle = True
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)

In the above example, we created a custom dataset class MyDataset and passed in the raw data, labels, and data transformation functions transform. We then create a data loader using DataLoader dataloader, set the batch size to 64, and specify whether to randomly shuffle the data.

Guess you like

Origin blog.csdn.net/weixin_44319595/article/details/134142296