Article directory

introduction
1. Installation and configuration
- 1.1 How to install PyTorch
- 1.2 Verify installation
2. Basic concepts
- 2.1 Tensors
- 2.2 Automatic differentiation (Autograd)
3. Computational graphs and automatic differentiation
- 3.1 Concept of computational graph
- 3.2 How to use Autograd
- 3.3 Custom gradient
- - 3.3.1 Example: Custom ReLU function
  - 3.3.2 Using custom Autograd functions
4. Neural Network Basics
- 4.1 Module (nn.Module)
- 4.2 Activation function
- 4.3 nn.Sequential
- 4.4 Loss function
5. Optimization algorithm
- 5.1 Gradient Descent
- 5.2 Other optimization algorithms (such as Adam, RMSProp, etc.)
6. Data loading and preprocessing
- 6.1 Using DataSet
- 6.2 Using DataLoader
- 6.3 Custom data sets
7. Practical combat: building a simple neural network
- 7.1 Data preparation
- 7.2 Building the model
- 7.3 Training model
- 7.4 Evaluation model
8. Debugging and Visualization
- 8.1 Using TensorBoard
- 8.2 Debugging skills
9. Model saving and loading
- 9.1 Save model parameters
- 9.2 Load model parameters
- - 9.2.1 Pay attention to the following points:
10. Advanced themes
- 10.1 Using GPU acceleration
- 10.2 Distributed training
- 10.3 Quantification
11. PyTorch ecosystem
- 11.1 torchvision
- 11.2 torchaudio
- 11.3 torchtext

introduction

What is PyTorch?
PyTorch is an open source machine learning library for a variety of computationally intensive tasks, from basic linear algebra and optimization problems to complex machine learning (deep learning) applications. Originally developed by Facebook's AI Research Lab (FAIR), it has become a widely used library with a large community and ecosystem.

main feature:

Tensor computing capabilities : PyTorch provides a data structure of multi-dimensional arrays (also called tensors) that can be used to perform various mathematical operations. It also provides a rich library for tensor calculations.
Automatic Differentiation : PyTorch provides automatic differentiation functionality through its Autograd module, which is useful for gradient descent and optimization.
Dynamic computation graph : Unlike other deep learning frameworks (such as early versions of TensorFlow) which use static computation graphs, PyTorch uses dynamic computation graphs. This means that graphs are constructed at runtime, which enables more flexible model building.
Simple API : PyTorch’s API is designed to be intuitive and easy to use, which makes developing and debugging models easier.
Python integration : Because PyTorch is tightly integrated with Python, it can easily work with the Python ecosystem including NumPy, SciPy, and Matplotlib.
Community and Ecosystem : PyTorch has won the favor of a large number of developers and researchers due to its flexibility and ease of use. This has resulted in an active community and a large number of third-party libraries and tools.
Multi-platform and multi-backend support : PyTorch not only supports CPUs, but also supports GPUs from NVIDIA and AMD. It also has a production-ready deployment solution - TorchServe.
Rich pre-trained models and toolboxes : PyTorch provides a rich set of pre-trained models and data loading tools through libraries such as torchvision, torchaudio and torchtext.

1. Installation and configuration

1.1 How to install PyTorch

Install using pip on Windows, macOS and Linux

CPU version only:

pip install torch torchvision

Versions with CUDA support:

First, make sure you have a PyTorch-compatible version of CUDA installed. Then, run the following command:

pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu102/torch_stable.html

Here cu102 corresponds to your installed CUDA version. Please modify it according to the actual situation.

Install using Conda on Linux and Windows

CPU version only:

conda install pytorch torchvision cpuonly -c pytorch

Versions with CUDA support:

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

Here cudatoolkit=10.2 corresponds to your installed CUDA version. Please modify it according to the actual situation.

1.2 Verify installation

Once the installation is complete, you can verify that the installation was successful by running the following Python code:

import torch
print(torch.__version__)

If this code runs successfully and prints out the PyTorch version number, then you have successfully installed PyTorch.

Since PyTorch is updated frequently, it is best to check its official installation guide for the latest installation commands and compatibility information.

2. Basic concepts

2.1 Tensors

Tensors are the basic unit used to store and manipulate data in PyTorch. Mathematically, a tensor is a multidimensional data container, similar to a generalization of vectors and matrices. In PyTorch, tensors are used to encode input and output data, model parameters, and intermediate data during model training.

2.1.1 Basic characteristics of tensors

Data type (dtype): Tensors can contain various data types, such as integers (int), floating point numbers (float), etc.
Shape: The shape of a tensor consists of the size of each of its axes. For example, a 3x4 matrix has shape (3, 4).

2.1.2 Create tensor

In PyTorch, there are multiple ways to create tensors:

Create from Python list

import torch
tensor_from_list = torch.tensor([1, 2, 3])

Created using built-in functions

zeros_tensor = torch.zeros(2, 3)
ones_tensor = torch.ones(2, 3)
rand_tensor = torch.rand(2, 3)

Create from existing tensor

new_tensor = torch.ones_like(rand_tensor)

Create from NumPy array

import numpy as np
numpy_array = np.array([1, 2, 3])
tensor_from_numpy = torch.from_numpy(numpy_array)

2.1.3 Tensor operations

PyTorch provides a large number of tensor operations, including but not limited to:

Arithmetic operations: addition, subtraction, multiplication, division, etc.

sum_tensor = torch.add(tensor_from_list, tensor_from_list)

Shape operations: change shape, transpose, etc.

reshaped_tensor = tensor_from_list.view(3, 1)

Indexing, Slicing, and Joining: Selecting and Combining Tensors

sliced_tensor = tensor_from_list[:2]

Mathematical functions: logarithms, exponentials, square roots, etc.

sqrt_tensor = torch.sqrt(tensor_from_list)

Statistical functions: sum, average, maximum, minimum, etc.

max_value = torch.max(tensor_from_list)

2.2 Automatic differentiation (Autograd)

When training a model, you usually need to calculate the derivative or gradient of a function (usually a loss function) with respect to the model parameters. The autograd package in PyTorch provides an automatic differentiation function, which can automatically track and differentiate all operations on tensors.

2.2.1 Basic usage

Create a tensor and set requires_grad=True to track its calculation history:

import torch

# 创建一个张量并设置 requires_grad=True 用于追踪其计算历史
x = torch.ones(2, 2, requires_grad=True)
print(x)

Perform tensor operations:

y = x + 2
print(y)

Since y was created via an operation, it will have a grad_fn attribute:

print(y.grad_fn)

Do more:

z = y * y * 3
out = z.mean()

print(z, out)

2.2.2 Calculating the gradient

By calling the .backward() method, the gradient of all tensors with requires_grad=True can be calculated. This will backpropagate the gradient.

For example:

# 对于单个标量输出，无需为 backward() 指定任何参数
out.backward()

# 输出 d(out)/dx
print(x.grad)

2.2.3 Stop tracking history

In some cases, it may not be desirable to track the computation history of a tensor. This can be achieved via the .detach() method, or using with torch.no_grad(): code blocks to wrap code that does not need to be traced.

print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

2.2.4 Custom gradient function

You can define your own automatic differential operation by inheriting torch.autograd.Function and implementing the forward and backward methods.

2.2.5 Other precautions

If requires_grad=True but you don't want to calculate the gradient for certain steps, you can use .detach() or with torch.no_grad(): to stop automatic differentiation.
Logical operations (such as conditional statements) are not differentiable in automatic differentiation. Therefore, they do not affect the backpropagation process.

Automatic differentiation greatly simplifies the gradient calculation process, which is especially useful when optimizing complex models. This is one of the most powerful features of PyTorch as a deep learning framework.

3. Computational graphs and automatic differentiation

3.1 Concept of computational graph

A computational graph is a special type of directed graph used to describe how to calculate output results from input data and initial value parameters. Such diagrams help to organize efficiently how to perform numerical calculations, especially for gradient calculations or backpropagation.

3.1.1 Nodes and edges

Nodes: Nodes in a computational graph usually represent data (usually tensors) or operations (such as addition, multiplication, etc.).
Edges: Edges represent dependencies between data. An edge from node A to node B means that the calculation of B depends on the value of A.

3.1.2 Forward propagation and back propagation

Forward Propagation: In the forward propagation stage, the calculation starts from the input node, passes through the intermediate nodes along the edge, and finally reaches the output node. This process actually performs various mathematical operations defined in the diagram.
Backpropagation: In the backpropagation stage, the gradient (or derivative) is calculated starting from the output node and then propagated back to the input node along the opposite path of forward propagation. This process is used to update model parameters to optimize a certain objective function.

3.1.3 Dynamic and static calculation graphs

Static Computational Graphs: In this mode, the computational graph is defined before running and remains unchanged throughout the training process. Early versions of TensorFlow primarily used static computation graphs.
Dynamic Computational Graphs: In dynamic mode, computational graphs are constructed at runtime. This means that each forward pass creates a new computational graph. PyTorch mainly uses dynamic calculation graphs, which makes it more flexible and easier to debug, but sometimes it may sacrifice certain performance.

3.1.4 Computational graphs and automatic differentiation

In PyTorch, when you perform an operation, PyTorch dynamically builds a computational graph. Every tensor is a node in the graph, and every operation (function) is an edge. When the .backward() method is called, PyTorch will backpropagate along this graph starting from the current tensor (usually the loss) to calculate the gradient.

3.1.5 Advantages and limitations

Advantages: Computational graphs provide an efficient way to organize and perform calculations, especially when it comes to gradient calculations and optimization.
Limitations: Building and storing computational graphs takes up additional memory and computing resources, especially for very complex models and large-scale data.

3.2 How to use Autograd

Autograd is an automatic differentiation library provided by PyTorch, which is used to automatically calculate the gradient of neural networks (or other differentiable functions). It is the core component used for backpropagation when training neural networks. Here are some basic examples and explanations of how to use Autograd.

3.2.1 Tensor and Gradient Tracking

First, create a tensor and set requires_grad=True to track its computation history.

import torch

# 创建一个张量并追踪其计算历史
x = torch.ones(2, 2, requires_grad=True)
print(x)

3.2.2 Perform tensor operations

Perform some tensor operations. These operations will be dynamically recorded by Autograd and used to build the calculation graph.

y = x + 2
z = y * y * 3
out = z.mean()
print(y)
print(z)
print(out)

3.2.3 Backpropagation

Call .backward() to perform backward propagation. This computes the gradient of out (our output tensor) with respect to all input tensors (only x in this example ) with requires_grad=True .

out.backward()

Look at the gradient d(out)/dx.

print(x.grad)

3.2.4 Stop gradient tracking

There are several ways to stop gradient tracking:

Using .detach(): Create a new tensor with the same content but no gradient required.

x_detached = x.detach()

Use torch.no_grad(): All operations performed in this context manager will not track gradients.

with torch.no_grad():
    y = x + 2

3.2.5 Calculating more complex gradients

By default, gradients are accumulated into the .grad attribute. This is very useful for recurrent or recursive networks. If you don't want to accumulate the gradient, you can manually reset it to zero before each iteration.

x.grad.zero_()

3.3 Custom gradient

In PyTorch, if you want to implement a gradient calculation process that is not built into the library, or if you want to modify the gradient calculation of an existing operation, you can do so by defining your own Autograd function. This usually involves inheriting the torch.autograd.Function class and implementing the forward and backward methods.

3.3.1 Example: Custom ReLU function

The following is an example of a custom ReLU (Rectified Linear Unit) activation function.

import torch

class MyReLU(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input_tensor):
        # 保存用于后续梯度计算的信息
        ctx.save_for_backward(input_tensor)
        
        # 实现 ReLU 操作
        output_tensor = input_tensor.clamp(min=0)
        return output_tensor

    @staticmethod
    def backward(ctx, grad_output):
        # 获得 forward 传播阶段保存的信息
        input_tensor, = ctx.saved_tensors

        # 计算 ReLU 的梯度
        grad_input = grad_output.clone()
        grad_input[input_tensor < 0] = 0

        return grad_input

In this example, the forward method implements the forward propagation of ReLU and saves the input tensor through the ctx.save_for_backward method for use in backward propagation.

The backward method is responsible for calculating the gradient. It accepts a parameter grad_output , which is the gradient of the loss function with respect to the forward propagation output, and then returns the gradient of the loss function with respect to the forward propagation input.

3.3.2 Using custom Autograd functions

Once you define your Autograd function, you can use it just like the built-in PyTorch operations.

# 使用自定义的 ReLU
relu = MyReLU.apply

# 创建一个张量并应用自定义的 ReLU
x = torch.randn(5, 5, requires_grad=True)
y = relu(x)

# 计算梯度
z = y.sum()
z.backward()

# 查看梯度
print(x.grad)

This approach can be used for more complex operations and gradient calculations, and can even be used to implement your own optimization algorithms.

Note that the use of custom Autograd functions is generally recommended only for advanced users, as it requires a deep understanding of backpropagation and gradient calculations. For most application scenarios, PyTorch's built-in operations and automatic differentiation capabilities are sufficient.

4. Neural Network Basics

4.1 Module (nn.Module)

In PyTorch, nn.Module is the base class of the neural network module, which is a very important concept for building and managing neural networks.

4.1.1 Basic structure

A custom network layer or the entire neural network should inherit the nn.Module class and implement its methods. The most common methods are _ init _ (constructor) and forward .

_ init _ method: used to initialize network layers and parameters.
forward method: defines the calculation process of forward propagation.

4.1.2 Simple example

Below is a simple linear regression model that contains only one linear layer.

import torch
import torch.nn as nn

class LinearRegressionModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegressionModel, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim)

    def forward(self, x):
        out = self.linear(x)
        return out

4.1.3 Parameter management

nn.Module automatically tracks the trainable parameters you define. For example, the weights and biases of the nn.Linear layer are trainable parameters. You can use the parameters() or named_parameters() method to get all parameters in the model.

model = LinearRegressionModel(1, 1)

# 打印模型的参数
for name, param in model.named_parameters():
    print(name, param.data)

4.1.4 Nested modules

nn.Modules can contain other nn.Modules , allowing you to build nested structures. This is the key to implementing more complex network architectures (such as ResNet, BERT, etc.).

class TwoLayerNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(TwoLayerNN, self).__init__()
        self.layer1 = nn.Linear(input_dim, hidden_dim)
        self.layer2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = torch.relu(self.layer1(x))
        x = self.layer2(x)
        return x

4.1.5 Loss function and optimizer

Although nn.Module is mainly used to define network structure and forward propagation, PyTorch also provides matching loss functions (such as nn.CrossEntropyLoss , nn.MSELoss, etc.) and optimizers (such as torch.optim.SGD , torch. optim.Adam etc.).

4.1.6 GPU support

You can easily move all model parameters and buffers to the GPU by calling the .to(device) method.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

4.2 Activation function

Activation functions play a very important role in neural networks. They introduce nonlinear characteristics, allowing neural networks to approximate more complex functions. Without activation functions, even multi-layer neural networks can only represent linear mappings. Here are some commonly used activation functions and how they are used in PyTorch:

4.2.1 ReLU（Rectified Linear Unit）

ReLU is one of the most commonly used activation functions. It is linear in the positive range and outputs zero in the negative range.

Insert image description here

Use in PyTorch:

import torch.nn as nn

relu = nn.ReLU()

Or in the model definition:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(x)
        return x

4.2.2 Sigmoid

The sigmoid function squeezes any input between 0 and 1.
Insert image description here

Use in PyTorch:

sigmoid = nn.Sigmoid()

4.2.3 Tanh（Hyperbolic Tangent）

The Tanh function squeezes any input between -1 and 1.
Insert image description here

Use in PyTorch:

tanh = nn.Tanh()

4.2.4 Leaky ReLU

Leaky ReLU is a variant of ReLU that allows negative numbers to have a small positive gradient.
Insert image description here

Use in PyTorch:

leaky_relu = nn.LeakyReLU(0.01)

4.2.5 Softmax

Softmax is often used in the output layer of classification problems. It converts the input into a probability distribution.
Insert image description here

Use in PyTorch:

softmax = nn.Softmax(dim=1)  # dim 指定了哪一个维度将被归一化

4.2.6 Other activation functions

There are many other activation functions, such as Swish, Mish, SELU, etc. Some are already included in PyTorch's torch.nn library, and some require custom implementation.

4.2.7 Using activation functions

Usually when defining the network structure, the activation function is added as a layer to the nn.Sequential container, or called in the forward method.

import torch.nn as nn

# 使用 nn.Sequential
model = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 1)
)

# 在 forward 方法中
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.layer1 = nn.Linear(10, 20)
        self.layer2 = nn.Linear(20, 1)

    def forward(self, x):
        x = nn.functional.relu(self.layer1(x))
        x = self.layer2(x)
        return x

4.3 nn.Sequential

nn.Sequential is a container module in PyTorch for simply stacking different layers to create a larger model. Using nn.Sequential , you can quickly define and deploy models without defining a complete class. It is particularly suitable for relatively simple forward propagation models.

4.3.1 Basic usage

Here is an example of a simple multilayer perceptron (MLP) model built using nn.Sequential :

import torch
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(64, 128),  # 输入层到隐藏层的线性变换，输入维度64，输出维度128
    nn.ReLU(),           # 隐藏层使用的激活函数
    nn.Linear(128, 10)   # 隐藏层到输出层的线性变换，输入维度128，输出维度10
)

# 随机生成一个输入张量，维度为 [batch_size, input_features]
input_tensor = torch.randn(16, 64)

# 使用模型进行前向传播
output_tensor = model(input_tensor)

In this example, the model has two linear layers and a ReLU activation layer. Data will be propagated in the order defined by these layers in nn.Sequential .

4.3.2 Nested use

You can also nest other nn.Sequential modules or custom modules in nn.Sequential to make the structure clearer:

block1 = nn.Sequential(
    nn.Conv2d(1, 16, kernel_size=3),
    nn.ReLU(),
    nn.MaxPool2d(2)
)

block2 = nn.Sequential(
    nn.Conv2d(16, 32, kernel_size=3),
    nn.ReLU(),
    nn.MaxPool2d(2)
)

model = nn.Sequential(
    block1,
    block2,
    nn.Flatten(),
    nn.Linear(32 * 6 * 6, 10)
)

4.3.3 Using OrderedDict

You can also use OrderedDict to name each layer:

from collections import OrderedDict

model = nn.Sequential(OrderedDict([
    ('first_linear', nn.Linear(64, 128)),
    ('first_activation', nn.ReLU()),
    ('second_linear', nn.Linear(128, 10))
]))

This way you can access specific layers by name:

print(model.first_linear)

4.3.4 Precautions

Although nn.Sequential is very convenient, it has limitations. For example, it cannot handle layers with multiple inputs or outputs, or complex connections between layers (such as residual connections). For these more complex cases, you usually need to define a custom model class that inherits from nn.Module .

4.4 Loss function

The loss function (or objective function, cost function) is a way to measure the difference between the model predictions and the real data. Optimizing the model actually minimizes this loss function by changing the model parameters.

PyTorch provides a variety of built-in loss functions, which are generally under the torch.nn module. The following are some commonly used loss functions:

4.4.1 Mean Squared Error Loss (MSE)

Used for regression problems.

import torch
import torch.nn as nn

criterion = nn.MSELoss()
prediction = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
loss = criterion(prediction, target)

4.4.2 Cross Entropy Loss

used for classification problems.

criterion = nn.CrossEntropyLoss()
prediction = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
loss = criterion(prediction, target)

4.4.3 Binary Cross Entropy Loss (Binary Cross Entropy Loss)

Used for binary classification problems.

criterion = nn.BCELoss()
prediction = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
loss = criterion(prediction, target)

4.4.4 Hinge Loss

Used for support vector machine (SVM) related problems.

criterion = nn.HingeEmbeddingLoss()
prediction = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
loss = criterion(prediction, target)

4.4.5 Custom loss function

It's also easy to define your own loss function. The custom loss function needs to inherit the nn.Module class and implement the forward method.

class MyCustomLoss(nn.Module):
    def __init__(self):
        super(MyCustomLoss, self).__init__()

    def forward(self, prediction, target):
        loss = (prediction - target).abs().mean()
        return loss

criterion = MyCustomLoss()
prediction = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
loss = criterion(prediction, target)

4.4.6 Using loss functions

Once a suitable loss function is selected, it can be used in conjunction with an optimizer (such as SGD, Adam, etc.) to update the model parameters through backpropagation.

import torch.optim as optim

# 创建模型和数据
model = nn.Linear(5, 1)
prediction = model(torch.randn(3, 5))
target = torch.randn(3, 1)

# 定义损失函数和优化器
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# 计算损失
loss = criterion(prediction, target)

# 反向传播和参数更新
optimizer.zero_grad()  # 清除之前梯度
loss.backward()  # 反向传播
optimizer.step()  # 更新参数

5. Optimization algorithm

5.1 Gradient Descent

Gradient Descent is an optimization algorithm used to minimize a function (usually a loss function). In machine learning and deep learning, gradient descent is the primary method used to update model parameters to reduce the value of the loss function.

5.1.1 Basic principles

Insert image description here

5.1.2 Variants

Batch Gradient Descent: Use all data in each update. This approach works well for small data sets, but not for large data sets.
Stochastic Gradient Descent (SGD): Randomly select a sample for gradient calculation and update in each update. This approach works well for large data sets.
Mini-batch Gradient Descent: Between the above two, a mini-batch of samples is used for each update.

5.1.3 Using gradient descent in PyTorch

In PyTorch, using gradient descent is generally very simple. PyTorch's torch.optim package provides a variety of optimization algorithms, including various gradient descent variants.

Here is a simple example demonstrating how to use the SGD optimizer:

import torch
import torch.nn as nn
import torch.optim as optim

# 创建一个简单的线性模型
model = nn.Linear(2, 1)

# 定义损失函数和优化器
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# 随机生成一些输入数据和目标数据
input_data = torch.randn(10, 2)
target_data = 3 * input_data[:, 0] + 2 * input_data[:, 1]

# 前向传播
output = model(input_data)

# 计算损失
loss = criterion(output, target_data)

# 反向传播
optimizer.zero_grad()
loss.backward()

# 更新参数
optimizer.step()

5.1.4 Precautions

Learning Rate: Choosing an appropriate learning rate is very important. A learning rate that is too small will cause the convergence rate to be very slow, while a learning rate that is too large may cause the model to oscillate or diverge near the optimal solution.
Local Minima: For non-convex functions, gradient descent may get stuck in a local minimum. In practice, this is usually not a big problem for high-dimensional data.
Momentum: To speed up training and avoid getting stuck in local minima, momentum can be used. There is a momentum parameter available in PyTorch's torch.optim.SGD .

5.2 Other optimization algorithms (such as Adam, RMSProp, etc.)

In addition to traditional gradient descent (Gradient Descent) and its variants (such as stochastic gradient descent, Stochastic Gradient Descent or SGD), there are a variety of other optimization algorithms that are widely used in the fields of machine learning and deep learning. These algorithms are usually designed to solve certain limitations of gradient descent, such as choosing an appropriate learning rate, getting stuck in local optimal solutions, etc.

5.2.1 Adam（Adaptive Moment Estimation）

Adam is an adaptive learning rate optimization algorithm that integrates the ideas of AdaGrad and RMSProp. Adam considers both first-order momentum and second-order momentum, so it can perform better under various conditions.

import torch.optim as optim

optimizer = optim.Adam(model.parameters(), lr=0.001)

5.2.2 RMSProp（Root Mean Square Propagation）

RMSProp is an adaptive learning rate optimization algorithm that adjusts the learning rate of each parameter and is suitable for non-stationary objective functions.

optimizer = optim.RMSprop(model.parameters(), lr=0.01)

5.2.3 Dosage

Adagrad is an adaptive learning rate optimization algorithm that assigns an independent learning rate to each parameter. While it performs well when dealing with sparse data, it may not be suitable for training deep learning models because the learning rate may become very small prematurely.

optimizer = optim.Adagrad(model.parameters(), lr=0.01)

5.2.4 Adadelta

Adadelta is an extension to Adagrad designed to slow down the learning rate decline of Adagrad.

optimizer = optim.Adadelta(model.parameters(), lr=1.0)

5.2.5 FTRL（Follow The Regularized Leader）

FTRL is an optimization algorithm designed for large-scale online learning scenarios and is often used for large-scale sparse data problems such as advertising click-through rate prediction.

optimizer = optim.FTRL(model.parameters(), lr=0.1, l1_penalty=1e-3, l2_penalty=1e-3)

5.2.6 Optimizer selection considerations

Learning rate scheduling: Different optimization algorithms may require different learning rates or learning rate scheduling strategies. PyTorch provides torch.optim.lr_scheduler to easily adjust the learning rate.
Hyperparameter tuning: In addition to learning rate, many optimization algorithms have other hyperparameters (such as Adam's beta1, beta2, RMSProp's alpha, etc.).

6. Data loading and preprocessing

In PyTorch, the Dataset and DataLoader classes provide an efficient and flexible way to load and preprocess data. They make it easy to batch, shuffle, and load data in parallel.

6.1 Using DataSet

To create a custom Dataset, you need to inherit the torch.utils.data.Dataset class and implement the len and getitem methods .

from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        return self.data[index], self.labels[index]

This way, you can initialize the Dataset with your own data.

6.2 Using DataLoader

The DataLoader class is used to create a data loader that can accept any data set that implements the len and getitem methods . DataLoader provides a variety of useful features such as automatic batch processing, shuffling data, and parallel data loading.

from torch.utils.data import DataLoader

# 创建自定义数据集
my_dataset = MyDataset(data, labels)

# 创建 DataLoader 实例
data_loader = DataLoader(my_dataset, batch_size=4, shuffle=True)

# 使用 DataLoader
for batch_data, batch_labels in data_loader:
    # 进行模型训练或评估

6.3 Custom data sets

Creating a custom dataset is relatively simple in PyTorch. You only need to inherit the torch.utils.data.Dataset class and implement the len and getitem methods . Here is a basic example of creating a custom dataset:

6.3.1 Create a custom Dataset

from torch.utils.data import Dataset
import torch

class CustomDataset(Dataset):
    def __init__(self, data_array, labels_array):
        self.data = data_array
        self.labels = labels_array
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index):
        sample_data = self.data[index]
        sample_label = self.labels[index]
        return sample_data, sample_label

In this example, the _init_ method is used to initialize the dataset, taking the data and labels as input parameters . The _len_ method returns the number of samples in the dataset, while the _ getitem _ method returns the data and label at the given index.

6.3.2 Using custom Dataset

Once you've created your custom Dataset, you can easily use it with PyTorch's DataLoader.

from torch.utils.data import DataLoader

# 初始化自定义数据集
data_array = torch.randn(100, 3, 32, 32)  # 100个 3x32x32 的随机张量
labels_array = torch.randint(0, 2, (100,))  # 100个随机标签 (0或1)

custom_dataset = CustomDataset(data_array, labels_array)

# 使用 DataLoader
data_loader = DataLoader(custom_dataset, batch_size=10, shuffle=True)

for data, label in data_loader:
    # 这里进行你的训练/验证代码
    pass

6.3.3 Data preprocessing

If you need to perform some form of preprocessing or data augmentation on your dataset, you can implement it inside the getitem method , or use torchvision.transforms .

from torchvision import transforms

class CustomDatasetWithTransforms(Dataset):
    def __init__(self, data_array, labels_array):
        self.data = data_array
        self.labels = labels_array
        self.transform = transforms.Compose([
            transforms.ToPILImage(),
            transforms.Resize((128, 128)),
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ])
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index):
        sample_data = self.data[index]
        sample_label = self.labels[index]
        
        if self.transform:
            sample_data = self.transform(sample_data)
        
        return sample_data, sample_label

In this way, a custom dataset with data preprocessing capabilities is created. Using custom datasets and data loaders (DataLoaders) is the recommended approach to data processing in PyTorch, as they provide a flexible and easy-to-use data processing framework.

7. Practical combat: building a simple neural network

7.1 Data preparation

First, we will use PyTorch's built-in dataset, such as MNIST, as our data source.

import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

transform = transforms.Compose([transforms.ToTensor()])

train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

test_dataset = datasets.MNIST('./data', train=False, download=True, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

7.2 Building the model

Next, we build a simple fully connected network (also called a multilayer perceptron, MLP).

import torch.nn as nn
import torch.nn.functional as F

class SimpleMLP(nn.Module):
    def __init__(self):
        super(SimpleMLP, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return F.log_softmax(x, dim=1)

7.3 Training model

model = SimpleMLP()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

for epoch in range(5):
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        
        if batch_idx % 100 == 0:
            print('Epoch: {} [{}/{}]\tLoss: {:.6f}'.format(epoch, batch_idx * len(data), len(train_loader.dataset), loss.item()))

7.4 Evaluation model

Use the test set to evaluate the model.

model.eval()
test_loss = 0
correct = 0

with torch.no_grad():
    for data, target in test_loader:
        output = model(data)
        test_loss += criterion(output, target).item()
        pred = output.argmax(dim=1, keepdim=True)
        correct += pred.eq(target.view_as(pred)).sum().item()

test_loss /= len(test_loader.dataset)

print('Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)'.format(
    test_loss, correct, len(test_loader.dataset),
    100. * correct / len(test_loader.dataset)))

8. Debugging and Visualization

8.1 Using TensorBoard

TensorBoard is a visualization tool that can be used to display various parameters and indicators during the training process, making model development more intuitive. PyTorch provides native integration with TensorBoard.

8.1.1 Install TensorBoard

First, you need to install TensorBoard. If you haven't installed it yet, you can use the following command to install it:

pip install tensorboard

8.1.2 Import TensorBoard

Import PyTorch's SummaryWriter class, which is the main interface for interacting with TensorBoard.

from torch.utils.tensorboard import SummaryWriter

8.1.3 Initialize SummaryWriter

Create a SummaryWriter object.

writer = SummaryWriter('runs/experiment_1')

This command will create a directory named runs/experiment_1, which will contain all log files.

8.1.4 Recording scalars, images, histograms, etc.

You can use various methods of SummaryWriter to record the information that interests you.

Scalar (for example, loss or accuracy):

for epoch in range(num_epochs):
    # ... 训练过程 ...
    writer.add_scalar('Training Loss', loss, epoch)

image:

# 将一批图像添加到 TensorBoard
images, labels = next(iter(train_loader))
writer.add_images('Training Images', images)

Histogram (e.g., model weight distribution):

for name, param in model.named_parameters():
    writer.add_histogram(name, param, epoch)

You can also add model diagrams:

writer.add_graph(model, images)

8.1.5 Starting TensorBoard

From the command line, start TensorBoard using the following command:

tensorboard --logdir=runs

Then, open a browser and navigate to http://localhost:6006/.

8.1.6 Close SummaryWriter

Once everything is done, close the SummaryWriter .

writer.close()

In this way, you can see various visualization results in TensorBoard, which helps in debugging and optimizing the model. TensorBoard also has many other advanced features, such as embeddings visualization, PR curves, etc.

8.2 Debugging skills

8.2.1 Print shapes and types

When you encounter problems related to tensor operations, first check the shape and data type of the tensor.

print("Tensor shape:", tensor.size())
print("Tensor type:", tensor.dtype)

8.2.2 Using assertions

Use assertions to ensure that the logic of your code behaves as you expect.

assert tensor.size(0) == 64, "Expected batch size of 64"

8.2.3 Print gradient information

During backpropagation, looking at the value and shape of the gradients of each layer can provide a lot of useful information.

def print_grad(grad):
    print(grad)

tensor.register_hook(print_grad)

8.2.4 Using .item() and .detach()

During debugging, you may need to convert a single value from a PyTorch tensor to a Python number. This can be achieved using .item() . Note that this only works for tensors containing only a single element.

single_value = tensor.item()

If you want to detach a multi-element tensor from the computational graph and convert it into a NumPy array, you can use .detach() .

array = tensor.detach().cpu().numpy()

8.2.5 Using PyTorch’s torch.set_printoptions

This function gives you more flexible control over how tensors are printed.

torch.set_printoptions(precision=2, sci_mode=False)

8.2.6. PyTorch’s built-in debugging tools

torch.autograd.set_detect_anomaly(True): This will help you find out where NaN or infinity values are generated.

8.2.7 Using standard Python debugging tools

You can also use Python's standard library pdb or more advanced tools, such as ipdb , to perform line-by-line code tracing.

Add this before the line of code you want to pause:

import pdb; pdb.set_trace()

Then, you can perform some checks in the terminal.

8.2.8 Logging

In addition to inserting print statements or breakpoints directly into the code, you can also use Python's logging library to record important runtime information.

import logging

logging.basicConfig(level=logging.INFO)
logging.info("Loss: %s, Accuracy: %s", loss, accuracy)

9. Model saving and loading

9.1 Save model parameters

There are many ways to save and load models in PyTorch, but they mainly fall into two categories:

Save the entire model (including structure and parameters)
Only save model parameters

9.1.1 Save the entire model

This method will save the structure and parameters of the model, but it should be noted that the model saved in this method will be relatively large, and a model class definition is required when loading the model.

keep:

torch.save(model, 'model.pth')

load:

model = torch.load('model.pth')

9.1.2 Save only model parameters

This method only saves the parameters of the model, not the structure of the model. The resulting file will be much smaller, and only models that match its structure will need to be loaded when loading the model.

keep:

torch.save(model.state_dict(), 'params.pth')

load:

model = TheModelClass(*args, **kwargs)  # 先实例化模型类，*args, **kwargs是初始化需要的参数
model.load_state_dict(torch.load('params.pth'))

9.1.3 Precautions

Device compatibility: When loading a model, you need to note that the model and the device (CPU or GPU) the model is loaded from need to match, or you need to specify the correct device.

# 如果保存时模型在 GPU，加载到 CPU
model.load_state_dict(torch.load('params.pth', map_location=torch.device('cpu')))

# 如果保存时模型在 CPU，加载到 GPU
model.load_state_dict(torch.load('params.pth'))
model.to('cuda')

Version compatibility: There may be compatibility issues when saving and loading models between different versions of PyTorch.
Saving the model in nn.DataParallel : If your model is trained on multiple GPUs, you need to save the state_dict carefully .

# 推荐这种方式
torch.save(model.module.state_dict(), 'params.pth')

9.2 Load model parameters

Parameters for loading PyTorch models are usually related to how the model is saved. There are generally two main ways to load model parameters:

Loading from complete model: This method is performed after saving the entire model including structure and parameters. In this case, you can load the model directly using the torch.load method. This method requires that the model class definition is available when the model is saved.

model = torch.load('model.pth')

Loading from saved parameters: This is the more common (and recommended) method. First, you need to instantiate a model with the same structure as the saved model, and then load the parameters using the load_state_dict method.

model = TheModelClass(*args, **kwargs)  # 初始化模型实例
model.load_state_dict(torch.load('params.pth'))

9.2.1 Pay attention to the following points:

Device: If the model was trained and saved on GPU, but you want to load it on CPU, or vice versa, you need to specify the map_location parameter in the torch.load method .

Save from GPU, load to CPU:

model.load_state_dict(torch.load('params.pth', map_location=torch.device('cpu')))

Save from CPU, load to GPU:

model.load_state_dict(torch.load('params.pth'))
model.to('cuda')

Model status: It is important to ensure that the model is in the appropriate status (such as training or evaluation status).

model.eval()  # 或者 model.train()

Multi-GPU training: If the model parameters are saved using nn.DataParallel , special handling is required when loading the parameters on a single GPU or CPU.

# 在加载模型定义后
model = nn.DataParallel(model)  # 包装为 DataParallel 类型
model.load_state_dict(torch.load('params.pth'))

Or, if you trained and saved the model on multiple GPUs but want to load it on a single device:

# 保存时使用 model.module.state_dict()
model.load_state_dict(torch.load('params.pth'))

10. Advanced themes

Using GPUs for model training and inference in PyTorch can often significantly speed up computations. Here are some basic steps on how to use a GPU:

10.1 Using GPU acceleration

10.1.1 Check if GPU is available

First, you need to check if there is a CUDA supported GPU available.

print(torch.cuda.is_available())

10.1.2 Designated equipment

Before you start, you need to specify the device you want to use.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Here, "cuda:0" means using the first GPU. If there are multiple GPUs, you can select other GPUs by changing the index (e.g. "cuda:1" , "cuda:2", etc.).

10.1.3 Moving the model to the GPU

Next, you can use the .to() method to move all parameters and buffers of the model to the GPU.

model.to(device)

10.1.4 Moving data to GPU

Before doing forward and backward passes, you need to ensure that the input data and labels are also on the GPU.

inputs, labels = inputs.to(device), labels.to(device)

10.1.5 Multi-GPU usage (optional)

If you have multiple GPUs, you can use nn.DataParallel for data parallel processing.

model = nn.DataParallel(model)

This will allow the model to be trained on all available GPUs, automatically distributing data and collecting results.

10.2 Distributed training

Distributed training is a method that uses multiple computing nodes to speed up deep learning model training and scale up the model. In PyTorch, distributed training is usually implemented through the torch.distributed API.

10.2.1 Distributed training method

Data Parallelism: This is the simplest form of distributed training and is typically used with multiple GPUs on a single machine. Each GPU is responsible for a part of the model, and the data is divided into multiple mini-batches and distributed to each GPU.
Model parallelism: This is typically used for large models that cannot fit on a single GPU. In this case, different parts of the model are assigned to different GPUs.
Data parallelism + model parallelism: This is a combination of the above two approaches.

10.2.2 Initializing the distributed environment

Before starting distributed training, the distributed environment must be initialized.

import torch.distributed as dist

dist.init_process_group(backend='nccl', init_method='tcp://localhost:23456', rank=0, world_size=1)

10.2.3 Distributed data parallelism

PyTorch provides torch.nn.DataParallel and torch.nn.parallel.DistributedDataParallel classes to simplify the implementation of data parallelism.

Use DistributedDataParallel

from torch.nn.parallel import DistributedDataParallel

model = model.to(device)  # 先把模型移到设备上
model = DistributedDataParallel(model)

10.2.4 Losses and Optimizers

The implementation of loss and optimizer is basically the same as non-distributed training, but you need to pay attention to the gradient accumulation on each node.

10.2.5 Loading and saving models

In a distributed environment, you typically only need to save or load a model from one process.

10.2.6 Sample code

import torch
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel

# 初始化分布式环境
dist.init_process_group(backend='nccl')

# 设备设置
device = torch.device("cuda")

# 创建模型和数据
model = YourModel()
model = model.to(device)
model = DistributedDataParallel(model)

data_loader = YourDataloader()

# 损失和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())

# 训练循环
for epoch in range(epochs):
    for batch in data_loader:
        inputs, targets = batch
        inputs = inputs.to(device)
        targets = targets.to(device)

        # 前向传播
        outputs = model(inputs)

        # 计算损失
        loss = criterion(outputs, targets)

        # 反向传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # ... 可能的评估、保存模型等

10.3 Quantification

Model quantization is a technique for reducing model size and increasing inference speed, usually at the expense of a certain amount of model accuracy. PyTorch provides a complete set of quantification tools and supports multiple quantification methods.

10.3.1 Preliminary steps

Make sure your PyTorch installation includes the dependencies required for quantization. The quantization function is available in PyTorch 1.3.0 and above.

10.3.2 Preparing the model

Typically, quantization is performed on an already trained model. You can also include quantization in the training phase, which is called quantization-aware training.

10.3.3 Select quantization configuration

Static quantification: only quantizes the model weights without accessing the data.
Dynamic quantization: Model weights and activations are dynamically quantized.
Quantization-aware training (QAT): Quantization is performed during the training phase.

10.3.4 Static quantization example

import torch
import torch.quantization

# 准备模型和数据
model = torch.load('my_model.pth')
model.eval()

# 指定量化配置并量化模型
quantized_model = torch.quantization.quantize_dynamic(
    model, {
    
    torch.nn.Linear}, dtype=torch.qint8
)

# 保存量化后的模型
torch.save(quantized_model.state_dict(), 'my_quantized_model.pth')

10.3.5 Dynamic quantization example

import torch
import torch.quantization

# 准备模型和数据
model = torch.load('my_model.pth')
model.eval()

# 量化模型
quantized_model = torch.quantization.quantize_dynamic(model)

# 保存量化后的模型
torch.save(quantized_model.state_dict(), 'my_quantized_model.pth')

10.3.6 Quantitative awareness training example

Quantization-aware training typically involves pseudo-quantization of model weights while also quantizing the input. This allows the model to adapt to the errors introduced by quantization during the training phase.

import torch
import torch.quantization

# 准备模型
model = YourModel()
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model, inplace=True)

# 模型训练
# ...

# 转换为量化模型
torch.quantization.convert(model, inplace=True)

10.3.7 Advantages and Disadvantages

Advantages: reduced model size and faster inference.
Disadvantages: Some model accuracy may be lost.

11. PyTorch ecosystem

11.1 torchvision

torchvision is a library used with PyTorch to handle computer vision tasks. It provides the following main functions:

11.1.1 Pre-trained model

torchvision contains many pre-trained models, such as VGG, ResNet, MobileNet, etc. These models can be used for transfer learning or directly for inference.

import torchvision.models as models

resnet18 = models.resnet18(pretrained=True)

11.1.2 Dataset

torchvision includes commonly used computer vision data sets, such as CIFAR-10 , MNIST , ImageNet , etc.

from torchvision.datasets import MNIST

train_dataset = MNIST(root='./data', train=True, transform=None, download=True)

11.1.3 Data conversion

A variety of transformation methods are provided for image processing and enhancement.

from torchvision import transforms

transform = transforms.Compose([
    transforms.RandomSizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

11.1.4 Tool functions

torchvision also includes some utility functions for computer vision, such as non-maximum suppression (NMS).

11.1.5 Example

Here is a simple example of using torchvision to load a pre-trained model and perform image classification:

import torch
import torchvision.models as models
from torchvision import transforms
from PIL import Image

# 加载预训练模型
model = models.resnet18(pretrained=True)
model.eval()

# 图像预处理
input_image = Image.open("your_image.jpg")
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)

# 推理
with torch.no_grad():
    output = model(input_batch)

# 获取预测结果
_, predicted_idx = torch.max(output, 1)
print(predicted_idx)

11.2 torchaudio

torchaudio is a library that works with PyTorch and is specifically designed for processing audio data and audio signal processing tasks. torchaudio provides a range of tools, including methods for audio data loading, transformation and feature extraction, as well as a number of pre-trained audio processing models.

11.2.1 Main functions

Data loading and saving
torchaudio supports reading and writing of multiple audio formats, including but not limited to WAV, MP3, FLAC, etc.

import torchaudio

# 读取音频文件
waveform, sample_rate = torchaudio.load('audio_file.wav')

# 保存音频文件
torchaudio.save('output_audio_file.wav', waveform, sample_rate)

Audio Transform
Torchaudio provides many transformation methods for audio processing, such as Mel Frequency Cepstrum Coefficients (MFCC), spectrogram creation, resampling, etc.

import torchaudio.transforms as T

# 创建声谱图
spectrogram = T.Spectrogram()(waveform)

# 计算 MFCC
mfcc = T.MFCC()(waveform)

Pre-trained models
Although torchaudio does not have as many pre-trained models as torchvision , it does include some task-specific pre-trained models, such as waveform-to-waveform generation models.

Dataset
torchaudio also includes some standard datasets for training and testing audio models.

from torchaudio.datasets import YESNO

yesno_data = YESNO('./', download=True)

11.2.2 Example

Here is a simple example using torchaudio to load an audio file and calculate its MFCC:

import torchaudio
import torchaudio.transforms as T

# 加载音频文件
waveform, sample_rate = torchaudio.load('audio_file.wav')

# 计算 MFCC
mfcc_transform = T.MFCC(
    sample_rate=sample_rate,
    n_mfcc=12,
    melkwargs={
    
    
        'n_fft': 400,
        'hop_length': 160,
        'center': False,
        'pad_mode': 'reflect',
        'power': 2.0,
        'norm': 'slaney',
        'onesided': True,
    }
)
mfcc = mfcc_transform(waveform)

print("MFCC Shape:", mfcc.shape)

11.2.3 Usage scenarios

torchaudio is mainly used in the following aspects:

Speech Recognition
Music generation and classification
Audio event detection
Audio signal enhancement

11.3 torchtext

torchtext is a text processing library used with PyTorch. It provides a set of tools for natural language processing (NLP) tasks to facilitate data loading, text preprocessing, vocabulary creation, etc. torchtext is designed to easily handle various text datasets and integrates seamlessly with other modules in PyTorch (such as nn.Module ).

11.3.1 Main functions

Data loading
torchtext supports loading data from various sources (such as CSV files, JSON files, text files, etc.).

from torchtext.data import TabularDataset

train_data, test_data = TabularDataset.splits(
    path='./data', train='train.csv', test='test.csv', format='csv',
    fields=[('text', TEXT), ('label', LABEL)]
)

Text preprocessing
It also includes a series of text preprocessing tools, such as word segmentation, stemming, digitization, etc.

from torchtext.data import Field

TEXT = Field(tokenize=custom_tokenize, lower=True)

Vocabulary management
torchtext allows you to easily create and manage vocabularies and integrates seamlessly with PyTorch tensors.

TEXT.build_vocab(train_data, vectors="glove.6B.100d")

Batch processing and data iteration
torchtext provides flexible batch processing and data iterator options.

from torchtext.data import Iterator, BucketIterator

train_iter, test_iter = BucketIterator.splits(
    (train_data, test_data), batch_size=32, sort_key=lambda x: len(x.text)
)

Pre-trained word vector
torchtext also supports a variety of pre-trained word vectors, such as GloVe and FastText.

11.3.2 Example

Here is a simple example of using torchtext for data preprocessing:

from torchtext.data import Field, TabularDataset, BucketIterator

# 定义字段
TEXT = Field(sequential=True, tokenize='spacy', lower=True)
LABEL = Field(sequential=False, use_vocab=False)

# 创建数据集
fields = {
    
    'text': ('text', TEXT), 'label': ('label', LABEL)}
train_data, test_data = TabularDataset.splits(
    path='./data',
    train='train.json',
    test='test.json',
    format='json',
    fields=fields
)

# 构建词汇表
TEXT.build_vocab(train_data, max_size=10000, vectors='glove.6B.100d')

# 创建数据迭代器
train_iterator, test_iterator = BucketIterator.splits(
    (train_data, test_data),
    batch_size=32,
    device='cuda'
)

11.3.3 Usage scenarios

torchtext is mainly used in the following aspects:

Text Categorization
sequence annotation
Language model training
machine translation

Detailed explanation of Pytorch framework

Article directory

introduction

1. Installation and configuration

1.1 How to install PyTorch

1.2 Verify installation

2. Basic concepts

2.1 Tensors

2.1.1 Basic characteristics of tensors

2.1.2 Create tensor

2.1.3 Tensor operations

2.2 Automatic differentiation (Autograd)

2.2.1 Basic usage

2.2.2 Calculating the gradient

2.2.3 Stop tracking history

2.2.4 Custom gradient function

2.2.5 Other precautions

3. Computational graphs and automatic differentiation

3.1 Concept of computational graph

3.1.1 Nodes and edges

3.1.2 Forward propagation and back propagation

3.1.3 Dynamic and static calculation graphs

3.1.4 Computational graphs and automatic differentiation

3.1.5 Advantages and limitations

3.2 How to use Autograd

3.2.1 Tensor and Gradient Tracking

3.2.2 Perform tensor operations

3.2.3 Backpropagation

3.2.4 Stop gradient tracking

3.2.5 Calculating more complex gradients

3.3 Custom gradient

3.3.1 Example: Custom ReLU function

3.3.2 Using custom Autograd functions

4. Neural Network Basics

4.1 Module (nn.Module)

4.1.1 Basic structure

4.1.2 Simple example

4.1.3 Parameter management

4.1.4 Nested modules

4.1.5 Loss function and optimizer

4.1.6 GPU support

4.2 Activation function

4.2.1 ReLU（Rectified Linear Unit）

4.2.2 Sigmoid

4.2.3 Tanh（Hyperbolic Tangent）

4.2.4 Leaky ReLU

4.2.5 Softmax

4.2.6 Other activation functions

4.2.7 Using activation functions

4.3 nn.Sequential

4.3.1 Basic usage

4.3.2 Nested use

4.3.3 Using OrderedDict

4.3.4 Precautions

4.4 Loss function

4.4.1 Mean Squared Error Loss (MSE)

4.4.2 Cross Entropy Loss

4.4.3 Binary Cross Entropy Loss (Binary Cross Entropy Loss)

4.4.4 Hinge Loss

4.4.5 Custom loss function

4.4.6 Using loss functions

5. Optimization algorithm

5.1 Gradient Descent

5.1.1 Basic principles

5.1.2 Variants

5.1.3 Using gradient descent in PyTorch

5.1.4 Precautions

5.2 Other optimization algorithms (such as Adam, RMSProp, etc.)

5.2.1 Adam（Adaptive Moment Estimation）

5.2.2 RMSProp（Root Mean Square Propagation）

5.2.3 Dosage

5.2.4 Adadelta

5.2.5 FTRL（Follow The Regularized Leader）

5.2.6 Optimizer selection considerations

6. Data loading and preprocessing

6.1 Using DataSet

6.2 Using DataLoader

6.3 Custom data sets

6.3.1 Create a custom Dataset

6.3.2 Using custom Dataset