PyTorch uses NVLink for model training

PyTorch uses NVLink for data transfer and computation acceleration. The following is a sample code for dual card training using NVLink:

import torch

devices = [0, 1]  # 指定要使用的显卡编号列表

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if device.type == "cuda":
    torch.cuda.set_device(devices)  # 指定使用的显卡编号列表

    torch.cuda.set_enabled_lms(True)  # 启用 Low Memory Usage 模式,优化 NVLink 性能

# 在模型和数据处理之前进行其他设置,如模型初始化、数据预处理等

model = ...

# 将模型移动到指定的设备中
model = model.to(device)

# 在数据加载之前,将数据集划分为子数据集,每个子数据集对应一块显卡
dataset_per_gpu = ...

# 创建数据加载器,将每个子数据集与相应的显卡对应起来
data_loader_per_gpu = [
    torch.utils.data.DataLoader(dataset, batch_size=..., shuffle=...)
    for dataset in dataset_per_gpu
]

# 在训练循环中,分配数据和模型到各个显卡上,并进行并行训练
for epoch in range(num_epochs):
    for data_loader in data_loader_per_gpu:
        for batch in data_loader:
            # 数据分发到各个显卡上(可根据需要进行数据预处理)
            inputs = batch['inputs'].to(device)
            labels = batch['labels'].to(device)

            # 模型在各个显卡上进行前向传播和反向传播
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()

            # 利用 NVLink 进行梯度同步,确保两个显卡上的模型参数保持一致
            torch.cuda.sync()

            # 参数更新
            optimizer.step()
            optimizer.zero_grad()

# 训练结束后,根据需要进行模型保存、验证等操作

In the example code, first specify the list of graphics card numbers to use devices. Then, when setting up the device, torch.cuda.set_enabled_lms(True)enable the Low Memory Usage mode, which optimizes NVLink performance.

Next, move the model to the designated device, and divide the dataset into multiple sub-datasets according to the number of graphics cards. Then, use a different data loader to map each sub-dataset to the corresponding graphics card.

In the training loop, for each sub-dataset and graphics card, the data is distributed to the specified graphics card, and forward propagation, back propagation and gradient synchronization operations are performed. Finally, perform operations such as parameter update, model saving, and verification as required.

In this way, you can use two graphics cards connected by NVLink to train simultaneously and get accelerated performance. Please note, make sure your system has enough video memory to run the model and data on two graphics cards at the same time, and make appropriate adjustments according to the actual situation.

Guess you like

Origin blog.csdn.net/qq_36541069/article/details/132017280
Recommended