PyTorch 单机多卡训练

在 PyTorch 中进行单机多卡训练可以使用 torch.nn.DataParallel 或 torch.nn.DistributedDataParallel 进行模型的并行化。下面我将为您介绍这两种方法：

torch.nn.DataParallel: torch.nn.DataParallel 是 PyTorch 提供的一个简单而有效的模型并行化工具。它能够自动将模型的前向和反向传播操作分布到多个 GPU 上进行计算。使用 torch.nn.DataParallel 的代码示例如下：

import torch
import torch.nn as nn

# 定义模型
model = YourModel()

# 将模型包装为 DataParallel
model = nn.DataParallel(model)

# 将模型移动到 GPU 上
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# 在每个训练迭代中，将数据移动到 GPU 上
inputs, labels = inputs.to(device), labels.to(device)

# 前向传播、计算损失、反向传播和优化
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

torch.nn.DistributedDataParallel: torch.nn.DistributedDataParallel 是用于在支持分布式训练的环境下进行模型并行化的工具。它可以在多个机器上的多个 GPU 上分布式计算。使用 torch.nn.DistributedDataParallel 的代码示例如下：

import torch
import torch.nn as nn
import torch.distributed as dist
import torch.multiprocessing as mp

# 定义模型
model = YourModel()

# 初始化分布式训练环境
def init_process(rank, size, fn, backend='nccl'):
    """初始化进程"""
    dist.init_process_group(backend, rank=rank, world_size=size)
    fn(rank, size)

# 定义训练函数
def train(rank, size):
    # 将模型包装为 DistributedDataParallel
    model = nn.DistributedDataParallel(model)

    # 将模型移动到 GPU 上
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    # 定义损失函数和优化器
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

    # 在每个训练迭代中，将数据移动到 GPU 上
    inputs, labels = inputs.to(device), labels.to(device)

    # 前向传播、计算损失、反向传播和优化
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

# 启动分布式训练
if __name__ == '__main__':
    # 初始化多进程训练
    mp.spawn(init_process, args=(size, train), nprocs=size, join=True)

以上是在 PyTorch 中进行单机多卡训练的两种方法。您可以根据自己的实际情况选择适合您的方式。

PyTorch 单机多卡训练

猜你喜欢