PyTorch single-machine multi-card training

For single-machine multi-card training in PyTorch, you can use torch.nn.DataParallelor torch.nn.DistributedDataParallelto parallelize the model. Below I will introduce you to both methods:

  1. torch.nn.DataParalleltorch.nn.DataParallel is a simple and effective model parallelization tool provided by PyTorch. It can automatically distribute the forward and backward propagation operations of the model to multiple GPUs for calculation. The code example used  torch.nn.DataParallel is as follows:
  2. import torch
    import torch.nn as nn
    
    # 定义模型
    model = YourModel()
    
    # 将模型包装为 DataParallel
    model = nn.DataParallel(model)
    
    # 将模型移动到 GPU 上
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    
    # 定义损失函数和优化器
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    
    # 在每个训练迭代中,将数据移动到 GPU 上
    inputs, labels = inputs.to(device), labels.to(device)
    
    # 前向传播、计算损失、反向传播和优化
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    
  3. torch.nn.DistributedDataParalleltorch.nn.DistributedDataParallel is a tool for model parallelization in an environment that supports distributed training. It can distribute computations across multiple GPUs on multiple machines. The code example used  torch.nn.DistributedDataParallel is as follows:
    import torch
    import torch.nn as nn
    import torch.distributed as dist
    import torch.multiprocessing as mp
    
    # 定义模型
    model = YourModel()
    
    # 初始化分布式训练环境
    def init_process(rank, size, fn, backend='nccl'):
        """初始化进程"""
        dist.init_process_group(backend, rank=rank, world_size=size)
        fn(rank, size)
    
    # 定义训练函数
    def train(rank, size):
        # 将模型包装为 DistributedDataParallel
        model = nn.DistributedDataParallel(model)
    
        # 将模型移动到 GPU 上
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        model.to(device)
    
        # 定义损失函数和优化器
        criterion = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    
        # 在每个训练迭代中,将数据移动到 GPU 上
        inputs, labels = inputs.to(device), labels.to(device)
    
        # 前向传播、计算损失、反向传播和优化
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    
    # 启动分布式训练
    if __name__ == '__main__':
        # 初始化多进程训练
        mp.spawn(init_process, args=(size, train), nprocs=size, join=True)
    

    The above are two methods for single-machine multi-card training in PyTorch. You can choose the method that suits you according to your actual situation.

Guess you like

Origin blog.csdn.net/qq_36541069/article/details/132059211