For single-machine multi-card training in PyTorch, you can use torch.nn.DataParallel
or torch.nn.DistributedDataParallel
to parallelize the model. Below I will introduce you to both methods:
torch.nn.DataParallel
:torch.nn.DataParallel
is a simple and effective model parallelization tool provided by PyTorch. It can automatically distribute the forward and backward propagation operations of the model to multiple GPUs for calculation. The code example usedtorch.nn.DataParallel
is as follows:-
import torch import torch.nn as nn # 定义模型 model = YourModel() # 将模型包装为 DataParallel model = nn.DataParallel(model) # 将模型移动到 GPU 上 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) # 定义损失函数和优化器 criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # 在每个训练迭代中,将数据移动到 GPU 上 inputs, labels = inputs.to(device), labels.to(device) # 前向传播、计算损失、反向传播和优化 outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step()
torch.nn.DistributedDataParallel
:torch.nn.DistributedDataParallel
is a tool for model parallelization in an environment that supports distributed training. It can distribute computations across multiple GPUs on multiple machines. The code example usedtorch.nn.DistributedDataParallel
is as follows:import torch import torch.nn as nn import torch.distributed as dist import torch.multiprocessing as mp # 定义模型 model = YourModel() # 初始化分布式训练环境 def init_process(rank, size, fn, backend='nccl'): """初始化进程""" dist.init_process_group(backend, rank=rank, world_size=size) fn(rank, size) # 定义训练函数 def train(rank, size): # 将模型包装为 DistributedDataParallel model = nn.DistributedDataParallel(model) # 将模型移动到 GPU 上 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) # 定义损失函数和优化器 criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # 在每个训练迭代中,将数据移动到 GPU 上 inputs, labels = inputs.to(device), labels.to(device) # 前向传播、计算损失、反向传播和优化 outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() # 启动分布式训练 if __name__ == '__main__': # 初始化多进程训练 mp.spawn(init_process, args=(size, train), nprocs=size, join=True)
The above are two methods for single-machine multi-card training in PyTorch. You can choose the method that suits you according to your actual situation.