pytorch分布式训练简单总结

这个是较为简单的,详细请移步另一篇文章:https://blog.csdn.net/qq_36276587/article/details/123913384

简单总结使用pytorch进行单机多卡的分布式训练,主要是一些关键API的使用,以及分布式训练流程,pytorch版本1.2.0可用

 初始化GPU通信方式(NCCL)
 

import torch.distributed as dist

torch.cuda.set_device(FLAGS.local_rank)
dist.init_process_group(backend='nccl')
device = torch.device("cuda", FLAGS.local_rank) #自己设置

 分布式的数据加载

train_sampler = torch.utils.data.distributed.DistributedSampler(traindataset)
train_loader = torch.utils.data.DataLoader(
        traindataset, batch_size=batchSize,
        sampler=train_sampler,
        num_workers=4, pin_memory=True,#drop_last=False,
        collate_fn=alignCollate(imgH=imgH, imgW=imgW, keep_ratio=FLAGS.keep_ratio))
#pytorch的DataLoader格式处理训练标签

分布式训练模型

#初始化后的模型使用分布式训练
model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)  ## 同步bn
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[FLAGS.local_rank],
                                                      output_device=FLAGS.local_rank)

启动训练

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train_distributed.py

猜你喜欢

转载自blog.csdn.net/qq_36276587/article/details/113124122