A brief summary of pytorch distributed training

This is relatively simple, please move to another article for details: https://blog.csdn.net/qq_36276587/article/details/123913384

A brief summary of using pytorch for single-machine multi-card distributed training, mainly the use of some key APIs, and the distributed training process, pytorch version 1.2.0 is available

 Initialize the GPU communication method (NCCL)
 

import torch.distributed as dist

torch.cuda.set_device(FLAGS.local_rank)
dist.init_process_group(backend='nccl')
device = torch.device("cuda", FLAGS.local_rank) #自己设置

 Distributed Data Loading

train_sampler = torch.utils.data.distributed.DistributedSampler(traindataset)
train_loader = torch.utils.data.DataLoader(
        traindataset, batch_size=batchSize,
        sampler=train_sampler,
        num_workers=4, pin_memory=True,#drop_last=False,
        collate_fn=alignCollate(imgH=imgH, imgW=imgW, keep_ratio=FLAGS.keep_ratio))
#pytorch的DataLoader格式处理训练标签

Distributed training model

#初始化后的模型使用分布式训练
model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)  ## 同步bn
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[FLAGS.local_rank],
                                                      output_device=FLAGS.local_rank)

start training

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train_distributed.py

Guess you like

Origin blog.csdn.net/qq_36276587/article/details/113124122