Pytorch - switching between multi-card GPU training and single-card GPU training

Some deep learning networks are trained on multiple cards in parallel by default. For some reasons, sometimes it is necessary to specify training on a single card. I recently encountered one, which is summarized as follows.

1. Multi-card training

1.1 Modify configuration file

Insert image description here

1.2 Modify the main training file

Insert image description here
Analysis of the code in the red box above:

if torch.cuda.is_available() and ngpu > 1:         # 当 torch.cuda.is_available() 为真且 ngpu > 1 时              
	model = nn.DataParallel(model, device_ids=list(range(ngpu)))

model = nn.DataParallel(model, device_ids=list(range(ngpu))):

This line of code creates a DataParallel wrapper for parallel processing of neural network models on multiple GPUs. DataParallel is a module in PyTorch that can split and send input data to different GPUs for processing, and then aggregate the results.

model: the neural network model to be parallelized.

device_ids=list(range(ngpu)): Specify the GPU to be used. Here, it uses all available GPUs, up to the specified ngpu.

1.3 Graphics card usage

Insert image description here

2. Single card training

2.1 Modify configuration file

Insert image description hereInsert image description here

2.2 Graphics card usage

After modification, start training and check the graphics card usage:

Insert image description here

3. Summary

The above is the operation process of switching between multi-card GPU training and single-card GPU training. I hope it can help you, thank you!

Guess you like

Origin blog.csdn.net/qq_40280673/article/details/134730561