Some deep learning networks are trained on multiple cards in parallel by default. For some reasons, sometimes it is necessary to specify training on a single card. I recently encountered one, which is summarized as follows.
Table of contents
1. Multi-card training
1.1 Modify configuration file
1.2 Modify the main training file
Analysis of the code in the red box above:
if torch.cuda.is_available() and ngpu > 1: # 当 torch.cuda.is_available() 为真且 ngpu > 1 时
model = nn.DataParallel(model, device_ids=list(range(ngpu)))
model = nn.DataParallel(model, device_ids=list(range(ngpu))):
This line of code creates a DataParallel wrapper for parallel processing of neural network models on multiple GPUs. DataParallel is a module in PyTorch that can split and send input data to different GPUs for processing, and then aggregate the results.
model: the neural network model to be parallelized.
device_ids=list(range(ngpu)): Specify the GPU to be used. Here, it uses all available GPUs, up to the specified ngpu.
1.3 Graphics card usage
2. Single card training
2.1 Modify configuration file
2.2 Graphics card usage
After modification, start training and check the graphics card usage:
3. Summary
The above is the operation process of switching between multi-card GPU training and single-card GPU training. I hope it can help you, thank you!