background:
Recently I was using Hugging Face's transformers api to fine-tune the pre-trained large model. The machine has an 8-card GPU. When I called trainer.train(), I found that all 8 GPUs were used because there are other machines on it. For models run by people, when I was training, the GPU usage was close to 100%, which caused other people's models to respond very slowly because they used four cards from 4 to 7, so I needed to avoid After opening these four cards, how do I set my model to train on the specified cards?
Machine environment: NVIDIA A100-SXM
transformers version: 4.32.1
torch version: 2.0.1
Method 1【Failure】
Set the visibility of the GPU by importing os. The code is as follows:
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "2,3"
Specify the GPUs visible to torch by setting the environment variable CUDA_VISIBLE_DEVICES. After setting this, run the program and the program will throw the following error:
Traceback (most recent call last):
File "/data2/.env/lib/python3.9/site-packages/torch/cuda/__init__.py", line 260, in _lazy_init
queued_call()
File "/data2/.env/lib/python3.9/site-packages/torch/cuda/__init__.py", line 145, in _check_capability
capability = get_device_capability(d)
File "/data2/.env/lib/python3.9/site-packages/torch/cuda/__init__.py", line 381, in get_device_capability
prop = get_device_properties(device)
File "/data2/.env/lib/python3.9/site-packages/torch/cuda/__init__.py", line 399, in get_device_properties
return _get_device_properties(device) # type: ignore[name-defined]
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data2/.env/lib/python3.9/site-packages/torch/cuda/__init__.py", line 395, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/data2/.env/lib/python3.9/site-packages/torch/cuda/__init__.py", line 264, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch.
Method 2【Success】
Set the visibility of the GPU through export. The code is as follows:
export TRANSFORMERS_OFFLINE=1
export HF_DATASETS_OFFLINE=1
export CUDA_VISIBLE_DEVICES=1,2
export CUDA_DEVICE_ORDER=PCI_BUS_ID
nohup python data_train.py > log/log.txt 2>&1 &
Set CUDA_VISIBLE_DEVICES=1,2, which means that only GPU No. 1 and No. 2 are used
Use ps-ef | grep python to find the process number corresponding to my program. The process number is: 27378
Then use nvidia-smi to check GPU usage:
From the picture above, it is obvious that process 27378 uses cards No. 1 and No. 2.
Puzzled:
1. Why can’t the purpose be achieved by setting environment variables by importing os, but can using Linux’s export?