Answer to the question: Your GPU should preferably be fully and continuously assigned. Otherwise your rank may not be equal to the GPU_id you loaded.
Problem solved: use os.environ[“CUDA_VISIBLE_DEVICES”] = “0, 3” to specify the GPU, and use them all when there are multiple GPUs.
import os, sys
os.environ["CUDA_VISIBLE_DEVICES"] = "0, 3"