Based on Pytorch, using GPU to run the model method and possible problem solving methods

Method and precautions of using GPU to run the model based on Pytorch

1. When performing model training and prediction based on pytorch deep learning, the data set is often relatively large, and the model may also be relatively complex, but if the direct training calls the CPU to run, the calculation speed is very slow, so the GPU is used for model training and Prediction is very necessary and can greatly improve experimental efficiency. If you have not configured the operating environment, bloggers can refer to the blogger's article below.

1. Click to open the article "Building Anaconda+Cudnn+Cuda+Pytorch+Pycharm Tools and Configuration Environment Complete and Simple Version Based on Windows for Deep Learning" 2. Click to open the article "View local or remote server GPU and usage methods based on Pytorch
"

2. The specific method is divided into two parts (model and data set).

  • First move the model model to the cuda device, that is, the GPU. Note: this large model can contain multiple sub-models, and the sub-models do not need to be repeatedly moved to the GPU.
model = Net() # 举例模型
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 
model.to(device) # 移动举例模型到cuda

or

model = Net() # 举例模型
device = torch.cuda.current_device() if args.cuda else torch.device('cpu')
model.to(device) # 移动举例模型到cuda
  • Move the data set (including the training set and test set and the included label data set) to the cuda device, that is, the GPU, and use the data set.cuda() form to complete.
drug_embeddings = drug_embeddings.cuda()
protein_embeddings = protein_embeddings.cuda()
effectives = effectives.cuda()

or

drug_embeddings = drug_embeddings.to(device)
protein_embeddings = protein_embeddings.to(device)
effectives = effectives.to(device)

3. Problems and methods

  • Question 1 : The difference between torch.FloatTensor and torch.cuda.FloatTensor
    insert image description here
Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

The question is translated into Chinese

输入类型(torch.FloatTensor)和权重类型(torch.cuda.FloatTensor)应该相同,或者输入应该是MKLDNN张量,权重是密集张量
  • Question 2 : Not on the same device when interaction is required during data calculation
    insert image description here
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

The question is translated into Chinese

预期所有张量都在同一设备上,但至少找到了两个设备,cuda:0 和 cpu !
  • The solution to problem 1 and problem 2 (the same) : first find out which line of data in the code has the problem according to the error prompt, and then solve it in a targeted manner. Take the problem of data XR as an example, there are two cases: the first case is If the data XR passes torch.FloatTensor(数据XR)the construction, then it torch.cuda.FloatTensor(数据XR)can be changed to; the second case is not to construct the data XR, but to transfer the data XR to another sub-model, then directly add cuda after the data XR, or Just " 数据XR.cuda()".
    insert image description here
    insert image description here

  • Problem 3 : CUDA transport memory is not enough to allocate during code running
    insert image description here

CUDA out of memory. Tried to allocate 490.00 MiB (GPU 0; 2.00 GiB total capacity; 954.66 MiB already allocated; 62.10 MiB free; 978.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The question is translated into Chinese

CUDA内存不足。尝试分配490.00 MiB(GPU 02.00 GiB总容量;954.66 MiB已分配;62.10 MiB可用;PyTorch总共保留978.00 MiB)如果保留内存>>已分配内存,请尝试设置max_split_size_mb以避免碎片。请参阅内存管理和PYTORCH_CUDA_ALLOC_CONF的文档
  • Solution to problem 3 : There are two solutions in the case of insufficient computing memory. The first method generally reduces the batch size of the data set, that is, the batch_size is reduced. For example, the original batch_size=256 can be reduced to batch_size=16; The second is to run the code on the server, that is, replace it with a better GPU to run it. If the same problem still occurs, it is best to use these two methods in combination.
    insert image description here
  • New issues will continue to be updated in the future, so stay tuned!

Guess you like

Origin blog.csdn.net/rothschild666/article/details/127446694