How to deal with Pytorch GPU memory overflow Out of Memory

Without modifying the network structure, the following operations are performed:

  1. Agree with @Jiaming , use inplace operation as much as possible, such as relu can use inplace=True. A simple usage method is as follows:

 
def inplace_relu(m): 
    classname = m.__class__.__name__
    if classname.find('ReLU') != -1: 
        m.inplace=True #model.apply(inplace_relu)

2. Further, for example, ResNet and DenseNet can pack batchnorm and relu into inplace, and then recalculate at bp. Using the new checkpoint feature of pytorch, there are the following two codes. It will be slower due to the need to recalculate the result after bn.

 

3. Delete the loss at the end of each loop, which can save a little video memory, but it is better than nothing. See the following issue

Tensor to Variable and memory freeing best practices

 

4. Use float16 precision mixed calculation. I have used @NVIDIA 普维达apex, it works well and can save nearly 50% of video memory, but be careful of some unsafe operations such as mean and sum, overflowing fp16.

NVIDIA/apex

 

Supplement: Recently, I also tried to add fp16 training to my CVPR19 GAN model, which can reduce the video memory requirement from 15G to about 10G, so that most common graphics cards such as 1080Ti can be trained. Welcome everyone to star https://github.com/NVlabs/DG-Net

5. For forwards that do not require bp, such as validation, please use torch.no_grad. Note that model.eval() is not equal to torch.no_grad(). Please see the following discussion.

'model.eval()' vs 'with torch.no_grad()'

 

6. torch.cuda.empty_cache() This is an advanced version of del. Using nvidia-smi will find that the video memory has obvious changes. However, the maximum video memory usage during training seems to have remained unchanged. You can try it.

How can we release GPU memory cache?

 

In addition, the show operations that will affect the accuracy include:

Divide a batchsize=64 into two batches of 32, and after two forwards, backward once. But it will affect batchnorm and other layers related to batchsize.

 

Related links: Methods written by foreigners to improve the efficiency of pytorch, including data prefetch, etc.

Optimizing PyTorch training code

 

Finally, thank you for reading~ Welcome to follow, share and like~ You can also check some of my other articles

Zheng Zhedong: [New UAV Dataset] From pedestrian re-identification to UAV target positioning

Zheng Zhedong: Use Uncertainty to correct pseudo-labels in Domain Adaptation

Zheng Zhedong: Using CNN to classify 100,000 categories of images

Zheng Zhedong: Interpretation of NVIDIA/University of Technology Sydney/Australian National University's new work: Using GAN to generate high-quality pedestrian images to assist pedestrian re-identification

Guess you like

Origin blog.csdn.net/Layumi1993/article/details/106218563