“实战深度学习”-缺陷:ran out of memory错误解决办法


问题描述

tensorflow报以下错误:Allocator (GPU_0_bfc) ran out of memory

详细错误代码:

Allocator (GPU_0_bfc) ran out of memory trying to allocate 200.00MiB (rounded to 209715200).  Current allocation summary follows.

Resource exhausted: OOM when allocating tensor with shape[51200,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc.

tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[51200,1024] and type


原因分析:

查看此时GPU的使用情况,如下图:
在这里插入图片描述
由上图可知,主要原因是显存不足


解决方案:

  1. 限制gpu的最大增长容量
import tensorflow as tf
#限制消耗固定大小的显存(程序不会超出限定的显存大小,若超出的报错)。
physical_gpus = tf.config.list_physical_devices("GPU")
tf.config.experimental.set_virtual_device_configuration(
    physical_gpus[0], #对0号gpu进行限制
    [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5000)] 
)
logical_gpus = tf.config.list_logical_devices("GPU")
  1. 设置较小的batch size

  2. 买更大的显卡

  3. 调整代码在Data这部分使用cpu

猜你喜欢

转载自blog.csdn.net/womengdoushizhongguo/article/details/128276564