问题描述
tensorflow报以下错误:Allocator (GPU_0_bfc) ran out of memory
详细错误代码:
Allocator (GPU_0_bfc) ran out of memory trying to allocate 200.00MiB (rounded to 209715200). Current allocation summary follows.
Resource exhausted: OOM when allocating tensor with shape[51200,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc.
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[51200,1024] and type
原因分析:
查看此时GPU的使用情况,如下图:
由上图可知,主要原因是显存不足
解决方案:
- 限制gpu的最大增长容量
import tensorflow as tf
#限制消耗固定大小的显存(程序不会超出限定的显存大小,若超出的报错)。
physical_gpus = tf.config.list_physical_devices("GPU")
tf.config.experimental.set_virtual_device_configuration(
physical_gpus[0], #对0号gpu进行限制
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5000)]
)
logical_gpus = tf.config.list_logical_devices("GPU")
-
设置较小的batch size
-
买更大的显卡
-
调整代码在Data这部分使用cpu