【tensorflow 学习】 gpu使用

由于tensorflow默认抢占服务器所有GPU显存，只允许一个小内存的程序也会占用所有GPU资源。下面提出使用GPU运行tensorflow的几点建议：

1.在运行之前先查看GPU的使用情况：

$ nvidia-smi # 查看GPU此时的使用情况

或者

$ nvidia-smi -l # 实时返回GPU使用情况

2.目前实验室服务器有0，1，2，3四个GPU，找到空闲的GPU号，可以使用环境变量CUDA_VISIBLE_DEVICES：

环境变量的定义格式：
CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen
CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible
CUDA_VISIBLE_DEVICES=”0,1” Same as above, quotation marks are optional
CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked

输入以下命令运行程序：

$ export CUDA_VISIBLE_DEVICES=0 # 假设此时 GPU 0 空闲

为了防止新开终端忘记export，比较保险的做法是每次运行tensorflow之前定义使用的GPU：

$ CUDA_VISIBLE_DEVICES=0 python mnist.py # 假设此时 GPU 0 空闲, mnist.py为你想运行的程序。

3.这样tensorflow此时只会在指定的GPU上运行，但是仍然会占用整个GPU的显存，不过不和其他人公用GPU时也不会有影响，下面介绍两种限定GPU占用的方法：

(1)在tensorflow中定义session时作如下设置，该设置会启用最少的GPU显存来运行程序。

config = tf.ConfigProto() 
config.gpu_options.allow_growth = True 
session = tf.Session(config=config)

(2)在tensorflow中定义session时作如下设置，该设置会强制程序只占用指定比例的GPU显存。

config = tf.ConfigProto() 
config.gpu_options.per_process_gpu_memory_fraction = 0.4 # 占用GPU40%的显存 
session = tf.Session(config=config)

注：
- 在tensorflow代码中with tf.device(‘/gpu:0’):只会指定在GPU 0上计算，但仍然会默认占用所有GPU资源。

4.查看使用的设备
用tf.ConfigProto(log_device_placement=True))

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

运行后能看到如下结果：

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/device:GPU:0
a: /job:localhost/replica:0/task:0/device:GPU:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

【tensorflow 学习】 gpu使用

猜你喜欢