The practice of building TensorFlow in a GPU environment and dividing a single GPU into logical partitions

1. Build the environment

1.1. View the GPU version

Before installing the GPU version of tensorflow, you can check your graphics card first.
Command: nvidia-smi

Or the lower right corner of the desktop, the NVIDIA icon, enter the NVIDIA control panel:

Click the system information in the lower left corner , view the version of NVCUDA64.dll in the component

1.2. Create a new virtual environment

Create a virtual environment named myTFGPU and Python version 3.7 . These two are specified according to your own environment, and you can also enter the site: tensorflow-GPU version Choose your own version
conda create -n myTFGPU python=3.7
to enter this virtual environment: activate myTFGPU

1.3. Install tensorflow with GPU version

It is much faster to install the command and bring the Douban mirror

pip install tensorflow-gpu==2.9.0 -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

After installation, let's take a look at our equipment:

import tensorflow as tf
tf.test.gpu_device_name()
#Created device /device:GPU:0 with 1318 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1
'/device:GPU:0'

You can see the model, size, and computing power of the GPU. These can be displayed correctly, indicating that the installation is successful. Let's take a look at the physical GPU and CPU of this machine

# 查看gpu和cpu的数量
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
cpus = tf.config.experimental.list_physical_devices(device_type='CPU')
print(gpus, cpus)
#[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

1.4, CUDA and cuDNN version correspondence

Among them, cudnn and cuda have been installed before, so they are omitted here. The installation command:

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0

Of course, for these two corresponding versions, you can use the command: conda search -f cudnn to view the corresponding versions to avoid version problems

(tensorflowGPU) C:\Users\Tony>conda search -f cudnn
Fetching package metadata .........
cudnn                        7.1.4                 cuda9.0_0  defaults
                             7.1.4                 cuda8.0_0  defaults
                             7.3.1                 cuda9.0_0  defaults
                             7.3.1                cuda10.0_0  defaults
                             7.6.0                cuda10.1_0  defaults
                             7.6.0                cuda10.0_0  defaults
                             7.6.0                 cuda9.0_0  defaults
                             7.6.4                cuda10.1_0  defaults
                             7.6.4                 cuda9.0_0  defaults
                             7.6.4                cuda10.0_0  defaults
                             7.6.5                 cuda9.0_0  defaults
                             7.6.5                cuda10.1_0  defaults
                             7.6.5                cuda10.0_0  defaults
                             7.6.5                cuda10.2_0  defaults
                             7.6.5                 cuda9.2_0  defaults
                             8.2.1                cuda11.3_0  defaults

1.5, CUDA and cuDNN download

CUDA and cuDNN are relatively large, it is recommended to download and install better:
CUDA Toolkit: https://developer.nvidia.com/cuda-toolkit-archive
cuDNN: https://developer.nvidia.com/rdp/cudnn-archive

After installation, copy the following files to the corresponding location:

Copy <installpath>\cuda\bin\cudnn*.dll to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin.
Copy <installpath>\cuda\include\cudnn*.h to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include.
Copy <installpath>\cuda\lib\x64\cudnn*.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\lib\x64.

If you encounter problems during installation, please refer to: https://blog.csdn.net/weixin_41896770/article/details/127444808

2. GPU logical partition

We know that the hard disk in Windows can be divided into multiple logical disks at will, such as C, D, E, F, etc. For the same reason, GPU can also do such settings.

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
#[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

For example, my computer only has one GPU, and sometimes I want to do some multi-GPU tests without extra GPUs, but the good point is that frameworks such as tensorflow support distributed, and also support logical division of GPUs.

#逻辑划分
tf.config.experimental.set_virtual_device_configuration(
     gpus[0],
     [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512),
      tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512),
      tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512)])
logical_gpus = tf.config.experimental.list_logical_devices('GPU')

#我们看下是否将GPU成功划分逻辑分区
print(logical_gpus)
#[LogicalDevice(name='/device:GPU:0', device_type='GPU'), LogicalDevice(name='/device:GPU:1', device_type='GPU'), LogicalDevice(name='/device:GPU:2', device_type='GPU')]

Print it out, we can see that one physical GPU of this machine is divided into three logical GPUs, and the names are /device:GPU:0 , /device:GPU:1 , /device:GPU:2

For some other practical methods, please refer to print(dir(tf.config)) :

tf.debugging.set_log_device_placement(True)#打印一些变量在哪些设备
gpus = tf.config.experimental.list_physical_devices('GPU')#物理的GPU列表
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')#设置GPU可见

There are many methods below, and those who are interested can understand their functions:

['LogicalDevice', 'LogicalDeviceConfiguration', 'PhysicalDevice', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_sys', 'experimental', 'experimental_connect_to_cluster', 'experimental_connect_to_host', 'experimental_functions_run_eagerly', 'experimental_run_functions_eagerly', 'functions_run_eagerly', 'get_logical_device_configuration', 'get_soft_device_placement', 'get_visible_devices', 'list_logical_devices', 'list_physical_devices', 'optimizer', 'run_functions_eagerly', 'set_logical_device_configuration', 'set_soft_device_placement', 'set_visible_devices', 'threading']

3. JupyterLab test

Since we are creating a new environment here, after activation in this virtual environment, we first install:
conda install -c conda-forge jupyterlab
and then enter the command: jupyter lab
so that you can write code in JupyterLab for testing

Let's actually test the GPU of this logical partition:

import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
     gpus[0],
     [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512),
      tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512),
      tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512)])
logical_gpus = tf.config.experimental.list_logical_devices('GPU')

c = []
for gpu in logical_gpus:
    print(gpu.name)
    with tf.device(gpu.name):
        a = tf.constant([[1,2,3],[4,5,6]])
        b = tf.constant([[7,8],[9,10],[11,12]])
        c.append(tf.matmul(a, b))
print(c)
#将三块GPU计算的值，通过CPU计算相加
with tf.device('/cpu:0'):
    matmul_sum = tf.add_n(c)

print(matmul_sum)

'''
/device:GPU:0
/device:GPU:1
/device:GPU:2
[<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[ 58,  64],
       [139, 154]])>, <tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[ 58,  64],
       [139, 154]])>, <tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[ 58,  64],
       [139, 154]])>]
tf.Tensor(
[[174 192]
 [417 462]], shape=(2, 2), dtype=int32)
'''

We can see that matrix multiplication calculations are performed on the three logical GPUs, and they can also be assigned to the CPU to calculate the sum. So we need to use a certain GPU or CPU, we specify it by using with tf.device('GPU or CPU name') . This is very practical for large models. Generally, the size of a GPU is limited, but the parameters of a large model are very large, so that different layers can be calculated separately using different GPUs.