TensorFlow 2.0多卡gpu训练

环境

TensorFlow 2.0
python3.6

代码位置

https://github.com/lilihongjava/leeblog_python/tree/master/TensorFlow_GPU

模型代码说明

通过最简单的线性回归例子，实现TensorFlow多卡gpu例子

def model_train(x_data, y_data):
    layer0 = tf.keras.layers.Dense(1, input_shape=(x_data.shape[1],))
    model = tf.keras.Sequential([layer0])
    model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(x_data, y_data, epochs=100, verbose=False)
    return model

tf.keras.layers.Dense(1, input_shape=(x_data.shape[1],))
只有一层神经单元，所以此层就是输出层，1就相当于是输出神经元个数就是1个，如果是分类问题，有几类就代表输出神经元的个数是几，后面的input_shape，就是指输入数据的维度。

编译模型：
optimizer=‘adam’，优化器：梯度下降法优化
loss=‘mse’, 损失函数：使用均方差判断误差

gpu多卡利用代码说明

gpu为true开启多卡gpu支持，官网地址https://www.tensorflow.org/guide/gpu

if gpu:
    tf.debugging.set_log_device_placement(True)
    # 多卡gpu支持，维度必须是gpu卡的倍数
    gpu_len = len(tf.config.experimental.list_physical_devices('GPU'))
    print("gpu_len:" + str(gpu_len))
    dataset = tf.data.Dataset.from_tensor_slices((x_data.values, y_data.values))
    strategy = tf.distribute.MirroredStrategy()
    BATCH_SIZE_PER_REPLICA = 64
    BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
    print("x_data shape:" + str(x_data.shape))
    # tf1.14.0版本 维度必须是gpu卡的倍数 if x_data.shape[1] % gpu_len == 0 and x_data.shape[0] % gpu_len == 0:
    print("执行多卡gpu")
    with strategy.scope():
        layer0 = tf.keras.layers.Dense(1, input_shape=(x_data.shape[1],))
        model = tf.keras.Sequential([layer0])
        model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(dataset.batch(BATCH_SIZE), verbose=False)

GPU

watch -n 0.1 -d nvidia-smi 每隔0.1秒刷新一次
在这里插入图片描述

Dockerfile

FROM tensorflow/tensorflow:2.0.0-gpu-py3

WORKDIR /app

RUN pip install --upgrade setuptools \
numpy \
matplotlib \
xgboost \
pandas \
scikit-learn \
wheel \
flask -i https://pypi.douban.com/simple

docker run

请通过 docker -v 检查 Docker 版本。对于 19.03 之前的版本，您需要使用 nvidia-docker2 和
–runtime=nvidia 标记；对于 19.03 及之后的版本，您将需要使用 nvidia-container-toolkit 软件包和 --gpus all 标记。这两个选项都记录在上面链接的网页上。

nvidia-docker run -it -d    -v /root/lee/TF/:/app 10.1.8.XX:80/xxx/python_slim:1.4 --runtime=nvidia /bin/bash