Article directory
foreword
Based on docker, use two GPUs to train a custom model (Keras subclass).
Mirrored distributed strategy MirroredStrategy
There are many distributed strategies, only one of them is introduced here, which is convenient to get started quickly. Practice has proved that one machine with multiple cards is feasible
. For other distributed strategies, please refer to: https://blog.csdn.net/u010099177/article/details/106074932
tf.distribute.MirroredStrategy
Supports synchronous distributed training on a single machine with multiple GPUs . It creates a copy on each GPU device. Every variable in the model will be mirrored across all copies . These variables together form a conceptual variable called MirroredVariable. These variables are kept in sync with each other by applying the same updates .
Efficient reduction algorithms are used to pass variable updates between devices. Full reduction aggregates tensors on different devices and makes them available on all devices. This is a fusion algorithm that is very efficient and can greatly reduce the overhead of synchronization. Depending on the type of communication available between devices, there are many reduction algorithms and implementations available, with NVIDIA NCCL being used by default. You can choose from other options we provide or write your own.
This is MirroredStrategy
the easiest :
strategy = tf.distribute.MirroredStrategy()
This will create an MirroredStrategy
instance that will use all GPUs visible to TensorFlow, using NCCL for cross-device communication.
If you only want to use some of the GPUs on your computer, you can do this:
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])
We have tf.distribute.Strategy
integrated into tf.keras
. tf.keras
is a high-level API for building and training models. By integrating into the tf.keras
backend , programs written with the Keras training framework can seamlessly perform distributed training.
You need to make the following changes in your code:
- Create an
tf.distribute.Strategy
instance - Moved the Keras model creation and compilation process
strategy.scope
to - Supports various types of Keras models: sequential, functional, and subclassed
Here is a very simple Keras model example:
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
model.compile(loss='mse', optimizer='sgd')
Just put your model creation part and compile part into strategy.scope()
it
implementation code
The key code is as follows, using 0 card and 1 card:
callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=1e-8, patience=0, verbose=2)]
opt = optimizers.SGD(learning_rate=0.001, )
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])
with strategy.scope():
# 创建自定义model
model = FCINN(dense_feature_columns, sparse_feature_columns, len(dense_features), hidden_units=(512, 256, 128), activation='relu', dropout=(0.3, 0.2, 0.2),
k_vector=8, w_reg=0.01, v_reg=0.01, mode='inner',
filters=[16, 18, 22, 24], kernel_with=[7, 7, 7, 7], dnn_maps=[3, 3, 3, 3], pooling_width=[2, 2, 2, 2]
)
model.compile(
optimizer=opt,
loss='binary_crossentropy',
metrics=['AUC', 'Precision', 'Recall', 'accuracy']
)
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=2,
verbose=2,
callbacks=callbacks,
)
docker run -d --gpus '"device=0,1"' \
--rm -it --name ctr_tf_tmp \
-v /data/wangguisen/ctr_note/new_thought:/ad_ctr/new_thought \
-v /data/wangguisen/ctr_note/data:/ad_ctr/data \
ad_ctr:3.0 \
sh -c 'python3 -u /ad_ctr/new_thought/moreGPU.py 1>>/ad_ctr/new_thought/log/moreGPU.log 2>>/ad_ctr/new_thought/log/moreGPU.err'
run successfully
Looking at the usage rate and memory usage, it means that one machine with dual cards is running successfully.
One machine and one card:
One machine with multiple cards:
refer to:
Docker specifies the use of certain graphics cards:
https://blog.csdn.net/qq_21768483/article/details/115204043
tf2 Dataset use:
https://blog.csdn.net/u012513618/article/details/109671774
Distributed training with TensorFlow 2.0:
https://blog.csdn.net/u010099177/article/details/106074932