keras about using multiple gpus

I recently participated in an image competition. Because it was the first time to use GPU to run deep learning and use the keras framework with tensorflow as the backend, I encountered a serious problem, which caused the pre-training time to double, almost It's a pity to be able to enter the rematch.

At the beginning, the teacher told me that when I was using the computer, it was all the sound of the CPU, and the GPU was not being used effectively. I didn’t care, because when I ran the program, it
write picture description here
was obvious that I had used the computer. only two GPUs. . .
Then running nvidia-smi shows that the two GPU memory is full, eh, no, why only one GPU is calculating and the other is 0% utilization? ? ?
It turned out that I was only using half of my computing resources for the entire game. . . Oh my god, I just realized that this is a trick to my teammates!

Also, by default, keras fills up all video memory. At this time, other processes cannot use GPU resources. Even if keras can't actually be used up, you need to set the size of video memory used by keras at this time.

keras set video memory proportional/on-demand allocation

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.5 #比例
#or
config.gpu_options.allow_growth = True #按需
set_session(tf.Session(config=config))

Specify the GPU to use

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

or specify at command line run time

CUDA_VISIBLE_DEVICES=0,1 python keras_demo.py

Think multiple GPUs are used by keras when multiple GPUs are specified? Too naive
The following is the solution of keras using multiple GPUs, the
article is very well written, I am a porter, the original link:
https://www.jianshu.com/p/db0ba022936f

References.
Official documentation: multi_gpu_model
and Google

  1. Misunderstanding
    At present , Keras supports multiple GPUs to train the network at the same time, which is very easy, but the following code is not enough.

When you monitor the usage of the GPU (nvidia-smi -l 1), you will find that although the GPU is not idle, only one GPU is actually running, and the others are idle and occupied, that is, if your computer has With multiple graphics cards, Keras will occupy all detected GPUs by default, with or without the above code. This line of code is used when you only need one GPU, which means that Keras can't detect other GPUs in the computer. Suppose you have a total of three graphics cards, and each graphics card has its own label (0, 1, 2). In order not to affect the use of others, you only use one of them, such as the one with gpu=1, then

os.environ["CUDA_VISIBLE_DEVICES"] = "1"

Then monitor the GPU usage (nvidia-smi -l 1), indeed only one is occupied, and the others are idle. So this is a misunderstanding of Keras using multiple graphics cards, it cannot utilize multiple GPUs at the same time.

  1. Purpose
    Why use multiple GPUs for training at the same time?

The memory of a single graphics card is too small -> the batch size cannot be set relatively large, and sometimes even batch_size=1 overflows the memory (OUT OF MEMORY)

From my experience of running deep networks, it would be better to set batch_size larger, which is equivalent to updating the weights by backpropagation each time, and the network can see more samples, so that it will not be overfitted for each iteration. Go to different places to Don't Decay the Learning Rate, Increase the Batch Size. Of course, I've also seen some papers saying that it can't be set too large, and the reason is unknown... Anyway, I didn't have a chance to try it. The batch_size I recommend is probably in the range of 64~256, which is not a big problem.

However, as the depth of the network is getting deeper and deeper, the memory requirements for the GPU are also increasing. The biggest problem for many newcomers is often not the code, but the code copied from Github. The GPU is too scumbag. No, you can only reduce the batch_size, and finally the training will not have that effect.

There are two solutions:
one is to buy a super powerful GPU with huge memory;
the other is to buy multiple general GPUs and use them together.

The first solution does not work, because even the best NVIDIA graphics card currently has only a dozen or so gigabytes of memory, and the network hangs even when it is deep, and the price/performance ratio of buying a powerful graphics card is not high. Therefore, learning to use multiple GPUs under Keras is a more reliable choice.

  1. Implementation
    is very simple
from model import unet
G = 3 # 同时使用3个GPU
with tf.device("/gpu:0"):
        M = unet(input_rows, input_cols, 1)
model = keras.utils.training_utils.multi_gpu_model(M, gpus=G)
model.compile(optimizer=Adam(lr=1e-5), loss='binary_crossentropy', metrics = ['accuracy'])

model.fit(X_train, y_train,
              batch_size=batch_size*G, epochs=nb_epoch, verbose=0, shuffle=True,
              validation_data=(X_valid, y_valid))

model.save_weights('/path/to/save/model.h5')
  1. Question
    3.1 Compile the model
    If it is an ordinary network structure, then there is no problem, just like the above compiled code (model.compile(optimizer=Adam(lr=1e-5), loss='binary_crossentropy', metrics = ['accuracy '])) . However, if it is a Multi-task network, such as Faster-RCNN, it consists of multiple output branches, that is, multiple losses, which are usually named when the network is defined, and then the names of different branch layers are found when compiling. Just like this:
model.compile(optimizer=optimizer, 
              loss={'main_output': jaccard_distance_loss, 'aux_output': 'binary_crossentropy'},
              metrics={'main_output': jaccard_distance_loss, 'aux_output': 'acc'},
              loss_weights={'main_output': 1., 'aux_output': 0.5})

Among them, main_output and aux_output are the defined layer names, but if keras.utils.training_utils.multi_gpu_model() is used, the names will be automatically replaced and become the default concatenate_1, concatenate_2, etc., so you need to model.summary first (), print out the network structure, then figure out which output represents which branch, and then recompile the network, as follows:

from keras.optimizers import Adam, RMSprop, SGD
model.compile(optimizer=RMSprop(lr=0.045, rho=0.9, epsilon=1.0), 
              loss={'concatenate_1': jaccard_distance_loss, 'concatenate_2': 'binary_crossentropy'},
              metrics={'concatenate_1': jaccard_distance_loss, 'concatenate_2': 'acc'},
              loss_weights={'concatenate_1': 1., 'concatenate_2': 0.5})

3.2 Save the model
The model trained with multiple GPUs has a problem that Keras has not solved, that is, an error is reported when model.save() is saved

TypeError: can’t pickle module objects

or

RuntimeError: Unable to create attribute (object header message is too large)

原因是
In https://keras.io/utils/#multi_gpu_model it clearly stated that the model can be used like the normal model, but it cannot be saved, very funny. I can’t even perform reinforced training just because I cannot save the previous model trained with multiple GPUs. If trained with single GPU, the rest of my invested GPUs will become useless. Please urge the developer to look into this bug ASAP.

Under normal circumstances, Keras provides you with a function to automatically save the best network (keras.callbacks.ModelCheckpoint()), which is internally saved with model.save(), so it cannot be used, you need to design your own function CustomModelCheckpoint() to save the best model


class CustomModelCheckpoint(keras.callbacks.Callback):

    def __init__(self, model, path):
        self.model = model
        self.path = path
        self.best_loss = np.inf

    def on_epoch_end(self, epoch, logs=None):
        val_loss = logs['val_loss']
        if val_loss < self.best_loss:
            print("\nValidation loss decreased from {} to {}, saving model".format(self.best_loss, val_loss))
            self.model.save_weights(self.path, overwrite=True)
            self.best_loss = val_loss

model.fit(X_train, y_train,
              batch_size=batch_size*G, epochs=nb_epoch, verbose=0, shuffle=True,
              validation_data=(X_valid, y_valid),
              callbacks=[CustomModelCheckpoint(model, '/path/to/save/model.h5')])

Even so, if the model is still too large, the following method is required, saving it in npy format instead of hdf5 format.

RuntimeError: Unable to create attribute (Object header message is too large)

model.get_weights(): returns a list of all weight tensors in the model, as Numpy arrays.
model.set_weights(weights): sets the values of the weights of the model, from a list of Numpy arrays. The arrays in the list should have the same shape as those returned by get_weights().

save model

weight = self.model.get_weights()
np.save(self.path+'.npy', weight)

load model

weight = np.load(load_path)
model.set_weights(weight)

3.3 Load the model The
same is true. When reading the network file .h trained with multiple graphics cards, an error will also be reported.

ValueError: You are trying to load a weight file containing 3 layers into a model with 1 layers.

The reason is that the internal storage of .h is different from that of a single GPU training, so you also need to set the function keras.utils.training_utils.multi_gpu_model() when reading.

from model import unet
with tf.device("/cpu:0"):
    M = unet(input_rows, input_cols, 1)
model = keras.utils.training_utils.multi_gpu_model(M, gpus=G)
model.load_weights(load_path)

Then it's fine.

Author: MrGiovanni
Link: https://www.jianshu.com/p/db0ba022936f
Source: Jianshu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

Of course there is another problem, that is, when you use multi-model prediction, you can't comment out

with tf.device("/cpu:0"):
    M = unet(input_rows, input_cols, 1)

In addition, you will find that the prediction of the multi-GPU model will be much slower than normal,
wondering why?

Actually I want to know too! Who knows if it's okay to tell me.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324667244&siteId=291194637