The training process of deep learning is often very time-consuming. Training a model for several hours is commonplace. Training for a few days is also a common thing. Sometimes it takes even tens of days to train.
The time-consuming training process mainly comes from two parts, one part comes from data preparation and the other part comes from parameter iteration.
When the data preparation process is still the main bottleneck of the model training time, we can use more processes to prepare the data.
When the parameter iteration process becomes the main bottleneck of training time, our usual method is to use GPU or Google's TPU to accelerate.
See "Accelerating Keras Model with GPU-Colab Free GPU Usage Guide" for details
https://zhuanlan.zhihu.com/p/68509398
Whether it is a built-in fit method or a custom training loop, it is very convenient to switch from the CPU to a single GPU training model without changing any code. When a GPU is available, if you do not specify a device, tensorflow will automatically prefer to use the GPU to create tensors and perform tensor calculations.
However, if it is a server environment in a company or school laboratory, when there are multiple GPUs and multiple users, in order not to allow a single classmate's task to occupy all GPU resources, other classmates cannot use it (tensorflow defaults to obtain all memory resources of all GPUs) Permissions, but in fact only use part of the resources of one GPU), we usually add the following lines of code at the beginning to control the GPU number and memory size used by each task, so that other students can also train the model at the same time.
In the Colab notebook: modify-> notebook settings-> select GPU in the hardware accelerator
Note: The following code can only be executed correctly on Colab.
You can test the effect "tf_single GPU" through the following colab link:
https://colab.research.google.com/drive/1r5dLoeJq5z01sU72BX2M5UiNSkuxsEFe
%tensorflow_version 2.x import tensorflow as tf print(tf.__version__) from tensorflow.keras import * # 打印时间分割线 @tf.function def printbar(): ts = tf.timestamp() today_ts = ts%(24*60*60) hour = tf.cast(today_ts//3600+8,tf.int32)%tf.constant(24) minite = tf.cast((today_ts%3600)//60,tf.int32) second = tf.cast(tf.floor(today_ts%60),tf.int32) def timeformat(m): if tf.strings.length(tf.strings.format("{}",m))==1: return(tf.strings.format("0{}",m)) else: return(tf.strings.format("{}",m)) timestring = tf.strings.join([timeformat(hour),timeformat(minite), timeformat(second)],separator = ":") tf.print("=========="*8,end = "") tf.print(timestring)
2.2.0-rc2
One, GPU settings
= tf.config.list_physical_devices GPUs ( " GPU " ) IF GPUs: GPUO = GPUs [0] # If there are multiple GPU, only the 0th GPU tf.config.experimental.set_memory_growth (GPUO, True) # Set GPU memory Use the amount as needed # or you can set the GPU memory to a fixed amount (for example: 4G) # tf.config.experimental.set_virtual_device_configuration (gpu0, # [tf.config.experimental.VirtualDeviceConfiguration (memory_limit = 4096)]) tf.config .set_visible_devices ([GPUO], " GPU " ) # comparative calculation speed of the CPU and GPU printbar () with tf.device ( " / GPU: 0 "): tf.random.set_seed(0) a = tf.random.uniform((10000,100),minval = 0,maxval = 3.0) b = tf.random.uniform((100,100000),minval = 0,maxval = 3.0) c = a@b tf.print(tf.reduce_sum(tf.reduce_sum(c,axis = 0),axis=0)) printbar() printbar() with tf.device("/cpu:0"): tf.random.set_seed(0) a = tf.random.uniform((10000,100),minval = 0,maxval = 3.0) b = tf.random.uniform((100,100000),minval = 0,maxval = 3.0) c = a@b tf.print(tf.reduce_sum(tf.reduce_sum(c,axis = 0),axis=0)) printbar()
================================================== ============================== 11:59:21 2.24953778e + 11 =========== ================================================== =================== 11:59:23 ========================== ================================================== ==== 11:59:23 2.24953795e + 11 ===================================== =========================================== 11:59:29
Second, prepare the data
MAX_LEN = 300 BATCH_SIZE = 32 (x_train,y_train),(x_test,y_test) = datasets.reuters.load_data() x_train = preprocessing.sequence.pad_sequences(x_train,maxlen=MAX_LEN) x_test = preprocessing.sequence.pad_sequences(x_test,maxlen=MAX_LEN) MAX_WORDS = x_train.max()+1 CAT_NUM = y_train.max()+1 ds_train = tf.data.Dataset.from_tensor_slices((x_train,y_train)) \ .shuffle(buffer_size = 1000).batch(BATCH_SIZE) \ .prefetch(tf.data.experimental.AUTOTUNE).cache() ds_test = tf.data.Dataset.from_tensor_slices((x_test,y_test)) \ .shuffle(buffer_size = 1000).batch(BATCH_SIZE) \ .prefetch(tf.data.experimental.AUTOTUNE).cache()
Three, define the model
tf.keras.backend.clear_session() def create_model(): model = models.Sequential() model.add(layers.Embedding(MAX_WORDS,7,input_length=MAX_LEN)) model.add(layers.Conv1D(filters = 64,kernel_size = 5,activation = "relu")) model.add(layers.MaxPool1D(2)) model.add(layers.Conv1D(filters = 32,kernel_size = 3,activation = "relu")) model.add(layers.MaxPool1D(2)) model.add(layers.Flatten()) model.add(layers.Dense(CAT_NUM,activation = "softmax")) return(model) model = create_model() model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding (Embedding) (None, 300, 7) 216874 _________________________________________________________________ conv1d (Conv1D) (None, 296, 64) 2304 _________________________________________________________________ max_pooling1d (MaxPooling1D) (None, 148, 64) 0 _________________________________________________________________ conv1d_1 (Conv1D) (None, 146, 32) 6176 _________________________________________________________________ max_pooling1d_1 (MaxPooling1 (None, 73, 32) 0 _________________________________________________________________ flatten (Flatten) (None, 2336) 0 _________________________________________________________________ dense (Dense) (None, 46) 107502 ================================================================= Total params: 332,856 Trainable params: 332,856 Non-trainable params: 0 _________________________________________________________________
Fourth, the training model
optimizer = optimizers.Nadam() loss_func = losses.SparseCategoricalCrossentropy() train_loss = metrics.Mean(name='train_loss') train_metric = metrics.SparseCategoricalAccuracy(name='train_accuracy') valid_loss = metrics.Mean(name='valid_loss') valid_metric = metrics.SparseCategoricalAccuracy(name='valid_accuracy') @tf.function def train_step(model, features, labels): with tf.GradientTape() as tape: predictions = model(features,training = True) loss = loss_func(labels, predictions) gradients = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(gradients, model.trainable_variables)) train_loss.update_state(loss) train_metric.update_state(labels, predictions) @tf.function def valid_step(model, features, labels): predictions = model(features) batch_loss = loss_func(labels, predictions) valid_loss.update_state(batch_loss) valid_metric.update_state(labels, predictions) def train_model(model,ds_train,ds_valid,epochs): for epoch in tf.range(1,epochs+1): for features, labels in ds_train: train_step(model,features,labels) for features, labels in ds_valid: valid_step(model,features,labels) logs = 'Epoch={},Loss:{},Accuracy:{},Valid Loss:{},Valid Accuracy:{}' if epoch%1 ==0: printbar() tf.print(tf.strings.format(logs, (epoch,train_loss.result(),train_metric.result(),valid_loss.result(),valid_metric.result()))) tf.print("") train_loss.reset_states() valid_loss.reset_states() train_metric.reset_states() valid_metric.reset_states() train_model(model,ds_train,ds_test,10)
================================================================================12:01:11 Epoch=1,Loss:2.00887108,Accuracy:0.470273882,Valid Loss:1.6704694,Valid Accuracy:0.566340148 ================================================================================12:01:13 Epoch=2,Loss:1.47044504,Accuracy:0.618681788,Valid Loss:1.51738906,Valid Accuracy:0.630454123 ================================================================================12:01:14 Epoch=3,Loss:1.1620506,Accuracy:0.700289488,Valid Loss:1.52190566,Valid Accuracy:0.641139805 ================================================================================12:01:16 Epoch=4,Loss:0.878907442,Accuracy:0.771654427,Valid Loss:1.67911685,Valid Accuracy:0.644256473 ================================================================================12:01:17 Epoch=5,Loss:0.647668123,Accuracy:0.836450696,Valid Loss:1.93839979,Valid Accuracy:0.642475486 ================================================================================12:01:19 Epoch=6,Loss:0.487838209,Accuracy:0.880538881,Valid Loss:2.20062685,Valid Accuracy:0.642030299 ================================================================================12:01:21 Epoch=7,Loss:0.390418053,Accuracy:0.90670228,Valid Loss:2.32795334,Valid Accuracy:0.646482646 ================================================================================12:01:22 Epoch=8,Loss:0.328294098,Accuracy:0.92351371,Valid Loss:2.44113493,Valid Accuracy:0.644701719 ================================================================================12:01:24 Epoch=9,Loss:0.286735713,Accuracy:0.931195736,Valid Loss:2.5071857,Valid Accuracy:0.642920732 ================================================================================12:01:25 Epoch=10,Loss:0.256434649,Accuracy:0.936428428,Valid Loss:2.60177088,Valid Accuracy:0.640249312
reference:
Open source e-book address: https://lyhue1991.github.io/eat_tensorflow2_in_30_days/
GitHub project address: https://github.com/lyhue1991/eat_tensorflow2_in_30_days