TensorFlow and Keras solve the problem of memory overflow with large data volume Reprinted: https://blog.csdn.net/leadai/article/details/79999785

Reprinted: https://blog.csdn.net/leadai/article/details/79999785

The problem of memory overflow is the first stumbling block for participating in kaggle competitions or experimenting with large data volumes.


The small hand-training project that I did before led to an inertial thinking for the novice - when reading the pictures of the training set, read all the pictures into the memory, and then train in batches.


In fact, this is problematic and can easily lead to OOM. Now the memory is generally 16G, and the training set pictures are usually tens of thousands, and the RGB pictures are still very large. The pictures of VGG16 are generally 224x224x3, tens of thousands of pictures, and 16G memory is not enough. At this time, I will think again - set the batch, but the input parameter of the batch is a picture, it just sends the pictures passed in to the graphics card in batches, and the place where I OOM is the "passed in" picture, why? manage?


The solution is actually quite simple, just break the mindset, instead of reading all the pictures into memory, only the paths of all the pictures are read into the memory at one time.


The general solution is:


Read the paths of tens of thousands of pictures into the memory at one time, implement a batch read function by yourself, set the read pictures according to your own memory conditions in this function, read only this batch of pictures into the memory, and then submit Give the model, and the model trains this batch of images in batches. Because the memory is generally greater than or equal to the video memory, the batch size of the memory and the batch size of the video memory are usually different.


The following code introduces the key functions of Tensorflow and Keras to read data into memory in batches. Tensorflow is not very friendly to beginners, so I am more accustomed to using its high-level API Keras for related projects at this stage. The following TF implementation is a series of information that I would not use Keras to read in batches before. In model training Still using Keras, only batched reads use TF's API.


TensorFlow


Write the get_batch function in input.py.


def get_batch(X_train, y_train, img_w, img_h, color_type, batch_size, capacity):
   '''
   Args:
       X_train: train img path list
       y_train: train labels list
       img_w: image width
       img_h: image height
       batch_size: batch size
       capacity: the maximum elements in queue
   Returns:
       X_train_batch: 4D tensor [batch_size, width, height, chanel],\
                       dtype=tf.float32
       y_train_batch: 1D tensor [batch_size], dtype=int32
   '''
   X_train = tf.cast(X_train, tf.string)

   y_train = tf.cast(y_train, tf.int32)    
   # make an input queue
   input_queue = tf.train.slice_input_producer([X_train, y_train])

   y_train = input_queue[1]
   X_train_contents = tf.read_file(input_queue[0])
   X_train = tf.image.decode_jpeg(X_train_contents, channels=color_type)

   X_train = tf.image.resize_images(X_train, [img_h, img_w], 
                                    tf.image.ResizeMethod.NEAREST_NEIGHBOR)

   X_train_batch, y_train_batch = tf.train.batch([X_train, y_train],
                                                 batch_size=batch_size,
                                                 num_threads=64,
                                                 capacity=capacity)

   y_train_batch = tf.one_hot(y_train_batch, 10)    return X_train_batch, y_train_batch


在train.py文件中训练(下面不是纯TF代码,model.fit是Keras的拟合,用纯TF的替换就好了)。


X_train_batch, y_train_batch = inp.get_batch(X_train, y_train, 
                                            img_w, img_h, color_type, 
                                            train_batch_size, capacity)
X_valid_batch, y_valid_batch = inp.get_batch(X_valid, y_valid, 
                                            img_w, img_h, color_type, 
                                            valid_batch_size, capacity)with tf.Session() as sess:

   coord = tf.train.Coordinator()
   threads = tf.train.start_queue_runners(coord=coord)   

 try:       

 for step in np.arange(max_step):            

if coord.should_stop() :                

break
           X_train, y_train = sess.run([X_train_batch, 
                                            y_train_batch])
           X_valid, y_valid = sess.run([X_valid_batch,
                                            y_valid_batch])
             
           ckpt_path = 'log/weights-{val_loss:.4f}.hdf5'
           ckpt = tf.keras.callbacks.ModelCheckpoint(ckpt_path, 
                                                     monitor='val_loss', 
                                                     verbose=1, 
                                                     save_best_only=True, 
                                                     mode='min')
           model.fit(X_train, y_train, batch_size=64, 
                         epochs=50, verbose=1,
                         validation_data=(X_valid, y_valid),
                         callbacks=[ckpt])            
           del X_train, y_train, X_valid, y_valid    

except tf.errors.OutOfRangeError:
       print('done!')    finally:
       coord.request_stop()
   coord.join(threads)

   sess.close()


Keras


keras文档中对fit、predict、evaluate这些函数都有一个generator,这个generator就是解决分批问题的。


关键函数:fit_generator


# 读取图片函数

def get_im_cv2(paths, img_rows, img_cols, color_type=1, normalize=True):
   '''
   参数:
       paths:要读取的图片路径列表
       img_rows:图片行
       img_cols:图片列
       color_type:图片颜色通道
   返回: 
       imgs: 图片数组
   '''
   # Load as grayscale
   imgs = []    for path in paths:        

if color_type == 1:
           img = cv2.imread(path, 0)        

elif color_type == 3:
           img = cv2.imread(path)        

# Reduce size
       resized = cv2.resize(img, (img_cols, img_rows))       

 if normalize:
           resized = resized.astype('float32')
           resized /= 127.5
           resized -= 1. 
       
       imgs.append(resized)        

   return np.array(imgs).reshape(len(paths), img_rows, img_cols, color_type)


获取批次函数,其实就是一个generator


def get_train_batch(X_train, y_train, batch_size, img_w, img_h, color_type, is_argumentation):
   '''
   参数:
       X_train:所有图片路径列表
       y_train: 所有图片对应的标签列表
       batch_size:批次
       img_w:图片宽
       img_h:图片高
       color_type:图片类型
       is_argumentation:是否需要数据增强
   返回: 
       一个generator,

x: 获取的批次图片 

y: 获取的图片对应的标签
   '''
   while 1:        

for i in range(0, len(X_train), batch_size):
           x = get_im_cv2(X_train[i:i+batch_size], img_w, img_h, color_type)
           y = y_train[i:i+batch_size]            

if is_argumentation:                

# 数据增强
               x, y = img_augmentation(x, y)            

# 最重要的就是这个yield,它代表返回,返回以后循环还是会继续,然后再返回。就比如有一个机器一直在作累加运算,但是会把每次累加中间结果告诉你一样,直到把所有数加完

           yield({'input': x}, {'output': y})


训练函数


result = model.fit_generator(generator=get_train_batch(X_train, y_train, train_batch_size, img_w, img_h, color_type, True), 
         steps_per_epoch=1351, 
         epochs=50, verbose=1,
         validation_data=get_train_batch(X_valid, y_valid, valid_batch_size,img_w, img_h, color_type, False),
         validation_steps=52,
         callbacks=[ckpt, early_stop],
         max_queue_size=capacity,

         workers=1)


就是这么简单。但是当初从0到1的过程很难熬,每天都没有进展,没有头绪,急躁占据了思维的大部,熬过了这个阶段,就会一切顺利,不是运气,而是踩过的从0到1的每个脚印累积的灵感的爆发,从0到1的脚印越多,后面的路越顺利。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324887938&siteId=291194637