Comparison of Python image reading and writing methods

  When training vision-related neural network models, image reading and writing are always used. There are many methods, such as matplotlib, cv2, PIL, etc. The following compares several ways of reading and writing, in order to select the fastest way to improve training speed.

Experimental standard

  Because the framework used for training is Pytorch, the experimental standards for reading are as follows:

  1. Read 5 pictures (one in png format and four in jpg format) with a resolution of 1920x1080 and save them to an array.

  2. Convert the read array into a Pytorch tensor whose dimension order is CxHxW and save it to the video memory (I use GPU for training), where the order of the three channels is RGB.

  3. Record the time spent by each method in the above operations. Because the image size in the png format is almost 10 times that of the jpg format with a slight difference in quality, the data set is usually not saved in png, so the difference in reading time between the two formats is not compared.

  The written experimental standards are as follows:

  1. Convert the Pytorch tensor corresponding to the 5 1920x1080 images into an array of data types that can be used by the corresponding method.

  2. Save five pictures in jpg format.

  3. Record the time consumed by each method to save the picture.

Experimental situation

cv2

  Because of the GPU, cv2 has two ways to read pictures:

  1. First read all the pictures as a numpy array, and then convert them into a pytorch tensor stored in the GPU.

  2. Initialize a pytorch tensor saved in the GPU, and then copy each image directly into this tensor.

  The experiment code of the first way is as follows:

import os, torch
import cv2 as cv 
import numpy as np 
from time import time 
 
read_path = 'D:test'
write_path = 'D:test\\write\\'
 
# cv2读取 1
start_t = time()
imgs = np.zeros([5, 1080, 1920, 3])
for img, i in zip(os.listdir(read_path), range(5)): 
  img = cv.imread(filename=os.path.join(read_path, img))
  imgs[i] = img   
imgs = torch.tensor(imgs).to('cuda')[...,[2,1,0]].permute([0,3,1,2])/255 
print('cv2 读取时间1:', time() - start_t) 
# cv2保存
start_t = time()
imgs = (imgs.permute([0,2,3,1])[...,[2,1,0]]*255).cpu().numpy()
for i in range(imgs.shape[0]): 
  cv.imwrite(write_path + str(i) + '.jpg', imgs[i])
print('cv2 保存时间:', time() - start_t) 

  Experimental results:

cv2 读取时间1: 0.39693760871887207
cv2 保存时间: 0.3560612201690674

  The experiment code of the second way is as follows:

import os, torch
import cv2 as cv 
import numpy as np 
from time import time 
 
read_path = 'D:test'
write_path = 'D:test\\write\\'
 
 
# cv2读取 2
start_t = time()
imgs = torch.zeros([5, 1080, 1920, 3], device='cuda')
for img, i in zip(os.listdir(read_path), range(5)): 
  img = torch.tensor(cv.imread(filename=os.path.join(read_path, img)), device='cuda')
  imgs[i] = img   
imgs = imgs[...,[2,1,0]].permute([0,3,1,2])/255 
print('cv2 读取时间2:', time() - start_t) 
# cv2保存
start_t = time()
imgs = (imgs.permute([0,2,3,1])[...,[2,1,0]]*255).cpu().numpy()
for i in range(imgs.shape[0]): 
  cv.imwrite(write_path + str(i) + '.jpg', imgs[i])
print('cv2 保存时间:', time() - start_t) 

  Experimental results:

cv2 读取时间2: 0.23636841773986816
cv2 保存时间: 0.3066873550415039

matplotlib

  The same two reading methods, the first code is as follows:

import os, torch 
import numpy as np
import matplotlib.pyplot as plt 
from time import time 
 
read_path = 'D:test'
write_path = 'D:test\\write\\'
 
# matplotlib 读取 1
start_t = time()
imgs = np.zeros([5, 1080, 1920, 3])
for img, i in zip(os.listdir(read_path), range(5)): 
  img = plt.imread(os.path.join(read_path, img)) 
  imgs[i] = img    
imgs = torch.tensor(imgs).to('cuda').permute([0,3,1,2])/255  
print('matplotlib 读取时间1:', time() - start_t) 
# matplotlib 保存
start_t = time()
imgs = (imgs.permute([0,2,3,1])).cpu().numpy()
for i in range(imgs.shape[0]):  
  plt.imsave(write_path + str(i) + '.jpg', imgs[i])
print('matplotlib 保存时间:', time() - start_t) 

  Experimental results:

matplotlib 读取时间1: 0.45380306243896484
matplotlib 保存时间: 0.768944263458252

  The second way to experiment code:

import os, torch 
import numpy as np
import matplotlib.pyplot as plt 
from time import time 
 
read_path = 'D:test'
write_path = 'D:test\\write\\'
 
# matplotlib 读取 2
start_t = time()
imgs = torch.zeros([5, 1080, 1920, 3], device='cuda')
for img, i in zip(os.listdir(read_path), range(5)): 
  img = torch.tensor(plt.imread(os.path.join(read_path, img)), device='cuda')
  imgs[i] = img    
imgs = imgs.permute([0,3,1,2])/255  
print('matplotlib 读取时间2:', time() - start_t) 
# matplotlib 保存
start_t = time()
imgs = (imgs.permute([0,2,3,1])).cpu().numpy()
for i in range(imgs.shape[0]):  
  plt.imsave(write_path + str(i) + '.jpg', imgs[i])
print('matplotlib 保存时间:', time() - start_t) 

  Experimental results:

matplotlib 读取时间2: 0.2044532299041748
matplotlib 保存时间: 0.4737534523010254

  It should be noted that the value of the array obtained by matplotlib reading the png format picture is a floating point number in the range of $[0, 1]$, while the jpg format picture is an integer in the range of $[0, 255]$. Therefore, if the image format in the data set is inconsistent, be careful to convert it to the same before reading, otherwise the preprocessing of the data set will be troublesome.

PIL

  Reading and writing of PIL cannot directly use pytorch tensor or numpy array. It must be converted to Image type first, so it is very troublesome. The time complexity is definitely at a disadvantage, so I won't experiment.

torchvision

  torchvision提供了直接从pytorch张量保存图片的功能,和上面读取最快的matplotlib的方法结合,代码如下:

import os, torch  
import matplotlib.pyplot as plt 
from time import time 
from torchvision import utils 

read_path = 'D:test'
write_path = 'D:test\\write\\'
 
# matplotlib 读取 2
start_t = time()
imgs = torch.zeros([5, 1080, 1920, 3], device='cuda')
for img, i in zip(os.listdir(read_path), range(5)): 
  img = torch.tensor(plt.imread(os.path.join(read_path, img)), device='cuda')
  imgs[i] = img    
imgs = imgs.permute([0,3,1,2])/255  
print('matplotlib 读取时间2:', time() - start_t) 
# torchvision 保存
start_t = time() 
for i in range(imgs.shape[0]):   
  utils.save_image(imgs[i], write_path + str(i) + '.jpg')
print('torchvision 保存时间:', time() - start_t) 

  实验结果:

matplotlib 读取时间2: 0.15358829498291016
torchvision 保存时间: 0.14760661125183105

  可以看出这两个是最快的读写方法。另外,要让图片的读写尽量不影响训练进程,我们还可以让这两个过程与训练并行。另外,utils.save_image可以将多张图片拼接成一张来保存,具体使用方法如下:

utils.save_image(tensor = imgs,     # 要保存的多张图片张量 shape = [n, C, H, W]
                 fp = 'test.jpg',   # 保存路径
                 nrow = 5,          # 多图拼接时,每行所占的图片数
                 padding = 1,       # 多图拼接时,每张图之间的间距
                 normalize = True,  # 是否进行规范化,通常输出图像用tanh,所以要用规范化 
                 range = (-1,1))    # 规范化的范围

Guess you like

Origin blog.csdn.net/qq_37189298/article/details/109699749