TensorFlow图像数据预处理

写在前面

在之前介绍的栗子中都是直接使用图像原始的像素矩阵。但是如果在输入前通过对图像的预处理，可以尽量避免模型收到无关因素的影响。在大部分图像识别问题中，通过图像预处理过程可以提高模型的准确率。

1. 图像编码处理

我们平常提到的RGB图像可以看成一个三维矩阵，矩阵中的每个元素表示了图像上不同位置，不同颜色的亮度。但是图像在存储时并不是直接记录这些矩阵中的数字，而是记录经过压缩编码之后的结果。所以在使用时还需要解码的过程。TensorFlow提供了对jpg和png格式图像的编码/解码函数：tf.image.decode_jpeg()和tf.image.decode_png()。代码如下：

with tf.Session() as sess:
    img_data = tf.image.decode_jpeg(image_raw_data)
    
    # 输出解码之后的三维矩阵。
    print(img_data.eval())
    img_data.set_shape([1797, 2673, 3])
    print(img_data.get_shape())

可视化图片：

#可视化图片
with tf.Session() as sess:
    plt.imshow(img_data.eval())
    plt.show()

2. 图片大小调整

一般来说，我们获取的图像大小是不统一的，但是神经网络输入节点个数是固定的。所以在将图像的像素作为输入提供给神经网络之前，需要先将图像的大小统一。

（1）通过算法使得新的图像尽量保存原始图像上的所有信息。TensorFlow提供了tf.image.resize_images()函数。

#重新调整图片大小
with tf.Session() as sess:
    # 如果直接以0-255范围的整数数据输入resize_images，那么输出将是0-255之间的实数，
    # 不利于后续处理。建议在调整图片大小前，先将图片转为0-1范围的实数。
    image_float = tf.image.convert_image_dtype(img_data, tf.float32)

    #method参数对应该函数中不同的图像大小调整算法：0=双线性插值法， 1=最近邻算法， 2=双三次插值法， 3=面积插值法
    resized = tf.image.resize_images(image_float, [300, 300], method=0)
    
    plt.imshow(resized.eval())
    plt.show()

（2）对图像进行裁剪或者填充

#裁剪和填充图片
with tf.Session() as sess:    
    croped = tf.image.resize_image_with_crop_or_pad(resized, 100, 100)
    padded = tf.image.resize_image_with_crop_or_pad(resized, 1000, 1000)
    plt.imshow(croped.eval())
    plt.show()
    plt.imshow(padded.eval())
    plt.show()

（3）通过比例调整图像的大小

#截取中间50%的部分
with tf.Session() as sess:   
    central_cropped = tf.image.central_crop(resized, 0.5)
    plt.imshow(central_cropped.eval())
    plt.show()

3.图像翻转

在很多图像识别问题汇总，图像的翻转不会影响识别的结果，于是可以在训练集中进行翻转预处理增加训练样本。

#翻转图片
with tf.Session() as sess: 
    # 上下翻转
    flipped1 = tf.image.flip_up_down(img_data)
    # 左右翻转
    flipped2 = tf.image.flip_left_right(img_data)
    
    #对角线翻转
    transposed = tf.image.transpose_image(resized)
    #plt.imshow(transposed.eval())
    #plt.show()
    
    # 以一定概率上下翻转图片。
    flipped = tf.image.random_flip_up_down(img_data)
    # 以一定概率左右翻转图片。
    flipped = tf.image.random_flip_left_right(img_data)

4. 图像色彩调整

对图像的色彩预处理目的和图像翻转类似。

（1）调节亮度和对比度

with tf.Session() as sess:
    # 在进行一系列图片调整前，先将图片转换为实数形式，有利于保持计算精度。
    image_float = tf.image.convert_image_dtype(img_data, tf.float32)
    
    # 将图片的亮度-0.5。
    #adjusted = tf.image.adjust_brightness(image_float, -0.5)
    
    # 将图片的亮度-0.5
    #adjusted = tf.image.adjust_brightness(image_float, 0.5)
    
    # 在[-max_delta, max_delta)的范围随机调整图片的亮度。
    adjusted = tf.image.random_brightness(image_float, max_delta=0.5)
    
    # 将图片的对比度-5
    #adjusted = tf.image.adjust_contrast(image_float, -5)
    
    # 将图片的对比度+5
    #adjusted = tf.image.adjust_contrast(image_float, 5)
    
    # 在[lower, upper]的范围随机调整图的对比度。
    #adjusted = tf.image.random_contrast(image_float, lower, upper)

    # 在最终输出前，将实数取值截取到0-1范围内。
    adjusted = tf.clip_by_value(adjusted, 0.0, 1.0)
    plt.imshow(adjusted.eval())

（2）调节色相和饱和度

with tf.Session() as sess:
    # 在进行一系列图片调整前，先将图片转换为实数形式，有利于保持计算精度。
    image_float = tf.image.convert_image_dtype(img_data, tf.float32)
    
    adjusted = tf.image.adjust_hue(image_float, 0.1)
    #adjusted = tf.image.adjust_hue(image_float, 0.3)
    #adjusted = tf.image.adjust_hue(image_float, 0.6)
    #adjusted = tf.image.adjust_hue(image_float, 0.9)
    
    # 在[-max_delta, max_delta]的范围随机调整图片的色相。max_delta的取值在[0, 0.5]之间。
    #adjusted = tf.image.random_hue(image_float, max_delta)
    
    # 将图片的饱和度-5。
    #adjusted = tf.image.adjust_saturation(image_float, -5)
    # 将图片的饱和度+5。
    #adjusted = tf.image.adjust_saturation(image_float, 5)
    # 在[lower, upper]的范围随机调整图的饱和度。
    #adjusted = tf.image.random_saturation(image_float, lower, upper)
    
    # 将代表一张图片的三维矩阵中的数字均值变为0，方差变为1。
    #adjusted = tf.image.per_image_whitening(image_float)
    
    # 在最终输出前，将实数取值截取到0-1范围内。
    adjusted = tf.clip_by_value(adjusted, 0.0, 1.0)
    plt.imshow(adjusted.eval())
    plt.show()

5. 处理标注框

在很多图像识别问题中，图像中需要关注的物体会被标注框圈出来。

with tf.Session() as sess:         
    boxes = tf.constant([[[0.05, 0.05, 0.9, 0.7], [0.35, 0.47, 0.5, 0.56]]])
    
    # sample_distorted_bounding_box要求输入图片必须是实数类型。
    image_float = tf.image.convert_image_dtype(img_data, tf.float32)
    
    begin, size, bbox_for_draw = tf.image.sample_distorted_bounding_box(
        tf.shape(image_float), bounding_boxes=boxes, min_object_covered=0.4)
    
    # 截取后的图片
    distorted_image = tf.slice(image_float, begin, size)
    plt.imshow(distorted_image.eval())
    plt.show()

    # 在原图上用标注框画出截取的范围。由于原图的分辨率较大（2673x1797)，生成的标注框 
    # 在Jupyter Notebook上通常因边框过细而无法分辨，这里为了演示方便先缩小分辨率。
    image_small = tf.image.resize_images(image_float, [180, 267], method=0)
    batchced_img = tf.expand_dims(image_small, 0)
    image_with_box = tf.image.draw_bounding_boxes(batchced_img, bbox_for_draw)
    print(bbox_for_draw.eval())
    plt.imshow(image_with_box[0].eval())
    plt.show()

上面就是利用tensorflow预处理图像数据的基本操作，完整代码样例稍后上传Github，包括了图像片段截取，到图像大小调整再到图像翻转以及色彩调整的整个过程。

以上~

2018.06.10