Primary interpretation of the DIP module in the IA-YOLO project

The IA-YOLO project is derived from the paper Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions , which proposes an end-to-end joint learning of CNN-PP and YOLOv3, which ensures that CNN-PP can learn the appropriate DIP in a weakly supervised manner Enhanced image detection. The IA-YOLO method can adaptively process images under normal and adverse weather conditions. After reading the paper, I found that only the actual effect of IA-YOLO was introduced, and there were few introductions to the DIP module and CNN-PP, so I checked the source code to analyze its implementation.

Through analysis, it is found that IA-YOLO uses the mse loss of filtered_image_batch and input_data_clean to optimize the output of the CNN-PP module. Therefore, it can be seen that CNN-PP | DIP can actually be separated from IA-YOLO and used as a data optimization module alone. If you need to consider the use of the IA-YOLO project, you should compare and study other image enhancement modules.

Interpretation of the IA-YOLO project is to use the DIP module separately and add it to its own target detection model. Since the blogger uses pytorch, the code in the IA-YOLO project is not run, but the core part is extracted according to the operation logic.

1. Use of IA-YOLO project

1.1. Installation command

$ git clone https://github.com/wenyyu/Image-Adaptive-YOLO.git  
$ cd Image-Adaptive-YOLO  
# Require python3 and tensorflow
$ pip install -r ./docs/requirements.txt

1.2. Related data sets

The following two datasets are external datasets used by IA-YOLO (it also uses the voc dataset and the foggy_voc dataset).
ExDark: https://github.com/cs-chan/Exclusively-Dark-Image-Dataset/tree/master/Dataset
insert image description here

RTTS: https://sites.google.com/view/reside-dehaze-datasets/reside-%CE%B2
insert image description here

1.3 Basic use

Train and Evaluate on the datasets

  1. Download VOC PASCAL trainval and test data
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar

Extract all of these tars into one directory and rename them, which should have the following basic structure.


VOC           # path:  /home/lwy/work/code/tensorflow-yolov3/data/VOC
├── test
|    └──VOCdevkit
|        └──VOC2007 (from VOCtest_06-Nov-2007.tar)
└── train
     └──VOCdevkit
         └──VOC2007 (from VOCtrainval_06-Nov-2007.tar)
         └──VOC2012 (from VOCtrainval_11-May-2012.tar)
                     
$ python scripts/voc_annotation.py
  1. Generate Voc_foggy_train and Voc_foggy_val dataset offline
# generate ten levels' foggy training images and val images, respectively
$ python ./core/data_make.py 
  1. Edit core/config.py to configure
--vocfog_traindata_dir'  = '/data/vdd/liuwenyu/data_vocfog/train/JPEGImages/'
--vocfog_valdata_dir'    = '/data/vdd/liuwenyu/data_vocfog/val/JPEGImages/'
--train_path             = './data/dataset_fog/voc_norm_train.txt'
--test_path              = './data/dataset_fog/voc_norm_test.txt'
--class_name             = './data/classes/vocfog.names'
  1. Train and Evaluate
$ python train.py # we trained our model from scratch.  
$ python evaluate.py   
$ cd ./experiments/.../mAP & python main.py 
  1. More details of Preparing dataset or Train with your own dataset
    reference the implementation tensorflow-yolov3.

2. Implementation of CNN-PP and DIP modules

In the IA-YOLO paper, a lot of data processing knowledge is mentioned, such as generating foggy pictures, differentiable DIP modules, CNN-PP modules, etc. Here we mainly introduce the implementation and training of the differentiable DIP module and CNN-PP module.

2.1 Main process

As mentioned in the paper, the sharpening parameters of the thumbnail image and the original image are the same. In order to save the amount of calculation, CNN-PP inputs a low-resolution original image, and its output parameters are used for the DIP module to clear the image; and The DIP module uses the parameters output by CNN-PP to optimize the image, and finally hand it over to the yolov3 model for prediction
insert image description here

2.2 Subject Code

In order to facilitate comparative experiments, IA-YOLO decides whether to use CNN-PP and DIP modules by using isp_flag as a configuration item in yolov3.py (包含去雾filter)or yolov3_lowlight.py . (不包含去雾filter,对图像随机进行亮度处理)The code is as follows:

#代码地址:https://github.com/wenyyu/Image-Adaptive-YOLO/blob/main/core/yolov3_lowlight.py
 def __build_nework(self, input_data, isp_flag, input_data_clean):

        filtered_image_batch = input_data
        self.filter_params = input_data
        filter_imgs_series = []

        if isp_flag:
            with tf.variable_scope('extract_parameters_2'):
                input_data = tf.image.resize_images(input_data, [256, 256], method=tf.image.ResizeMethod.BILINEAR)#---------对原始图像进行下采样
                filter_features = common.extract_parameters_2(input_data, cfg, self.trainable)#-----CNN-PP计算出DIP模块的参数

            # filter_features = tf.random_normal([1, 10], 0.5, 0.1)

            filters = cfg.filters
            filters = [x(input_data, cfg) for x in filters]#-----生成DIP模块
            filter_parameters = []
            for j, filter in enumerate(filters):
                with tf.variable_scope('filter_%d' % j):
                    print('    creating filter:', j, 'name:', str(filter.__class__), 'abbr.',
                          filter.get_short_name())
                    print('      filter_features:', filter_features.shape)

                    filtered_image_batch, filter_parameter = filter.apply(
                        filtered_image_batch, filter_features)#-----DIP模块中filter使用CNN-PP参数优化图像
                    filter_parameters.append(filter_parameter)
                    filter_imgs_series.append(filtered_image_batch)


                    print('      output:', filtered_image_batch.shape)
            self.filter_params = filter_parameters
        self.image_isped = filtered_image_batch
        self.filter_imgs_series = filter_imgs_series

        recovery_loss = tf.reduce_sum(tf.pow(filtered_image_batch - input_data_clean, 2.0))#/(2.0 * batch_size)
		#正常的yolov3代码训练使用流程
        input_data = filtered_image_batch

if isp_flag:The code inside The code
outsideif isp_flag:

2.3 Implementation of CNN-PP module

Complete code: https://github.com/wenyyu/Image-Adaptive-YOLO/blob/main/core/common.py

def extract_parameters_2(net, cfg, trainable):
    output_dim = cfg.num_filter_parameters
    # net = net - 0.5
    min_feature_map_size = 4
    print('extract_parameters_2 CNN:')
    channels = 16
    print('    ', str(net.get_shape()))
    net = convolutional(net, filters_shape=(3, 3, 3, channels), trainable=trainable, name='ex_conv0',
                        downsample=True, activate=True, bn=False)
    net = convolutional(net, filters_shape=(3, 3, channels, 2*channels), trainable=trainable, name='ex_conv1',
                        downsample=True, activate=True, bn=False)
    net = convolutional(net, filters_shape=(3, 3, 2*channels, 2*channels), trainable=trainable, name='ex_conv2',
                        downsample=True, activate=True, bn=False)
    net = convolutional(net, filters_shape=(3, 3, 2*channels, 2*channels), trainable=trainable, name='ex_conv3',
                        downsample=True, activate=True, bn=False)
    net = convolutional(net, filters_shape=(3, 3, 2*channels, 2*channels), trainable=trainable, name='ex_conv4',
                        downsample=True, activate=True, bn=False)
    net = tf.reshape(net, [-1, 2048])
    features = ly.fully_connected(
        net,
        64,
        scope='fc1',
        activation_fn=lrelu,
        weights_initializer=tf.contrib.layers.xavier_initializer())
    filter_features = ly.fully_connected(
        features,
        output_dim,
        scope='fc2',
        activation_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer())
    return filter_features

Looking at the code, it can be seen that CNN-DIP is an ordinary CNN network, and its output structure is determined by cfg.num_filter_parameters, the specific value is 14 (excluding the dehazing filter) or 15 (including the dehazing filter).
insert image description here
In addition, it is defined in config.py Some other DIP, CNN-PP parameter items

cfg.filters = [
    DefogFilter, ImprovedWhiteBalanceFilter,  GammaFilter,
    ToneFilter, ContrastFilter, UsmFilter
]
cfg.num_filter_parameters = 15

cfg.defog_begin_param = 0

cfg.wb_begin_param = 1
cfg.gamma_begin_param = 4
cfg.tone_begin_param = 5
cfg.contrast_begin_param = 13
cfg.usm_begin_param = 14


cfg.curve_steps = 8
cfg.gamma_range = 3
cfg.exposure_range = 3.5
cfg.wb_range = 1.1
cfg.color_curve_range = (0.90, 1.10)
cfg.lab_curve_range = (0.90, 1.10)
cfg.tone_curve_range = (0.5, 2)
cfg.defog_range = (0.1, 1.0)
cfg.usm_range = (0.0, 5)



# Masking is DISABLED
cfg.masking = False
cfg.minimum_strength = 0.3
cfg.maximum_sharpness = 1
cfg.clamp = False

###########################################################################
# CNN Parameters
###########################################################################
cfg.source_img_size = 64
cfg.base_channels = 32
cfg.dropout_keep_prob = 0.5
# G and C use the same feed dict?
cfg.share_feed_dict = True
cfg.shared_feature_extractor = True
cfg.fc1_size = 128
cfg.bnw = False
# number of filters for the first convolutional layers for all networks
#                      (stochastic/deterministic policy, critic, value)
cfg.feature_extractor_dims = 4096

2.4 Realization of DIP module

The DIP module is essentially a stacked differentiable filter. Its implementation functions include Defog, White Balance (WB), Gamma, Contrast, Tone and Sharpen. The implementation code is mainly in filters.py, and some auxiliary codes are in util_filters.py middle在作者实现中有filters.py和filters_lowlight.py,二者没有本质区别,filters.py为包含去雾参数的filer(代码中有部分注释未删除),而filters_lowlight.py为后期更新代码实现了不包含去雾参数的filer(删除了原有的注释)

filters.py implementation

Part of the following code uses tf api to process images, but all of its Filter subclasses are not tf.layer objects. Using tf—api to process images is only to realize the differentiability of CNN-PP output results

import tensorflow as tf
import numpy as np
import tensorflow.contrib.layers as ly
from util_filters import lrelu, rgb2lum, tanh_range, lerp
import cv2
import math
class Filter:

  def __init__(self, net, cfg):
    self.cfg = cfg
    # self.height, self.width, self.channels = list(map(int, net.get_shape()[1:]))

    # Specified in child classes
    self.num_filter_parameters = None
    self.short_name = None
    self.filter_parameters = None

  def get_short_name(self):
    assert self.short_name
    return self.short_name

  def get_num_filter_parameters(self):
    assert self.num_filter_parameters
    return self.num_filter_parameters

  def get_begin_filter_parameter(self):
    return self.begin_filter_parameter

  def extract_parameters(self, features):
    # output_dim = self.get_num_filter_parameters(
    # ) + self.get_num_mask_parameters()
    # features = ly.fully_connected(
    #     features,
    #     self.cfg.fc1_size,
    #     scope='fc1',
    #     activation_fn=lrelu,
    #     weights_initializer=tf.contrib.layers.xavier_initializer())
    # features = ly.fully_connected(
    #     features,
    #     output_dim,
    #     scope='fc2',
    #     activation_fn=None,
    #     weights_initializer=tf.contrib.layers.xavier_initializer())
    return features[:, self.get_begin_filter_parameter():(self.get_begin_filter_parameter() + self.get_num_filter_parameters())], \
           features[:, self.get_begin_filter_parameter():(self.get_begin_filter_parameter() + self.get_num_filter_parameters())]

  # Should be implemented in child classes
  def filter_param_regressor(self, features):
    assert False

  # Process the whole image, without masking
  # Should be implemented in child classes
  def process(self, img, param, defog, IcA):
    assert False

  def debug_info_batched(self):
    return False

  def no_high_res(self):
    return False

  # Apply the whole filter with masking
  def apply(self,
            img,
            img_features=None,
            defog_A=None,
            IcA=None,
            specified_parameter=None,
            high_res=None):
    assert (img_features is None) ^ (specified_parameter is None)
    if img_features is not None:
      filter_features, mask_parameters = self.extract_parameters(img_features)
      filter_parameters = self.filter_param_regressor(filter_features)
    else:
      assert not self.use_masking()
      filter_parameters = specified_parameter
      mask_parameters = tf.zeros(
          shape=(1, self.get_num_mask_parameters()), dtype=np.float32)
    if high_res is not None:
      # working on high res...
      pass
    debug_info = {
    
    }
    # We only debug the first image of this batch
    if self.debug_info_batched():
      debug_info['filter_parameters'] = filter_parameters
    else:
      debug_info['filter_parameters'] = filter_parameters[0]
    # self.mask_parameters = mask_parameters
    # self.mask = self.get_mask(img, mask_parameters)
    # debug_info['mask'] = self.mask[0]
    #low_res_output = lerp(img, self.process(img, filter_parameters), self.mask)
    low_res_output = self.process(img, filter_parameters, defog_A, IcA)

    if high_res is not None:
      if self.no_high_res():
        high_res_output = high_res
      else:
        self.high_res_mask = self.get_mask(high_res, mask_parameters)
        # high_res_output = lerp(high_res,
        #                        self.process(high_res, filter_parameters, defog, IcA),
        #                        self.high_res_mask)
    else:
      high_res_output = None
    #return low_res_output, high_res_output, debug_info
    return low_res_output, filter_parameters

  def use_masking(self):
    return self.cfg.masking

  def get_num_mask_parameters(self):
    return 6

  # Input: no need for tanh or sigmoid
  # Closer to 1 values are applied by filter more strongly
  # no additional TF variables inside
  def get_mask(self, img, mask_parameters):
    if not self.use_masking():
      print('* Masking Disabled')
      return tf.ones(shape=(1, 1, 1, 1), dtype=tf.float32)
    else:
      print('* Masking Enabled')
    with tf.name_scope(name='mask'):
      # Six parameters for one filter
      filter_input_range = 5
      assert mask_parameters.shape[1] == self.get_num_mask_parameters()
      mask_parameters = tanh_range(
          l=-filter_input_range, r=filter_input_range,
          initial=0)(mask_parameters)
      size = list(map(int, img.shape[1:3]))
      grid = np.zeros(shape=[1] + size + [2], dtype=np.float32)

      shorter_edge = min(size[0], size[1])
      for i in range(size[0]):
        for j in range(size[1]):
          grid[0, i, j,
               0] = (i + (shorter_edge - size[0]) / 2.0) / shorter_edge - 0.5
          grid[0, i, j,
               1] = (j + (shorter_edge - size[1]) / 2.0) / shorter_edge - 0.5
      grid = tf.constant(grid)
      # Ax + By + C * L + D
      inp = grid[:, :, :, 0, None] * mask_parameters[:, None, None, 0, None] + \
            grid[:, :, :, 1, None] * mask_parameters[:, None, None, 1, None] + \
            mask_parameters[:, None, None, 2, None] * (rgb2lum(img) - 0.5) + \
            mask_parameters[:, None, None, 3, None] * 2
      # Sharpness and inversion
      inp *= self.cfg.maximum_sharpness * mask_parameters[:, None, None, 4,
                                                          None] / filter_input_range
      mask = tf.sigmoid(inp)
      # Strength
      mask = mask * (
          mask_parameters[:, None, None, 5, None] / filter_input_range * 0.5 +
          0.5) * (1 - self.cfg.minimum_strength) + self.cfg.minimum_strength
      print('mask', mask.shape)
    return mask

  # def visualize_filter(self, debug_info, canvas):
  #   # Visualize only the filter information
  #   assert False

  def visualize_mask(self, debug_info, res):
    return cv2.resize(
        debug_info['mask'] * np.ones((1, 1, 3), dtype=np.float32),
        dsize=res,
        interpolation=cv2.cv2.INTER_NEAREST)

  def draw_high_res_text(self, text, canvas):
    cv2.putText(
        canvas,
        text, (30, 128),
        cv2.FONT_HERSHEY_SIMPLEX,
        0.8, (0, 0, 0),
        thickness=5)
    return canvas


class ExposureFilter(Filter):#gamma_param is 2*exposure_range + exposure_range

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'E'
    self.begin_filter_parameter = cfg.exposure_begin_param
    self.num_filter_parameters = 1

  def filter_param_regressor(self, features):
    return tanh_range(
        -self.cfg.exposure_range, self.cfg.exposure_range, initial=0)(features)

  def process(self, img, param, defog, IcA):
    return img * tf.exp(param[:, None, None, :] * np.log(2))

  # def visualize_filter(self, debug_info, canvas):
  #   exposure = debug_info['filter_parameters'][0]
  #   if canvas.shape[0] == 64:
  #     cv2.rectangle(canvas, (8, 40), (56, 52), (1, 1, 1), cv2.FILLED)
  #     cv2.putText(canvas, 'EV %+.2f' % exposure, (8, 48),
  #                 cv2.FONT_HERSHEY_SIMPLEX, 0.3, (0, 0, 0))
  #   else:
  #     self.draw_high_res_text('Exposure %+.2f' % exposure, canvas)

class UsmFilter(Filter):#Usm_param is in [Defog_range]

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'UF'
    self.begin_filter_parameter = cfg.usm_begin_param
    self.num_filter_parameters = 1

  def filter_param_regressor(self, features):
    return tanh_range(*self.cfg.usm_range)(features)

  def process(self, img, param, defog_A, IcA):
    def make_gaussian_2d_kernel(sigma, dtype=tf.float32):
      radius = 12
      x = tf.cast(tf.range(-radius, radius + 1), dtype=dtype)
      k = tf.exp(-0.5 * tf.square(x / sigma))
      k = k / tf.reduce_sum(k)
      return tf.expand_dims(k, 1) * k

    kernel_i = make_gaussian_2d_kernel(5)
    print('kernel_i.shape', kernel_i.shape)
    kernel_i = tf.tile(kernel_i[:, :, tf.newaxis, tf.newaxis], [1, 1, 1, 1])

    # outputs = []
    # for channel_idx in range(3):
    #     data_c = img[:, :, :, channel_idx:(channel_idx + 1)]
    #     data_c = tf.nn.conv2d(data_c, kernel_i, [1, 1, 1, 1], 'SAME')
    #     outputs.append(data_c)

    pad_w = (25 - 1) // 2
    padded = tf.pad(img, [[0, 0], [pad_w, pad_w], [pad_w, pad_w], [0, 0]], mode='REFLECT')
    outputs = []
    for channel_idx in range(3):
        data_c = padded[:, :, :, channel_idx:(channel_idx + 1)]
        data_c = tf.nn.conv2d(data_c, kernel_i, [1, 1, 1, 1], 'VALID')
        outputs.append(data_c)

    output = tf.concat(outputs, axis=3)
    img_out = (img - output) * param[:, None, None, :] + img
    # img_out = (img - output) * 2.5 + img

    return img_out

class UsmFilter_sigma(Filter):#Usm_param is in [Defog_range]

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'UF'
    self.begin_filter_parameter = cfg.usm_begin_param
    self.num_filter_parameters = 1

  def filter_param_regressor(self, features):
    return tanh_range(*self.cfg.usm_range)(features)

  def process(self, img, param, defog_A, IcA):
    def make_gaussian_2d_kernel(sigma, dtype=tf.float32):
      radius = 12
      x = tf.cast(tf.range(-radius, radius + 1), dtype=dtype)
      k = tf.exp(-0.5 * tf.square(x / sigma))
      k = k / tf.reduce_sum(k)
      return tf.expand_dims(k, 1) * k

    kernel_i = make_gaussian_2d_kernel(param[:, None, None, :])
    print('kernel_i.shape', kernel_i.shape)
    kernel_i = tf.tile(kernel_i[:, :, tf.newaxis, tf.newaxis], [1, 1, 1, 1])

    # outputs = []
    # for channel_idx in range(3):
    #     data_c = img[:, :, :, channel_idx:(channel_idx + 1)]
    #     data_c = tf.nn.conv2d(data_c, kernel_i, [1, 1, 1, 1], 'SAME')
    #     outputs.append(data_c)

    pad_w = (25 - 1) // 2
    padded = tf.pad(img, [[0, 0], [pad_w, pad_w], [pad_w, pad_w], [0, 0]], mode='REFLECT')
    outputs = []
    for channel_idx in range(3):
        data_c = padded[:, :, :, channel_idx:(channel_idx + 1)]
        data_c = tf.nn.conv2d(data_c, kernel_i, [1, 1, 1, 1], 'VALID')
        outputs.append(data_c)

    output = tf.concat(outputs, axis=3)
    img_out = (img - output) * param[:, None, None, :] + img

    return img_out

class DefogFilter(Filter):#Defog_param is in [Defog_range]

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'DF'
    self.begin_filter_parameter = cfg.defog_begin_param
    self.num_filter_parameters = 1

  def filter_param_regressor(self, features):
    return tanh_range(*self.cfg.defog_range)(features)

  def process(self, img, param, defog_A, IcA):
    print('      defog_A:', img.shape)
    print('      defog_A:', IcA.shape)
    print('      defog_A:', defog_A.shape)

    tx = 1 - param[:, None, None, :]*IcA
    # tx = 1 - 0.5*IcA

    tx_1 = tf.tile(tx, [1, 1, 1, 3])
    return (img - defog_A[:, None, None, :])/tf.maximum(tx_1, 0.01) + defog_A[:, None, None, :]

class GammaFilter(Filter):  #gamma_param is in [-gamma_range, gamma_range]

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'G'
    self.begin_filter_parameter = cfg.gamma_begin_param
    self.num_filter_parameters = 1

  def filter_param_regressor(self, features):
    log_gamma_range = np.log(self.cfg.gamma_range)
    return tf.exp(tanh_range(-log_gamma_range, log_gamma_range)(features))

  def process(self, img, param, defog_A, IcA):
    param_1 = tf.tile(param, [1, 3])
    return tf.pow(tf.maximum(img, 0.0001), param_1[:, None, None, :])
    # return img

  # def visualize_filter(self, debug_info, canvas):
  #   gamma = debug_info['filter_parameters']
  #   cv2.rectangle(canvas, (8, 40), (56, 52), (1, 1, 1), cv2.FILLED)
  #   cv2.putText(canvas, 'G 1/%.2f' % (1.0 / gamma), (8, 48),
  #               cv2.FONT_HERSHEY_SIMPLEX, 0.3, (0, 0, 0))


class ImprovedWhiteBalanceFilter(Filter):

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'W'
    self.channels = 3
    self.begin_filter_parameter = cfg.wb_begin_param
    self.num_filter_parameters = self.channels

  def filter_param_regressor(self, features):
    log_wb_range = 0.5
    mask = np.array(((0, 1, 1)), dtype=np.float32).reshape(1, 3)
    # mask = np.array(((1, 0, 1)), dtype=np.float32).reshape(1, 3)
    print(mask.shape)
    assert mask.shape == (1, 3)
    features = features * mask
    color_scaling = tf.exp(tanh_range(-log_wb_range, log_wb_range)(features))
    # There will be no division by zero here unless the WB range lower bound is 0
    # normalize by luminance
    color_scaling *= 1.0 / (
        1e-5 + 0.27 * color_scaling[:, 0] + 0.67 * color_scaling[:, 1] +
        0.06 * color_scaling[:, 2])[:, None]
    return color_scaling

  def process(self, img, param, defog, IcA):
    return img * param[:, None, None, :]
    # return img

  # def visualize_filter(self, debug_info, canvas):
  #   scaling = debug_info['filter_parameters']
  #   s = canvas.shape[0]
  #   cv2.rectangle(canvas, (int(s * 0.2), int(s * 0.4)), (int(s * 0.8), int(
  #       s * 0.6)), list(map(float, scaling)), cv2.FILLED)


class ColorFilter(Filter):

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.curve_steps = cfg.curve_steps
    self.channels = int(net.shape[3])
    self.short_name = 'C'
    self.begin_filter_parameter = cfg.color_begin_param

    self.num_filter_parameters = self.channels * cfg.curve_steps

  def filter_param_regressor(self, features):
    color_curve = tf.reshape(
        features, shape=(-1, self.channels,
                         self.cfg.curve_steps))[:, None, None, :]
    color_curve = tanh_range(
        *self.cfg.color_curve_range, initial=1)(color_curve)
    return color_curve

  def process(self, img, param, defog, IcA):
    color_curve = param
    # There will be no division by zero here unless the color filter range lower bound is 0
    color_curve_sum = tf.reduce_sum(param, axis=4) + 1e-30
    total_image = img * 0
    for i in range(self.cfg.curve_steps):
      total_image += tf.clip_by_value(img - 1.0 * i / self.cfg.curve_steps, 0, 1.0 / self.cfg.curve_steps) * \
                     color_curve[:, :, :, :, i]
    total_image *= self.cfg.curve_steps / color_curve_sum
    return total_image

  # def visualize_filter(self, debug_info, canvas):
  #   curve = debug_info['filter_parameters']
  #   height, width = canvas.shape[:2]
  #   for i in range(self.channels):
  #     values = np.array([0] + list(curve[0][0][i]))
  #     values /= sum(values) + 1e-30
  #     scale = 1
  #     values *= scale
  #     for j in range(0, self.cfg.curve_steps):
  #       values[j + 1] += values[j]
  #     for j in range(self.cfg.curve_steps):
  #       p1 = tuple(
  #           map(int, (width / self.cfg.curve_steps * j, height - 1 -
  #                     values[j] * height)))
  #       p2 = tuple(
  #           map(int, (width / self.cfg.curve_steps * (j + 1), height - 1 -
  #                     values[j + 1] * height)))
  #       color = []
  #       for t in range(self.channels):
  #         color.append(1 if t == i else 0)
  #       cv2.line(canvas, p1, p2, tuple(color), thickness=1)


class ToneFilter(Filter):

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.curve_steps = cfg.curve_steps
    self.short_name = 'T'
    self.begin_filter_parameter = cfg.tone_begin_param

    self.num_filter_parameters = cfg.curve_steps

  def filter_param_regressor(self, features):
    tone_curve = tf.reshape(
        features, shape=(-1, 1, self.cfg.curve_steps))[:, None, None, :]
    tone_curve = tanh_range(*self.cfg.tone_curve_range)(tone_curve)
    return tone_curve

  def process(self, img, param, defog, IcA):
    # img = tf.minimum(img, 1.0)
    # param = tf.constant([[0.52, 0.53, 0.55, 1.9, 1.8, 1.7, 0.7, 0.6], [0.52, 0.53, 0.55, 1.9, 1.8, 1.7, 0.7, 0.6],
    #                       [0.52, 0.53, 0.55, 1.9, 1.8, 1.7, 0.7, 0.6], [0.52, 0.53, 0.55, 1.9, 1.8, 1.7, 0.7, 0.6],
    #                       [0.52, 0.53, 0.55, 1.9, 1.8, 1.7, 0.7, 0.6], [0.52, 0.53, 0.55, 1.9, 1.8, 1.7, 0.7, 0.6]])
    # param = tf.constant([[0.52, 0.53, 0.55, 1.9, 1.8, 1.7, 0.7, 0.6]])
    # param = tf.reshape(
    #     param, shape=(-1, 1, self.cfg.curve_steps))[:, None, None, :]

    tone_curve = param
    tone_curve_sum = tf.reduce_sum(tone_curve, axis=4) + 1e-30
    total_image = img * 0
    for i in range(self.cfg.curve_steps):
      total_image += tf.clip_by_value(img - 1.0 * i / self.cfg.curve_steps, 0, 1.0 / self.cfg.curve_steps) \
                     * param[:, :, :, :, i]
    # p_cons = [0.52, 0.53, 0.55, 1.9, 1.8, 1.7, 0.7, 0.6]
    # for i in range(self.cfg.curve_steps):
    #   total_image += tf.clip_by_value(img - 1.0 * i / self.cfg.curve_steps, 0, 1.0 / self.cfg.curve_steps) \
    #                  * p_cons[i]
    total_image *= self.cfg.curve_steps / tone_curve_sum
    img = total_image
    return img


  # def visualize_filter(self, debug_info, canvas):
  #   curve = debug_info['filter_parameters']
  #   height, width = canvas.shape[:2]
  #   values = np.array([0] + list(curve[0][0][0]))
  #   values /= sum(values) + 1e-30
  #   for j in range(0, self.curve_steps):
  #     values[j + 1] += values[j]
  #   for j in range(self.curve_steps):
  #     p1 = tuple(
  #         map(int, (width / self.curve_steps * j, height - 1 -
  #                   values[j] * height)))
  #     p2 = tuple(
  #         map(int, (width / self.curve_steps * (j + 1), height - 1 -
  #                   values[j + 1] * height)))
  #     cv2.line(canvas, p1, p2, (0, 0, 0), thickness=1)


class VignetFilter(Filter):

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'V'
    self.begin_filter_parameter = cfg.vignet_begin_param

    self.num_filter_parameters = 1

  def filter_param_regressor(self, features):
    return tf.sigmoid(features)

  def process(self, img, param):
    return img * 0  # + param[:, None, None, :]

  def get_num_mask_parameters(self):
    return 5

  # Input: no need for tanh or sigmoid
  # Closer to 1 values are applied by filter more strongly
  # no additional TF variables inside
  def get_mask(self, img, mask_parameters):
    with tf.name_scope(name='mask'):
      # Five parameters for one filter
      filter_input_range = 5
      assert mask_parameters.shape[1] == self.get_num_mask_parameters()
      mask_parameters = tanh_range(
          l=-filter_input_range, r=filter_input_range,
          initial=0)(mask_parameters)
      size = list(map(int, img.shape[1:3]))
      grid = np.zeros(shape=[1] + size + [2], dtype=np.float32)

      shorter_edge = min(size[0], size[1])
      for i in range(size[0]):
        for j in range(size[1]):
          grid[0, i, j,
               0] = (i + (shorter_edge - size[0]) / 2.0) / shorter_edge - 0.5
          grid[0, i, j,
               1] = (j + (shorter_edge - size[1]) / 2.0) / shorter_edge - 0.5
      grid = tf.constant(grid)
      # (Ax)^2 + (By)^2 + C
      inp = (grid[:, :, :, 0, None] * mask_parameters[:, None, None, 0, None]) ** 2 + \
            (grid[:, :, :, 1, None] * mask_parameters[:, None, None, 1, None]) ** 2 + \
            mask_parameters[:, None, None, 2, None] - filter_input_range
      # Sharpness and inversion
      inp *= self.cfg.maximum_sharpness * mask_parameters[:, None, None, 3,
                                                          None] / filter_input_range
      mask = tf.sigmoid(inp)
      # Strength
      mask *= mask_parameters[:, None, None, 4,
                              None] / filter_input_range * 0.5 + 0.5
      if not self.use_masking():
        print('* Masking Disabled')
        mask = mask * 0 + 1
      else:
        print('* Masking Enabled')
      print('mask', mask.shape)
    return mask

  # def visualize_filter(self, debug_info, canvas):
  #   brightness = float(debug_info['filter_parameters'][0])
  #   cv2.rectangle(canvas, (8, 40), (56, 52), (brightness, brightness,
  #                                             brightness), cv2.FILLED)
  #

class ContrastFilter(Filter):

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'Ct'
    self.begin_filter_parameter = cfg.contrast_begin_param

    self.num_filter_parameters = 1

  def filter_param_regressor(self, features):
    # return tf.sigmoid(features)
    return tf.tanh(features)

  def process(self, img, param, defog, IcA):
    luminance = tf.minimum(tf.maximum(rgb2lum(img), 0.0), 1.0)
    contrast_lum = -tf.cos(math.pi * luminance) * 0.5 + 0.5
    contrast_image = img / (luminance + 1e-6) * contrast_lum
    return lerp(img, contrast_image, param[:, :, None, None])
    # return lerp(img, contrast_image, 0.5)

  # def visualize_filter(self, debug_info, canvas):
  #   exposure = debug_info['filter_parameters'][0]
  #   cv2.rectangle(canvas, (8, 40), (56, 52), (1, 1, 1), cv2.FILLED)
  #   cv2.putText(canvas, 'Ct %+.2f' % exposure, (8, 48),
  #               cv2.FONT_HERSHEY_SIMPLEX, 0.3, (0, 0, 0))


class WNBFilter(Filter):

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'BW'
    self.begin_filter_parameter = cfg.wnb_begin_param

    self.num_filter_parameters = 1

  def filter_param_regressor(self, features):
    return tf.sigmoid(features)

  def process(self, img, param, defog, IcA):
    luminance = rgb2lum(img)
    return lerp(img, luminance, param[:, :, None, None])

  # def visualize_filter(self, debug_info, canvas):
  #   exposure = debug_info['filter_parameters'][0]
  #   cv2.rectangle(canvas, (8, 40), (56, 52), (1, 1, 1), cv2.FILLED)
  #   cv2.putText(canvas, 'B&W%+.2f' % exposure, (8, 48),
  #               cv2.FONT_HERSHEY_SIMPLEX, 0.3, (0, 0, 0))


class LevelFilter(Filter):

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'Le'
    self.begin_filter_parameter = cfg.level_begin_param

    self.num_filter_parameters = 2

  def filter_param_regressor(self, features):
    return tf.sigmoid(features)

  def process(self, img, param):
    lower = param[:, 0]
    upper = param[:, 1] + 1
    lower = lower[:, None, None, None]
    upper = upper[:, None, None, None]
    return tf.clip_by_value((img - lower) / (upper - lower + 1e-6), 0.0, 1.0)

  # def visualize_filter(self, debug_info, canvas):
  #   level = list(map(float, debug_info['filter_parameters']))
  #   level[1] += 1
  #   cv2.rectangle(canvas, (8, 40), (56, 52), (1, 1, 1), cv2.FILLED)
  #   cv2.putText(canvas, '%.2f %.2f' % tuple(level), (8, 48),
  #               cv2.FONT_HERSHEY_SIMPLEX, 0.25, (0, 0, 0))


class SaturationPlusFilter(Filter):

  def __init__(self, net, cfg):
    Filter.__init__(self, net, cfg)
    self.short_name = 'S+'
    self.begin_filter_parameter = cfg.saturation_begin_param

    self.num_filter_parameters = 1

  def filter_param_regressor(self, features):
    return tf.sigmoid(features)

  def process(self, img, param, defog, IcA):
    img = tf.minimum(img, 1.0)
    hsv = tf.image.rgb_to_hsv(img)
    s = hsv[:, :, :, 1:2]
    v = hsv[:, :, :, 2:3]
    # enhanced_s = s + (1 - s) * 0.7 * (0.5 - tf.abs(0.5 - v)) ** 2
    enhanced_s = s + (1 - s) * (0.5 - tf.abs(0.5 - v)) * 0.8
    hsv1 = tf.concat([hsv[:, :, :, 0:1], enhanced_s, hsv[:, :, :, 2:]], axis=3)
    full_color = tf.image.hsv_to_rgb(hsv1)

    param = param[:, :, None, None]
    color_param = param
    img_param = 1.0 - param

    return img * img_param + full_color * color_param

  # def visualize_filter(self, debug_info, canvas):
  #   exposure = debug_info['filter_parameters'][0]
  #   if canvas.shape[0] == 64:
  #     cv2.rectangle(canvas, (8, 40), (56, 52), (1, 1, 1), cv2.FILLED)
  #     cv2.putText(canvas, 'S %+.2f' % exposure, (8, 48),
  #                 cv2.FONT_HERSHEY_SIMPLEX, 0.3, (0, 0, 0))
  #   else:
  #     self.draw_high_res_text('Saturation %+.2f' % exposure, canvas)

util_filters.py implementation

Most of the codes irrelevant to filter.py are removed here, and only 4 functions lrelu, rgb2lum, tanh_range, lerp are kept.

def lrelu(x, leak=0.2, name="lrelu"):
  with tf.variable_scope(name):
    f1 = 0.5 * (1 + leak)
    f2 = 0.5 * (1 - leak)
    return f1 * x + f2 * abs(x)

def rgb2lum(image):
  image = 0.27 * image[:, :, :, 0] + 0.67 * image[:, :, :,
                                                  1] + 0.06 * image[:, :, :, 2]
  return image[:, :, :, None]


def tanh01(x):
  return tf.tanh(x) * 0.5 + 0.5

def tanh_range(l, r, initial=None):
  def get_activation(left, right, initial):
    def activation(x):
      if initial is not None:
        bias = math.atanh(2 * (initial - left) / (right - left) - 1)
      else:
        bias = 0
      return tanh01(x + bias) * (right - left) + left

    return activation

  return get_activation(l, r, initial)
  
def lerp(a, b, l):
  return (1 - l) * a + l * b

3. CNN-PP and DIP module training and optimization

3.1 loss design

Only the loss related to CNN-PP optimization is discussed here.

By observing the __build_nework function in yolov3.py, you can see that the final recovery_loss is implemented as: recovery_loss = tf.reduce_sum(tf.pow(filtered_image_batch - input_data_clean, 2.0))#/(2.0 * batch_size)
where filtered_image_batch is directly involved in the forward propagation of the yolov3 model, and filtered_image_batch 与 input_data_cleanthe mse loss is used to optimize the output of the CNN-PP module. Therefore, it can be seen that CNN-PP | DIP can actually be separated from IA-YOLO and used as a data optimization module alone.

def __build_nework(self, input_data, isp_flag, input_data_clean, defog_A, IcA):

        filtered_image_batch = input_data
        self.filter_params = input_data
        filter_imgs_series = []
        if isp_flag:
            # start_time = time.time()

            with tf.variable_scope('extract_parameters_2'):
                input_data = tf.image.resize_images(input_data, [256, 256], method=tf.image.ResizeMethod.BILINEAR)
                filter_features = common.extract_parameters_2(input_data, cfg, self.trainable)

            # filter_features = tf.random_normal([1, 15], 0.5, 0.1)
            filters = cfg.filters
            filters = [x(filtered_image_batch, cfg) for x in filters]
            filter_parameters = []
            for j, filter in enumerate(filters):
                with tf.variable_scope('filter_%d' % j):
                    print('    creating filter:', j, 'name:', str(filter.__class__), 'abbr.',
                          filter.get_short_name())
                    print('      filter_features:', filter_features.shape)

                    filtered_image_batch, filter_parameter = filter.apply(
                        filtered_image_batch, filter_features, defog_A, IcA)
                    filter_parameters.append(filter_parameter)
                    filter_imgs_series.append(filtered_image_batch)
                    print('      output:', filtered_image_batch.shape)

            self.filter_params = filter_parameters
            # end_time = time.time()
            # print('filters所用时间:', end_time - start_time)
        # input_data_shape = tf.shape(input_data)
        # batch_size = input_data_shape[0]
        recovery_loss = tf.reduce_sum(tf.pow(filtered_image_batch - input_data_clean, 2.0))#/(2.0 * batch_size)

Then the loss returned by the model body is as follows, it can be seen that it is the mse loss calculated based on filtered_image_batch and input_data_clean. Among them, filtered_image_batch is generated by the DIP module based on input_data, and input_data_clean does not know how it came about.

    def compute_loss(self, label_sbbox, label_mbbox, label_lbbox, true_sbbox, true_mbbox, true_lbbox):

        with tf.name_scope('smaller_box_loss'):
            loss_sbbox = self.loss_layer(self.conv_sbbox, self.pred_sbbox, label_sbbox, true_sbbox,
                                         anchors = self.anchors[0], stride = self.strides[0])

        with tf.name_scope('medium_box_loss'):
            loss_mbbox = self.loss_layer(self.conv_mbbox, self.pred_mbbox, label_mbbox, true_mbbox,
                                         anchors = self.anchors[1], stride = self.strides[1])

        with tf.name_scope('bigger_box_loss'):
            loss_lbbox = self.loss_layer(self.conv_lbbox, self.pred_lbbox, label_lbbox, true_lbbox,
                                         anchors = self.anchors[2], stride = self.strides[2])

        with tf.name_scope('giou_loss'):
            giou_loss = loss_sbbox[0] + loss_mbbox[0] + loss_lbbox[0]

        with tf.name_scope('conf_loss'):
            conf_loss = loss_sbbox[1] + loss_mbbox[1] + loss_lbbox[1]

        with tf.name_scope('prob_loss'):
            prob_loss = loss_sbbox[2] + loss_mbbox[2] + loss_lbbox[2]

        with tf.name_scope('recovery_loss'):
            recovery_loss = self.recovery_loss

        return giou_loss, conf_loss, prob_loss, recovery_loss

3.2 input_data_clean traceback

train related code

By tracing the code train.py , you can find that input_data_clean is the image returned by trainset.

In addition, the following code also reflects the data transmission process with and without fog, and the foreseeable fog process is more complicated. At the same time, there are various dark channel calculation methods for DefogFilter, which are expected to be used for DefogFilter. According to the blogger's understanding, defogging is extremely time-consuming, but in the IA-YOLO paper, the difference between train and train_lowlight is not carefully explained, and only the processing time of using CNN-PP and DIP is only increased by 13ms.

In train_lowlight.py , the data forword process is simpler, only usingnp.power(train_data[0], lowlight_param)模拟生成低亮度数据

    def train(self):
        self.sess.run(tf.global_variables_initializer())
        try:
            print('=> Restoring weights from: %s ... ' % self.initial_weight)
            self.loader.restore(self.sess, self.initial_weight)
        except:
            print('=> %s does not exist !!!' % self.initial_weight)
            print('=> Now it starts to train YOLOV3 from scratch ...')
            self.first_stage_epochs = 0

        def DarkChannel(im):
            b, g, r = cv2.split(im)
            dc = cv2.min(cv2.min(r, g), b);
            return dc

        def AtmLight(im, dark):
            [h, w] = im.shape[:2]
            imsz = h * w
            numpx = int(max(math.floor(imsz / 1000), 1))
            darkvec = dark.reshape(imsz, 1)
            imvec = im.reshape(imsz, 3)

            indices = darkvec.argsort(0)
            indices = indices[(imsz - numpx):imsz]

            atmsum = np.zeros([1, 3])
            for ind in range(1, numpx):
                atmsum = atmsum + imvec[indices[ind]]

            A = atmsum / numpx
            return A

        def DarkIcA(im, A):
            im3 = np.empty(im.shape, im.dtype)
            for ind in range(0, 3):
                im3[:, :, ind] = im[:, :, ind] / A[0, ind]
            return DarkChannel(im3)

        for epoch in range(1, 1+self.first_stage_epochs+self.second_stage_epochs):
            if epoch <= self.first_stage_epochs:
                train_op = self.train_op_with_frozen_variables
            else:
                train_op = self.train_op_with_all_variables

            pbar = tqdm(self.trainset)
            train_epoch_loss, test_epoch_loss = [], []


            for train_data in pbar:
                if args.fog_FLAG:
                    # start_time = time.time()
                    dark = np.zeros((train_data[0].shape[0], train_data[0].shape[1], train_data[0].shape[2]))
                    defog_A = np.zeros((train_data[0].shape[0], train_data[0].shape[3]))
                    IcA = np.zeros((train_data[0].shape[0], train_data[0].shape[1], train_data[0].shape[2]))
                    if DefogFilter in cfg.filters:
                        # print("**************************")
                        for i in range(train_data[0].shape[0]):
                            dark_i = DarkChannel(train_data[0][i])
                            defog_A_i = AtmLight(train_data[0][i], dark_i)
                            IcA_i = DarkIcA(train_data[0][i], defog_A_i)
                            dark[i, ...] = dark_i
                            defog_A[i, ...] = defog_A_i
                            IcA[i, ...] = IcA_i

                    IcA = np.expand_dims(IcA, axis=-1)


                    _, summary, train_step_loss, train_step_loss_recovery, global_step_val = self.sess.run(
                        [train_op, self.write_op, self.loss, self.recovery_loss, self.global_step], feed_dict={
    
    
                            self.input_data: train_data[0],
                            self.defog_A: defog_A,
                            self.IcA: IcA,
                            self.label_sbbox: train_data[1],
                            self.label_mbbox: train_data[2],
                            self.label_lbbox: train_data[3],
                            self.true_sbboxes: train_data[4],
                            self.true_mbboxes: train_data[5],
                            self.true_lbboxes: train_data[6],
                            self.input_data_clean: train_data[7],
                            self.trainable: True,
                        })


                else:
                    _, summary, train_step_loss, global_step_val = self.sess.run(
                        [train_op, self.write_op, self.loss, self.global_step], feed_dict={
    
    
                            self.input_data: train_data[7],
                            self.label_sbbox: train_data[1],
                            self.label_mbbox: train_data[2],
                            self.label_lbbox: train_data[3],
                            self.true_sbboxes: train_data[4],
                            self.true_mbboxes: train_data[5],
                            self.true_lbboxes: train_data[6],
                            self.input_data_clean: train_data[7],
                            self.trainable: True,
                        })
                train_epoch_loss.append(train_step_loss)
                self.summary_writer.add_summary(summary, global_step_val)

                pbar.set_description("train loss: %.2f" % train_step_loss)

Through the analysis of train-related codes, it is found that input_data_clean is directly returned by dataloader

dataset related code

Found the code in the __next__ function in dataset.py: image, bboxes, clean_image = self.parse_annotation(annotation)
retracing the parse_annotation function further:

  • 1. Found the key code clean_image, bboxes = utils.image_preporcess(np.copy(image), [self.train_input_size, self.train_input_size], np.copy(bboxes)), and found through analysis that image_preporcess is only an image resize function
  • 2. By tracing back if random.randint(0, 2) > 0:the code of reading the image in the branch of the parse_annotation function, it is found as follows. You can see that foggy_image and clean_image are read from different image paths.
       image = cv2.imread(image_path)
       img_name = image_path.split('/')[-1]
       image_name = img_name.split('.')[0]
       image_name_index = img_name.split('.')[1]
       bboxes = np.array([list(map(lambda x: int(float(x)), box.split(','))) for box in line[1:]])
       if random.randint(0, 2) > 0:
           beta = random.randint(0, 9)
           beta = 0.01 * beta + 0.05
           if self.data_train_flag:
               img_name = args.vocfog_traindata_dir + image_name \
                          + '_' + ("%.2f" % beta) + '.' + image_name_index
           else:
               img_name = args.vocfog_valdata_dir + image_name \
                          + '_' + ("%.2f" % beta) + '.' + image_name_index

           foggy_image = cv2.imread(img_name)
           clean_image = image 
        '''aug 代码'''
       return foggy_image, bboxes, clean_image
  • 3. Through the above analysis, it can be found that the image in the training process is actually a foggy image 实则为data_make.py根据原始图像生成的, while the clean_image is the original image.

4. Generate foggy pictures

In the IA-YOLO paper, it is mentioned that the reverse operation based on the atmospheric scattering model uses code to generate different levels of foggy pictures. The code is in data_make.py. After extracting the core function, the following code is formed.

import numpy as np
import os
import cv2
import math
from numba import jit
import random
from PIL import Image
from ImgUilt import *

#生成带雾图片,i为雾气的等级
@jit()
def AddHaz_loop(img_f, i):
    (row, col, chs) = img_f.shape
    A = 0.5  
    # beta = 0.08  
    beta = 0.01 * i + 0.05# 0.03
    size = math.sqrt(max(row, col)) 
    center = (row // 2, col // 2)  
    for j in range(row):
        for l in range(col):
            d = -0.04 * math.sqrt((j - center[0]) ** 2 + (l - center[1]) ** 2) + size
            td = math.exp(-beta * d)
            img_f[j][l][:] = img_f[j][l][:] * td + A * (1 - td)
    img_f = np.clip(img_f*255, 0, 255).astype(np.uint8)
    return img_f

path = r"D:\实战项目\datasets\coco128\images\train2017\000000000071.jpg"
image = np.array(Image.open(path))
all_list=[image]
for i in range(11):
    img_f = image/255
    (row, col, chs) = image.shape
    foggy_image = AddHaz_loop(img_f, i)
    all_list.append(foggy_image)
    #img=Image.fromarray(img_f)
myimshowsCL(all_list,rows=4,cols=3)

The generated foggy image is shown below, where the myimshowsCL function comes from the python tool method 28 2.3 Single image, multi-image, and grid display
insert image description here

data_make.py

The following code generates a foggy image for reading voc data.

import numpy as np
import os
import cv2
import math
from numba import jit
import random

# only use the image including the labeled instance objects for training
def load_annotations(annot_path):
    print(annot_path)
    with open(annot_path, 'r') as f:
        txt = f.readlines()
        annotations = [line.strip() for line in txt if len(line.strip().split()[1:]) != 0]
    return annotations


# print('*****************Add haze offline***************************')
def parse_annotation(annotation):

    line = annotation.split()
    image_path = line[0]
    # print(image_path)
    img_name = image_path.split('/')[-1]
    # print(img_name)
    image_name = img_name.split('.')[0]
    # print(image_name)
    image_name_index = img_name.split('.')[1]
    # print(image_name_index)

#'/data/vdd/liuwenyu/data_vocfog/train/JPEGImages/'
    if not os.path.exists(image_path):
        raise KeyError("%s does not exist ... " %image_path)
    image = cv2.imread(image_path)
    for i in range(10):
        @jit()
        def AddHaz_loop(img_f, center, size, beta, A):
            (row, col, chs) = img_f.shape

            for j in range(row):
                for l in range(col):
                    d = -0.04 * math.sqrt((j - center[0]) ** 2 + (l - center[1]) ** 2) + size
                    td = math.exp(-beta * d)
                    img_f[j][l][:] = img_f[j][l][:] * td + A * (1 - td)
            return img_f

        img_f = image/255
        (row, col, chs) = image.shape
        A = 0.5  
        # beta = 0.08  
        beta = 0.01 * i + 0.05
        size = math.sqrt(max(row, col)) 
        center = (row // 2, col // 2)  
        foggy_image = AddHaz_loop(img_f, center, size, beta, A)
        img_f = np.clip(foggy_image*255, 0, 255)
        img_f = img_f.astype(np.uint8)
        img_name = '/data/vdd/liuwenyu/data_vocfog/train/JPEGImages/' + image_name \
                   + '_' + ("%.2f"%beta) + '.' + image_name_index
        #img_name = '/data/vdd/liuwenyu/data_vocfog/val/JPEGImages/' + image_name \
        #   + '_' + ("%.2f"%beta) + '.' + image_name_index
        cv2.imwrite(img_name, img_f)


if __name__ == '__main__':
    an = load_annotations('/home/liuwenyu.lwy/code/defog_yolov3/data/dataset/voc_norm_train.txt')
    #an = load_annotations('/home/liuwenyu.lwy/code/defog_yolov3/data/dataset/voc_norm_test.txt')
    ll = len(an)
    print(ll)
    for j in range(ll):
        parse_annotation(an[j])

Guess you like

Origin blog.csdn.net/a486259/article/details/132520781