Image segmentation suite PaddleSeg comprehensive analysis (four) data preprocessing

This part mainly introduces the data enhancement part. In the PaddleSeg suite, the data enhancement part is defined in transforms, which is similar to Pytorch. In this way, some basic image processing methods (scaling, normalization, etc.) and data enhancement (random cropping) , Flip, color dithering) are unified, and your own new data enhancement methods can also be added here.

The code entry for data enhancement comes from the Config class like the Dataset. When accessing the transfroms member of the config object, the corresponding object will be created based on the yaml file.

For example, the yaml file configuration is as follows:

transforms:
    #根据Config类部分的代码解读,我们已经了解到type的值代表了具体的类名,
    #所以再找个transforms中会创建ResizeStepScaling、RandomPaddingCrop、
    #RandomHorizontalFlip、RandomDistort和Normalize这个几个类。
    #type后面的键值对则是构建这几个类时需要传递的参数。
    - type: ResizeStepScaling
      min_scale_factor: 0.5
      max_scale_factor: 2.0
      scale_step_size: 0.25
    - type: RandomPaddingCrop
      crop_size: [1024, 512]
    - type: RandomHorizontalFlip
    - type: RandomDistort
      brightness_range: 0.4
      contrast_range: 0.4
      saturation_range: 0.4
    - type: Normalize

Below we introduce in detail the implementation of several representative preprocessing and data enhancement classes in transform, and other classes may be supplemented with code interpretation in the future.
Related classes are defined in the paddleseg/transforms/transforms.py file.

The first is the Compose class, which is mainly a collection of multiple classes. It saves a list of methods for storing image processing and enhancement. It is called sequentially through the __call__ method. The following is the specific implementation.
The first is Compose's construction method code:

    def __init__(self, transforms, to_rgb=True):
        #传递进来的transforms参数需要是一个列表,列表包含了一个或者多个图像处理或者增强的方法。
        if not isinstance(transforms, list):
            raise TypeError('The transforms must be a list!')
        if len(transforms) < 1:
            raise ValueError('The length of transforms ' + \
                             'must be equal or larger than 1!')
        self.transforms = transforms
        #记录是否需要将图片转换为RGB
        self.to_rgb = to_rgb

When a __call__ method is defined in a class, the method can be executed directly by the object name. For example, an object p of class P, when p(parm) is executed, the __call__ method in class P will be called.
This method is adopted by transform. The __call__ method is implemented in all data preprocessing and enhancement classes, which can be regarded as an anonymous protocol.
The __call__ method code in the Compose class is as follows:

def __call__(self, im, label=None):
        #首先通过Opencv读取样本图片数据,保存在im中类型为float32,im是一个ndarray类型变量。
        if isinstance(im, str):
            im = cv2.imread(im).astype('float32')
        if isinstance(label, str):
        	#通过pillow打开标签文件,这里使用的pillow原因是因为标注文件有可能是伪彩色标注,使用调色板模式,通过pillow打开
            #则可以直接获取标注文件每一个像素点值为调色板中的索引,这样就可以直接定义为类别号。这样同时兼容灰度标注与伪彩色标注。
            label = np.asarray(Image.open(label))
        if im is None:
            raise ValueError('Can\'t read The image file {}!'.format(im))
        #因为opencv打开的图片,像素点排序默认是BGR,这里如果需要可以转换成RGB。
        if self.to_rgb:
            im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
        #遍历transforms列表,执行数据预处理与增强。
        for op in self.transforms:
            outputs = op(im, im_info, label)
            im = outputs[0]
            if len(outputs) == 2:
                label = outputs[1]
        #这里将图像数据的矩阵进行转置,将通道放在高度和宽度之前。
        #比如一张图片为高度为480,宽度为640,通道数为3代表RGB图像。它的矩阵形状为[480, 640, 3]
        #经过下面代码的转置操作则变为[3, 480, 640]
        im = np.transpose(im, (2, 0, 1))
        return (im, label)

The following begins to introduce the code for image preprocessing and enhancement, the first is the RandomHorizontalFlip class:

#通过类名可以知道,该类对图像进行随机的水平翻转。
class RandomHorizontalFlip:
    """
    Flip an image horizontally with a certain probability.

    Args:
        prob (float, optional): A probability of horizontally flipping. Default: 0.5.
    """
    #构造时传入prob参数,代表概率,当随机数小于这个概率时则翻转图像。
    #有prob的概率翻转图像。
    def __init__(self, prob=0.5):
        self.prob = prob

    def __call__(self, im, label=None):
        if random.random() < self.prob:
            #进行图像翻转
            im = functional.horizontal_flip(im)
            #如果同时传入了标签图像则都需要翻转。一般是在训练时会传入标签图像。
            if label is not None:
                label = functional.horizontal_flip(label)
        if label is None:
            return (im,)
        else:
            return (im, label)

The RandomVerticalFlip class is similar to the previous class in that it performs a random vertical flip of the picture. The parameters are consistent with the RandomHorizontalFlip class.

class RandomVerticalFlip:

    def __init__(self, prob=0.1):
        self.prob = prob

    def __call__(self, im, label=None):
        if random.random() < self.prob:
            im = functional.vertical_flip(im)
            if label is not None:
                label = functional.vertical_flip(label)
        if label is None:
            return (im,)
        else:
            return (im, label)

Resize is the most commonly used method in image processing. It scales sample images and label images. Let's interpret the code below.

class Resize:
    # The interpolation mode
    #插值方法,这里代表了不同的插值算法。
    interp_dict = {
        'NEAREST': cv2.INTER_NEAREST,
        'LINEAR': cv2.INTER_LINEAR,
        'CUBIC': cv2.INTER_CUBIC,
        'AREA': cv2.INTER_AREA,
        'LANCZOS4': cv2.INTER_LANCZOS4
    }

    def __init__(self, target_size=(512, 512), interp='LINEAR'):
        #验证插值方法参数是否正确,如果interp不在interp_dict字典里,同时interp的值还不是RANDOM则抛出异常。
        self.interp = interp
        if not (interp == "RANDOM" or interp in self.interp_dict):
            raise ValueError("`interp` should be one of {}".format(
                self.interp_dict.keys()))
        #验证target_size参数是否正确,只能包含两个元素,分别代码了图像的高与宽。
        #如果不正确则抛出异常。
        if isinstance(target_size, list) or isinstance(target_size, tuple):
            if len(target_size) != 2:
                raise ValueError(
                    '`target_size` should include 2 elements, but it is {}'.
                    format(target_size))
        else:
            raise TypeError(
                "Type of `target_size` is invalid. It should be list or tuple, but it is {}"
                .format(type(target_size)))
        #保存target_size为成员变量
        self.target_size = target_size

    def __call__(self, im, label=None):
        #需要保证图像的类型为ndarray,通过Opencv读取的默认是该类型,如果是标签图片通过PIL读取
        #则需要通过asarray等方法转换。
        if not isinstance(im, np.ndarray):
            raise TypeError("Resize: image type is not numpy.")
        #图片需要是3阶矩阵,标签图片需要新建一个维度。
        if len(im.shape) != 3:
            raise ValueError('Resize: image is not 3-dimensional.')
        #如果interp为RANDOM则随机选取一种插值算法。否则使用指定的插值算法。
        if self.interp == "RANDOM":
            interp = random.choice(list(self.interp_dict.keys()))
        else:
            interp = self.interp
        #对图像进行插值缩放。
        im = functional.resize(im, self.target_size, self.interp_dict[interp])
        #如果传入了标签图片数据,也需要进行缩放,这里注意的是标签图片数据只能使用INTER_NEAREST方法,否则
        #会影响标签数据的准确性。
        if label is not None:
            label = functional.resize(label, self.target_size,
                                      cv2.INTER_NEAREST)
        #返回数据。
        if label is None:
            return (im,)
        else:
            return (im, label)

The following introduces the ResizeByLong class, which is similar to Resize, except that you only need to specify the length of the long side in ResizeByLong, and then
scale the image proportionally.

class ResizeByLong:

    def __init__(self, long_size):
        #保存长边长度为成员变量。
        self.long_size = long_size

    def __call__(self, im, label=None):

        if im_info is None:
            im_info = list()

        im_info.append(('resize', im.shape[:2]))
        #对图片进行缩放。
        im = functional.resize_long(im, self.long_size)
        #这里同样对标签图片缩放需要使用INTER_NEAREST算法,保证准确率。
        if label is not None:
            label = functional.resize_long(label, self.long_size,
                                           cv2.INTER_NEAREST)

        if label is None:
            return (im,)
        else:
            return (im, label)

ResizeStepScaling is also a commonly used scaling method. As mentioned earlier, this method is used in the YAML configuration to enhance the data of the sample.
It has three parameters min_scale_factor, max_scale_factor and scale_step_size, which are also reflected in the YAML configuration file.
This class is specifically described below . Interpretation of the code.

class ResizeStepScaling:
    def __init__(self,
                 min_scale_factor=0.75,
                 max_scale_factor=1.25,
                 scale_step_size=0.25):
        #在构造方法中,主要判断一下参数的合法性,然后将参数保存为成员变量。
        if min_scale_factor > max_scale_factor:
            raise ValueError(
                'min_scale_factor must be less than max_scale_factor, '
                'but they are {} and {}.'.format(min_scale_factor,
                                                 max_scale_factor))
        self.min_scale_factor = min_scale_factor
        self.max_scale_factor = max_scale_factor
        self.scale_step_size = scale_step_size

    def __call__(self, im, label=None):
        #如果最小的缩放因子和最大的缩放因子相等,则本次缩放的因子则会它们的值。
        if self.min_scale_factor == self.max_scale_factor:
            scale_factor = self.min_scale_factor
        #如果缩放的随机步长为0,则在最小和最大的缩放因子之间随机选择一个数。
        elif self.scale_step_size == 0:
            scale_factor = np.random.uniform(self.min_scale_factor,
                                             self.max_scale_factor)
		#如果步长不为0,则需要计算在最大值和最小值之间包含多少个步长。
        #然后将最小值和最大值之间,通过根据步长的个数,分割出数值,
        #对这些数值进行随机,选择第一个元素作为本次的缩放因子。
        else:
            num_steps = int((self.max_scale_factor - self.min_scale_factor) /
                            self.scale_step_size + 1)
            scale_factors = np.linspace(self.min_scale_factor,
                                        self.max_scale_factor,
                                        num_steps).tolist()
            np.random.shuffle(scale_factors)
            scale_factor = scale_factors[0]
        #分别将缩放因子乘以高和宽,得到新的高宽。
        w = int(round(scale_factor * im.shape[1]))
        h = int(round(scale_factor * im.shape[0]))
        #用新的高宽对图像进行缩放处理。
        im = functional.resize(im, (w, h), cv2.INTER_LINEAR)
        #同样如果传递了标签图片数据,要使用INTER_NEAREST方法进行插值缩放保证数据准确性。
        if label is not None:
            label = functional.resize(label, (w, h), cv2.INTER_NEAREST)

        if label is None:
            return (im,)
        else:
            return (im, label)

RandomPaddingCrop is also an image enhancement method used in the previous YAML configuration file. Let's interpret its code below.

class RandomPaddingCrop:
    def __init__(self,
                 crop_size=(512, 512),
                 im_padding_value=(127.5, 127.5, 127.5),
                 label_padding_value=255):
        #检测构造时传入的参数正确性,并保存为成员变量。
        if isinstance(crop_size, list) or isinstance(crop_size, tuple):
            if len(crop_size) != 2:
                raise ValueError(
                    'Type of `crop_size` is list or tuple. It should include 2 elements, but it is {}'
                    .format(crop_size))
        else:
            raise TypeError(
                "The type of `crop_size` is invalid. It should be list or tuple, but it is {}"
                .format(type(crop_size)))
        self.crop_size = crop_size
        self.im_padding_value = im_padding_value
        self.label_padding_value = label_padding_value

    def __call__(self, im, label=None):
        #如果传入的crop_size为整型,则需要裁减宽高都为crop_size。
        if isinstance(self.crop_size, int):
            crop_width = self.crop_size
            crop_height = self.crop_size
        #如果传入的是列表或元组则分别对宽高进行赋值。
        else:
            crop_width = self.crop_size[0]
            crop_height = self.crop_size[1]
        
        img_height = im.shape[0]
        img_width = im.shape[1]
        #如果图像原始宽高与需要裁减的宽高一致,则直接返回图像,不做任何处理。
        if img_height == crop_height and img_width == crop_width:
            if label is None:
                return (im,)
            else:
                return (im, label)
        else:
            #计算高和宽分别需要填充的长度。
            pad_height = max(crop_height - img_height, 0)
            pad_width = max(crop_width - img_width, 0)
            #如果裁减尺寸大于图像尺寸,则对图像进行填充扩展。
            if (pad_height > 0 or pad_width > 0):
                im = cv2.copyMakeBorder(
                    im,
                    0,
                    pad_height,
                    0,
                    pad_width,
                    cv2.BORDER_CONSTANT,
                    value=self.im_padding_value)
                #同样对应的标签图片也需要填充。
                if label is not None:
                    label = cv2.copyMakeBorder(
                        label,
                        0,
                        pad_height,
                        0,
                        pad_width,
                        cv2.BORDER_CONSTANT,
                        value=self.label_padding_value)
                #获得填充后的图像尺寸。
                img_height = im.shape[0]
                img_width = im.shape[1]
            #如果需要裁剪的尺寸大于0,则在img_height和crop_height的差值之间随机一个整数,作为高度裁剪的起点,
            #宽度同理。
            if crop_height > 0 and crop_width > 0:
                h_off = np.random.randint(img_height - crop_height + 1)
                w_off = np.random.randint(img_width - crop_width + 1)
				#以crop_height为高度、crop_width为宽度,h_off和w_off分别作为起点对图片进行裁剪。
                im = im[h_off:(crop_height + h_off), w_off:(
                    w_off + crop_width), :]
                #同样,如果传递了标签图片,也需要进行裁剪,与样本图片保持一致。
                if label is not None:
                    label = label[h_off:(crop_height + h_off), w_off:(
                        w_off + crop_width)]
        if label is None:
            return (im,)
        else:
            return (im, label)

Image standardization is a commonly used method for image preprocessing, which is used in training, verification, and testing phases.
After using this method, the convergence of the model will be accelerated and the accuracy of the model will be improved.

Let's interpret the code of this method.

class Normalize:
    #构造方法需要传入RGB三个像素点的平均值和标准差,一般通过统计数据集中的像素点的值获得。
    def __init__(self, mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)):
        #这里保存均值和标准差,同时需要验证参数合法性。
        self.mean = mean
        self.std = std
        if not (isinstance(self.mean, (list, tuple))
                and isinstance(self.std, (list, tuple))):
            raise ValueError(
                "{}: input type is invalid. It should be list or tuple".format(
                    self))
        from functools import reduce
        if reduce(lambda x, y: x * y, self.std) == 0:
            raise ValueError('{}: std is invalid!'.format(self))

    def __call__(self, im, label=None):
    	#对均值和标准差的维度进行变换,方便与图形进行计算,变换后的维度为[1,1,3]
        mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
        std = np.array(self.std)[np.newaxis, np.newaxis, :]
        #对图像进行标准化处理。
        im = functional.normalize(im, mean, std)

        if label is None:
            return (im,)
        else:
            return (im, label)

The above is the code interpretation of the commonly used image preprocessing and enhancement part.

I believe that as the code of the PaddleSeg suite continues to improve, the methods of processing and enhancement will gradually increase, and the code interpretation of the new method will be added to this chapter in the future.

PaddleSeg warehouse address: https://github.com/PaddlePaddle/PaddleSeg

Guess you like

Origin blog.csdn.net/txyugood/article/details/111033713