About the preprocessing transform of image segmentation

Table of contents

1 Introduction

2. About the resize problem in segmentation

3. Split transform

3.1 Random scaling RandomResize

3.2 Random Horizontal Flip RandomHorizontalFlip

3.3 Random Vertical Flip RandomVerticalFlip

3.4 Center cropping RandomCrop

3.5 ToTensor

3.6 normalization

3.7 Compose

4. Visualization of preprocessing results


1 Introduction

The preprocessing of image segmentation is not as easy to operate as classification, because the label of classification is a category, and the operations of image enhancement are all operations on the original image.

The label and img of the image segmentation are strictly corresponding, or the spatial resolution (h*w) of the two is the same and the corresponding positions of the pixels cannot be changed . Otherwise, supervised learning is useless. In deep learning, data enhancement is indispensable, especially for medical images with less data.

Therefore, this chapter mainly talks about data enhancement in image segmentation

Here is a preprocessed visualization of the DRIVE dataset:

2. About the resize problem in segmentation

The resize problem of image segmentation, I have not figured it out....

The final purpose of segmentation should be to extract the foreground from the image, so the size of the two must be guaranteed to be the same. However, for example, network inputs such as unet have a fixed size of 480*480, so the final segmentation resolution is 480*480, which is obviously different from the original image.

Now whether it is classification or the input of the segmentation network does not need to be consistent with the original paper, the network has been optimized, such as the largest pooling layer, etc....

No matter how the network is optimized, the resize operation is added to most preprocessing. Then it can be guaranteed that the resolution of the input image is the same as that of the output, but the most original image is still inconsistent. For example, the original 512*512, resize 480*480 input to the network to generate a 480*480 segmented image, 480 and 512 are not the same

Although the final segmented image can also be restored to the original size by resize. But interpolation becomes a problem again, and better linear interpolation will cause the gray value of the segmented image to change. For example, the segmented image is a binary image, the background is 0 and the foreground is 255. Interpolation will cause any number from 0-255 to become a grayscale image. Of course, the nearest neighbor interpolation can avoid this problem, but the nearest neighbor interpolation is obviously not a good choice in image processing.

I thought about it before, using bilinear interpolation resize to segment the image, and then using threshold processing to generate a binary image. However, such a method is not only troublesome, but also has many problems, and it violates the idea of ​​end to end

The following is purely personal imagination...for reference only...

Therefore, the solution is to randomly resize the training image. For example, the input to the network is 480*480, then randomly scale the training image to any size between 300-500, and then crop it to 480* 480 inputs to the segmentation network

The advantage of this is that the network will not be sensitive to simple image scaling

Then, when splitting randomly, there is no need to resize, just input the original image directly

3. Split transform

As follows, the test code for image preprocessing in the segmentation task

Among them, just ensure that img and label are transformed at the same time

3.1 Random scaling RandomResize

As follows, an integer is directly randomly generated at the given min and max, and then resized.

The segmented label image should use the nearest neighbor algorithm, otherwise the label after resize is not a binary image

class RandomResize(object):
    def __init__(self, min_size, max_size=None):
        self.min_size = min_size
        if max_size is None:
            max_size = min_size
        self.max_size = max_size

    def __call__(self, image, target):
        size = random.randint(self.min_size, self.max_size)
        # 这里size传入的是int类型,所以是将图像的最小边长缩放到size大小
        image = F.resize(image, size)
        target = F.resize(target, size, interpolation=T.InterpolationMode.NEAREST)
        return image, target

3.2 Random Horizontal Flip RandomHorizontalFlip

flip_prob is the probability of flipping

class RandomHorizontalFlip(object):
    def __init__(self, flip_prob):
        self.flip_prob = flip_prob

    def __call__(self, image, target):
        if random.random() < self.flip_prob:
            image = F.hflip(image)
            target = F.hflip(target)
        return image, target

3.3 Random Vertical Flip RandomVerticalFlip

same as flipped horizontally

class RandomVerticalFlip(object):
    def __init__(self, flip_prob):
        self.flip_prob = flip_prob

    def __call__(self, image, target):
        if random.random() < self.flip_prob:
            image = F.vflip(image)
            target = F.vflip(target)
        return image, target

3.4 Center cropping RandomCrop

The code for center cropping is as follows. It should be noted that because the image is likely to be insufficient for cropping, it needs to be filled

class RandomCrop(object):
    def __init__(self, size):
        self.size = size

    def __call__(self, image, target):
        image = pad_if_smaller(image, self.size)
        target = pad_if_smaller(target, self.size, fill=255)
        crop_params = T.RandomCrop.get_params(image, (self.size, self.size))
        image = F.crop(image, *crop_params)
        target = F.crop(target, *crop_params)
        return image, target

Filling code, where filling 255 represents the area that is not interested

def pad_if_smaller(img, size, fill=0):
    # 如果图像最小边长小于给定size,则用数值fill进行padding
    min_size = min(img.size)
    if min_size < size:
        ow, oh = img.size
        padh = size - oh if oh < size else 0
        padw = size - ow if ow < size else 0
        img = F.pad(img, (0, 0, padw, padh), fill=fill)
    return img

3.5 ToTensor

Here the label cannot implement the official totensor method, because of normalization, the gray value of the foreground pixel will be changed

dtype is because the cross-entropy loss needs to be used, and it needs to be an integer, and there cannot be a channel in the dimension of the label

class ToTensor(object):
    def __call__(self, image, target):
        image = F.to_tensor(image)
        target = torch.as_tensor(np.array(target), dtype=torch.int64)
        return image, target

3.6 normalization

The implementation of normalization is also very simple

class Normalize(object):
    def __init__(self, mean, std):
        self.mean = mean
        self.std = std

    def __call__(self, image, target):
        image = F.normalize(image, mean=self.mean, std=self.std)
        return image, target

3.7 Compose

Just implement the transform one by one

class Compose(object):
    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, image, target):
        for t in self.transforms:
            image, target = t(image, target)
        return image, target

4. Visualization of preprocessing results

Just change it to this in the dataset

After loading the data, you can call it like this

 

Test code:

The gray value in the label is only 0 1 255

There is no channel in label

# 可视化数据
def plot(data_loader):
    plt.figure(figsize=(12,8))
    imgs,labels = data_loader
    for i,(x,y) in enumerate(zip(imgs,labels)):
        x = np.transpose(x.numpy(),(1,2,0))
        x[:,:,0] = x[:,:,0]*0.127 + 0.709       # 去 normalization
        x[:,:,1] = x[:,:,1]*0.079 + 0.381
        x[:,:,2] = x[:,:,2]*0.043 + 0.224
        y = y.numpy()

        # print(np.unique(y))   # 0 1 255
        # print(x.shape)      # 480*480*3
        # print(y.shape)      # 480*480

        plt.subplot(2,4,i+1)
        plt.imshow(x)

        plt.subplot(2,4,i+5)
        plt.imshow(y)
    plt.show()

Show results:

In the dataset, change the foreground pixel to 120, and you can see the details of the label

 

Guess you like

Origin blog.csdn.net/qq_44886601/article/details/130112899