Table of contents
2. About the resize problem in segmentation
3.1 Random scaling RandomResize
3.2 Random Horizontal Flip RandomHorizontalFlip
3.3 Random Vertical Flip RandomVerticalFlip
3.4 Center cropping RandomCrop
4. Visualization of preprocessing results
1 Introduction
The preprocessing of image segmentation is not as easy to operate as classification, because the label of classification is a category, and the operations of image enhancement are all operations on the original image.
The label and img of the image segmentation are strictly corresponding, or the spatial resolution (h*w) of the two is the same and the corresponding positions of the pixels cannot be changed . Otherwise, supervised learning is useless. In deep learning, data enhancement is indispensable, especially for medical images with less data.
Therefore, this chapter mainly talks about data enhancement in image segmentation
Here is a preprocessed visualization of the DRIVE dataset:
2. About the resize problem in segmentation
The resize problem of image segmentation, I have not figured it out....
The final purpose of segmentation should be to extract the foreground from the image, so the size of the two must be guaranteed to be the same. However, for example, network inputs such as unet have a fixed size of 480*480, so the final segmentation resolution is 480*480, which is obviously different from the original image.
Now whether it is classification or the input of the segmentation network does not need to be consistent with the original paper, the network has been optimized, such as the largest pooling layer, etc....
No matter how the network is optimized, the resize operation is added to most preprocessing. Then it can be guaranteed that the resolution of the input image is the same as that of the output, but the most original image is still inconsistent. For example, the original 512*512, resize 480*480 input to the network to generate a 480*480 segmented image, 480 and 512 are not the same
Although the final segmented image can also be restored to the original size by resize. But interpolation becomes a problem again, and better linear interpolation will cause the gray value of the segmented image to change. For example, the segmented image is a binary image, the background is 0 and the foreground is 255. Interpolation will cause any number from 0-255 to become a grayscale image. Of course, the nearest neighbor interpolation can avoid this problem, but the nearest neighbor interpolation is obviously not a good choice in image processing.
I thought about it before, using bilinear interpolation resize to segment the image, and then using threshold processing to generate a binary image. However, such a method is not only troublesome, but also has many problems, and it violates the idea of end to end
The following is purely personal imagination...for reference only...
Therefore, the solution is to randomly resize the training image. For example, the input to the network is 480*480, then randomly scale the training image to any size between 300-500, and then crop it to 480* 480 inputs to the segmentation network
The advantage of this is that the network will not be sensitive to simple image scaling
Then, when splitting randomly, there is no need to resize, just input the original image directly
3. Split transform
As follows, the test code for image preprocessing in the segmentation task
Among them, just ensure that img and label are transformed at the same time
3.1 Random scaling RandomResize
As follows, an integer is directly randomly generated at the given min and max, and then resized.
The segmented label image should use the nearest neighbor algorithm, otherwise the label after resize is not a binary image
class RandomResize(object):
def __init__(self, min_size, max_size=None):
self.min_size = min_size
if max_size is None:
max_size = min_size
self.max_size = max_size
def __call__(self, image, target):
size = random.randint(self.min_size, self.max_size)
# 这里size传入的是int类型,所以是将图像的最小边长缩放到size大小
image = F.resize(image, size)
target = F.resize(target, size, interpolation=T.InterpolationMode.NEAREST)
return image, target
3.2 Random Horizontal Flip RandomHorizontalFlip
flip_prob is the probability of flipping
class RandomHorizontalFlip(object):
def __init__(self, flip_prob):
self.flip_prob = flip_prob
def __call__(self, image, target):
if random.random() < self.flip_prob:
image = F.hflip(image)
target = F.hflip(target)
return image, target
3.3 Random Vertical Flip RandomVerticalFlip
same as flipped horizontally
class RandomVerticalFlip(object):
def __init__(self, flip_prob):
self.flip_prob = flip_prob
def __call__(self, image, target):
if random.random() < self.flip_prob:
image = F.vflip(image)
target = F.vflip(target)
return image, target
3.4 Center cropping RandomCrop
The code for center cropping is as follows. It should be noted that because the image is likely to be insufficient for cropping, it needs to be filled
class RandomCrop(object):
def __init__(self, size):
self.size = size
def __call__(self, image, target):
image = pad_if_smaller(image, self.size)
target = pad_if_smaller(target, self.size, fill=255)
crop_params = T.RandomCrop.get_params(image, (self.size, self.size))
image = F.crop(image, *crop_params)
target = F.crop(target, *crop_params)
return image, target
Filling code, where filling 255 represents the area that is not interested
def pad_if_smaller(img, size, fill=0):
# 如果图像最小边长小于给定size,则用数值fill进行padding
min_size = min(img.size)
if min_size < size:
ow, oh = img.size
padh = size - oh if oh < size else 0
padw = size - ow if ow < size else 0
img = F.pad(img, (0, 0, padw, padh), fill=fill)
return img
3.5 ToTensor
Here the label cannot implement the official totensor method, because of normalization, the gray value of the foreground pixel will be changed
dtype is because the cross-entropy loss needs to be used, and it needs to be an integer, and there cannot be a channel in the dimension of the label
class ToTensor(object):
def __call__(self, image, target):
image = F.to_tensor(image)
target = torch.as_tensor(np.array(target), dtype=torch.int64)
return image, target
3.6 normalization
The implementation of normalization is also very simple
class Normalize(object):
def __init__(self, mean, std):
self.mean = mean
self.std = std
def __call__(self, image, target):
image = F.normalize(image, mean=self.mean, std=self.std)
return image, target
3.7 Compose
Just implement the transform one by one
class Compose(object):
def __init__(self, transforms):
self.transforms = transforms
def __call__(self, image, target):
for t in self.transforms:
image, target = t(image, target)
return image, target
4. Visualization of preprocessing results
Just change it to this in the dataset
After loading the data, you can call it like this
Test code:
The gray value in the label is only 0 1 255
There is no channel in label
# 可视化数据
def plot(data_loader):
plt.figure(figsize=(12,8))
imgs,labels = data_loader
for i,(x,y) in enumerate(zip(imgs,labels)):
x = np.transpose(x.numpy(),(1,2,0))
x[:,:,0] = x[:,:,0]*0.127 + 0.709 # 去 normalization
x[:,:,1] = x[:,:,1]*0.079 + 0.381
x[:,:,2] = x[:,:,2]*0.043 + 0.224
y = y.numpy()
# print(np.unique(y)) # 0 1 255
# print(x.shape) # 480*480*3
# print(y.shape) # 480*480
plt.subplot(2,4,i+1)
plt.imshow(x)
plt.subplot(2,4,i+5)
plt.imshow(y)
plt.show()
Show results:
In the dataset, change the foreground pixel to 120, and you can see the details of the label