Abstract: This article will take you to master CutMix&Mixup through practical cases.
This article is shared from Huawei Cloud Community " CutMix&Mixup Detailed Explanation and Code Combat ", author: Li Changan.
introduction
Recently, I reviewed the knowledge I learned before and saw the data enhancement part. I found that the understanding of the two data enhancement methods of CutMix and Mixup is not very good, so I wrote a project here and then went to take a good look at these two data enhancement methods. At the beginning, in the target detection, we did not think about the label part of the data. Everyone can understand the image processing very well, because it is very intuitive. However, by reading related papers and checking some relevant materials, we found some new and interesting ones. thing. Next, I will explain these two data enhancement methods for you. The picture below shows the original picture, mixup, cutout, and cutmix from left to right.
Mixup offline implementation
Mixup believes that everyone has a lot of understanding, and you can also find that there are many answers from great gods on the Internet, so I will not explain it in detail here.
- The core idea of Mixup: two pictures are mixed in proportion, and the label also needs to be mixed in proportion
- Key points of the paper
- Consider mixing three or more tags, but the effect is almost the same as two, and it increases the time of the mixup process.
- The current mixup uses a single loader to obtain the minibatch, and after randomly shuffling it, mixup mixes the data in the same minibatch. This strategy has the same effect as randomly shuffling the entire data set, and it also reduces the IO overhead.
- Using mixup on data with the same label does not result in significantly enhanced results
The following Cell is the image effect display of Mixup. For specific implementation, please refer to the online implementation below.
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as Image
import numpy as np
im1 = Image.imread("work/data/10img11.jpg")
im1 = im1/255.
im2 = Image.imread("work/data/14img01.jpg")
im2 = im2/255.
for i in range(1,10):
lam= i*0.1
im_mixup = (im1*lam+im2*(1-lam))
plt.subplot(3,3,i)
plt.imshow(im_mixup)
plt.show()
CutMix offline implementation
Simply put, cutmix is equivalent to the combination of cutout+mixup, which can be applied to various tasks.
mixup is equivalent to the fusion of the whole image, cutout only enhances the image without changing the label, and cutmix adopts the partial fusion idea of cutout, and adopts the mixed label strategy of mixup, which looks more make sense.
- The difference between cutmix and mixup is that the mixed position uses a hard 0-1 mask instead of a soft operation, which is equivalent to the hard combination of the two newly synthesized images, not the linear combination of Mixup. But its label is still a linear combination like mixup.
In order to eliminate randomness, the following code fixes the position of the cut, mainly to show the effect. The location of the code change is as follows, and the commented part is the common implementation for everyone.
# bbx1 = np.clip(cx - cut_w // 2, 0, W)
# bby1 = np.clip(cy - cut_h // 2, 0, H)
# bbx2 = np.clip(cx + cut_w // 2, 0, W)
# bby2 = np.clip(cy + cut_h // 2, 0, H)
bbx1 = 10
bby1 = 600
bbx2 = 10
bby2 = 600
%matplotlib inline
import glob
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [10,10]
import cv2
# Path to data
data_folder = f"/home/aistudio/work/data/"
# Read filenames in the data folder
filenames = glob.glob(f"{data_folder}*.jpg")
# Read first 10 filenames
image_paths = filenames[:4]
image_batch = []
image_batch_labels = []
n_images = 4
print(image_paths)
for i in range(4):
image = cv2.cvtColor(cv2.imread(image_paths[i]), cv2.COLOR_BGR2RGB)
image_batch.append(image)
image_batch_labels=np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]])
def rand_bbox(size, lamb):
W = size[0]
H = size[1]
cut_rat = np.sqrt(1. - lamb)
cut_w = np.int(W * cut_rat)
cut_h = np.int(H * cut_rat)
# uniform
cx = np.random.randint(W)
cy = np.random.randint(H)
# bbx1 = np.clip(cx - cut_w // 2, 0, W)
# bby1 = np.clip(cy - cut_h // 2, 0, H)
# bbx2 = np.clip(cx + cut_w // 2, 0, W)
# bby2 = np.clip(cy + cut_h // 2, 0, H)
bbx1 = 10
bby1 = 600
bbx2 = 10
bby2 = 600
return bbx1, bby1, bbx2, bby2
image = cv2.cvtColor(cv2.imread(image_paths[0]), cv2.COLOR_BGR2RGB)
# Crop a random bounding box
lamb = 0.3
size = image.shape
print('size',size)
def generate_cutmix_image(image_batch, image_batch_labels, beta):
c=[1,0,3,2]
# generate mixed sample
lam = np.random.beta(beta, beta)
rand_index = np.random.permutation(len(image_batch))
print(f'iamhere{rand_index}')
target_a = image_batch_labels
target_b = np.array(image_batch_labels)[c]
print('img.shape',image_batch[0].shape)
bbx1, bby1, bbx2, bby2 = rand_bbox(image_batch[0].shape, lam)
print('bbx1',bbx1)
print('bby1',bby1)
print('bbx2',bbx2)
print('bby2',bby2)
image_batch_updated = image_batch.copy()
image_batch_updated=np.array(image_batch_updated)
image_batch=np.array(image_batch)
image_batch_updated[:, bbx1:bby1, bbx2:bby2, :] = image_batch[[c], bbx1:bby1, bbx2:bby2, :]
# adjust lambda to exactly match pixel ratio
lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (image_batch.shape[1] * image_batch.shape[2]))
print(f'lam is {lam}')
label = target_a * lam + target_b * (1. - lam)
return image_batch_updated, label
# Generate CutMix image
input_image = image_batch[0]
image_batch_updated, image_batch_labels_updated = generate_cutmix_image(image_batch, image_batch_labels, 1.0)
# Show original images
print("Original Images")
for i in range(2):
for j in range(2):
plt.subplot(2,2,2*i+j+1)
plt.imshow(image_batch[2*i+j])
plt.show()
# Show CutMix images
print("CutMix Images")
for i in range(2):
for j in range(2):
plt.subplot(2,2,2*i+j+1)
plt.imshow(image_batch_updated[2*i+j])
plt.show()
# Print labels
print('Original labels:')
print(image_batch_labels)
print('Updated labels')
print(image_batch_labels_updated)
['/home/aistudio/work/data/11img01.jpg', '/home/aistudio/work/data/10img11.jpg', '/home/aistudio/work/data/14img01.jpg', '/home/aistudio/work/data/12img11.jpg']
size (2016, 1512, 3)
iamhere[2 1 0 3]
img.shape (2016, 1512, 3)
bbx1 10
bby1 600
bbx2 10
bby2 600
lam is 1.0
Original Images
CutMix Images
Original labels:
[[1 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 1]]
Updated labels
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
Mixup&CutMix online implementation
What you need to pay attention to is that usually we use the online method for data enhancement in actual use, which is the method described in this section, so you can use the following code in actual use. The implementation principle of mixup is similar to that of cutmix, you can change it according to the code below.
!cd 'data/data97595' && unzip -q nongzuowu.zip
from paddle.io import Dataset
import cv2
import paddle
import random
# 导入所需要的库
from sklearn.utils import shuffle
import os
import pandas as pd
import numpy as np
from PIL import Image
import paddle
import paddle.nn as nn
from paddle.io import Dataset
import paddle.vision.transforms as T
import paddle.nn.functional as F
from paddle.metric import Accuracy
import warnings
warnings.filterwarnings("ignore")
# 读取数据
train_images = pd.read_csv('data/data97595/nongzuowu/train.csv')
# 划分训练集和校验集
all_size = len(train_images)
# print(all_size)
train_size = int(all_size * 0.8)
train_df = train_images[:train_size]
val_df = train_images[train_size:]
# CutMix 的切块功能
def rand_bbox(size, lam):
if len(size) == 4:
W = size[2]
H = size[3]
elif len(size) == 3:
W = size[0]
H = size[1]
else:
raise Exception
cut_rat = np.sqrt(1. - lam)
cut_w = np.int(W * cut_rat)
cut_h = np.int(H * cut_rat)
# uniform
cx = np.random.randint(W)
cy = np.random.randint(H)
bbx1 = np.clip(cx - cut_w // 2, 0, W)
bby1 = np.clip(cy - cut_h // 2, 0, H)
bbx2 = np.clip(cx + cut_w // 2, 0, W)
bby2 = np.clip(cy + cut_h // 2, 0, H)
return bbx1, bby1, bbx2, bby2
# 定义数据预处理
data_transforms = T.Compose([
T.Resize(size=(256, 256)),
T.Transpose(), # HWC -> CHW
T.Normalize(
mean=[0, 0, 0], # 归一化
std=[255, 255, 255],
to_rgb=True)
])
class JSHDataset(Dataset):
def __init__(self, df, transforms, train=False):
self.df = df
self.transfoms = transforms
self.train = train
def __getitem__(self, idx):
row = self.df.iloc[idx]
fn = row.image
# 读取图片数据
image = cv2.imread(os.path.join('data/data97595/nongzuowu/train', fn))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (256, 256), interpolation=cv2.INTER_LINEAR)
# 读取 mask 数据
# masks = cv2.imread(os.path.join(row['mask_path'], fn), cv2.IMREAD_GRAYSCALE)/255
# masks = cv2.resize(masks, (1024, 1024), interpolation=cv2.INTER_LINEAR)
# 读取 label
label = paddle.zeros([4])
label[row.label] = 1
# ------------------------------ CutMix ------------------------------------------
prob = 20 # 将 prob 设置为 0 即可关闭 CutMix
if random.randint(0, 99) < prob and self.train:
rand_index = random.randint(0, len(self.df) - 1)
rand_row = self.df.iloc[rand_index]
rand_fn = rand_row.image
rand_image = cv2.imread(os.path.join('data/data97595/nongzuowu/train', rand_fn))
rand_image = cv2.cvtColor(rand_image, cv2.COLOR_BGR2RGB)
rand_image = cv2.resize(rand_image, (256, 256), interpolation=cv2.INTER_LINEAR)
# rand_masks = cv2.imread(os.path.join(rand_row['mask_path'], rand_fn), cv2.IMREAD_GRAYSCALE)/255
# rand_masks = cv2.resize(rand_masks, (1024, 1024), interpolation=cv2.INTER_LINEAR)
lam = np.random.beta(1,1)
bbx1, bby1, bbx2, bby2 = rand_bbox(image.shape, lam)
image[bbx1:bbx2, bby1:bby2, :] = rand_image[bbx1:bbx2, bby1:bby2, :]
# masks[bbx1:bbx2, bby1:bby2] = rand_masks[bbx1:bbx2, bby1:bby2]
lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (image.shape[1] * image.shape[0]))
rand_label = paddle.zeros([4])
rand_label[rand_row.label] = 1
label = label * lam + rand_label * (1. - lam)
# --------------------------------- CutMix ---------------------------------------
# 应用之前我们定义的各种数据增广
# augmented = self.transforms(image=image, mask=masks)
# img, mask = augmented['image'], augmented['mask']
img = image
return self.transfoms(img), label
def __len__(self):
return len(self.df)
train_dataset = JSHDataset(train_df, data_transforms, train=True)
val_dataset = JSHDataset(val_df, data_transforms)
#train_loader
train_loader = paddle.io.DataLoader(train_dataset, places=paddle.CPUPlace(), batch_size=8, shuffle=True, num_workers=0)
#val_loader
val_loader = paddle.io.DataLoader(val_dataset, places=paddle.CPUPlace(), batch_size=8, shuffle=True, num_workers=0)
for batch_id, data in enumerate(train_loader()):
x_data = data[0]
y_data = data[1]
print(x_data.dtype)
print(y_data)
break
paddle.float32
Tensor(shape=[8, 4], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
[[0. , 0. , 1. , 0. ],
[0.54284668, 0.45715332, 0. , 0. ],
[0. , 1. , 0. , 0. ],
[0. , 0. , 1. , 0. ],
[0.32958984, 0. , 0.67041016, 0. ],
[0. , 0. , 0. , 1. ],
[0. , 0. , 0. , 1. ],
[0. , 0. , 0. , 1. ]])
from paddle.vision.models import resnet18
model = resnet18(num_classes=4)
# 模型封装
model = paddle.Model(model)
# 定义优化器
optim = paddle.optimizer.Adam(learning_rate=3e-4, parameters=model.parameters())
# 配置模型
model.prepare(
optim,
paddle.nn.CrossEntropyLoss(soft_label=True),
Accuracy()
)
# 模型训练与评估
model.fit(train_loader,
val_loader,
log_freq=1,
epochs=2,
verbose=1,
)
The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/2
step 56/56 [==============================] - loss: 1.2033 - acc: 0.5843 - 96ms/step
Eval begin...
step 14/14 [==============================] - loss: 1.6905 - acc: 0.5625 - 73ms/step
Eval samples: 112
Epoch 2/2
step 56/56 [==============================] - loss: 0.5297 - acc: 0.7708 - 82ms/step
Eval begin...
step 14/14 [==============================] - loss: 0.5764 - acc: 0.7857 - 67ms/step
Eval samples: 112
Summarize
In CutMix, replace the cut with part of another image and the ground truth marker of the second image. Set the scale of each image (e.g. 0.4/0.6) during image generation. In the image below, you can see how the authors of CutMix demonstrate that this technique works better than simple MixUp and Cutout.
ps: For neural network heat map generation, please refer to another project of mine.
These two data enhancement methods can well represent some current data enhancement methods, such as cutout, mosaic and other methods. After mastering these two methods, everyone will understand other cutout and mosaic enhancement methods.
Click to follow and learn about Huawei Cloud's fresh technologies for the first time~