mosaic data enhancement

mosaic data enhancement

Mosaic data enhancement

The Mosaic data enhancement method is proposed in the YOLOV4 paper. The main idea is to randomly crop four pictures, and then stitch them to one picture as training data. The advantage of this is that the background of the picture is enriched, and the four pictures are spliced ​​together to increase the batch_size in a disguised manner. Four pictures will also be calculated during batch normalization, so the batch_size itself is not very dependent, and a single GPU can be used. Training YOLOV4.
The following is my collation of Mosaic data enhancement based on the code of pytorch YOLOV4 .

Insert picture description here

figure 1

Part of the code display:

oh, ow, oc = img.shape    # img为读取的图片数据      
# self.cfg.jitter为cfg文件中的参数,默认给的是0.2                                              
dh, dw, dc = np.array(np.array([oh, ow, oc]) * self.cfg.jitter, dtype=np.int)  
# 首先生成一些随机偏移的坐标,分别代表左右上下                                    
pleft = random.randint(-dw, dw)   
pright = random.randint(-dw, dw)  
ptop = random.randint(-dh, dh)    
pbot = random.randint(-dh, dh) 
# 裁剪部分的长和宽
swidth = ow - pleft - pright
sheight = oh - ptop - pbot     
1234567891011

The entire Mosaic process is shown in Figure 1. Figure 1 shows the situation when pleft, pright, ptop, and pbot are all greater than 0. First, find the original image with (pleft, pright) as the upper left corner, swidth and sheight as the width and length. , And then take the intersection of this rectangle and the original image (that is, the dark green part).

Note: (2) in Figure 1 is not a direct intersection. Instead, create a rectangle with a width of swidth and a length of height, and then assign the rectangle to the average value of the three RGB channels of the original image, and then add the above. The intersection part of is placed on this rectangle according to the calculated coordinates, but Figure 1 is based on the situation when pleft, pright, ptop, and pbot are all greater than 0, so it is placed on the (0, 0) coordinates. For details, please refer to the following code.

# new_src_rect也就是上面说的交集的坐标(x1, y1, x2, y2)
 new_src_rect = rect_intersection(src_rect, img_rect)                                                                                                                        
 dst_rect = [max(0, -pleft), max(0, -ptop), max(0, -pleft) + new_src_rect[2] - new_src_rect[0], 
             max(0, -ptop) + new_src_rect[3] - new_src_rect[1]]  
                                            
cropped = np.zeros([sheight, swidth, 3])                                   
cropped[:, :, ] = np.mean(img, axis=(0, 1))                                
# 这里就是将交集部分放在矩形上                                                      
cropped[dst_rect[1]:dst_rect[3], dst_rect[0]:dst_rect[2]] = \              
    img[new_src_rect[1]:new_src_rect[3], new_src_rect[0]:new_src_rect[2]]  
12345678910

Then resize the picture. Resize is the resolution required for network input, which is 608x608 by default. Then, according to the calculated upper left coordinates, and the randomly obtained width CutX and length Cuty, a part of the area is cropped as the upper left part of a new image. The red box in (4) in Figure 1 represents the cropped area. Note: The (0, 0) coordinates in the upper left corner of (4) in Figure 1 are because pleft and pright are greater than 0, according to calculations. The process of calculating the clipping coordinates can refer to the following code.

# 根据网络的输入大小随机计算的cut_x、cut_y,min_offset为预设参数,默认为0.2
# cfg.w,cfg.h为网络的输入大小,默认为608
 cut_x = random.randint(int(self.cfg.w * min_offset), int(self.cfg.w * (1 - min_offset)))
 cut_y = random.randint(int(self.cfg.h * min_offset), int(self.cfg.h * (1 - min_offset)))
# 裁剪坐标的计算过程                                                                                 
left_shift = int(min(cut_x, max(0, (-int(pleft) * self.cfg.w / swidth))))                         
top_shift = int(min(cut_y, max(0, (-int(ptop) * self.cfg.h / sheight))))                          
                                                                                                  
right_shift = int(min((self.cfg.w - cut_x), max(0, (-int(pright) * self.cfg.w / swidth))))        
bot_shift = int(min(self.cfg.h - cut_y, max(0, (-int(pbot) * self.cfg.h / sheight))))               
# 这里的ai参数为图一中的(3), out_img初始化的新图
# 该函数的功能就是图1中(3)到(5)的过程,分别将裁剪的图片粘贴到新图的左上,右上,左下,右下
# 循环4次,每循环一次粘贴一次,每次根据给的参数i粘贴到哪个部分                                       
out_img, out_bbox = blend_truth_mosaic(out_img, ai, truth.copy(), self.cfg.w, self.cfg.h, cut_x,  
                                       cut_y, i, left_shift, right_shift, top_shift, bot_shift)   
123456789101112131415

The following are the details of blend_truth_mosaic function:

def blend_truth_mosaic(out_img, img, bboxes, w, h, cut_x, cut_y, i_mixup,                                        
                       left_shift, right_shift, top_shift, bot_shift):                                           
    left_shift = min(left_shift, w - cut_x)                                                                      
    top_shift = min(top_shift, h - cut_y)                                                                        
    right_shift = min(right_shift, cut_x)                                                                        
    bot_shift = min(bot_shift, cut_y)                                                                            
                                                                                                                 
    if i_mixup == 0:                                                                                             
        bboxes = filter_truth(bboxes, left_shift, top_shift, cut_x, cut_y, 0, 0)                                 
        out_img[:cut_y, :cut_x] = img[top_shift:top_shift + cut_y, left_shift:left_shift + cut_x]                
    if i_mixup == 1:                                                                                             
        bboxes = filter_truth(bboxes, cut_x - right_shift, top_shift, w - cut_x, cut_y, cut_x, 0)                
        out_img[:cut_y, cut_x:] = img[top_shift:top_shift + cut_y, cut_x - right_shift:w - right_shift]          
    if i_mixup == 2:                                                                                             
        bboxes = filter_truth(bboxes, left_shift, cut_y - bot_shift, cut_x, h - cut_y, 0, cut_y)                 
        out_img[cut_y:, :cut_x] = img[cut_y - bot_shift:h - bot_shift, left_shift:left_shift + cut_x]            
    if i_mixup == 3:                                                                                             
        bboxes = filter_truth(bboxes, cut_x - right_shift, cut_y - bot_shift, w - cut_x, h - cut_y, cut_x, cut_y)
        out_img[cut_y:, cut_x:] = img[cut_y - bot_shift:h - bot_shift, cut_x - right_shift:w - right_shift]      
                                                                                                                 
    return out_img, bboxes                                                                                       
123456789101112131415161718192021

Finally, explain the processing of the label frame. As can be seen in Figure 1, when cutting, if a part of the label frame in the sample is cropped, it is discarded and the label frame is kept intact after cropping.

The following figure shows the cropping situation where pleft, pright, ptop, and pbot are all less than 0:
Insert picture description here

figure 2

Due to time constraints, I only drew the two diagrams that are easier to draw, and I only pasted the upper left corner, but in fact, both are the same.
This article only does a detailed explanation of the cutting part of Mosaic. Of course, I think this part is more critical. In fact, there are some other enhancement operations, such as random flip, blur, HSV enhancement, etc. This has not been done yet, and will be updated later. .

After the operation is completed, place the original picture on the upper left according to the first picture, the second picture on the lower left, the third picture on the lower right, and the fourth picture on the upper right. .

Finally, this is my code based on pytorch YOLOV4 , my own understanding, if there is a mistake, please correct me, thank you.

Guess you like

Origin blog.csdn.net/ahelloyou/article/details/111462862