YoloV5 training rectangular images

20230329 update

In the official source code, when rect is turned on during training, rectangular training can be performed and the video memory during training will be further reduced.

 Imagesz only needs to set the maximum size of the image. In the dataload, judgment processing will be performed when reading the image.

 During the load_image process, the image will be scaled proportionally.

For example, the original image is 1280*640.

If the input imagesize is 1280, the read image will be 1280*640.

If the input imagesize is 640, the read image will be 640*320

 

However, it should be noted that after starting Rect, the image will not be enhanced with mosaic. If you really need it, you can refer to the original rectangular training (changes will be made in the following article)

 

 

 

Rectangular image training:

Step1: Modify the size of the training images, because the default size is for the training set and validation set. Therefore, when modifying the rectangle here, you need to assign values ​​separately to construct an array.

Step2: Modify the image size check function in Train.py.

Step3: Modify the model attributes function in Train.py. It turns out that 640 is of type int, but now [640,320] is an array. So modifications are needed.

 Step4: Modify the code in LoadImageAndLabels

 

Modify mosic function

 Modify Load_image function

 Modify load_mosaic function

 

Modify the mosaic splicing function and label splicing function.

Pay attention to the position where if isinstance makes the judgment, and it will be modified.

    def load_mosaic(self, index):
        # YOLOv5 4-mosaic loader. Loads 1 image + 3 random images into a 4-image mosaic
        labels4, segments4 = [], []
        if isinstance(self.img_size, int):
            s = self.img_size
            yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)  # mosaic center x, y
        else:
            s_h, s_w = self.img_size  # (h,w)
            yc, xc = [int(random.uniform(-x, 2 * s + x)) for x, s in zip(self.mosaic_border, self.img_size)]
        indices = [index] + random.choices(self.indices, k=3)  # 3 additional image indices
        random.shuffle(indices)
        for i, index in enumerate(indices):
            # Load image
            img, _, (h, w) = self.load_image(index)
            if isinstance(self.img_size, int):
                # place img in img4
                if i == 0:  # top left
                    img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
                    x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
                    x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
                elif i == 1:  # top right
                    x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
                    x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
                elif i == 2:  # bottom left
                    x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
                    x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
                elif i == 3:  # bottom right
                    x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
                    x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)
            else:
                if i == 0:  # top left
                    img4 = np.full((s_h * 2, s_w * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
                    x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
                    x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
                elif i == 1:  # top right
                    x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s_w * 2), yc
                    x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
                elif i == 2:  # bottom left
                    x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s_h * 2, yc + h)
                    x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
                elif i == 3:  # bottom right
                    x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s_w * 2), min(s_h * 2, yc + h)
                    x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)

            img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]
            padw = x1a - x1b
            padh = y1a - y1b

            # Labels
            labels, segments = self.labels[index].copy(), self.segments[index].copy()
            if labels.size:
                labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy format
                segments = [xyn2xy(x, w, h, padw, padh) for x in segments]
            labels4.append(labels)
            segments4.extend(segments)

This completes the program modifications

Attached mosaic enhancement

        Mosaic splicing does not mean splicing pictures into 4 equal parts.

 1: The center point will be randomly set, and then this center point will divide the data into 4 parts, and then fill the picture into each part.

The center point value of the mosaic is randomly selected from -x ---2*s+x. Generally, the value of X is -320, so it becomes the   center value randomly selected from 320--960 .

Step0: Initialize the entire image and fill all pixels of the image to 114

Step1: Read the current picture to be filled and get the length and width of the picture

Step2: Calculate the position of the current picture in the big picture. If the current picture exceeds the limit of the big picture, the out-of-bounds part will be cancelled.

At the same time, if the current picture is too small, it cannot fill the area given by the large picture.

Regardless of whether the picture is large or small, its position in the entire picture must be calculated , and the difference value must be calculated, because the update label (label) must be obtained next.

(The label is relative to the position of the picture. If the current picture is placed in the larger picture and does not completely occupy the upper left corner, then it is not redundant.

Are there some empty areas? If you calculate the label directly, it will definitely be offset, so you need to remove these empty areas and then load the label of the original image and update it)

If there is a label that is exactly on the border, you need to correct or crop the box that exceeds the border.

Step3: Put pictures in succession

Step4: Perform random rotation, flip, translation and scaling operations on the inserted pictures.

Guess you like

Origin blog.csdn.net/weixin_43852823/article/details/127735416