YOLO format data set production

Table of contents

1. Introduction to YOLO

2. Segmentation dataset preparation

3. Code display

It's not easy to organize, welcome to one-click three links! ! !


1. Introduction to YOLO

YOLO (You Only Look Once) is a popular object detection and image segmentation model developed by Joseph Redmon and Ali Farhadi at the University of Washington. The first version of YOLO was released in 2015 and quickly became popular due to its high speed and accuracy.

Release time of different versions of YOLO

Version time
YOLOv1 2015
YOLOv2 2016
YOLOv3 2018
YOLOv4 2020
YOLOv5 2021
YOLOv8 2022

Taking YOLOv5 as an example, this paper illustrates the data preparation for multi-task network training that         supports image classification, object recognition, and image segmentation at the same time. I have been searching on the Internet for several days in the past few days, while groping, while preparing my own data set, I finally got it done, and recorded the tutorial for nanny-level data set preparation.


2. Segmentation dataset preparation

        In the usual segmentation task, the data set is an original image corresponding to a mask image of the same size. YOLO initially ran the target detection task, so the commonly used data organization format is an original image corresponding to a json mask file. Or a TXT mask file, this can be known from different target detection data formats (COCO/VOC/...), today I will use an image corresponding to a txt mask as an example to make my own data set.

       ------>           

 The key to converting from the original png mask to the txt mask file required by YOLO is to find the content and organization of the txt file:

        As can be seen from the example txt file above, the first value of the first line is "45", which means the category is 45, and the following decimal points (0.78...) represent the x, y of the normalized polygon Coordinates, the normalization standard is to normalize according to the size of the original image, for example, the original pixel coordinates are (10, 20), and the original image size is 100*100, then the normalized pixel coordinates are (0.1, 0.2) .

        After the carriage return, the second line starts to read the category and position of the second target, and so on.


3. Code display

from skimage import io
import cv2
import numpy as np

def mask_to_polygon(mask: np.array, report: bool = False) -> List[int]:
    contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    polygons = []
    for object in contours:
        coords = []
        for point in object:
            coords.append(int(point[0][0]))
            coords.append(int(point[0][1]))
        polygons.append(coords)
    if report:
        print(f"Number of points = {len(polygons[0])}")
    return np.array(polygons).ravel().tolist()
    
mask = io.imread('/labels/xxx.png')
polygons = mask_to_polygon(mask, report=True)

        The polygons returned by the function are the polygon positions that store all the objects in an image. If you want to get the final txt file, don't forget to add the image category in front. 

        If you want to simplify the mask elements, it is to delete some points that are closer to reduce the size of the mask file, you can refer to the following article.

Reference: Binary mask to txt

The official COCO128-seg data set: download link

Voting is welcome, sorting out is not easy, one-click triple link! ! !

Guess you like

Origin blog.csdn.net/qq_38308388/article/details/129060710