Convert yolov5 target detection labels based on semantic segmentation Ground Truth (GT) (example of road surface water detection)

Convert yolov5 target detection labels based on semantic segmentation Ground Truth (GT) (example of road surface water detection)

overview

Many target detection data are obtained through direct labeling or open platforms. If there are label files for semantically segmented Ground Truth, how to realize the format conversion of yolov5 target detection? There is no good way to search the whole network, so I wrote one by myself using opencv, and the test effect is not bad.

The example here is based on the data set given by the road surface water detection of Jishi platform. Since the platform only gives sample data for segmentation, if you want to use yolo for target detection, you need to convert the labels yourself. Existing data sets There are original images and labels, where the labels are images in PNG format, as shown below:

image-20230128163946548

image-20230128163907398

The data set contains the original image and the corresponding segmented image (annotation file). The format of the annotation file is PNG, and it is a single-channel grayscale image. In this task, in order to visually observe the output PNG image, the following gray values ​​are used:

  1. Background: 0
  2. Standing water: 1

You can directly use this data set for training and verification of segmentation models, such as unet, deeplab and other models

process

Since segmented images have finer annotations than object detection, only three steps are required to complete the overall conversion process

1、找到图中的分割块(目标),并找到其最小矩形

2、得到的最小矩形进行坐标转换,OpenCV进行边缘检测,得到坐标

3、根据YOLO坐标归一化方法,完成坐标转换,并存储*.txt

Note: The amount of data is relatively large, and batch conversion will be used during conversion. Since the edge detection is used to obtain the minimum rectangular frame, there will be errors during the training process, which has little impact.

read image

Use os to read the label image under the folder, and use OpenCV to read the picture

path = "./mask/train/"
files = os.listdir(path)
for file in files:
    img = cv2.imread(path+file)

In order to facilitate the viewing of labels, the labels can be visualized first, and the original segmentation labels are 0,1,. . . , the number of n, so it is not obvious that it is black on the image, and different categories are displayed in different colors.

#1->255####
print(img.shape)
for i in range(len(img)):
for j in range(len(img[0])):
if img[i][j] == 1:
img[i][j]print(file) = 255
print(img[i][j])
cv2.imwrite("./mask/{}".format(file),img)

#get_bbox

path = "./mask/ponding_sample_103.png"
img = cv2.imread(path)
img_w = img.shape[1]
img_h = img.shape[0]

Convert and get the minimum bounding rectangle

OpenCV’s regular reading is in BGR format, and the label is a single-channel graphic. It needs to be converted to a single-channel grayscale image for later detection. Use cv2.cvtColor to convert it to a grayscale image, or when reading, the cv2.imread(path,0)0 mode is a grayscale image. OpenCV gives a very convenient edge detection method contours, hierarchy = cv.findContours( image, mode, method[, contours[, hierarchy[, offset]]] ),

image : The input is a binary image, black is the background and white is the target

A single-channel image matrix, which can be a grayscale image , but more commonly used is a binary image, which is usually a binary image processed by edge detection operators such as Canny and Laplacian; this function will modify the original image, so if If you want to keep the original image, you need to make a copy and modify it in the copy.

mode

image-20230128165946856

method

image-20230128165954881

offset : the offset of the contour point, the format is tuple, such as (-10, 10) means that the contour point is offset by 10 pixels in the negative direction of X, and offset by 10 pixels in the positive direction of Y

Return value
contours : contour points. List format, each element is a 3-dimensional array (its shape is (n,1,2), where n represents the number of contour points, 2 represents the pixel coordinates), representing a contour

hierarchy : The hierarchical relationship between contours, which is a three-dimensional array with a shape of (1,n,4), where n represents the total number of contours, and 4 means that 4 numbers are used to represent the relationship between contours. The first number represents the next contour number of the same level contour, the second number represents the number of the previous contour of the same level contour, the third number represents the number of the next level contour of this contour, and the fourth number represents the contour The number of the parent contour.

Get the minimum bounding rectangle of each target by cv2.minAreaRectandcv2.contourArea

img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
cnts,_ = cv2.findContours(img.copy(),cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for i, contour in enumerate(cnts):
    area = cv2.contourArea(contour)  # 计算包围形状的面积
    rect = cv2.minAreaRect(contour)  # 检测轮廓最小外接矩形,得到最小外接矩形的(中心(x,y), (宽,高), 旋转角度)

Convert yolo coordinates

The label form of yolo is four points, and the four points represent the center coordinates (x, y) and width and height of the bbox respectively, but they need to be normalized according to the length and width of the original image, that is, x/width of the original image, y/ Original image height, w/original image width, h/original image height, just calculate and record

boxs = []
for i, contour in enumerate(cnts):
        area = cv2.contourArea(contour)  # 计算包围形状的面积
        rect = cv2.minAreaRect(contour)  # 检测轮廓最小外接矩形,得到最小外接矩形的(中心(x,y), (宽,高), 旋转角度)
        temp =[0]
    temp = [0]
    temp.append(rect[0][0])
    temp.append(rect[0][1])
    temp.append(rect[1][0])
    temp.append(rect[1][1])
    temp[1] /= img_w
    temp[2] /= img_h
    temp[3] /= img_w
    temp[4] /= img_h
    # box = np.int0(cv2.boxPoints(rect))
    boxs.append(temp)   # 最后剩下的有用的框

Note: Since there is only one category here, temp=[0] when I directly initialize and write, if it is a variety of targets, it can be modified accordingly

dump txt file

The data stored in the boxes array is the category to be saved and the corresponding four coordinates

f = open("./labels/train2017/{}.txt".format(file.split(".")[0]), "w+")
for line in boxs:
    line = str(line)[1:-2].replace(",","")
    print(line)
    f.write(line+"\n")
f.close()

The converted txt tag is as shown in the figure below, which can be directly used for training in yolov5

image-20230128171139014
Complete code address: https://github.com/magau123/CSDN/blob/master/GT2yolo.py

Guess you like

Origin blog.csdn.net/charles_zhang_/article/details/128780070