[Target Detection] Large images include label segmentation and converted into txt format

Preface

Remote sensing images are relatively large and usually need to be divided into small pieces for training. I have previously written an article about large image cropping and splicing[Target Detection] Image Cropping/ Label visualization/image splicing processing script, but the workflow at that time was to first divide the large image into small images, and then label the small images, so the issue of label transformation was not considered.

The problem encountered in the recent project is that a batch of large pictures have been annotated and need to be cut. At the same time, the labels must also be cut simultaneously. This article explains how to achieve this requirement and at the same time convert the xml format tags directly output by labelimg into the txt tags required by models such as yolov5.

Image cropping

Image cropping still follows a set of encoding rules mentioned in the previous blog post, that is, the image is cut into 1280x1280 image blocks. After cropping, the position of the image blocks in the original image is marked by the file name.

import configparser
import shutil
import yaml
import os.path
from pathlib import Path
from PIL import Image
from tqdm import tqdm

rootdir = r"E:\Dataset\数据集\可见光数据\原始未裁剪\img"
savedir = r'E:\Dataset\数据集\可见光数据\裁剪后数据\img'  # 保存图片文件夹

dis = 1280
leap = 1280


def main():
    # 创建输出文件夹
    if Path(savedir).exists():
        shutil.rmtree(savedir)
    os.mkdir(savedir)

    num_dir = len(os.listdir(rootdir))  # 得到文件夹下数量
    num = 0

    for parent, dirnames, filenames in os.walk(rootdir):  # 遍历每一张图片
        filenames.sort()
        for filename in tqdm(filenames):
            currentPath = os.path.join(parent, filename)
            suffix = currentPath.split('.')[-1]
            if suffix == 'jpg' or suffix == 'png' or suffix == 'JPG' or suffix == 'PNG':
                img = Image.open(currentPath)
                width = img.size[0]
                height = img.size[1]
                i = j = 0
                for i in range(0, width, leap):
                    for j in range(0, height, leap):
                        box = (i, j, i + dis, j + dis)
                        image = img.crop(box)  # 图像裁剪
                        image.save(savedir + '/' + filename.split(suffix)[0][:-1] + "__" + str(i) + "__" + str(j) + ".jpg")


if __name__ == '__main__':
    main()

Label cropping

tag reading

First, the xml format data needs to be parsed through the lxml library, and two main pieces of information are extracted, 1 is the target category, and 2 is the target bbox coordinates.

Through recursive form, the xml is converted into dictionary form, and then the required information can be obtained.

def parse_xml_to_dict(xml):
    """
    将xml文件解析成字典形式
    """
    if len(xml) == 0:  # 遍历到底层,直接返回tag对应的信息
        return {
    
    xml.tag: xml.text}
    result = {
    
    }
    for child in xml:
        child_result = parse_xml_to_dict(child)  # 递归遍历标签信息
        if child.tag != 'object':
            result[child.tag] = child_result[child.tag]
        else:
            if child.tag not in result:
                result[child.tag] = []
            result[child.tag].append(child_result[child.tag])
    return {
    
    xml.tag: result}

def main():
    xml_path = r"label.xml"
    with open(xml_path, encoding="utf-8") as fid:
        xml_str = fid.read()
    xml = etree.fromstring(xml_str)
    data = parse_xml_to_dict(xml)["annotation"]
   	for obj in data["object"]:
        # 获取每个object的box信息
        xmin = float(obj["bndbox"]["xmin"])
        xmax = float(obj["bndbox"]["xmax"])
        ymin = float(obj["bndbox"]["ymin"])
        ymax = float(obj["bndbox"]["ymax"])
        class_name = obj["name"]

Label position reset

Since the image is cropped into small image blocks, the labels must also be converted into bboxes corresponding to the image blocks. However, one problem with cropped images is what to do if the label is cut in half.

The following is my processing idea. By encoding the position of the image block, it can be divided into four situations.

In the first case, the four corners of the label are all in the image block, so there is no need to do too much processing at this time.
(The picture below is only for illustration, the actual size ratio is not accurate, black is the bbox, red is the cutting line)

Insert image description here

In the second case, the label is cut left and right. At this time, both the left and right parts are treated as a label and assigned to the corresponding image block.

Insert image description here

In the third case, the label is cut up and down. At this time, both the upper and lower parts are treated as a label and assigned to the corresponding image block.

Insert image description here

In the fourth case, the label is cut into four pieces. At this time, each piece is too small. For small targets, this situation is relatively rare, so the label is discarded.

Insert image description here

Corresponding code:

xmin_index = int(xmin / leap)
xmax_index = int(xmax / leap)
ymin_index = int(ymin / leap)
ymax_index = int(ymax / leap)

xmin = xmin % leap
xmax = xmax % leap
ymin = ymin % leap
ymax = ymax % leap

# 第一种情况,两个点在相同的图像块中
if xmin_index == xmax_index and ymin_index == ymax_index:
    info = xml2txt(xmin, xmax, ymin, ymax, class_name, img_width, img_height)
    file_name = img_name + "__" + str(xmin_index * leap) + "__" + str(ymin_index * leap) + ".txt"
    write_txt(info, file_name)
# 第二种情况,目标横跨左右两幅图
elif xmin_index + 1 == xmax_index and ymin_index == ymax_index:
    # 保存左半目标
    info = xml2txt(xmin, leap, ymin, ymax, class_name, img_width, img_height)
    file_name = img_name + "__" + str(xmin_index * leap) + "__" + str(ymax_index * leap) + ".txt"
    write_txt(info, file_name)
    # 保存右半目标
    info = xml2txt(0, xmax, ymin, ymax, class_name, img_width, img_height)
    file_name = img_name + "__" + str(xmax_index * leap) + "__" + str(ymax_index * leap) + ".txt"
    write_txt(info, file_name)
# 第三种情况,目标纵跨上下两幅图
elif xmin_index == xmax_index and ymin_index + 1 == ymax_index:
    # 保存上半目标
    info = xml2txt(xmin, xmax, ymin, leap, class_name, img_width, img_height)
    file_name = img_name + "__" + str(xmin_index * leap) + "__" + str(ymin_index * leap) + ".txt"
    write_txt(info, file_name)
    # 保存下半目标
    info = xml2txt(xmin, xmax, 0, ymax, class_name, img_width, img_height)
    file_name = img_name + "__" + str(xmin_index * leap) + "__" + str(ymax_index * leap) + ".txt"
    write_txt(info, file_name)

Convert tags to txt format

The xml format is xmin, ymin, xmax, ymax, corresponding to the global pixel coordinates of the upper left corner and lower left corner of the rectangular frame.
The txt format is class, xcenter, ycenter, w, h, which corresponds to the center point and the width and height of the bbox. However, the coordinates are relative coordinates. When converting here, you need to divide by the width and height of the small image. .

Related code:

def xml2txt(xmin, xmax, ymin, ymax, class_name, img_width, img_height):
    # 类别索引
    class_index = class_dict.index(class_name)

    # 将box信息转换到yolo格式
    xcenter = xmin + (xmax - xmin) / 2
    ycenter = ymin + (ymax - ymin) / 2
    w = xmax - xmin
    h = ymax - ymin

    # 绝对坐标转相对坐标,保存6位小数
    xcenter = round(xcenter / img_width, 6)
    ycenter = round(ycenter / img_height, 6)
    w = round(w / img_width, 6)
    h = round(h / img_height, 6)

    info = [str(i) for i in [class_index, xcenter, ycenter, w, h]]
    return info

Complete code

Finally, the complete code for batch processing is attached:

import os
from tqdm import tqdm
from lxml import etree

xml_file_path = "E:/Dataset/数据集/可见光数据/原始未裁剪/labels"
output_txt_path = "E:/Dataset/数据集/可见光数据/裁剪后数据/labels"

class_dict = ['class1', 'class2']
leap = 1280


def parse_xml_to_dict(xml):
    """
    将xml文件解析成字典形式
    """
    if len(xml) == 0:  # 遍历到底层,直接返回tag对应的信息
        return {
    
    xml.tag: xml.text}

    result = {
    
    }
    for child in xml:
        child_result = parse_xml_to_dict(child)  # 递归遍历标签信息
        if child.tag != 'object':
            result[child.tag] = child_result[child.tag]
        else:
            if child.tag not in result:
                result[child.tag] = []
            result[child.tag].append(child_result[child.tag])
    return {
    
    xml.tag: result}


def xml2txt(xmin, xmax, ymin, ymax, class_name, img_width, img_height):
    # 类别索引
    class_index = class_dict.index(class_name)

    # 将box信息转换到yolo格式
    xcenter = xmin + (xmax - xmin) / 2
    ycenter = ymin + (ymax - ymin) / 2
    w = xmax - xmin
    h = ymax - ymin

    # 绝对坐标转相对坐标,保存6位小数
    xcenter = round(xcenter / img_width, 6)
    ycenter = round(ycenter / img_height, 6)
    w = round(w / img_width, 6)
    h = round(h / img_height, 6)

    info = [str(i) for i in [class_index, xcenter, ycenter, w, h]]
    return info


def write_txt(info, file_name):
    with open(file_name, encoding="utf-8", mode="a") as f:
        # 若文件不为空,添加换行
        if os.path.getsize(file_name):
            f.write("\n" + " ".join(info))
        else:
            f.write(" ".join(info))


def main():
    for xml_file in os.listdir(xml_file_path):
        with open(os.path.join(xml_file_path, xml_file), encoding="utf-8") as fid:
            xml_str = fid.read()
        xml = etree.fromstring(xml_str)
        data = parse_xml_to_dict(xml)["annotation"]

        # img_height = int(data["size"]["height"])
        # img_width = int(data["size"]["width"])
        img_height = leap
        img_width = leap

        img_name = xml_file[:-4]

        for obj in data["object"]:
            # 获取每个object的box信息
            xmin = float(obj["bndbox"]["xmin"])
            xmax = float(obj["bndbox"]["xmax"])
            ymin = float(obj["bndbox"]["ymin"])
            ymax = float(obj["bndbox"]["ymax"])
            class_name = obj["name"]

            xmin_index = int(xmin / leap)
            xmax_index = int(xmax / leap)
            ymin_index = int(ymin / leap)
            ymax_index = int(ymax / leap)

            xmin = xmin % leap
            xmax = xmax % leap
            ymin = ymin % leap
            ymax = ymax % leap

            # 第一种情况,两个点在相同的图像块中
            if xmin_index == xmax_index and ymin_index == ymax_index:
                info = xml2txt(xmin, xmax, ymin, ymax, class_name, img_width, img_height)
                file_name = output_txt_path + "/" + img_name + "__" + str(xmin_index * leap) + "__" + str(
                    ymin_index * leap) + ".txt"
                write_txt(info, file_name)
            # 第二种情况,目标横跨左右两幅图
            elif xmin_index + 1 == xmax_index and ymin_index == ymax_index:
                # 保存左半目标
                info = xml2txt(xmin, leap, ymin, ymax, class_name, img_width, img_height)
                file_name = output_txt_path + "/" + img_name + "__" + str(xmin_index * leap) + "__" + str(
                    ymax_index * leap) + ".txt"
                write_txt(info, file_name)
                # 保存右半目标
                info = xml2txt(0, xmax, ymin, ymax, class_name, img_width, img_height)
                file_name = output_txt_path + "/" + img_name + "__" + str(xmax_index * leap) + "__" + str(
                    ymax_index * leap) + ".txt"
                write_txt(info, file_name)
            # 第三种情况,目标纵跨上下两幅图
            elif xmin_index == xmax_index and ymin_index + 1 == ymax_index:
                # 保存上半目标
                info = xml2txt(xmin, xmax, ymin, leap, class_name, img_width, img_height)
                file_name = output_txt_path + "/" + img_name + "__" + str(xmin_index * leap) + "__" + str(
                    ymin_index * leap) + ".txt"
                write_txt(info, file_name)
                # 保存下半目标
                info = xml2txt(xmin, xmax, 0, ymax, class_name, img_width, img_height)
                file_name = output_txt_path + "/" + img_name + "__" + str(xmin_index * leap) + "__" + str(
                    ymax_index * leap) + ".txt"
                write_txt(info, file_name)


if __name__ == "__main__":
    main()

Guess you like

Origin blog.csdn.net/qq1198768105/article/details/133692067