[halcon deep learning] Conversion of image segmentation data set format

Preface

目前用于**图像分割的**数据集,我目前接触到的用的比较多的有:
1 PASCAL VOC
2 COCO
3 YOLO
4 Halcon自己的格式(其实就是Halcon字典类型)

The current data set formats I am involved in computer vision include PASCAL VOC, COCO and YOLO for different target detection and image segmentation tasks. The following is an introduction to these three dataset formats:

1. PASCAL VOC format:

PASCAL VOC (Visual Object Classes) is a widely used object detection and image segmentation data set, and its annotation format is provided in the form of XML files. The following is an example of PASCAL VOC format (for a single object):

<annotation>
	<folder>images</folder>
	<filename>example.jpg</filename>
	<source>
		<database>PASCAL VOC</database>
	</source>
	<size>
		<width>800</width>
		<height>600</height>
		<depth>3</depth>
	</size>
	<object>
		<name>cat</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>200</xmin>
			<ymin>150</ymin>
			<xmax>400</xmax>
			<ymax>450</ymax>
		</bndbox>
	</object>
</annotation>

2. COCO format:

COCO (Common Objects in Context) is a large-scale dataset for object detection, segmentation and key point estimation. Its annotation format is provided in the form of JSON files. The following is an example in COCO format (for a single object):

{
    
    
	"info": {
    
    },
	"images": [
		{
    
    
			"id": 1,
			"file_name": "example.jpg",
			"width": 800,
			"height": 600,
			"depth": 3
		}
	],
	"annotations": [
		{
    
    
			"id": 1,
			"image_id": 1,
			"category_id": 1,
			"bbox": [200, 150, 200, 300],
			"area": 60000,
			"iscrowd": 0
		}
	],
	"categories": [
		{
    
    
			"id": 1,
			"name": "cat"
		}
	]
}

3. YOLO format:

YOLO (You Only Look Once) is a target detection algorithm and also has its specific data set format. The YOLO format usually requires a text file where each line describes an object in an image. Here is an example in YOLO format (each line represents a single object):

0 0.45 0.35 0.2 0.5

In this example, each row contains the category index and the normalized coordinate information of the object (center point coordinates and width and height relative to the image size).

Please note that these examples are for demonstration purposes only and the actual dataset files may contain annotation information for many more images and objects. Different dataset formats are suitable for different tasks and algorithms, and you need to understand the corresponding annotation format when using a specific dataset.

These formats all describe the position of a certain frame in the picture and the category corresponding to the frame.

background

I now have a data set in PASCAL VOC format. Each picture has a corresponding labeled image. I now want to use halcon to read the entire data set. However, halcon has its own annotation tool: MVTec Deep Learning Tool
has pictures annotated by this software, and the exported data set format is: .hdict.
Is there any way to directly convert PASCAL VOC to .hdict format?

PASCAL VOC to .hdict

We have already seen the format type of PASCAL VOC. It is an XML. Parsing this XML is easy, but the .hdict file is a binary file and the content cannot be seen.
So, I searched the entire Internet and found a halcon script for converting PASCAL VOC to .hdict. Then I spent a lot of money to buy it. After downloading it, I found that there was no big problem. It worked with a little modification:

*read_dict ('C:/Users/12820/Desktop/数据/分割.hdict', [], [], DictHandle)
* Image Acquisition 01: Code generated by Image Acquisition 01

*read_dl_dataset_from_coco
*read_dl


create_dict (NEWDictHandle1)
class_ids:=[0,1,2,3,4,5]
class_names:=['crazing', 'inclusion', 'patches', 'pitted_surface', 'rolled-in_scale', 'scratches']
image_dir:='images/'

set_dict_tuple (NEWDictHandle1, 'class_ids', class_ids)
set_dict_tuple (NEWDictHandle1, 'class_names', class_names)
set_dict_tuple (NEWDictHandle1, 'image_dir', image_dir)

list_files ('images/', ['files','follow_links','recursive'], ImageFiles)
tuple_regexp_select (ImageFiles, ['\\.(tif|tiff|gif|bmp|jpg|jpeg|jp2|png|pcx|pgm|ppm|pbm|xwd|ima|hobj)$','ignore_case'], ImageFiles)


list_files ('labels/', ['files','follow_links','recursive'], xmladdress)
samples:=[]
for Index := 0 to |ImageFiles| - 1 by 1
    read_image (Image, ImageFiles[Index])
    
    
    
    
    
   open_file (xmladdress[Index], 'input', FileHandle)
   IsEof := false
   bbox_row1:=[]  
   bbox_col1:=[]  
   bbox_row2:=[]  
   bbox_col2:=[]  
   bbox_label_id:=[]
   while (not(IsEof))
       fread_line (FileHandle, XmlElement, IsEof)
       if (IsEof)
           break
       endif
       tuple_split (XmlElement, '<''>', Substrings)
       create_dict (image)
       if (Substrings[1]=='folder')
           floder:= Substrings[2] 
       endif  
       if (Substrings[1]=='filename')
           filename:= Substrings[2] 
       endif    
       *class_names:=['crazing', 'inclusion', 'patches', 'pitted_surface', 'rolled-in_scale', 'scratches']
       if (Substrings[1]=='name')
           if (Substrings[2]== class_names[0] )
               bbox_label_id:=[bbox_label_id,0]
           elseif (Substrings[2]==class_names[1])
               bbox_label_id:=[bbox_label_id,1]
           elseif (Substrings[2]==class_names[2])
               bbox_label_id:=[bbox_label_id,2]
           elseif (Substrings[2]==class_names[3])
                    bbox_label_id:=[bbox_label_id,3]
           elseif (Substrings[2]==class_names[4])
               bbox_label_id:=[bbox_label_id,4]
           elseif (Substrings[2]==class_names[5])
               bbox_label_id:=[bbox_label_id,5]
           endif
      endif
      
      if (Substrings[1]=='xmin')
          bbox_col1:= [bbox_col1,Substrings[2]]
          tuple_number (bbox_col1, bbox_col1)
   
      endif  
      if (Substrings[1]=='ymin')
          bbox_row1:= [bbox_row1,Substrings[2] ]
          tuple_number (bbox_row1, bbox_row1)
      endif  
      if (Substrings[1]=='xmax')
          bbox_col2:=[bbox_col2, Substrings[2] ]
          tuple_number (bbox_col2, bbox_col2)
      endif  
      if (Substrings[1]=='ymax')
          bbox_row2:=[bbox_row2, Substrings[2] ]
          tuple_number (bbox_row2, bbox_row2)
      endif  


    endwhile
    
    
   * gen_rectangle1 (Rectangle,bbox_row1 , bbox_col1,bbox_row2 , bbox_col2)
   
  set_dict_tuple (image, 'image_id', Index+1)
  set_dict_tuple (image, 'image_file_name', floder+'/'+filename)
  set_dict_tuple (image, 'bbox_label_id', bbox_label_id)
  set_dict_tuple (image, 'bbox_row1', bbox_row1)
  set_dict_tuple (image, 'bbox_col1', bbox_col1)
  set_dict_tuple (image, 'bbox_row2', bbox_row2)
  set_dict_tuple (image, 'bbox_col2', bbox_col2)
 samples:=[samples,image]
    
    *stop()  
endfor

set_dict_tuple (NEWDictHandle1, 'samples', samples)
 
write_dict (NEWDictHandle1, '数据test.hdict', [], [])

After seeing the last sentence: write_dict, I realized that the so-called .hdict file is the dictionary format in halcon!
Although this script file can be used, it took nearly half an hour to convert 1800 pieces of data. Can this be tolerated?
There is also a problem in the PASCAL VOC annotation file where the image name does not have a suffix, so the image cannot be
displayed in the Deep Learning Tool after importing! So, after I figured out the principle, it would be easier for me to write a tool myself:

HTuple NEWDict;
HOperatorSet.CreateDict(out NEWDict);

 List<int> class_ids = new List<int> {
    
     0, 1, 2, 3, 4, 5 };
 List<string> class_names = new List<string> {
    
     "crazing", "inclusion", "patches", "pitted_surface", "rolled-in_scale", "scratches" };
 string image_dir = "F:\\temp\\数据集格式转换测试\\images";

 //图片字典
 HTuple hv_image = new HTuple();
 HTuple hv_samples = new HTuple();

 HTuple hv_class_ids = new HTuple(class_ids.ToArray());
 HTuple hv_class_names = new HTuple(class_names.ToArray());
 HTuple hv_image_dir = new HTuple(image_dir);

 HOperatorSet.SetDictTuple(NEWDict, "class_ids", hv_class_ids);
 HOperatorSet.SetDictTuple(NEWDict, "class_names", hv_class_names);
 HOperatorSet.SetDictTuple(NEWDict, "image_dir", hv_image_dir);


 string[] imageFiles = Directory.GetFiles(image_dir, "*.*", SearchOption.AllDirectories);

 List<Dictionary<string, object>> samples = new List<Dictionary<string, object>>();

 int index = 0;
 string extension = "";
 foreach (string imagePath in imageFiles)
 {
    
    
     HOperatorSet.CreateDict(out hv_image);

     string xmlPath = "D:/DATASET/yolo/NEU-DET/ANNOTATIONS/" + Path.GetFileNameWithoutExtension(imagePath) + ".xml";

     XDocument xdoc;
     using (StreamReader reader = new StreamReader(xmlPath))
     {
    
    
         string xmlContent = reader.ReadToEnd();
         xdoc = XDocument.Parse(xmlContent);
         
         // 现在可以使用xdoc进行XML解析操作
     }

     XElement xroot = xdoc.Root;//根节点
     List<int> bbox_label_ids = new List<int>();
     List<int> bbox_col1 = new List<int>();
     List<int> bbox_row1 = new List<int>();
     List<int> bbox_col2 = new List<int>();
     List<int> bbox_row2 = new List<int>();

     //----folder
     var folder = xroot.Element("folder").Value;

     //----filename
     var filename = xroot.Element("filename").Value;
     if(Path.GetExtension(filename) != "")
     {
    
    
         extension = Path.GetExtension(filename);
     }
     else
     {
    
    
         if (extension != "")
         {
    
    
             filename += extension;
         }
     }


     //----获取object节点(一个xml中可能会有多个)
     var objectNodes = xroot.Descendants("object");

     foreach (var objectNode in objectNodes)
     {
    
    
         //bndbox节点,包含xmin,ymin,xmax,ymax
         XElement bndboxNode = objectNode.Element("bndbox");
         XElement xminNode = bndboxNode.Element("xmin");
         XElement yminNode = bndboxNode.Element("ymin");
         XElement xmaxNode = bndboxNode.Element("xmax");
         XElement ymaxNode = bndboxNode.Element("ymax");

         // 解析坐标值并添加到相应列表
         bbox_col1.Add(int.Parse(xminNode.Value));
         bbox_row1.Add(int.Parse(yminNode.Value));
         bbox_col2.Add(int.Parse(xmaxNode.Value));
         bbox_row2.Add(int.Parse(ymaxNode.Value));

         // 获取类别名称对应的编号,并添加到相应列表
         string className = objectNode.Element("name").Value;
         int id = class_names.IndexOf(className);
         bbox_label_ids.Add(id);                                
     }


     HOperatorSet.SetDictTuple(hv_image, "image_id", index + 1);
     HOperatorSet.SetDictTuple(hv_image, "image_file_name", (folder + "/") + filename);
     HOperatorSet.SetDictTuple(hv_image, "bbox_label_id", bbox_label_ids.ToArray());
     HOperatorSet.SetDictTuple(hv_image, "bbox_row1", bbox_row1.ToArray());
     HOperatorSet.SetDictTuple(hv_image, "bbox_col1", bbox_col1.ToArray());
     HOperatorSet.SetDictTuple(hv_image, "bbox_row2", bbox_row2.ToArray());
     HOperatorSet.SetDictTuple(hv_image, "bbox_col2", bbox_col2.ToArray());


     // hv_image添加到samples
     using (HDevDisposeHelper dh = new HDevDisposeHelper())
     {
    
                    
         HTuple ExpTmpLocalVar_samples = hv_samples.TupleConcat(hv_image);
         hv_samples.Dispose();
         hv_samples = ExpTmpLocalVar_samples;                    
     }

     index++;
 }

 HOperatorSet.SetDictTuple(NEWDict, "samples", hv_samples);
 HOperatorSet.WriteDict(NEWDict, "数据Csharp.hdict", new HTuple(), new HTuple());
 MessageBox.Show("转换完成");

This time I use the XDocument method to parse, and the conversion is completed with just a few clicks! Use Deep Learning Tool again to open the converted Csharp.hdict, this time it was successful.
Insert image description here

Read a data set in .hdict format

With the data set in .hdict format, how to use it?

*Read the data set! ! !
This is the dictionary read_dict ("xxxxx.hdict", [], [], DLDataset) marked by the deep learning tool.
It should be a dictionary, so you can read the data set directly using read_dict!

Also, in addition to its own data set, halcon can actually read the coco data set directly:
read_dl_dataset_from_coco (FileExists, [], [], DLDataset1)
is not very convenient!

Details on how to train the data will be continuously output in the future. See you in the next article!

appendix

Comes with a python script for converting PASCAL VOC to YOLO!

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import glob


classes = ["crazing", "inclusion", "patches", "pitted_surface", "rolled-in_scale", "scratches"]

def convert(size, box):
    dw = 1./size[0]
    dh = 1./size[1]
    x = (box[0] + box[1])/2.0
    y = (box[2] + box[3])/2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(image_name):
    in_file = open('./ANNOTATIONS/'+image_name[:-3]+'xml')
    out_file = open('./LABELS/'+image_name[:-3]+'txt','w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        cls = obj.find('name').text
        if cls not in classes:
            print(cls)
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

wd = getcwd()

if __name__ == '__main__':
    for image_path in glob.glob("./IMAGES/*.jpg"):
        image_name = image_path.split('\\')[-1]
        #print(image_path)
        convert_annotation(image_name)

Guess you like

Origin blog.csdn.net/songhuangong123/article/details/132541216