Create a complete set of processes for your own dataset

Table of contents

1. Prepare your own dataset

2. Label data----json format

3. Convert the labeled data into a segmented image----voc format

4. Augment the dataset

5. Sub-training set and verification set


1. Prepare your own dataset

  Note: The data set must be in a unified suffix format, jpg or png

2. Label data----json format

Using labelme annotation, it is automatically saved as an annotation format. You can view this article

Note; this process is carried out in the anaconda prompt, enter the environment created by yourself, and then enter labelme to enter the labelme tool for labeling.

3. Convert the labeled data into a segmented image----voc format

 The json to voc code converts the json format into a png format segmentation map.

#运行命令
#data_annotated是标注图像的名字,data_dataset_voc是新创建的VOC格式的文件名字。
python labelme2voc.py data_annotated data_dataset_voc --labels labels.txt

Note: This naming is still carried out in the anaconda prompt, but pay attention to changing the directory to the code directory where you put the json to voc format and then run it. And replace data_annotated with the folder name of the image you marked, data_dataset_voc is the file name of the newly created VOC format, you can replace it yourself.

4. Augment the dataset

Here is the simultaneous augmentation of the original image and the labeled image. You can read this article to learn how to achieve augmentation at the same time.

Note: ① This code can only run on the cpu, not the gpu;

Step 1: Since the original image in voc format is jpg and the segmented image is png, it is necessary to keep the suffixes of the original image and the labeled image consistent before augmentation. How to batch batch can always be seen in this article for the implementation process .

The second part: Since our original image and the segmented image have a one-to-one correspondence, including the names, they should be consistent, but the names after augmentation are different, so we need to delete the same part of the file in batches to make the augmentation The name of the original image and the split image remains the same.

Step 3: restore the processed augmented image to voc format, that is, convert the suffix of the original image to jpg format. Convert the suffix of the split image to png format.

5. Sub-training set and verification set

You can see this article to achieve the final division.

Note: The line of code below indicates the path of the segmentation map, and the suffix is ​​the png image.

 After such a cumbersome process, the data preparation can be completed.

Awesome!

Guess you like

Origin blog.csdn.net/weixin_45912366/article/details/127936807