COCO data set production

COCO data set production COCO data set production C O C O number data collection system for

[Actual combat] Windows10+YOLOv3 realizes detection of own data set (1)-make your own data set

First determine the data form of the coco detection and segmentation of the input model

Labelme production data set to COCO data format

Labelimg data set to COCO data format

coco data sets can be used to make 目标检测and语义分割

The label information of the coco dataset is stored in .json format

An introduction

The full name of MS COCO is Microsoft Common Objects in Context, which is derived from the Microsoft COCO dataset that Microsoft funded and annotated in 2014. Its status is equivalent to ImageNet and is one of the best datasets for measuring the performance of common models.

The COCO dataset is a large and rich object detection, segmentation and captioning dataset.

With scene understanding as the goal, it is mainly intercepted from complex daily scenes, and the target in the image is calibrated through precise segmentation.

The image includes 91 types of targets, 328,000 images and 2,500,000 labels.

There is the largest dataset of semantic segmentation so far. There are 80 categories and more than 330,000 images, of which 200,000 are labeled. The number of individuals in the entire dataset exceeds 1.5 million

download

1. Download the 2014 data set
http://msvocds.blob.core.windows.net/coco2014/train2014.zip

2. Download the 2017 dataset
http://images.cocodataset.org/zips/train2017.zip
http://images.cocodataset.org/annotations/annotations_trainval2017.zip

http://images.cocodataset.org/zips/val2017.zip
http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip

http://images.cocodataset.org/zips/test2017.zip
http://images.cocodataset.org/annotations/image_info_test2017.zip

Realization case

import json
import os
import cv2
import shutil
import xml.etree.ElementTree as ET
dataset = {
    
    }
def readxml(dataset,xml,count):
	tree = ET.parse(xml)
	root = tree.getroot()
	for child in root:
		if child.tag == "size":
			for s_ch in child:
				if s_ch.tag == "width":
					w = s_ch.text
				else :
					h = s_ch.text
		elif child.tag == "object":
			for s_ch in child:
				if s_ch.tag == "bndbox":
					for ss_ch in s_ch:
						if ss_ch.tag == "xmin":
							xmin = ss_ch.text
						elif ss_ch.tag == "ymin":
							ymin = ss_ch.text
						elif ss_ch.tag == "xmax":
							xmax = ss_ch.text
						elif ss_ch.tag == "ymax":
							ymax = ss_ch.text
				else: 
				    ca_name = s_ch.text

	dataset.setdefault("images",[]).append({
    
    
		'file_name': str(count) +'.jpg',
		'id': int(count),
		'width': int(w),
		'height': int(h) 
		})
	dataset.setdefault("annotations",[]).append({
    
    
		'image_id': int(count),
		'bbox': [int(xmin), int(ymin), int(xmax)-int(xmin), int(ymax)-int(ymin)],
		'category_id': 6,
                'area':int(w) * int(h),
                'iscrowd':0,
                'id':int(count),
                'segmentation':[]
		})
        

im_path="/home/qusongyun/images/"
trainimg = "/home/qusongyun/simpledet/data/coco/images/val2014/"

cmax = 0
dirpath = os.listdir(im_path)
for imgdir in dirpath:

	f1 = os.listdir(trainimg)
	for file in f1:
		cmax = max(cmax,int(file.split(".")[0]))
	count = 1

	for file in os.listdir(im_path + imgdir):

		if file.split(".")[1] == "jpg":
			oldname = os.path.join(im_path + imgdir, file)
			jpgname = os.path.join(trainimg, str(count+cmax) + ".jpg")
			shutil.copyfile(oldname, jpgname)
			readxml(dataset,os.path.join(im_path + imgdir , file.split(".")[0] + ".xml"),count+cmax)
			count += 1
			
for i in range(1,81):
	dataset.setdefault("categories",[]).append({
    
    
		'id': i, 
		'name':1,
		'supercategory': 'No'
		})

folder = os.path.join('/home/qusongyun/simpledet/data/coco/annotations/')
if not os.path.exists(folder):
	os.makedirs(folder)
json_name = os.path.join(folder+'instances_minival2014.json')
with open(json_name, 'w') as f:
	json.dump(dataset, f)

references

https://patrickwasp.com/create-your-own-coco-style-dataset/

Make your own COCO data set and train mask R-CNN

https://cocodataset.org

Guess you like

Origin blog.csdn.net/qq_41375318/article/details/112861278