[DETR] Train your own data set - practice notes

DETR (Detection with TRansformers) trains its own data set - practice notes & problem summary

DETR (Detection with TRansformers) is an end-to-end target detection based on transformers, without NMS post-processing steps, and without anchors.
Realize the training of DETR using the NWPUVHR10 dataset.
The NWPU dataset contains a total of ten categories of targets, including 650 positive samples and 150 negative samples (not used).

NWPU_CATEGORIES=['airplane','ship','storage tank','baseball diamond','tennis court',\
					'basketball court','ground track field','harbor','bridge','vehicle']

Code: https://github.com/facebookresearch/detr

1. Training

1. Dataset preparation

The format of the data set required by DETR is coco format, and the pictures and label files are saved in four folders: training set, test set, verification set, and label files, among which the label files in json format are stored in annotations
insert image description here

insert image description here

The following code contains several data sets RSOD, NWPU, DIOR, YOLO data set label file conversion json function. Create a new py file tojson.py, and use the following code to generate the required json file.

Generate instances_train2017.json
(a) Modify the default path of image_path in line 29 to the path of train2017;
(b) Modify the default path of annotation_path in line 31 to the path of the label file (the labels of train and val are placed in this folder, so generate instances_val2017.json (c) Modify the
33-line dataset to your own dataset name NWPU
(d). Modify the default path of the 34-line save to the save path of the json file.../NWPUVHR-10/annotations/ instances_train2017.json

import os
import cv2
import json
import argparse
from tqdm import tqdm
import xml.etree.ElementTree as ET

COCO_DICT=['images','annotations','categories']
IMAGES_DICT=['file_name','height','width','id']

ANNOTATIONS_DICT=['image_id','iscrowd','area','bbox','category_id','id']

CATEGORIES_DICT=['id','name']
## {'supercategory': 'person', 'id': 1, 'name': 'person'}
## {'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'}
YOLO_CATEGORIES=['person']
RSOD_CATEGORIES=['aircraft','playground','overpass','oiltank']
NWPU_CATEGORIES=['airplane','ship','storage tank','baseball diamond','tennis court',\
					'basketball court','ground track field','harbor','bridge','vehicle']

VOC_CATEGORIES=['aeroplane','bicycle','bird','boat','bottle','bus','car','cat','chair','cow',\					'diningtable','dog','horse','motorbike','person','pottedplant','sheep','sofa','train','tvmonitor']

DIOR_CATEGORIES=['golffield','Expressway-toll-station','vehicle','trainstation','chimney','storagetank',\
					'ship','harbor','airplane','groundtrackfield','tenniscourt','dam','basketballcourt',\
					'Expressway-Service-area','stadium','airport','baseballfield','bridge','windmill','overpass']

parser=argparse.ArgumentParser(description='2COCO')
#parser.add_argument('--image_path',type=str,default=r'T:/shujuji/DIOR/JPEGImages-trainval/',help='config file')
parser.add_argument('--image_path',type=str,default=r'G:/NWPU VHR-10 dataset/positive image set/',help='config file')
#parser.add_argument('--annotation_path',type=str,default=r'T:/shujuji/DIOR/Annotations/',help='config file')
parser.add_argument('--annotation_path',type=str,default=r'G:/NWPU VHR-10 dataset/ground truth/',help='config file')
parser.add_argument('--dataset',type=str,default='NWPU',help='config file')
parser.add_argument('--save',type=str,default='G:/NWPU VHR-10 dataset/instances_train2017.json',help='config file')
args=parser.parse_args()
def load_json(path):
	with open(path,'r') as f:
		json_dict=json.load(f)
		for i in json_dict:
			print(i)
		print(json_dict['annotations'])
def save_json(dict,path):
	print('SAVE_JSON...')
	with open(path,'w') as f:
		json.dump(dict,f)
	print('SUCCESSFUL_SAVE_JSON:',path)
def load_image(path):
	img=cv2.imread(path)
	return img.shape[0],img.shape[1]
def generate_categories_dict(category):       #ANNOTATIONS_DICT=['image_id','iscrowd','area','bbox','category_id','id']
	print('GENERATE_CATEGORIES_DICT...')
	return [{
    
    CATEGORIES_DICT[0]:category.index(x)+1,CATEGORIES_DICT[1]:x} for x in category]  #CATEGORIES_DICT=['id','name']
def generate_images_dict(imagelist,image_path,start_image_id=11725):  #IMAGES_DICT=['file_name','height','width','id']
	print('GENERATE_IMAGES_DICT...')
	images_dict=[]
	with tqdm(total=len(imagelist)) as load_bar:
		for x in imagelist:  #x就是图片的名称
			#print(start_image_id)
			dict={
    
    IMAGES_DICT[0]:x,IMAGES_DICT[1]:load_image(image_path+x)[0],\
					IMAGES_DICT[2]:load_image(image_path+x)[1],IMAGES_DICT[3]:imagelist.index(x)+start_image_id}
			load_bar.update(1)
			images_dict.append(dict)
	return images_dict

def DIOR_Dataset(image_path,annotation_path,start_image_id=11725,start_id=0):
	categories_dict=generate_categories_dict(DIOR_CATEGORIES)    #CATEGORIES_DICT=['id':,1'name':golffield......]  id从1开始
	imgname=os.listdir(image_path)
	images_dict=generate_images_dict(imgname,image_path,start_image_id)  #IMAGES_DICT=['file_name','height','width','id']  id从0开始的
	print('GENERATE_ANNOTATIONS_DICT...')  #生成cooc的注记   ANNOTATIONS_DICT=['image_id','iscrowd','area','bbox','category_id','id']
	annotations_dict=[]
	id=start_id
	for i in images_dict:
		image_id=i['id']
		print(image_id)
		image_name=i['file_name']
		annotation_xml=annotation_path+image_name.split('.')[0]+'.xml'
		tree=ET.parse(annotation_xml)
		root=tree.getroot()
		for j in root.findall('object'):
			category=j.find('name').text
			category_id=DIOR_CATEGORIES.index(category)  #字典的索引,是从1开始的
			x_min=float(j.find('bndbox').find('xmin').text)
			y_min=float(j.find('bndbox').find('ymin').text)
			w=float(j.find('bndbox').find('xmax').text)-x_min
			h=float(j.find('bndbox').find('ymax').text)-y_min
			area = w * h
			bbox = [x_min, y_min, w, h]
			dict = {
    
    'image_id': image_id, 'iscrowd': 0, 'area': area, 'bbox': bbox, 'category_id': category_id,
					'id': id}
			annotations_dict.append(dict)
			id=id+1
	print('SUCCESSFUL_GENERATE_DIOR_JSON')
	return {
    
    COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}
def NWPU_Dataset(image_path,annotation_path,start_image_id=0,start_id=0):
	categories_dict=generate_categories_dict(NWPU_CATEGORIES)
	imgname=os.listdir(image_path)
	images_dict=generate_images_dict(imgname,image_path,start_image_id)
	print('GENERATE_ANNOTATIONS_DICT...')
	annotations_dict=[]
	id=start_id
	for i in images_dict:
		image_id=i['id']
		image_name=i['file_name']
		annotation_txt=annotation_path+image_name.split('.')[0]+'.txt'
		txt=open(annotation_txt,'r')
		lines=txt.readlines()
		for j in lines:
			if j=='\n':
				continue
			category_id=int(j.split(',')[4])
			category=NWPU_CATEGORIES[category_id-1]
			print(category_id,'        ',category)
			x_min=float(j.split(',')[0].split('(')[1])
			y_min=float(j.split(',')[1].split(')')[0])
			w=float(j.split(',')[2].split('(')[1])-x_min
			h=float(j.split(',')[3].split(')')[0])-y_min
			area=w*h
			bbox=[x_min,y_min,w,h]
			dict = {
    
    'image_id': image_id, 'iscrowd': 0, 'area': area, 'bbox': bbox, 'category_id': category_id,
					'id': id}
			id=id+1
			annotations_dict.append(dict)
	print('SUCCESSFUL_GENERATE_NWPU_JSON')
	return {
    
    COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}

def YOLO_Dataset(image_path,annotation_path,start_image_id=0,start_id=0):
	categories_dict=generate_categories_dict(YOLO_CATEGORIES)
	imgname=os.listdir(image_path)
	images_dict=generate_images_dict(imgname,image_path)
	print('GENERATE_ANNOTATIONS_DICT...')
	annotations_dict=[]
	id=start_id
	for i in images_dict:
		image_id=i['id']
		image_name=i['file_name']
		W,H=i['width'],i['height']
		annotation_txt=annotation_path+image_name.split('.')[0]+'.txt'
		txt=open(annotation_txt,'r')
		lines=txt.readlines()
		for j in lines:
			category_id=int(j.split(' ')[0])+1
			category=YOLO_CATEGORIES
			x=float(j.split(' ')[1])
			y=float(j.split(' ')[2])
			w=float(j.split(' ')[3])
			h=float(j.split(' ')[4])
			x_min=(x-w/2)*W
			y_min=(y-h/2)*H
			w=w*W
			h=h*H
			area=w*h
			bbox=[x_min,y_min,w,h]
			dict={
    
    'image_id':image_id,'iscrowd':0,'area':area,'bbox':bbox,'category_id':category_id,'id':id}
			annotations_dict.append(dict)
			id=id+1
	print('SUCCESSFUL_GENERATE_YOLO_JSON')
	return {
    
    COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}
def RSOD_Dataset(image_path,annotation_path,start_image_id=0,start_id=0):
	categories_dict=generate_categories_dict(RSOD_CATEGORIES)
	imgname=os.listdir(image_path)
	images_dict=generate_images_dict(imgname,image_path,start_image_id)
	print('GENERATE_ANNOTATIONS_DICT...')
	annotations_dict=[]
	id=start_id
	for i in images_dict:
		image_id=i['id']
		image_name=i['file_name']
		annotation_txt=annotation_path+image_name.split('.')[0]+'.txt'
		txt=open(annotation_txt,'r')
		lines=txt.readlines()
		for j in lines:
			category=j.split('\t')[1]
			category_id=RSOD_CATEGORIES.index(category)+1
			x_min=float(j.split('\t')[2])
			y_min=float(j.split('\t')[3])
			w=float(j.split('\t')[4])-x_min
			h=float(j.split('\t')[5])-y_min
			area = w * h
			bbox = [x_min, y_min, w, h]
			dict = {
    
    'image_id': image_id, 'iscrowd': 0, 'area': area, 'bbox': bbox, 'category_id': category_id,
					'id': id}
			annotations_dict.append(dict)
			id=id+1
	print('SUCCESSFUL_GENERATE_RSOD_JSON')

	return {
    
    COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}
if __name__=='__main__':
	dataset=args.dataset   #数据集名字
	save=args.save  #json的保存路径
	image_path=args.image_path     #对于coco是图片的路径
	annotation_path=args.annotation_path   #coco的annotation路径
	if dataset=='RSOD':
		json_dict=RSOD_Dataset(image_path,annotation_path,0)
	if dataset=='NWPU':
		json_dict=NWPU_Dataset(image_path,annotation_path,0)
	if dataset=='DIOR':
		json_dict=DIOR_Dataset(image_path,annotation_path,11725)
	if dataset=='YOLO':
		json_dict=YOLO_Dataset(image_path,annotation_path,0)
	save_json(json_dict,save)

Run to generate instances_train2017.json, and then modify the path to generate instances_train2017.json.
(After being reminded by the friends in the comment area, there is a problem and has been modified)

If your own data set is in voc format, you can use the code given in reference link 1.

2. Environment configuration

Activate the environment where the current project is located, and use the following command to complete the environment configuration:

pip install -r requirements.txt

3. pth file generation

First download the pre-training file , the official provides DETR and DETR-DC5 models, two models, the latter uses hole convolution in the fifth layer of the backbone network, choose one to download

Create a new py file, mydataset.py, use the following code to modify num_classes to the number of your own categories + 1

import torch
pretrained_weights  = torch.load('detr-r50-e632da11.pth')

#NWPU数据集,10类
num_class = 11    #类别数+1,1为背景
pretrained_weights["model"]["class_embed.weight"].resize_(num_class+1, 256)
pretrained_weights["model"]["class_embed.bias"].resize_(num_class+1)
torch.save(pretrained_weights, "detr-r50_%d.pth"%num_class)

Run, generate detr-r50_11.pth

4. Parameter modification

Modify the models/detr.py file, in the build() function, you can comment out the code in the red box, and directly set num_classes to the number of your own categories + 1
insert image description here

5. Training

Some parameters need to be set for training, which can be modified directly in the main.py file or by using the command line

[a] Modify the main.py file directly
Modify the training parameters such as epochs, lr, batch_size of the main.py file
insert image description here
Modify your own data set path:
insert image description here

Set the output path:
insert image description here
modify resume to be your own pre-training weight file path
insert image description here
[b] command line

python main.py --dataset_file "coco" --coco_path "/home/NWPUVHR-10" --epoch 300 --lr=1e-4 --batch_size=8 --num_workers=4 --output_dir="outputs" --resume="detr_r50_11.pth"

Run the main.py file

2. Bugs that appear

1. KeyError: ‘area’

area = torch.tensor([obj[“area”] for obj in anno])
KeyError: 'area'
insert image description here
can be seen from the located statement, it should be that the dictionary obj does not have a key named area, check my label file, There is indeed a problem with the generation, there is no area information in it, it is regenerated, and it runs successfully.
insert image description here

3. Evaluation and Forecast

1. Evaluation

In fact, during the training process, DETR will automatically perform an accuracy evaluation at each epoch. The evaluation results can be viewed directly in the output or generated intermediate files. You can also use main.py to evaluate the accuracy, and change –resume in the following code to the last For the model path output by an epoch, just change –coco_path to your own dataset path

python main.py --batch_size 6 --no_aux_loss --eval --resume /home/detr-main/outputs/checkpoint0299.pth --coco_path /home/NWPUVHR-10

Accuracy evaluation results:
After training for 300 epochs, the AP with IoU=0.5 is about 88%
insert image description here

2. Forecast

Use the code in reference link 3 to save the pictures to be predicted in a folder, and output the prediction results of all pictures at one time during prediction

The parameters that need to be modified are:

Backbone, the backbone I downloaded at the time was resnet50, modified (the backbone feature network that has been downloaded during training is the DETR weight file of Resnet50, placed in the main folder) the relevant parameters of the data set – coco_path is modified to your own data set
insert image description here
path

outputdir Change to the saved folder of the created prediction picture
–resume Change to the trained model file path
insert image description here
Change the image_file_path and image_path of the picture folder path to be predicted
insert image description here

ps: Since I ran with the server and could not return the picture, an error occurred, so I commented out these two sentences:

#cv2.imshow("images",image)
#cv2.waitKey(1)

It is possible to report an error, which is caused by not detecting the target when predicting one of your pictures:
insert image description here
Prediction result:
insert image description here

References:
1. Windows 10 reproduces DEtection TRansformers (DETR
2. How to use DETR (detection transformer) to train your own data set
3. Pytorch realizes the reasoning program of DETR

Guess you like

Origin blog.csdn.net/gsgs1234/article/details/125654021