DETR (Detection with TRansformers) trains its own data set - practice notes & problem summary
DETR (Detection with TRansformers) is an end-to-end target detection based on transformers, without NMS post-processing steps, and without anchors.
Realize the training of DETR using the NWPUVHR10 dataset.
The NWPU dataset contains a total of ten categories of targets, including 650 positive samples and 150 negative samples (not used).
NWPU_CATEGORIES=['airplane','ship','storage tank','baseball diamond','tennis court',\
'basketball court','ground track field','harbor','bridge','vehicle']
Code: https://github.com/facebookresearch/detr
Article directory
1. Training
1. Dataset preparation
The format of the data set required by DETR is coco format, and the pictures and label files are saved in four folders: training set, test set, verification set, and label files, among which the label files in json format are stored in annotations
The following code contains several data sets RSOD, NWPU, DIOR, YOLO data set label file conversion json function. Create a new py file tojson.py, and use the following code to generate the required json file.
Generate instances_train2017.json
(a) Modify the default path of image_path in line 29 to the path of train2017;
(b) Modify the default path of annotation_path in line 31 to the path of the label file (the labels of train and val are placed in this folder, so generate instances_val2017.json (c) Modify the
33-line dataset to your own dataset name NWPU
(d). Modify the default path of the 34-line save to the save path of the json file.../NWPUVHR-10/annotations/ instances_train2017.json
import os
import cv2
import json
import argparse
from tqdm import tqdm
import xml.etree.ElementTree as ET
COCO_DICT=['images','annotations','categories']
IMAGES_DICT=['file_name','height','width','id']
ANNOTATIONS_DICT=['image_id','iscrowd','area','bbox','category_id','id']
CATEGORIES_DICT=['id','name']
## {'supercategory': 'person', 'id': 1, 'name': 'person'}
## {'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'}
YOLO_CATEGORIES=['person']
RSOD_CATEGORIES=['aircraft','playground','overpass','oiltank']
NWPU_CATEGORIES=['airplane','ship','storage tank','baseball diamond','tennis court',\
'basketball court','ground track field','harbor','bridge','vehicle']
VOC_CATEGORIES=['aeroplane','bicycle','bird','boat','bottle','bus','car','cat','chair','cow',\ 'diningtable','dog','horse','motorbike','person','pottedplant','sheep','sofa','train','tvmonitor']
DIOR_CATEGORIES=['golffield','Expressway-toll-station','vehicle','trainstation','chimney','storagetank',\
'ship','harbor','airplane','groundtrackfield','tenniscourt','dam','basketballcourt',\
'Expressway-Service-area','stadium','airport','baseballfield','bridge','windmill','overpass']
parser=argparse.ArgumentParser(description='2COCO')
#parser.add_argument('--image_path',type=str,default=r'T:/shujuji/DIOR/JPEGImages-trainval/',help='config file')
parser.add_argument('--image_path',type=str,default=r'G:/NWPU VHR-10 dataset/positive image set/',help='config file')
#parser.add_argument('--annotation_path',type=str,default=r'T:/shujuji/DIOR/Annotations/',help='config file')
parser.add_argument('--annotation_path',type=str,default=r'G:/NWPU VHR-10 dataset/ground truth/',help='config file')
parser.add_argument('--dataset',type=str,default='NWPU',help='config file')
parser.add_argument('--save',type=str,default='G:/NWPU VHR-10 dataset/instances_train2017.json',help='config file')
args=parser.parse_args()
def load_json(path):
with open(path,'r') as f:
json_dict=json.load(f)
for i in json_dict:
print(i)
print(json_dict['annotations'])
def save_json(dict,path):
print('SAVE_JSON...')
with open(path,'w') as f:
json.dump(dict,f)
print('SUCCESSFUL_SAVE_JSON:',path)
def load_image(path):
img=cv2.imread(path)
return img.shape[0],img.shape[1]
def generate_categories_dict(category): #ANNOTATIONS_DICT=['image_id','iscrowd','area','bbox','category_id','id']
print('GENERATE_CATEGORIES_DICT...')
return [{
CATEGORIES_DICT[0]:category.index(x)+1,CATEGORIES_DICT[1]:x} for x in category] #CATEGORIES_DICT=['id','name']
def generate_images_dict(imagelist,image_path,start_image_id=11725): #IMAGES_DICT=['file_name','height','width','id']
print('GENERATE_IMAGES_DICT...')
images_dict=[]
with tqdm(total=len(imagelist)) as load_bar:
for x in imagelist: #x就是图片的名称
#print(start_image_id)
dict={
IMAGES_DICT[0]:x,IMAGES_DICT[1]:load_image(image_path+x)[0],\
IMAGES_DICT[2]:load_image(image_path+x)[1],IMAGES_DICT[3]:imagelist.index(x)+start_image_id}
load_bar.update(1)
images_dict.append(dict)
return images_dict
def DIOR_Dataset(image_path,annotation_path,start_image_id=11725,start_id=0):
categories_dict=generate_categories_dict(DIOR_CATEGORIES) #CATEGORIES_DICT=['id':,1'name':golffield......] id从1开始
imgname=os.listdir(image_path)
images_dict=generate_images_dict(imgname,image_path,start_image_id) #IMAGES_DICT=['file_name','height','width','id'] id从0开始的
print('GENERATE_ANNOTATIONS_DICT...') #生成cooc的注记 ANNOTATIONS_DICT=['image_id','iscrowd','area','bbox','category_id','id']
annotations_dict=[]
id=start_id
for i in images_dict:
image_id=i['id']
print(image_id)
image_name=i['file_name']
annotation_xml=annotation_path+image_name.split('.')[0]+'.xml'
tree=ET.parse(annotation_xml)
root=tree.getroot()
for j in root.findall('object'):
category=j.find('name').text
category_id=DIOR_CATEGORIES.index(category) #字典的索引,是从1开始的
x_min=float(j.find('bndbox').find('xmin').text)
y_min=float(j.find('bndbox').find('ymin').text)
w=float(j.find('bndbox').find('xmax').text)-x_min
h=float(j.find('bndbox').find('ymax').text)-y_min
area = w * h
bbox = [x_min, y_min, w, h]
dict = {
'image_id': image_id, 'iscrowd': 0, 'area': area, 'bbox': bbox, 'category_id': category_id,
'id': id}
annotations_dict.append(dict)
id=id+1
print('SUCCESSFUL_GENERATE_DIOR_JSON')
return {
COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}
def NWPU_Dataset(image_path,annotation_path,start_image_id=0,start_id=0):
categories_dict=generate_categories_dict(NWPU_CATEGORIES)
imgname=os.listdir(image_path)
images_dict=generate_images_dict(imgname,image_path,start_image_id)
print('GENERATE_ANNOTATIONS_DICT...')
annotations_dict=[]
id=start_id
for i in images_dict:
image_id=i['id']
image_name=i['file_name']
annotation_txt=annotation_path+image_name.split('.')[0]+'.txt'
txt=open(annotation_txt,'r')
lines=txt.readlines()
for j in lines:
if j=='\n':
continue
category_id=int(j.split(',')[4])
category=NWPU_CATEGORIES[category_id-1]
print(category_id,' ',category)
x_min=float(j.split(',')[0].split('(')[1])
y_min=float(j.split(',')[1].split(')')[0])
w=float(j.split(',')[2].split('(')[1])-x_min
h=float(j.split(',')[3].split(')')[0])-y_min
area=w*h
bbox=[x_min,y_min,w,h]
dict = {
'image_id': image_id, 'iscrowd': 0, 'area': area, 'bbox': bbox, 'category_id': category_id,
'id': id}
id=id+1
annotations_dict.append(dict)
print('SUCCESSFUL_GENERATE_NWPU_JSON')
return {
COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}
def YOLO_Dataset(image_path,annotation_path,start_image_id=0,start_id=0):
categories_dict=generate_categories_dict(YOLO_CATEGORIES)
imgname=os.listdir(image_path)
images_dict=generate_images_dict(imgname,image_path)
print('GENERATE_ANNOTATIONS_DICT...')
annotations_dict=[]
id=start_id
for i in images_dict:
image_id=i['id']
image_name=i['file_name']
W,H=i['width'],i['height']
annotation_txt=annotation_path+image_name.split('.')[0]+'.txt'
txt=open(annotation_txt,'r')
lines=txt.readlines()
for j in lines:
category_id=int(j.split(' ')[0])+1
category=YOLO_CATEGORIES
x=float(j.split(' ')[1])
y=float(j.split(' ')[2])
w=float(j.split(' ')[3])
h=float(j.split(' ')[4])
x_min=(x-w/2)*W
y_min=(y-h/2)*H
w=w*W
h=h*H
area=w*h
bbox=[x_min,y_min,w,h]
dict={
'image_id':image_id,'iscrowd':0,'area':area,'bbox':bbox,'category_id':category_id,'id':id}
annotations_dict.append(dict)
id=id+1
print('SUCCESSFUL_GENERATE_YOLO_JSON')
return {
COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}
def RSOD_Dataset(image_path,annotation_path,start_image_id=0,start_id=0):
categories_dict=generate_categories_dict(RSOD_CATEGORIES)
imgname=os.listdir(image_path)
images_dict=generate_images_dict(imgname,image_path,start_image_id)
print('GENERATE_ANNOTATIONS_DICT...')
annotations_dict=[]
id=start_id
for i in images_dict:
image_id=i['id']
image_name=i['file_name']
annotation_txt=annotation_path+image_name.split('.')[0]+'.txt'
txt=open(annotation_txt,'r')
lines=txt.readlines()
for j in lines:
category=j.split('\t')[1]
category_id=RSOD_CATEGORIES.index(category)+1
x_min=float(j.split('\t')[2])
y_min=float(j.split('\t')[3])
w=float(j.split('\t')[4])-x_min
h=float(j.split('\t')[5])-y_min
area = w * h
bbox = [x_min, y_min, w, h]
dict = {
'image_id': image_id, 'iscrowd': 0, 'area': area, 'bbox': bbox, 'category_id': category_id,
'id': id}
annotations_dict.append(dict)
id=id+1
print('SUCCESSFUL_GENERATE_RSOD_JSON')
return {
COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}
if __name__=='__main__':
dataset=args.dataset #数据集名字
save=args.save #json的保存路径
image_path=args.image_path #对于coco是图片的路径
annotation_path=args.annotation_path #coco的annotation路径
if dataset=='RSOD':
json_dict=RSOD_Dataset(image_path,annotation_path,0)
if dataset=='NWPU':
json_dict=NWPU_Dataset(image_path,annotation_path,0)
if dataset=='DIOR':
json_dict=DIOR_Dataset(image_path,annotation_path,11725)
if dataset=='YOLO':
json_dict=YOLO_Dataset(image_path,annotation_path,0)
save_json(json_dict,save)
Run to generate instances_train2017.json, and then modify the path to generate instances_train2017.json.
(After being reminded by the friends in the comment area, there is a problem and has been modified)
If your own data set is in voc format, you can use the code given in reference link 1.
2. Environment configuration
Activate the environment where the current project is located, and use the following command to complete the environment configuration:
pip install -r requirements.txt
3. pth file generation
First download the pre-training file , the official provides DETR and DETR-DC5 models, two models, the latter uses hole convolution in the fifth layer of the backbone network, choose one to download
Create a new py file, mydataset.py, use the following code to modify num_classes to the number of your own categories + 1
import torch
pretrained_weights = torch.load('detr-r50-e632da11.pth')
#NWPU数据集,10类
num_class = 11 #类别数+1,1为背景
pretrained_weights["model"]["class_embed.weight"].resize_(num_class+1, 256)
pretrained_weights["model"]["class_embed.bias"].resize_(num_class+1)
torch.save(pretrained_weights, "detr-r50_%d.pth"%num_class)
Run, generate detr-r50_11.pth
4. Parameter modification
Modify the models/detr.py file, in the build() function, you can comment out the code in the red box, and directly set num_classes to the number of your own categories + 1
5. Training
Some parameters need to be set for training, which can be modified directly in the main.py file or by using the command line
[a] Modify the main.py file directly
Modify the training parameters such as epochs, lr, batch_size of the main.py file
Modify your own data set path:
Set the output path:
modify resume to be your own pre-training weight file path
[b] command line
python main.py --dataset_file "coco" --coco_path "/home/NWPUVHR-10" --epoch 300 --lr=1e-4 --batch_size=8 --num_workers=4 --output_dir="outputs" --resume="detr_r50_11.pth"
Run the main.py file
2. Bugs that appear
1. KeyError: ‘area’
area = torch.tensor([obj[“area”] for obj in anno])
KeyError: 'area'
can be seen from the located statement, it should be that the dictionary obj does not have a key named area, check my label file, There is indeed a problem with the generation, there is no area information in it, it is regenerated, and it runs successfully.
3. Evaluation and Forecast
1. Evaluation
In fact, during the training process, DETR will automatically perform an accuracy evaluation at each epoch. The evaluation results can be viewed directly in the output or generated intermediate files. You can also use main.py to evaluate the accuracy, and change –resume in the following code to the last For the model path output by an epoch, just change –coco_path to your own dataset path
python main.py --batch_size 6 --no_aux_loss --eval --resume /home/detr-main/outputs/checkpoint0299.pth --coco_path /home/NWPUVHR-10
Accuracy evaluation results:
After training for 300 epochs, the AP with IoU=0.5 is about 88%
2. Forecast
Use the code in reference link 3 to save the pictures to be predicted in a folder, and output the prediction results of all pictures at one time during prediction
The parameters that need to be modified are:
Backbone, the backbone I downloaded at the time was resnet50, modified (the backbone feature network that has been downloaded during training is the DETR weight file of Resnet50, placed in the main folder) the relevant parameters of the data set – coco_path is modified to your own data set
path
–
outputdir Change to the saved folder of the created prediction picture
–resume Change to the trained model file path
Change the image_file_path and image_path of the picture folder path to be predicted
ps: Since I ran with the server and could not return the picture, an error occurred, so I commented out these two sentences:
#cv2.imshow("images",image)
#cv2.waitKey(1)
It is possible to report an error, which is caused by not detecting the target when predicting one of your pictures:
Prediction result:
References:
1. Windows 10 reproduces DEtection TRansformers (DETR
2. How to use DETR (detection transformer) to train your own data set
3. Pytorch realizes the reasoning program of DETR