使用DOTA数据集进行多类别任意方向遥感目标检测

博主目前正尝试使用DOTA数据集进行多类别任意方向遥感目标检测

一、DOTA数据集

论文github OBB -Faster-rcnn Deformable Model

1、已按照公式　完成xml　的转换

２、由于图片过大，不能直接送入网络,需要裁剪

二、训练阶段

use the DOTA_devkit to split the data into patches ang merge the results and visual data etc.

一些完整的目标被裁成两部分
a series of 1024×1024 patches from the original images with a stride set to 512.

如果 $U_{i}$ >0.7　保持原来的标注

否则 label difficult

三、代码修改

根据自身数据集的类别等修改　生成lmdb的过程

修改prototxt文件

# Modify the job name if you want.
job_name = "SSD_{}".format(resize)
# The name of the model. Modify it if you want.
model_name = "VGG_VOC0712_{}".format(job_name)

# Directory which stores the model .prototxt file.
save_dir = "models/VGGNet/VOC0712/{}".format(job_name)
# Directory which stores the snapshot of models.
snapshot_dir = "models/VGGNet/VOC0712/{}".format(job_name)
# Directory which stores the job script and log file.
job_dir = "jobs/VGGNet/VOC0712/{}".format(job_name)
# Directory which stores the detection results.
output_result_dir = "{}/data/VOCdevkit/results/VOC2007/{}/Main".format(os.environ['HOME'], job_name)

# model definition files.
train_net_file = "{}/train.prototxt".format(save_dir)
test_net_file = "{}/test.prototxt".format(save_dir)
deploy_net_file = "{}/deploy.prototxt".format(save_dir)
solver_file = "{}/solver.prototxt".format(save_dir)
# snapshot prefix.
snapshot_prefix = "{}/{}".format(snapshot_dir, model_name)
# job script path.
job_file = "{}/{}.sh".format(job_dir, model_name)

# Stores the test image names and sizes. Created by data/VOC0712/create_list.sh
name_size_file = "data/VOC0712/test_name_size.txt"
# The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.
pretrain_model = "models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"
# Stores LabelMapItem.
label_map_file = "data/VOC0712/labelmap_voc.prototxt"

测试阶段

crop后检测得出临时结果，再联合结果得出最后的检测结果

In the testing phase, first we send the cropped image patches to obtain temporary results and then
we combine the results together to restore the detecting results on the original image.

相关问题

修改类别导致的错误

 net.cpp:774] Cannot copy param 0 weights from layer 'conv4_3_norm_mbox_conf'; shape mismatch.  Source param shape is 40 512 3 5 (307200); target param shape is 320 512 3 5 (2457600). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

原先是两类　现在十六类

这是由于预训练网络的参数与当前模型架构对不上，只要把出现错误的层名字改了就ok了！！之前识别的是两类（background和text），然后源码每个点是有20个priorbox，所以num_output是40，现在我要识别16类，输出应该是320才对。

I have solved this problem by delete the file like "VGG_text_text_polygon_precise_fix_order_384x384_iter_120000.solverstate"

Delete all solverstate file and problem have solved .

Later, I realize that the true reason is that I have altered another caffemodel to train my data , so I can solve this problem!

 

The latest solvement is to rename the layer's name which you add, or you can change the layers'name in "model_libs.py".  It works!

But the train val is very low so I think this is a bad solvement ?Oh my god ~I don't know what I can do !
-----------------

针对　textbox++
在　model_libs.py的　CreateMultiBoxHead_multitask　函数中将

name = "{}_mbox_conf{}".format(from_layer, conf_postfix)

改为

name = "{}_mbox_conf1{}".format(from_layer, conf_postfix)

问题2

Train net output #0: mbox_loss = 0 (* 1 = 0 loss)

问题原因：　label 文件中的所有坐标　需要为整数

其它