我的AI之路(34)--py-faster-rcnn配置和训练

Faster-RCNN虽然有人觉得它过时了，但它毕竟是个经典，有的项目还在使用，部署环境是服务器的，一般只是把backbone由老旧的ZF，VGG换成了Resnet101之类的，对于需要部署到嵌入式环境进行前端边缘计算的，ZF之类的小网络由于识别效果还行而且占用资源较少，性价比高，也仍是不错的选择。把Faster-RCNN琢磨透了，对掌握其他相关RCNN网络是很有帮助的。

一般的文章都只讲了使用VOC 2007格式数据集训练Faster-RCNN模型时的配置修改，对于COCO格式数据集则没有涉及到，本文结合本人的实践经验，对分别使用这两种数据集训练py-faster-rcnn的配置修改都做说明，既是记录备忘也供分享。

关于py-faster-rcnn的安装部署，参见我的AI之路(31)--在Jetson Nano上试验安装部署py-faster-rcnn一文，虽然部署环境是在jetson nano嵌入式板子上，但是在PC和服务器上部署也差不多，我在本人的PC和公司AI服务器上都安装部署过，比在jetson nano上顺利多了，因为jetson nano使用的ARM64芯片平台，一般的PC和服务器使用多是X86-64平台，X86-64平台拥有丰富的软件支持环境源，应该比ARM平台上更容易安装部署。

首先是解决下载准备预训练模型和准备数据集的问题：

py-faster-rcnn/data/scripts/fetch_faster_rcnn_models.sh 是用来下载使用VOC2007数据集训练好了的模型的

py-faster-rcnn/data/scripts/fetch_imagenet_models.sh是用来下载使用ImageNet数据集训练好了的模型的

不过，拜托伟大的墙，使用这些脚本都下载不了，因为dropbox被禁止访问了:

https://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0

https://dl.dropbox.com/s/gstw7122padlf0l/imagenet_models.tgz?dl=0

如果不能翻墙，可以到网上搜索其他下载了的人上传的文件，imagenet训练出的模型文件网上好找，但是VOC2007训练出的模型文件难找，我好不容易从网上搜到了个bittrorrent地址，然后用迅雷下载到了：），我上传到这里(imagenet_models : part1,part2,part3,part4,part5 , faster_rcnn_models : part1, part2, part3, part4)供方便下载使用。

把上面的模型文件分别加压到py-faster-rcnn/data/faster_rcnn_models/下:

-rw-rw-r-- 1 xx xx 548317115 9月 29 16:59 VGG16_faster_rcnn_final.caffemodel
-rw-rw-r-- 1 xx xx 237181903 9月 29 16:59 ZF_faster_rcnn_final.caffemodel

和py-faster-rcnn/data/imagenet_models/下:

-rw-rw-r-- 1 xx xx 553432430 9月 29 16:59 VGG16.v2.caffemodel
-rw-rw-r-- 1 xx xx 349003349 9月 29 16:59 VGG_CNN_M_1024.v2.caffemodel
-rw-rw-r-- 1 xx xx 249432131 9月 29 16:59 ZF.v2.caffemodel

由于官方ImageNet数据集拥有图片数量远超VOC2007的，一般采用py-faster-rcnn/data/imagenet_models/下的这些训练好了的模型作为训练自己的模型的起始基础，也就是pretrained-model，例如，如果你的backbone采用ZF网络，那么使用ZF.v2.caffemodel作为预训练模型，如果你的backbone采用VGG16网络，那么使用VGG16.v2.caffemodel作为预训练模型，而训练py-faster-rcnn的脚本里根据你启动训练时执行的命令参数里指定的网络是ZF还是VGG16(VGG_CNN_M我就不说了，这个模型大小介于ZF和VGG16之间，至于使用Resnet之类网络作为backbone，这里先不说，后面写文章说到mmdetection框架时再说)，脚本里在训练启动时自动使用上面的对应模型文件的全路径来加载模型作为pretrained model开始训练。例如，看在py-faster-rcnn/experiments/scripts/faster_rcnn_end2end.sh里:

time ./tools/train_net.py --gpu ${GPU_ID} \
--solver models/${PT_DIR}/${NET}/faster_rcnn_end2end/solver.prototxt \
--weights data/imagenet_models/${NET}.v2.caffemodel \
--imdb ${TRAIN_IMDB} \
--iters ${ITERS} \
--cfg experiments/cfgs/faster_rcnn_end2end.yml \
${EXTRA_ARGS}

${NET}就是命令行传入的NET参数的值，例如ZF或VGG16或VGG_CNN_M_1024

好，解决了预训练模型的问题，下面解决准备数据集问题：

目前开源免费数据标注工具有不少，我们习惯用LableImage或者Labelme，前者标注图片保存后生成xml标注文件，后者标注图片保存后生成json文件，无论哪种文件，要用来做PASCAL VOC格式数据集或者COCO格式数据集都得自己写脚本处理或转换一下，另外数据预处理需要进行裁剪缩放或其他特定处理，这些一般都是需要自己写脚本，处理完毕的标注文件和图片按照一定路径存放和安装一定比例划分成训练数据集和测试、验证数据集。

py-faster-rcnn默认支持VOC 2007格式和COCO2014格式数据集，所以你的处理脚本需要把数据处理后生成VOC2007格式或者COCO2014格式。VOC2007下有Annotations、JPEGImages、ImageSets/Main 三个目录(路径),分别用来存放标注文件、图片文件、训练和测试、验证数据集的文件列表。COCO2014下有annotations、images/train2014、images/val2014三个目录(路径)，annotations下存放用于训练的标注文件instances_train2014.json和用于验证测试的标注文件instances_minival2014.json，images/train2014下存放用于训练的图片文件COCO_train2014_[0-9]{12}.jpg、images/val2014下存放用于验证测试的图片文件COCO_val2014_[0-9]{12}.jpg，[0-9]{12}表示12位数字，一般前面都是0,后面都是文件序号。

关于如何处理自己的数据和生成VOC2007或COCO2014格式数据集，这是个较大的话题，后面有空再写单独的文章再说。如果你只是用py-faster-rcnn跑起来玩玩，不是想要训练出自己特定用途的模型，那可以到相关官网网站上下载对应的数据集来跑即可，当然官网一般在国外，下载相当的慢，最好找个国内的下载地址，比如这里可以下载VOC2007数据集 https://pan.baidu.com/s/1mhMKKw4，会省很多时间。

准备好了自己的数据集后把数据集放到py-faster-rcnn下对应的位置，对于VOC2007格式的数据集，存放的位置应该是

py-faster-rcnn/data/VOCdevkit2007/VOC2007

对于COCO格式数据，存放的位置是:

py-faster-rcnn/data/coco

当然，你也可以把数据集存放在别的位置，然后建个链接链到上面这些默认路径位置。

准备好了数据集，下面就得修改py-faster-rcnn的脚本了，为什么要改脚本呢？三个原因：

一是py-faster-rcnn在处理图片像素数据的坐标时总是以(1,1)为原点，而不是通常的(0,0)，如果label标注的顶点的坐标x<1或者y<1(x=0或y=0，甚至出现负数，lableme就有这个Bug，不小心整体拖动了label框到边上越界了，labelme不报错并且能保存这种错误的负数坐标！！！)都会导致训练启动后在读取数据时出assertion错误而停止。

二是解决numpy版本不同引起的错误 TypeError: 'numpy.float64' object cannot be interpreted as an index.

三是py-faster-rcnn的脚本里参数配置是针对PASCAL VOC默认的20种分类(class)或者COCO默认的80种分类来设置的，而你自己需要用来做训练模型的数据的类别并没有那么多，一般就一种或几种，所以需要针对你的类别数调整相关的网络参数设置。

解决上述问题需修改下列代码：

（1）py-faster-rcnn/lib/datasets/imdb.py:

def append_flipped_images(self):
num_images = self.num_images
widths = self._get_widths()
for i in xrange(num_images):
boxes = self.roidb[i]['boxes'].copy()
oldx1 = boxes[:, 0].copy()
oldx2 = boxes[:, 2].copy()
boxes[:, 0] = widths[i] - oldx2 # - 1
boxes[:, 2] = widths[i] - oldx1 # - 1

（2）py-faster-rcnn/lib/datasets/pascal_voc.py:

class pascal_voc(imdb):
def __init__(self, image_set, year, devkit_path=None):
imdb.__init__(self, 'voc_' + year + '_' + image_set)
self._year = year
self._image_set = image_set
self._devkit_path = self._get_default_path() if devkit_path is None \
else devkit_path
self._data_path = os.path.join(self._devkit_path, 'VOC' + self._year)

#把默认的20个class注释掉，增加你自己的class:
self._classes = ('__background__', # always index 0
#'aeroplane', 'bicycle', 'bird', 'boat',
#'bottle', 'bus', 'car', 'cat', 'chair',
#'cow', 'diningtable', 'dog', 'horse',
#'motorbike', 'person', 'pottedplant',
#'sheep', 'sofa', 'train', 'tvmonitor'
'your_class')

...

def _load_pascal_annotation(self, index):

...

（3）py-faster-rcnn/lib/roi_data_layer/minibatch.py:

def get_minibatch(roidb, num_classes):
"""Given a roidb, construct a minibatch sampled from it."""
num_images = len(roidb)
# Sample random scales to use for each image in this batch
random_scale_inds = npr.randint(0, high=len(cfg.TRAIN.SCALES),
size=num_images)
assert(cfg.TRAIN.BATCH_SIZE % num_images == 0), \
'num_images ({}) must divide BATCH_SIZE ({})'. \
format(num_images, cfg.TRAIN.BATCH_SIZE)
rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
fg_rois_per_image = int(np.round(cfg.TRAIN.FG_FRACTION * rois_per_image))

def _get_bbox_regression_labels(bbox_target_data, num_classes):
...

for ind in inds:
cls = clss[ind]
start = int(4 * cls)
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights

（4）py-faster-rcnn/lib/rpn/proposal_target_layer.py:

def forward(self, bottom, top):
...

num_images = 1
rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
fg_rois_per_image = int(np.round(cfg.TRAIN.FG_FRACTION * rois_per_image))

def _get_bbox_regression_labels(bbox_target_data, num_classes):
...
for ind in inds:
cls = clss[ind]
start =int( 4 * cls)
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

这里需要说明的是，如果imdb.py和pascal_voc.py做了上面的修改，启动训练后运行到加载数据时，如果在imdb.py的append_flipped_images()里面仍旧在 assert (boxes[:, 2] >= boxes[:, 0]).all() 出发生assertion错误的话，那说明你的标注数据里肯定有标注点越界超出范围了(x或y的值为负数或者值大于图片本身的width或者height)，这是由于标注工具有bug导致这种非法数据也能保存，可以修改代码把它们忽略掉或者像有的文章建议把出错越界的x数据强制设置为0，但我更愿意把它们一一找出来在标注工具里手工修正以彻底解决问题,那怎么把这些数据找出来呢，可以在imdb.py的def append_flipped_images()里在 assert (boxes[:, 2] >= boxes[:, 0]).all() 的前面增加打印，把非法的xmin,xmax数据以及对应的flip之前的原始xmin,xmax数据打出来:

def append_flipped_images(self):

...

for i in xrange(num_images):
boxes = self.roidb[i]['boxes'].copy()
oldx1 = boxes[:, 0].copy()
oldx2 = boxes[:, 2].copy()
boxes[:, 0] = widths[i] - oldx2 # - 1
boxes[:, 2] = widths[i] - oldx1 # - 1

for b in range(len(boxes)):
if boxes[b][2] < boxes[b][0]:
print("==invalid data== i=",str(i),",widths[i]=",widths[i],"b=",str(b),"oldx1[b]=",str(oldx1[b]),"oldx2[b]=",str(oldx2[b]), "xmin=",str(boxes[b][0]),",xmax=",str(boxes[b][2]),",ymin=",str(boxes[b][1]),",ymax=",str(boxes[b][3]))
assert (boxes[:, 2] >= boxes[:, 0]).all()

知道了原始的xmin,xmax错误数据，然后在pascal_voc.py里的_load_pascal_annotation()增加对对应数据的打印即可找到他们所在的标注文件和原始的xmin,xmax等坐标数据：

def _load_pascal_annotation(self, index):

...

boxes = np.zeros((num_objs, 4), dtype=np.uint16)
gt_classes = np.zeros((num_objs), dtype=np.int32)
overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)
# "Seg" area for pascal is just the box area
seg_areas = np.zeros((num_objs), dtype=np.float32)

# Load object bounding boxes into a data frame.
for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
# Make pixel indexes 0-based
x1 = float(bbox.find('xmin').text) #- 1
y1 = float(bbox.find('ymin').text) #- 1
x2 = float(bbox.find('xmax').text) #- 1
y2 = float(bbox.find('ymax').text) #- 1
cls = self._class_to_ind[obj.find('name').text.lower().strip()]
boxes[ix, :] = [x1, y1, x2, y2] #这里x1,y1,x2,y2被自动由float类型转换成了uint16类型
#假设前面imdb.py的append_flipped_images（）里打印出的非法数据oldx1[b]，oldx2[b]分别是339,462

if boxes[ix,2]==462 and boxes[ix,0]==339 or boxes[ix,2] < boxes[ix,0] or boxes[ix,2]-1 < boxes[ix,0] -1 :
print("==invalid data== boxes[ix,2]=",str(boxes[ix,2]),",boxes[ix,0]=",str(boxes[ix,0]))
print("xmin=",bbox.find('xmin').text,",xmax=",bbox.find('xmax').text)
print("index=",index,",x1=",str(x1),",x2=",str(x2),"x1-1=",str(x1-1),"x2-1=",str(x2-1))
assert(False)

...

index就是非法标注数据所在的标注文件的名字，这样，当加载解析到这个标注文件时，会出assertion错误，到这个文件里找到错误的x1,x2数据手工修改即可，最好是用标注工具打开这个标注文件拖动label框达到修正的目的，然后保存即可。

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

对于解决class个数的问题，无论是使用COCO数据集还是PASCAL VOC数据集训练，无论是使用two-stage迭代训练还是end2end的训练方式，要修改的网络参数都是input-data层的num_classes，cls_score和bbox_pred层的num_ouput，需对用于训练或测试的prototxt文件做类似如下修改(假设只有一个class，以数据集为PASCAL VOC2007格式、backbone为ZF、使用迭代训练和数据集为COCO格式、backbone为VGG16、使用端到端训练两种组合方式为例)：

1. 数据集为PASCAL VOC2007格式、backbone为ZF、使用迭代训练

（1）py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_rpn_train.pt

name: "ZF"
layer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 2" #num_classes的值为class个数加1(background)
}
}

（2）py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_fast_rcnn_train.pt

name: "ZF"
layer {
name: 'data'
type: 'Python'
top: 'data'
top: 'rois'
top: 'labels'
top: 'bbox_targets'
top: 'bbox_inside_weights'
top: 'bbox_outside_weights'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 2" #num_classes的值为class个数加1(background)
}
}

layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
inner_product_param {
num_output: 2 #cls_core的num_output值与input_data layer的num_classes相等
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
inner_product_param {
num_output: 8 # bbox_pred的num_output的值为input_data layer的num_classes的值的4倍(显然是因为box有四个值嘛)
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}

（3）py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage2_rpn_train.pt

name: "ZF"
layer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 2"
}
}

（4）py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage2_fast_rcnn_train.pt

layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
inner_product_param {
num_output: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
inner_product_param {
num_output: 8
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}

（5）py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/faster_rcnn_test.pt

layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
inner_product_param {
num_output: 2
}
}
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
inner_product_param {
num_output: 8
}
}

2.数据集为COCO格式、backbone为VGG16、使用端到端训练:

（1）py-faster-rcnn/models/coco/VGG16/faster_rcnn_end2end/train.prototxt

layer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 2" #num_classes的值为class个数加1(background)
}
}

layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 2 #cls_core的num_output值与input_data layer的num_classes相等
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}

layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 8 # bbox_pred的num_output的值为input_data layer的num_classes的值的4倍(显然是因为box有四个值嘛)
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}

（2）py-faster-rcnn/models/coco/VGG16/faster_rcnn_end2end/test.prototxt

layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 8
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}

以上是必须修改的网络参数文件，lr、gamma、momentum等超参数和stepsize等参数在相应路径下名字里含有solver的相关.prototxt(或.pt)文件里，可根据需要做修改。

OK，准备完毕，开始训练，对于第一种组合，在py-faster-rcnn下执行下面命令启动(默认指定0号GPU，可以指定其它GPU)：

./experiments/scripts/faster_rcnn_alt_opt.sh 0 ZF pascal_voc

对于第二种组合，在在py-faster-rcnn下执行下面命令启动(默认指定0号GPU，可以指定其它GPU)：

./experiments/scripts/faster_rcnn_end2end.sh 0 VGG16 coco

对于其它数据集格式+backbone+alt/end2end 组合，仿照上面修改配置后执行对应的命令即可：

./experiments/scripts/<faster_rcnn_alt_opt.sh | faster_rcnn_end2end.sh> <GPU> <NET> <pascal_voc | coco>

Arnold-FY-Chen

发布了61 篇原创文章 · 获赞 90 · 访问量 11万+

私信关注

我的AI之路(34)--py-faster-rcnn配置和训练

猜你喜欢