本文链接： https://blog.csdn.net/weixin_43383164/article/details/102454389

文章目录

1. 配置

config.py

2. 层配置 layer_utils

2.1 anchor_target_layer.py
2.2 generate_anchors.py
2.3

2. 数据集

2.1 数据集配置

2.1.1 lib/datasets/imdb.py
2.1.2 lib/datasets/voc_eval.py
2.1.3 lib/datasets/ds_utils.py
2.1.4 lib/datasets/factory.py

2.2 数据集

2.2.1 对原有数据集的预处理
2.2.2 生成manipulated数据集

3. 网络

3.1 网络配置

1. 配置

config.py

FLAGS:类
FLAGS2：字典。储存基本信息。

FLAGS2={'pixel_means': array([[[102.9801, 115.9465, 122.7717]]]),
 'scales': (600,), 'test_scales': (600,),  
 'bbox_inside_weights': (1.0, 1.0, 1.0, 1.0), 'bbox_normalize_means': (0.0, 0.0, 0.0, 0.0), bbox_normalize_stds': (0.1, 0.1, 0.1, 0.1),
  'root_dir': '/content/drive/My Drive/Image_manipulation_detection-master', 'data_dir': '/content/drive/My Drive/Image_manipulation_detection-master/data'}

tf.app.flags.DEFINE_XXX：配置网络参数、训练参数、测试参数、RPN参数、Proposal参数、边界框参数、RoI参数、数据集参数等。
读取根目录、数据目录、权重目录等。

2. 层配置 layer_utils

2.1 anchor_target_layer.py

包含三个函数：
1.

def anchor_target_layer(rpn_cls_score, 		rpn分类得分
						gt_boxes, 			ground truth
						im_info, 			图像信息
						_feat_stride, 	
						all_anchors, 		所有anchor
						num_anchors):		anchor数量
return 	rpn_labels, 		rpn类别
		rpn_bbox_targets,		bbox大小参数
		rpn_bbox_inside_weights, rpn_bbox_outside_weights		bbox输入、输出权重

def _unmap(data, count, inds, fill=0): 反映射，对子集中的item找到原始item，进行参数填充
def _compute_targets(ex_rois, gt_rois): 返回偏移量,(target_dx, target_dy, target_dw, target_dh)

2.2 generate_anchors.py

包含五个函数，其中剩下四个函数都为第一个函数服务：

def generate_anchors(base_size=16, 					定义初始感受野为16*16
					 ratios=[0.5, 1, 2],			定义anchor比例为1:2, 1:1, 1:0.5
                     scales=2 ** np.arange(3, 6)):	定义三种面积大小，(16*8) x (16*8), 
                     							    (16*16) x (16*16), (16*32) x (16*32)
return anchors		返回每种ratio、每个scale生成的anchor,包含信息为(w, h, x_center, y_center)

2.3

2. 数据集

2.1 数据集配置

2.1.1 lib/datasets/imdb.py

包含一个类，class imdb(object):

输入为def __init__(self, name, classes=None):
将name、类别数量、类、图像索引、roidb_handler、roidb（字典，roidb = {‘boxes’:[…], ‘gt_overlaps’:[…], ‘gt_classes’:[…], ‘flipped’:[…]})、cache_path(FLAGS2中储存的数据路径）、图片数量等定义为属性。
定义self.roidb，为字典，记录RoI信息，{“boxes”，“gt_classes”， “gt_overlaps”， “flipped”， “seg_area”}。
merge_roidbs函数：将两输入列表合并。

	@staticmethod
    def merge_roidbs(a, b):
        assert len(a) == len(b)
        for i in range(len(a)):
            a[i]['boxes'] = np.vstack((a[i]['boxes'], b[i]['boxes']))
            a[i]['gt_classes'] = np.hstack((a[i]['gt_classes'],
                                            b[i]['gt_classes']))
            a[i]['gt_overlaps'] = scipy.sparse.vstack([a[i]['gt_overlaps'],
                                                       b[i]['gt_overlaps']])
            a[i]['seg_areas'] = np.hstack((a[i]['seg_areas'],
                                           b[i]['seg_areas']))
        return a

2.1.2 lib/datasets/voc_eval.py

包含三个函数：

def parse_rec(filename):
输入：filename；输出：objects = [{obj_1}, {obj_2}, ……, {obj_n}]。
分析指定文件，输出储存了每个Object信息的列表，列表中每个元素为每个Object的字典。字典键值：name, pose, truncated, difficult, [xmin, ymin, xmax, ymax]。
def voc_ap(rec, prec, use_07_metric=False):
输入：rec, prec；输出：VOC数据集的Average Precision
计算VOC数据集的Average Precision并输出。
def voc_eval(detpath, annopath, imagesetfile, classname, cachedir, ovthresh=0.5, use_07_metric=False):针对某一类图片的操作

detpath： 			 图片路径，其中文件以类别为单位；
annopath：			 注释路径，其中文件以图片为单位；
imagesetfile：		 文本文件，每一行包含一张图片的信息；
classname：			 类别名称；
cachedir：			 目录，用于缓存注释的目录；
ovthresh=0.5：	 	 重叠阈值；
use_07_metric=False：是否使用VOC07的11点AP。

输出：rec：召回率；prec：准确率；ap：Average Precision（由第2个函数计算得到）。

2.1.3 lib/datasets/ds_utils.py

两类函数：
（1）def unique_boxes(boxes, scale=1.0):返回array，记录box进行去重后的索引值
（2）维度含义变化

def xywh_to_xyxy(boxes):		改变输入boxes的维度含义，将[x1,y1,w,h]改变为[x1,y1,x2,y2]
def xyxy_to_xywh(boxes):		改变输入boxes的维度含义，将[x1,y1,x2,y2]改变为[x1,y1,w,h]
def validate_boxes(boxes, width=0, height=0):		检查上述两函数输出结果的正确性

2.1.4 lib/datasets/factory.py

__sets[name] = (lambda split=split, year=year: DIY_pascal_voc(split, year))
name: 
voc_2007/2012_train/val/trainval/test, 
coco_2014_train/val/minival/valminusminival/trainval, 
coco_2015_test/test-dev, 
DIY_dataset

两个函数：

def get_imdb(name): 返回__sets[name]
def list_imdbs(): 以列表形式返回__sets.keys()

2.2 数据集

2.2.1 对原有数据集的预处理

lib/datasets/pascal_voc.py, coco.py, DIY_pascal_voc.py
包含一个类：class pascal_voc(imdb)

输入为def __init__(self, image_set, year, devkit_path=None)：
函数：

def image_path_at(self, i):				返回第i张图片的路径
def image_path_from_index(self, index): 返回index对应图片的路径
def seg_path_at(self, i):				返回第i张图片的分割图片路径
def seg_path_from_index(self, index):	返回index对应图片的路径
def _load_image_set_index(self):		返回列表，以列表元素形式记录该class输的数据集中每张图片的图片信息（信息内容为图片索引/图片文件名）
def _load_seg_set_index(self):			返回列表，以列表元素形式记录该class输的数据集中每张图片的分割信息（信息内容为图片索引/图片文件名）
def _get_default_path(self):			返回数据集路径，Image_manipulation_detection-master/VOCdevkitxxxx，xxxx为年份
def gt_roidb(self):						返回列表，记录每个index对应的分割信息。
										分割信息为字典，在函数self._load_pascal_annotation中定义。
def _load_pascal_annotation(self, index):返回字典，记录 'boxes'， 'gt_classes'， 'gt_overlaps'， 'flipped'， 'seg_areas'

2.2.2 生成manipulated数据集

lib/datasets/main_create_training_set.py
由VOCdevkit2007生成DIY_datasets_VOC2007，图片文件保存在JPEGImages中，注释文件保存在Annotations中。

Learning Rich Features for Image Manipulation Dection论文复现