TF OD SSD API架构及源码解析

				版权声明:文章不提供转载					https://blog.csdn.net/AECHO1/article/details/81196498		
				https://blog.csdn.net/aecho1/article/details/81196498'		</div>
							<link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/ck_htmledit_views-f57960eb32.css">
							            <div id="content_views" class="markdown_views">
						<!-- flowchart 箭头图标 勿删 -->
						<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path></svg>
						<p><strong>声明:本文禁止转载</strong></p>

模型定义代码

ssd框架代码:research/object_detection/meta_architectures/ssd_meta_arch.py
该文件负责ssd框架的定义,后续基于该代码进行展开说明各个模块的具体实现

ssd配置文件:research/object_detection/samples/configs/ssd_mobilenet_v1_coco.config
该文件是models的配置文件,选取mobilenet_v1作为特征提取网络,后续基于这个配置文件展开描述

I. Predict实现过程

下面通过1-4说明predict的实现过程,1-3是相关模块,4是meta_arch中的实现流程。

1.FeatureExtractor

research/object_detection/models/ssd_mobilenet_v1_feature_extractor.py

1.1 结构定义如下:

feature_map_layout = {
     'from_layer': ['Conv2d_11_pointwise', 'Conv2d_13_pointwise', '', '',
                    '', ''],
     'layer_depth': [-1, -1, 512, 256, 256, 128],
     'use_explicit_padding': self._use_explicit_padding,
     'use_depthwise': self._use_depthwise,
 }
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

1.2 参数说明

‘from_layer’:从mobilenet中提取的namescop的名称,如果为空则表示新生成的。其value为list,长度为6,也就是6个不同尺度的featuremap。
‘layer_depth’:表示channel的深度,为空,表示从原有的net中继承。
‘use_explicit_padding’:如果使能,选择valid pading,并在valid padding之前先做一次fixed padding,其目的是为了让经过卷积后的size与使用same padding的大小一致。

使用same padding与fixed padding的差异,参考stackover flow
使用sampe padding时:

Case 1:

                pad|              |pad
    inputs:      0 |1  2  3  4  5 |0 
                |_______|
                        |_______|
                            |_______|
Case 2:

                                    |pad
    inputs:      1  2  3  4  5  6 |0 
                |_______|
                        |_______|
                            |_______|

  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

使用fixed padding

Case 1:

                pad|              |pad
    inputs:      0 |1  2  3  4  5 |0 
                |_______|
                        |_______|
                            |_______|
Case 2:

                pad|                 |pad
    inputs:      0 |1  2  3  4  5  6 |0 
                |_______|
                        |_______|
                            |_______|
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

1.3 feature map实现

research/object_detection/models/feature_map_generators.py.pymulti_resolution_feature_maps()

1.4 输出feature map结构

name scope channel depth feature map size
Conv2d_11_pointwise 512 19x19
Conv2d_13_pointwise 1024 10x10
Conv2d_13_pointwise_2_Conv2d_2_3x3_s2_512 512 5x5
Conv2d_13_pointwise_2_Conv2d_3_3x3_s2_256 256 3x3
Conv2d_13_pointwise_2_Conv2d_4_3x3_s2_256 256 2x2
Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128 128 1x1

2. anchor生成

models/research/object_detection/anchor_generators/multiple_grid_anchor_generator.py

2.1 config文件中anchor配置如下

anchor_generator {
    ssd_anchor_generator {
    num_layers: 6
    min_scale: 0.2
    max_scale: 0.95
    aspect_ratios: 1.0
    aspect_ratios: 2.0
    aspect_ratios: 0.5
    aspect_ratios: 3.0
    aspect_ratios: 0.3333
    }
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

anchor生成中有这个限制条件没弄明白,为什么需要feature map的数量与每个位置anchor的数量相同?

if self.check_num_anchors and (
    len(feature_map_shape_list) != len(self.num_anchors_per_location())):
    raise ValueError('Number of feature maps is expected to equal the length '
                    'of `num_anchors_per_location`.')
  
  
  • 1
  • 2
  • 3
  • 4

2.2 anchor 实现

2.2.1 anchor builder实现

research/object_detection/anchor_generators/multiple_grid_anchor_generator.py
create_ssd_anchors,返回一个MultipleGridAnchorGenerator对象。主要实现box_specs_list

for layer, scale, scale_next in zip(range(num_layers), scales[:-1], scales[1:]):
    layer_box_specs = []
    if layer == 0 and reduce_boxes_in_lowest_layer:
        layer_box_specs = [(0.1, 1.0), (scale, 2.0), (scale, 0.5)]
    else:
        for aspect_ratio in aspect_ratios:
            layer_box_specs.append((scale, aspect_ratio))
        # Add one more anchor, with a scale between the current scale, and the
        # scale for the next layer, with a specified aspect ratio (1.0 by
        # default).
        if interpolated_scale_aspect_ratio > 0.0:
            layer_box_specs.append((np.sqrt(scale*scale_next), interpolated_scale_aspect_ratio))
    box_specs_list.append(layer_box_specs)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

其中’reduce_boxes_in_lowest_layer’如果值为’True’,第0个feature map(size = 19x19)的anchor参数则使用’layer_box_specs’的配置,否则使用默认配置。从上面可以看出,这个参数使能后,可以在第0层的feature map上减少anchor的数量。 这样做的原因应该是,其一,第0个feature map的size本身较大,对应到原图的精度也还不错,因此更多的anchor带来的精度提升不明显;其二,可以减小运算量

根据配置最终的 box_specs_list 为:

box_specs_list = [[(0.1,1),(0.2,2),(0.2,0.5)],
  [(0.35,1), (0.35,2), (0.35,0.5), (0.35,3), (0.35,0.333), (sqrt(0.35*0.50),1)],
  [(0.50,1), (0.50,2), (0.50,0.5), (0.50,3), (0.50,0.333), (sqrt(0.50*0.65),1)],
  [(0.65,1), (0.65,2), (0.65,0.5), (0.65,3), (0.65,0.333), (sqrt(0.65*0.80),1)],
  [(0.80,1), (0.80,2), (0.80,0.5), (0.80,3), (0.80,0.333), (sqrt(0.80*0.95),1)],
  [(0.95,1), (0.95,2), (0.95,0.5), (0.95,3), (0.95,0.333), (sqrt(0.95*1.00),1)]]
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

2.2.2 anchor generator实现

research/object_detection/anchor_generators/grid_anchor_generator.py,tile_anchors() 函数。
输入变量为每层的feature map size,scales,aspect_ratios等参数,返回的变量为一个BoxList对象。

  # 第一个feature map,传入参数如下
  # grid_height = grid_width = 19
  # aspect_ratios = [0.1,0.2,0.2]
  # scales =  [1,2,0.5]
  # anchor_stride = (1/19,1/19)
  # anchor_offset = (1/38,1/38)
  ratio_sqrts = tf.sqrt(aspect_ratios)
  # 计算宽高
  heights = scales / ratio_sqrts * base_anchor_size[0]
  widths = scales * ratio_sqrts * base_anchor_size[1]

  # Get a grid of box centers
  y_centers = tf.to_float(tf.range(grid_height))
  y_centers = y_centers * anchor_stride[0] + anchor_offset[0]
  x_centers = tf.to_float(tf.range(grid_width))
  x_centers = x_centers * anchor_stride[1] + anchor_offset[1]
  # x_centers = y_centers = [1/38,3/38,5/38,...,37/38]
  x_centers, y_centers = ops.meshgrid(x_centers, y_centers)
  # 获取x,y meshgrid
  widths_grid, x_centers_grid = ops.meshgrid(widths, x_centers)
  heights_grid, y_centers_grid = ops.meshgrid(heights, y_centers)

  bbox_centers = tf.stack([y_centers_grid, x_centers_grid], axis=3)
  bbox_sizes = tf.stack([heights_grid, widths_grid], axis=3)
  bbox_centers = tf.reshape(bbox_centers, [-1, 2])
  bbox_sizes = tf.reshape(bbox_sizes, [-1, 2])
  bbox_corners = _center_size_bbox_to_corners_bbox(bbox_centers, bbox_sizes)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27

bbox_corners的size为[grid_height*grid_height*num_anchors_per_location,4]。
比如第0层的featuremap生成的bbox_corners的size为[19*19*3,4] = [1083,4]

3.predictor实现

3.1 preditor 主要配置:

box_predictor {
  convolutional_box_predictor {
    min_depth: 0
    max_depth: 0
    num_layers_before_predictor: 0
    use_dropout: false
    dropout_keep_probability: 0.8
    kernel_size: 1
    box_code_size: 4
    apply_sigmoid_to_scores: false
  }
}

  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

3.2 实现

research/object_detection/core/box_predictor.py
采用的方法为 convolutional_box_predictor,对应代码里的对象为ConvolutionalBoxPredictor,具体实现方法为_predict。
使用1x1的卷积得到合适size的featuremap。

# box_encodings 的size为 (BatchSize,featuremap_size,featuremap_size,4 * num_boxes)
box_encodings = slim.conv2d(
                  net, num_predictions_per_location * self._box_code_size,
                  [self._kernel_size, self._kernel_size],
                  scope='BoxEncodingPredictor')
# class_predictions_with_background的size为 (BatchSize,featuremap_size,featuremap_size,num_class_slots * num_boxes)
# num_class_slots = 实际分类数目 + 1,加入了是否是background的分类项
class_predictions_with_background = slim.conv2d(
                  net, num_predictions_per_location * num_class_slots,
                  [self._kernel_size, self._kernel_size],
                  scope='ClassPredictor',
                  biases_initializer=tf.constant_initializer(
                      self._class_prediction_bias_init))


# box_encodings reshape : [N,19,19,12] --> [N,1083,1,4] (第0层feature map)
# class_predictions_with_background, reshape : [N,19,19,3*classes] --> [N,1083,classes] (第0层feature map)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

4.predict在ssd_meta_arch实现过程

上面1,2,3介绍了主要功能的实现,ssd_meta_arch中依赖这些
主体代码:research/object_detection/meta_architectures/ssd_meta_arch.pypredict() 函数

4.1 提取feature map

with tf.variable_scope(None, self._extract_features_scope,[preprocessed_inputs]):
  #获取feature maps
  feature_maps = self._feature_extractor.extract_features(preprocessed_inputs)

#获取每层feature map的size
feature_map_spatial_dims = self._get_feature_map_spatial_dims(feature_maps)
#获取输入image的shape
image_shape = shape_utils.combined_static_and_dynamic_shape(preprocessed_inputs)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

4.2获取anchor信息

# 1.获取anchor, ssd_anchor_generator --> multiple_grid_anchor_generator -->grid_anchor_gerator
# 2.将box_lists转为concatenated_boxlist,也就是原来boxlist数量为featuremap的个数,现在为1个。
# featuremap的index和每个featuremap的boxes的数量记录在'feature_map_index'中,boxes记录在'boxes'中
self._anchors = box_list_ops.concatenate(
    self._anchor_generator.generate(
        feature_map_spatial_dims,
        im_height=image_shape[1],
        im_width=image_shape[2]))
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

4.3 根据feature map和anchor信息,得到分类结果box_encodings和class_predictions_with_background

设batch size = 12.

# multiple_grid_anchor_generator --> num_anchors_per_location 返回每个featuremap的每个位置boxes的数量,即 [3,6,6,6,6]
# box_predictor -- > ConvolutionalBoxPredictor
prediction_dict = self._box_predictor.predict(
    feature_maps, self._anchor_generator.num_anchors_per_location())
# 新的 box_encodings 的size为[12, 1083+600+180+54+24+6, 4] --> [12, 2667, 4]
box_encodings = tf.squeeze(
    tf.concat(prediction_dict['box_encodings'], axis=1), axis=2)
# 新的 class_predictions_with_background 的size为(12, 1083+600+180+54+24+6, 6) --> [12,2667,6]
class_predictions_with_background = tf.concat(
    prediction_dict['class_predictions_with_background'], axis=1)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

最终 box_encodings的大小为[12, 2667, 4],class_predictions_with_background大小为[12,2667,6]

II 优化目标Loss实现

1.target_assigner实现

matcher的功能是找到与groundtruth匹配的box的index。
在介绍target assigner之前需要先介绍IOU计算、matcher等功能

1.1 _similarity_calc

research/object_detection/core/box_list_ops.pyiou() 函数

1.1.1 iou() 函数实现

def iou(boxlist1, boxlist2, scope=None):
  """Computes pairwise intersection-over-union between box collections.

  Args:
    boxlist1: BoxList holding N boxes
    boxlist2: BoxList holding M boxes
    scope: name scope.

  Returns:
    a tensor with shape [N, M] representing pairwise iou scores.
  """
  with tf.name_scope(scope, 'IOU'):
    intersections = intersection(boxlist1, boxlist2)
    areas1 = area(boxlist1)
    areas2 = area(boxlist2)
    unions = (
        tf.expand_dims(areas1, 1) + tf.expand_dims(areas2, 0) - intersections)
    return tf.where(
        tf.equal(intersections, 0.0),
        tf.zeros_like(intersections), tf.truediv(intersections, unions))
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

1.1.2 生成 similarity_matrix

代码在 research/object_detection/core/target_assigner.py的assign()函数中实现
_similarity_calc.compare()调用iou函数生成iou matrix

# 返回[num_groundtruth_boxes,num_anchors]大小的IOU矩阵
match_quality_matrix = self._similarity_calc.compare(groundtruth_boxes,anchors)
  
  
  • 1
  • 2

1.2 matcher

1.2.1 matcher配置

matcher {
  argmax_matcher {
    matched_threshold: 0.5
    unmatched_threshold: 0.5
    ignore_thresholds: false
    negatives_lower_than_unmatched: true
    force_match_for_each_row: true
  }
}
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

1.2.2 matcher实现

research/object_detection/matchers/argmax_matcher.py

主要实现在_match_when_rows_are_non_empty()中:
1.在每列上找最大的IOU,也即找每个anchor对应IOU最大的ground truth

matches = tf.argmax(similarity_matrix, 0, output_type=tf.int32)

matched_vals = tf.reduce_max(similarity_matrix, 0)
below_unmatched_threshold = tf.greater(self._unmatched_threshold,matched_vals)
between_thresholds = tf.logical_and(
    tf.greater_equal(matched_vals, self._unmatched_threshold),
    tf.greater(self._matched_threshold, matched_vals))
# 配置中 _negatives_lower_than_unmatched = True
if self._negatives_lower_than_unmatched:
    # 将小于 _unmatched_threshold 的 index值设为-1
    matches = self._set_values_using_indicator(matches,below_unmatched_threshold,-1)
    # 将大于 _unmatched_threshold 且小于 _matched_threshold 的 index值设为-2
    matches = self._set_values_using_indicator(matches,between_thresholds,-2)
else:
    matches = self._set_values_using_indicator(matches,below_unmatched_threshold,-2)
    matches = self._set_values_using_indicator(matches,between_thresholds,-1)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

2. 在每行上找最大的IOU,也即找每个ground truth对应IOU最大的anchor

similarity_matrix_shape = shape_utils.combined_static_and_dynamic_shape(similarity_matrix)
force_match_column_ids = tf.argmax(similarity_matrix, 1,output_type=tf.int32)
# 对行上的index进行深度为anchors的数量的one hot展开
force_match_column_indicators = tf.one_hot(force_match_column_ids, depth=similarity_matrix_shape[1])
# force_match_row_ids 得到的index表示的实际意义为:ground truth和anchor互为最大值的index。
force_match_row_ids = tf.argmax(force_match_column_indicators, 0,output_type=tf.int32)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

3. 最终返回的matches

# 转换为bool类型变量
force_match_column_mask = tf.cast(tf.reduce_max(force_match_column_indicators, 0), tf.bool)

# force_match_column_mask = True的位置 final_matches 为force_match_row_ids ,否则为 matches
# 优先选择互为最大值的 index
final_matches = tf.where(force_match_column_mask,force_match_row_ids, matches)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

matches返回的是similarity_matrix中IOU最列大且大于iou_threshold的index。

1.3 target_assigner实现

1.3.1 target_assigner.assign的具体实现

research/object_detection/core/target_assigner.py

# 返回[num_groundtruth_boxes,num_anchors]大小的IOU矩阵
match_quality_matrix = self._similarity_calc.compare(groundtruth_boxes,
                                                           anchors)
# 返回 每个anchor 对应最大IOU的 groundtruth的index
match = self._matcher.match(match_quality_matrix, **params)

# 返回 matched anchors,其大小为[num_anchors,4],其中unmatched和ignore部分为[0,0,0,0]
reg_targets = self._create_regression_targets(anchors,groundtruth_boxes,match)

# 返回matched calsses,其中unmatched和ignore部分为0
cls_targets = self._create_classification_targets(groundtruth_labels,match)

reg_weights = self._create_regression_weights(match, groundtruth_weights)
cls_weights = self._create_classification_weights(match,groundtruth_weights)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

_create_regression_targets 的核心功能是提取matched的box,用于后续计算loss。
_create_classification_targets 的功能与前者相同。

1.3.2 batch_assign_targets

调用target_assigner.assign,对一个batch中每张照片进行处理,代码
research/object_detection/core/target_assigner.pybatch_assign_targets()

# 获取每张图片的 box,class,match_list等信息。
# match_list:match.Match的list,记录每个anchor matched的groundtruth的index,其中unmatch的值为-1,ignore值为-2
# batch_reg_targets ,记录位置回归的信息,其中unmatch和ignore的值为[0,0,0,0]
# batch_cls_targets ,记录分类的结果的信息,其中unmatch和ignore的值为0
for anchors, gt_boxes, gt_class_targets, gt_weights in zip(
    anchors_batch, gt_box_batch, gt_class_targets_batch, gt_weights_batch):
    (cls_targets, cls_weights, reg_targets,reg_weights, match) = target_assigner.assign(
        anchors, gt_boxes, gt_class_targets, gt_weights)
    cls_targets_list.append(cls_targets)
    cls_weights_list.append(cls_weights)
    reg_targets_list.append(reg_targets)
    reg_weights_list.append(reg_weights)
    match_list.append(match)
batch_cls_targets = tf.stack(cls_targets_list)
batch_cls_weights = tf.stack(cls_weights_list)
batch_reg_targets = tf.stack(reg_targets_list)
batch_reg_weights = tf.stack(reg_weights_list)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

2. hard_example_miner

在计算loss时,正负样本不均衡。因此在计算时,仅使用负样本中一部分样本来计算loss。hard_example_miner正是实现了该功能。

2.1 hard_example_miner 配置

hard_example_miner {
  # 最大输出样本数量
  num_hard_examples: 3000
  # NMS阈值
  iou_threshold: 0.99
  # 评价指标
  loss_type: CLASSIFICATION
  # 正负样本比例,负样本数量为正样本数量的3倍
  max_negatives_per_positive: 3
  # 最小负样本数量
  min_negatives_per_image: 0
}
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

2.2 hard_example_miner 实现

research/object_detection/core/losses.py

下面的代码是一张图片中的处理过程,HardExampleMiner.__caller__() 函数:

# NMS,非最大值抑制,根据class loss(image_losses)排序,然后剔除IOU大于threshold的box中class loss较小的box。
# 最大保存 num_hard_examples 个box。
selected_indices = tf.image.non_max_suppression(
    box_locations, image_losses, num_hard_examples, self._iou_threshold)

# 根据 _max_negatives_per_positive 确定正负样本数目
if self._max_negatives_per_positive is not None and match:
(selected_indices, num_positives,
    num_negatives) = self._subsample_selection_to_desired_neg_pos_ratio(
        selected_indices, match, self._max_negatives_per_positive,
        self._min_negatives_per_image)
# 记录每张图片中正负样本数目
num_positives_list.append(num_positives)
num_negatives_list.append(num_negatives)
#记录纳入统计的正负样本
mined_location_losses.append(
    tf.reduce_sum(tf.gather(location_losses[ind], selected_indices)))
mined_cls_losses.append(
        tf.reduce_sum(tf.gather(cls_losses[ind], selected_indices)))
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

处理完每张图片后,计算loss:

location_loss = tf.reduce_sum(tf.stack(mined_location_losses))
cls_loss = tf.reduce_sum(tf.stack(mined_cls_losses))
  
  
  • 1
  • 2

3. loss在ssd_meta_arch中的实现

代码在ssd_meta_arch.py中,loss()
loss配置:

loss {
  classification_loss {
    weighted_sigmoid {
    }
  }
  localization_loss {
    weighted_smooth_l1 {
    }
  }
  hard_example_miner {
    num_hard_examples: 3000
    iou_threshold: 0.99
    loss_type: CLASSIFICATION
    max_negatives_per_positive: 3
    min_negatives_per_image: 0
  }
  classification_weight: 1.0
  localization_weight: 1.0
}
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

3.1 调用target.assign,返回一个batch的matched信息。

match_list:matcher.Match的list,记录每个anchor matched的groundtruth的index,其中unmatch的值为-1,ignore值为-2
batch_reg_targets :记录位置回归的信息,其中unmatch和ignore的值为[0,0,0,0]
batch_cls_targets :记录分类的结果的信息,其中unmatch和ignore的值为0


(batch_cls_targets, batch_cls_weights, batch_reg_targets, batch_reg_weights, match_list) =
    self._assign_targets(
    self.groundtruth_lists(fields.BoxListFields.boxes),
    self.groundtruth_lists(fields.BoxListFields.classes),
    keypoints, weights)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

3.2 计算localization_loss和classification_loss

# losses.WeightedSmoothL1LocalizationLoss , tf.losses.huber_loss
# __init__() , default : delta = 1
# __caller__() --> _compute_loss()
# prediction_dict['box_encodings'], 大小为 [12,2667,4]
# location_losses 返回值size为 [12,2667]
location_losses = self._localization_loss(
    prediction_dict['box_encodings'],
    batch_reg_targets,
    ignore_nan_targets=True,
    weights=batch_reg_weights)

# losses.WeightedSigmoidClassificationLoss , sigmoid_cross_entropy_with_logits
# 采用sigmoid而非softmax,logit为4和100得到的分类概率可能差不多
# __init()
# __caller__() --> _compute_loss()
# prediction_dict['class_predictions_with_background'], 大小为 [12,2667,6]
# cls_losses 返回值size为 [12,2667]
cls_losses = ops.reduce_sum_trailing_dimensions(
    self._classification_loss(
        prediction_dict['class_predictions_with_background'],
        batch_cls_targets,
        weights=batch_cls_weights),
    ndims=2)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

3.3 调用hard_example_miner计算最终的loss

(localization_loss, classification_loss) = self._apply_hard_mining(
    location_losses, cls_losses, prediction_dict, match_list)

# 对loss进行归一化处理
# _normalize_loss_by_num_matches = True (in config file)
# _normalize_loc_loss_by_codesize = False (default in ssd.proto)
normalizer = tf.constant(1.0, dtype=tf.float32)
if self._normalize_loss_by_num_matches:
  normalizer = tf.maximum(tf.to_float(tf.reduce_sum(batch_reg_weights)),
                          1.0)
localization_loss_normalizer = normalizer
if self._normalize_loc_loss_by_codesize:
  localization_loss_normalizer *= self._box_coder.code_size
localization_loss = tf.multiply((self._localization_loss_weight /
                                 localization_loss_normalizer),
                                localization_loss,
                                name='localization_loss')
classification_loss = tf.multiply((self._classification_loss_weight /
                                   normalizer), classification_loss,
                                  name='classification_loss')

  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

III postprocess

在得到predict之后,最终的box还需要经过postprocess加工,得到最终的box

postprocess配置:

post_processing {
  batch_non_max_suppression {
    score_threshold: 1e-8
    iou_threshold: 0.6
    max_detections_per_class: 100
    max_total_detections: 100
  }
  score_converter: SIGMOID
}
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

实现过程:

# 提取predict结果
preprocessed_images = prediction_dict['preprocessed_inputs']
box_encodings = prediction_dict['box_encodings']
class_predictions = prediction_dict['class_predictions_with_background']
detection_boxes, detection_keypoints = self._batch_decode(box_encodings)
detection_boxes = tf.expand_dims(detection_boxes, axis=2)

# sigmoid,tf.sigmoid(class_predictions/logit_scale),激活函数
# logit_scale = 1 (default value in post_processing.proto)
detection_scores_with_background = self._score_conversion_fn(
    class_predictions)
detection_scores = tf.slice(detection_scores_with_background, [0, 0, 1],
                            [-1, -1, -1])

# 根据post_processing中nms的配置处理detection_scores和detection_boxes,得到最终的boxes和scores
# _non_max_suppression_fn = post_processing.batch_multiclass_non_max_suppression
(nmsed_boxes, nmsed_scores, nmsed_classes, _, nmsed_additional_fields,
 num_detections) = self._non_max_suppression_fn(
     detection_boxes,
     detection_scores,
     clip_window=self._compute_clip_window(
         preprocessed_images, true_image_shapes),
     additional_fields=additional_fields)
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
					<link href="https://csdnimg.cn/release/phoenix/mdeditor/markdown_views-258a4616f7.css" rel="stylesheet">
                  </div>
				版权声明:文章不提供转载					https://blog.csdn.net/AECHO1/article/details/81196498		
				https://blog.csdn.net/aecho1/article/details/81196498'		</div>
							<link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/ck_htmledit_views-f57960eb32.css">
							            <div id="content_views" class="markdown_views">
						<!-- flowchart 箭头图标 勿删 -->
						<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path></svg>
						<p><strong>声明:本文禁止转载</strong></p>

猜你喜欢

转载自blog.csdn.net/wc996789331/article/details/89444139
ssd
今日推荐