【深度学习】TensorFlow Object Detection API的使用

关于TensorFlow Object Detection API
相关连接
在Ubuntu16.04上的安装过程
在训练好的权重上对自己的数据集进行fine-tune
结语

关于TensorFlow Object Detection API

TensorFlow Object Detection API可以通过简单的配置，实现一些常见的目标检测网络，包括使用不同backbone的SSD、Faster-RCNN、Mask-RCNN等。即可以下载在一些常用公开数据集上训练好的权重文件，测试网络训练效果，也可以在这些权重基础上对自己的数据集上进行fine-tune。操作简单。

在Ubuntu16.04上的安装过程

clone工程到本地

git clone https://github.com/tensorflow/models.git

安装一些依赖

sudo apt-get install python-pil python-lxml python-tk
pip install --user Cython
pip install --user contextlib2
pip install --user matplotlib

参数- -user代表仅该用户的安装，安装后仅该用户可用
另外，还要安装protobuf，详细安装过程在【环境搭建】linux上编译安装caffe框架+Makefile.config文件详解中进行了说明

到工程的research文件夹下，执行

protoc object_detection/protos/*.proto --python_out=.

添加环境变量

export PYTHONPATH="/path/to/tensorflow/models/research":$PYTHONPATH
export PYTHONPATH="/path/to/tensorflow/models/research/slim":$PYTHONPATH

/path/to/tensorflow/models/就是clone的工程路径

安装cocoapi

git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make
cp -r pycocotools <path_to_tensorflow>/models/research/

测试是否安装成功
在工程路径中的research文件夹中执行

python object_detection/builders/model_builder_test.py

在这里插入图片描述

特别注意
TensorFlow的版本，官方要求

Tensorflow (>=1.12.0)

亲测1.7.0和1.11.0都会报错

在训练好的权重上对自己的数据集进行fine-tune

在Tensorflow detection model zoo中下载预训练的模型：
在工程路径下的research/object_detection/samples/configs中，有不同网络的配置文件样例：
下载的预训练模型压缩包里有pipeline.config，fine-tune在这个文件基础上进行修改
其中ssd_mobilenet_v2_coco中的pipeline.config内容如下：

model {
  ssd {
    num_classes: 1
    // 类别，更改成自己数据集的类别数(与caffe不同，这里不用加1，也就是不用考虑background)
    image_resizer {
      fixed_shape_resizer {
        height: 300
        // 训练图片要resize的height
        width: 300
        // 训练图片要resize的width
      }
    }
    feature_extractor {
      type: "ssd_mobilenet_v2"
      // 网络模型
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 3.99999989895e-05
          }
        }
        initializer {
          truncated_normal_initializer {
            mean: 0.0
            stddev: 0.0299999993294
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.999700009823
          center: true
          scale: true
          epsilon: 0.0010000000475
          train: true
        }
      }
      # batch_norm_trainable: true
      use_depthwise: true
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 3.99999989895e-05
            }
          }
          initializer {
            truncated_normal_initializer {
              mean: 0.0
              stddev: 0.0299999993294
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.999700009823
            center: true
            scale: true
            epsilon: 0.0010000000475
            train: true
          }
        }
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.800000011921
        kernel_size: 3
        box_code_size: 4
        apply_sigmoid_to_scores: false
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        // 创建anchor的特征图的数量
        min_scale: 0.20000000298
        max_scale: 0.949999988079
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        # aspect_ratios: 3.0
        # aspect_ratios: 0.333299994469
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 0.300000011921
        # iou_threshold: 0.600000023842
        iou_threshold: 0.5
        max_detections_per_class: 10000
        max_total_detections: 10000
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        // hard样本的最大数量，如果设置为0，则用mns阈值过滤后的所有样本进行训练。默认为64
        iou_threshold: 0.990000009537
        // 大于这个值，是正样本，小于这个值是负样本，默认为0.7
        loss_type: CLASSIFICATION
        // 对哪个loss使用hard样本抽样，默认是BOTH
        max_negatives_per_positive: 3
        // 负样本和正样本比例的最大值
        min_negatives_per_image: 3
        // 负样本的最少个数
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
  }
}
train_config {
  batch_size: 8
  // batch size
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    // min_object_covered = 1 [default=1.0]，默认随机裁剪至少包括一个gtbox的1.0的面积
    // min_aspect_ratio = 2 [default=0.75]，默认随机裁剪的长宽比在[0.75, 1.33]
    // max_aspect_ratio = 3 [default=1.33]
    // min_area = 4 [default=0.1]，默认随机裁剪的面积和原图的面积比例在[0.1, 1]
    // max_area = 5 [default=1.0]
    // overlap_thresh = 6 [default=0.3]，默认剪裁后图片中的gtbox，如果和原图的gtbox比例小于0.3，则去除这个gtbox
    // clip_boxes = 8 [default=true]，更新gtbox的位置信息
    // random_coef = 7 [default=0.0]，保留原图的概率默认是0.0
    }
  }
  optimizer {
    rms_prop_optimizer {
      learning_rate {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.001
          // 指数衰减学习率，初始学习率
          decay_steps: 15031
          decay_factor: 0.1
          // staircase = 4 [default = true]，默认使用阶梯下降的方式
        }
      }
      momentum_optimizer_value: 0.899999976158
      decay: 0.899999976158
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "预训练权重文件"
  // 解压后的下载的权重文件夹/model.ckpt
  num_steps: 60125
  // 最大迭代次数
  fine_tune_checkpoint_type: "detection"
}
train_input_reader {
  label_map_path: "labelmap文件"
  tf_record_input_reader {
    input_path: "训练集的.record文件"
  }
}
eval_config {
  num_examples: 2000
  max_evals: 10
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "labelmap文件"
  shuffle: false
  num_readers: 1
  tf_record_input_reader {
    input_path: "验证集的.record文件"
  }
}

配置文件和caffe的prototxt文件很像，每一个关键字的功能可以在research/object_detection/protos/中的proto文件中查询得知

其中关于SSD anchor生成的源码在multiple_grid_anchor_generator.py中：

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Generates grid anchors on the fly corresponding to multiple CNN layers.

Generates grid anchors on the fly corresponding to multiple CNN layers as
described in:
"SSD: Single Shot MultiBox Detector"
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu, Alexander C. Berg
(see Section 2.2: Choosing scales and aspect ratios for default boxes)
"""

import numpy as np

import tensorflow as tf

from object_detection.anchor_generators import grid_anchor_generator
from object_detection.core import anchor_generator
from object_detection.core import box_list_ops


class MultipleGridAnchorGenerator(anchor_generator.AnchorGenerator):
  """Generate a grid of anchors for multiple CNN layers."""

  def __init__(self,
               box_specs_list,
               base_anchor_size=None,
               anchor_strides=None,
               anchor_offsets=None,
               clip_window=None):
    """Constructs a MultipleGridAnchorGenerator.

    To construct anchors, at multiple grid resolutions, one must provide a
    list of feature_map_shape_list (e.g., [(8, 8), (4, 4)]), and for each grid
    size, a corresponding list of (scale, aspect ratio) box specifications.

    For example:
    box_specs_list = [[(.1, 1.0), (.1, 2.0)],  # for 8x8 grid
                      [(.2, 1.0), (.3, 1.0), (.2, 2.0)]]  # for 4x4 grid

    To support the fully convolutional setting, we pass grid sizes in at
    generation time, while scale and aspect ratios are fixed at construction
    time.

    Args:
      box_specs_list: list of list of (scale, aspect ratio) pairs with the
        outside list having the same number of entries as feature_map_shape_list
        (which is passed in at generation time).
      base_anchor_size: base anchor size as [height, width]
                        (length-2 float numpy or Tensor, default=[1.0, 1.0]).
                        The height and width values are normalized to the
                        minimum dimension of the input height and width, so that
                        when the base anchor height equals the base anchor
                        width, the resulting anchor is square even if the input
                        image is not square.
      anchor_strides: list of pairs of strides in pixels (in y and x directions
        respectively). For example, setting anchor_strides=[(25, 25), (50, 50)]
        means that we want the anchors corresponding to the first layer to be
        strided by 25 pixels and those in the second layer to be strided by 50
        pixels in both y and x directions. If anchor_strides=None, they are set
        to be the reciprocal of the corresponding feature map shapes.
      anchor_offsets: list of pairs of offsets in pixels (in y and x directions
        respectively). The offset specifies where we want the center of the
        (0, 0)-th anchor to lie for each layer. For example, setting
        anchor_offsets=[(10, 10), (20, 20)]) means that we want the
        (0, 0)-th anchor of the first layer to lie at (10, 10) in pixel space
        and likewise that we want the (0, 0)-th anchor of the second layer to
        lie at (25, 25) in pixel space. If anchor_offsets=None, then they are
        set to be half of the corresponding anchor stride.
      clip_window: a tensor of shape [4] specifying a window to which all
        anchors should be clipped. If clip_window is None, then no clipping
        is performed.

    Raises:
      ValueError: if box_specs_list is not a list of list of pairs
      ValueError: if clip_window is not either None or a tensor of shape [4]
    """
    if isinstance(box_specs_list, list) and all(
        [isinstance(list_item, list) for list_item in box_specs_list]):
      self._box_specs = box_specs_list
    else:
      raise ValueError('box_specs_list is expected to be a '
                       'list of lists of pairs')
    if base_anchor_size is None:
      base_anchor_size = [256, 256]
    self._base_anchor_size = base_anchor_size
    self._anchor_strides = anchor_strides
    self._anchor_offsets = anchor_offsets
    if clip_window is not None and clip_window.get_shape().as_list() != [4]:
      raise ValueError('clip_window must either be None or a shape [4] tensor')
    self._clip_window = clip_window
    self._scales = []
    self._aspect_ratios = []
    for box_spec in self._box_specs:
      if not all([isinstance(entry, tuple) and len(entry) == 2
                  for entry in box_spec]):
        raise ValueError('box_specs_list is expected to be a '
                         'list of lists of pairs')
      scales, aspect_ratios = zip(*box_spec)
      self._scales.append(scales)
      self._aspect_ratios.append(aspect_ratios)

    for arg, arg_name in zip([self._anchor_strides, self._anchor_offsets],
                             ['anchor_strides', 'anchor_offsets']):
      if arg and not (isinstance(arg, list) and
                      len(arg) == len(self._box_specs)):
        raise ValueError('%s must be a list with the same length '
                         'as self._box_specs' % arg_name)
      if arg and not all([
          isinstance(list_item, tuple) and len(list_item) == 2
          for list_item in arg
      ]):
        raise ValueError('%s must be a list of pairs.' % arg_name)

  def name_scope(self):
    return 'MultipleGridAnchorGenerator'

  def num_anchors_per_location(self):
    """Returns the number of anchors per spatial location.

    Returns:
      a list of integers, one for each expected feature map to be passed to
      the Generate function.
    """
    return [len(box_specs) for box_specs in self._box_specs]

  def _generate(self, feature_map_shape_list, im_height=1, im_width=1):
    """Generates a collection of bounding boxes to be used as anchors.

    The number of anchors generated for a single grid with shape MxM where we
    place k boxes over each grid center is k*M^2 and thus the total number of
    anchors is the sum over all grids. In our box_specs_list example
    (see the constructor docstring), we would place two boxes over each grid
    point on an 8x8 grid and three boxes over each grid point on a 4x4 grid and
    thus end up with 2*8^2 + 3*4^2 = 176 anchors in total. The layout of the
    output anchors follows the order of how the grid sizes and box_specs are
    specified (with box_spec index varying the fastest, followed by width
    index, then height index, then grid index).

    Args:
      feature_map_shape_list: list of pairs of convnet layer resolutions in the
        format [(height_0, width_0), (height_1, width_1), ...]. For example,
        setting feature_map_shape_list=[(8, 8), (7, 7)] asks for anchors that
        correspond to an 8x8 layer followed by a 7x7 layer.
      im_height: the height of the image to generate the grid for. If both
        im_height and im_width are 1, the generated anchors default to
        absolute coordinates, otherwise normalized coordinates are produced.
      im_width: the width of the image to generate the grid for. If both
        im_height and im_width are 1, the generated anchors default to
        absolute coordinates, otherwise normalized coordinates are produced.

    Returns:
      boxes_list: a list of BoxLists each holding anchor boxes corresponding to
        the input feature map shapes.

    Raises:
      ValueError: if feature_map_shape_list, box_specs_list do not have the same
        length.
      ValueError: if feature_map_shape_list does not consist of pairs of
        integers
    """
    if not (isinstance(feature_map_shape_list, list)
            and len(feature_map_shape_list) == len(self._box_specs)):
      raise ValueError('feature_map_shape_list must be a list with the same '
                       'length as self._box_specs')
    if not all([isinstance(list_item, tuple) and len(list_item) == 2
                for list_item in feature_map_shape_list]):
      raise ValueError('feature_map_shape_list must be a list of pairs.')

    im_height = tf.cast(im_height, dtype=tf.float32)
    im_width = tf.cast(im_width, dtype=tf.float32)

    if not self._anchor_strides:
      anchor_strides = [(1.0 / tf.cast(pair[0], dtype=tf.float32),
                         1.0 / tf.cast(pair[1], dtype=tf.float32))
                        for pair in feature_map_shape_list]
    else:
      anchor_strides = [(tf.cast(stride[0], dtype=tf.float32) / im_height,
                         tf.cast(stride[1], dtype=tf.float32) / im_width)
                        for stride in self._anchor_strides]
    if not self._anchor_offsets:
      anchor_offsets = [(0.5 * stride[0], 0.5 * stride[1])
                        for stride in anchor_strides]
    else:
      anchor_offsets = [(tf.cast(offset[0], dtype=tf.float32) / im_height,
                         tf.cast(offset[1], dtype=tf.float32) / im_width)
                        for offset in self._anchor_offsets]

    for arg, arg_name in zip([anchor_strides, anchor_offsets],
                             ['anchor_strides', 'anchor_offsets']):
      if not (isinstance(arg, list) and len(arg) == len(self._box_specs)):
        raise ValueError('%s must be a list with the same length '
                         'as self._box_specs' % arg_name)
      if not all([isinstance(list_item, tuple) and len(list_item) == 2
                  for list_item in arg]):
        raise ValueError('%s must be a list of pairs.' % arg_name)

    anchor_grid_list = []
    min_im_shape = tf.minimum(im_height, im_width)
    scale_height = min_im_shape / im_height
    scale_width = min_im_shape / im_width
    if not tf.contrib.framework.is_tensor(self._base_anchor_size):
      base_anchor_size = [
          scale_height * tf.constant(self._base_anchor_size[0],
                                     dtype=tf.float32),
          scale_width * tf.constant(self._base_anchor_size[1],
                                    dtype=tf.float32)
      ]
    else:
      base_anchor_size = [
          scale_height * self._base_anchor_size[0],
          scale_width * self._base_anchor_size[1]
      ]
    for feature_map_index, (grid_size, scales, aspect_ratios, stride,
                            offset) in enumerate(
                                zip(feature_map_shape_list, self._scales,
                                    self._aspect_ratios, anchor_strides,
                                    anchor_offsets)):
      tiled_anchors = grid_anchor_generator.tile_anchors(
          grid_height=grid_size[0],
          grid_width=grid_size[1],
          scales=scales,
          aspect_ratios=aspect_ratios,
          base_anchor_size=base_anchor_size,
          anchor_stride=stride,
          anchor_offset=offset)
      if self._clip_window is not None:
        tiled_anchors = box_list_ops.clip_to_window(
            tiled_anchors, self._clip_window, filter_nonoverlapping=False)
      num_anchors_in_layer = tiled_anchors.num_boxes_static()
      if num_anchors_in_layer is None:
        num_anchors_in_layer = tiled_anchors.num_boxes()
      anchor_indices = feature_map_index * tf.ones([num_anchors_in_layer])
      tiled_anchors.add_field('feature_map_index', anchor_indices)
      anchor_grid_list.append(tiled_anchors)

    return anchor_grid_list


def create_ssd_anchors(num_layers=6,
                       min_scale=0.2,
                       max_scale=0.95,
                       scales=None,
                       aspect_ratios=(1.0, 2.0, 3.0, 1.0 / 2, 1.0 / 3),
                       interpolated_scale_aspect_ratio=1.0,
                       base_anchor_size=None,
                       anchor_strides=None,
                       anchor_offsets=None,
                       reduce_boxes_in_lowest_layer=True):
  """Creates MultipleGridAnchorGenerator for SSD anchors.

  This function instantiates a MultipleGridAnchorGenerator that reproduces
  ``default box`` construction proposed by Liu et al in the SSD paper.
  See Section 2.2 for details. Grid sizes are assumed to be passed in
  at generation time from finest resolution to coarsest resolution --- this is
  used to (linearly) interpolate scales of anchor boxes corresponding to the
  intermediate grid sizes.

  Anchors that are returned by calling the `generate` method on the returned
  MultipleGridAnchorGenerator object are always in normalized coordinates
  and clipped to the unit square: (i.e. all coordinates lie in [0, 1]x[0, 1]).

  Args:
    num_layers: integer number of grid layers to create anchors for (actual
      grid sizes passed in at generation time)
    min_scale: scale of anchors corresponding to finest resolution (float)
    max_scale: scale of anchors corresponding to coarsest resolution (float)
    scales: As list of anchor scales to use. When not None and not empty,
      min_scale and max_scale are not used.
    aspect_ratios: list or tuple of (float) aspect ratios to place on each
      grid point.
    interpolated_scale_aspect_ratio: An additional anchor is added with this
      aspect ratio and a scale interpolated between the scale for a layer
      and the scale for the next layer (1.0 for the last layer).
      This anchor is not included if this value is 0.
    base_anchor_size: base anchor size as [height, width].
      The height and width values are normalized to the minimum dimension of the
      input height and width, so that when the base anchor height equals the
      base anchor width, the resulting anchor is square even if the input image
      is not square.
    anchor_strides: list of pairs of strides in pixels (in y and x directions
      respectively). For example, setting anchor_strides=[(25, 25), (50, 50)]
      means that we want the anchors corresponding to the first layer to be
      strided by 25 pixels and those in the second layer to be strided by 50
      pixels in both y and x directions. If anchor_strides=None, they are set to
      be the reciprocal of the corresponding feature map shapes.
    anchor_offsets: list of pairs of offsets in pixels (in y and x directions
      respectively). The offset specifies where we want the center of the
      (0, 0)-th anchor to lie for each layer. For example, setting
      anchor_offsets=[(10, 10), (20, 20)]) means that we want the
      (0, 0)-th anchor of the first layer to lie at (10, 10) in pixel space
      and likewise that we want the (0, 0)-th anchor of the second layer to lie
      at (25, 25) in pixel space. If anchor_offsets=None, then they are set to
      be half of the corresponding anchor stride.
    reduce_boxes_in_lowest_layer: a boolean to indicate whether the fixed 3
      boxes per location is used in the lowest layer.

  Returns:
    a MultipleGridAnchorGenerator
  """
  if base_anchor_size is None:
    base_anchor_size = [1.0, 1.0]
    // 正如proto文件中指明，base_anchor_height和base_anchor_width默认值都是1.0
  box_specs_list = []
  if scales is None or not scales:
  // 我们的config文件里也没有给scales赋值
    scales = [min_scale + (max_scale - min_scale) * i / (num_layers - 1)
              for i in range(num_layers)] + [1.0]
    // scales = [0.20000000298, 0.3499999999998, 0.49999999701959996, 0.6499999940394, 0.7999999910591999, 0.949999988079, 1.0]
    // 也就是说，第一个值是min_scale + 0，第二个值是min_scale + (max_scale - min_scale) * 1 /(num_layers - 1)
    // 第三个值是min_scale + (max_scale - min_scale) * 2 /(num_layers - 1)
    // ...
    // 倒数第二个值是min_scale + (max_scale - min_scale) *  (num_layers - 1) /(num_layers - 1) = max_scales
    // 最后一个值是1.0
  else:
    # Add 1.0 to the end, which will only be used in scale_next below and used
    # for computing an interpolated scale for the largest scale in the list.
    scales += [1.0]

  for layer, scale, scale_next in zip(
      range(num_layers), scales[:-1], scales[1:]):
      // layer是[0, 1, 2 ... num_layers - 1]
      // scale是[0.20000000298, 0.3499999999998, 0.49999999701959996, 0.6499999940394, 0.7999999910591999, 0.949999988079]
      // sacle_next是[0.3499999999998, 0.49999999701959996, 0.6499999940394, 0.7999999910591999, 0.949999988079, 1.0]
      // 都是num_layers个
    layer_box_specs = []
    if layer == 0 and reduce_boxes_in_lowest_layer:
    // proto中指明reduce_boxes_in_lowest_layer默认为True
      layer_box_specs = [(0.1, 1.0), (scale, 2.0), (scale, 0.5)]
      // layer_box_specs = [(0.1, 1.0), (0.20000000298, 2.0), (0.20000000298, 0.5)]
    else:
      for aspect_ratio in aspect_ratios:
        layer_box_specs.append((scale, aspect_ratio))
      # Add one more anchor, with a scale between the current scale, and the
      # scale for the next layer, with a specified aspect ratio (1.0 by
      # default).
      if interpolated_scale_aspect_ratio > 0.0:
      // proto中指明interpolated_scale_aspect_ratio默认为1.0
        layer_box_specs.append((np.sqrt(scale*scale_next),
                                interpolated_scale_aspect_ratio))
    box_specs_list.append(layer_box_specs)
    // box_specs_list中包含num_layers个元素，就是代表每一层上的anchor信息
    // 1.[(0.1, 1.0), (0.20000000298, 2.0), (0.20000000298, 0.5)]
    // 2.[(0.3499999999998, ratio1), (0.3499999999998, ratio2), ... (0.3499999999998和0.49999999701959996的乘积开平方, 1.0)]
    // 3.[(0.49999999701959996, ratio1), ...]
    // 4.[...]
    // 5.[...]
    // 6.[(0.949999988079, ratio1), ... (0.949999988079的开平方, 1.0)]

	// 其中列表中每一个元组，第一个值是anchor的尺寸信息，第二个值是anchor的宽高比，列表中元组的个数就是feature map上一个点上anchor的个数

  return MultipleGridAnchorGenerator(box_specs_list, base_anchor_size,
                                     anchor_strides, anchor_offsets)

拷贝工程路径下的research/object_detection/data/文件夹中的任意一个.pbtxt文件，更改成适合自己的数据集：

item {
  id: 1
  name: 'person'
}

数据集处理：
这里分两个文件夹，将图片和xml文件分开存放。
先把VOC数据集格式的xml文件转化为csv文件
执行这个脚本有两个参数，第一个参数：存放xml文件的路径；第二个参数：输出的csv文件，后缀为.csv

# -*- coding: utf-8 -*-
 
import os, sys
import glob
import pandas as pd
import xml.etree.ElementTree as ET
 
def xml_to_csv(_path, _out_file):
    xml_list = []
    for each in os.listdir(_path):
        xml_file = os.path.join(_path, each)
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (xml_file.split(".")[0].split("/")[-1].strip() + ".jpg",
                     int(root.find('size').find("width").text),
                     int(root.find('size').find("height").text),
                     member.find("name").text,
                     int(member.find("bndbox").find("xmin").text),
                     int(member.find("bndbox").find("ymin").text),
                     int(member.find("bndbox").find("xmax").text),
                     int(member.find("bndbox").find("ymax").text))
            xml_list.append(value)
    
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    xml_df.to_csv(_out_file, index=None)
    print('Successfully converted xml to csv.')
 
if __name__ == '__main__':
    xml_to_csv(sys.argv[1], sys.argv[2])

然后生成tf-record数据
执行这个脚本有三个参数，第一个参数：- -csv_input，csv文件；第二个参数：- -output_path，tf-record文件，后缀.record；第三个参数：- -img_path，图片所在路径

# -*- coding: utf-8 -*-
 
import os
import io
import pandas as pd
import tensorflow as tf
 
from PIL import Image
from object_detection.utils import dataset_util
 
flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('img_path', '', 'Path to image')
FLAGS = flags.FLAGS
 

def class_text_to_int(row_label):
    if row_label == 'person':
        return 1
    else:
        None
 
 
def create_tf_example(row):
    full_path = os.path.join(FLAGS.img_path, '{}'.format(str(row['filename'])))
    with tf.gfile.GFile(full_path, 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size
 
    filename = row['filename'].encode('utf8')
    image_format = b'jpg'
    xmins = [row['xmin'] / width]
    xmaxs = [row['xmax'] / width]
    ymins = [row['ymin'] / height]
    ymaxs = [row['ymax'] / height]
    classes_text = [row['class'].encode('utf8')]
    classes = [class_text_to_int(row['class'])]
 
    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))

    return tf_example
 

def main(_):
	writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
	examples = pd.read_csv(FLAGS.csv_input)
	
	for index, row in examples.iterrows():
		tf_example = create_tf_example(row)
		writer.write(tf_example.SerializeToString())

	writer.close()
 

if __name__ == '__main__':
    tf.app.run()

开始fine-tune：
在research/object_detection/legacy/路径下

python3 train.py --train_dir='Directory to save the checkpoints and training summaries.' --pipeline_config_path='Path to a pipeline_pb2.TrainEvalPipelineConfig config file. If provided, other configs are ignored' --logtostderr=True

可能会遇到打印输出loss日志两遍的情况，解决方法如下：
在research/object_detection/utils/variables_helper.py文件中，注释掉如下信息：

else:
    logging.warning('Variable [%s] not available in checkpoint',
                    variable_name)

用tensorboard可视化learning rate、loss等信息：

tensorboard --logdir=events.out.tfevents文件所在路径

在本地浏览器上输入：

服务器IP地址:创建container时给tensorboard留的端口号（在docker中）
服务器IP地址:6006（非docker中）

进入tensorboard可视化界面

使用tensorboard时有时会报错：

locale.Error: unsupported locale setting

有时使用pip时也会出现这个错误，是语言设置的问题，在终端中执行如下命令解决：

export LC_ALL=C

在验证集上计算精度(mAP)
在research/object_detection/legacy/路径下

python3 eval.py --logtostderr=True --checkpoint_dir="Directory containing checkpoints to evaluate, typically set to `train_dir` used in the training job." --eval_dir='Directory to write eval summaries to.' --pipeline_config_path='Path to a pipeline_pb2.TrainEvalPipelineConfig config file. If provided, other configs are ignored'

ckpt文件转pb文件
在research/object_detection/路径下

python3 export_inference_graph.py --input_type=image_tensor --pipeline_config_path='Path to a pipeline_pb2.TrainEvalPipelineConfig config file.' --trained_checkpoint_prefix="Path to trained checkpoint, typically of the form 'path/to/model.ckpt-250000'" --output_directory='Path to write outputs.'

对特征提取器head branch 的featue map进行修改
拿ssd_mobilenet_v3特征提取器作为例子，在models/research/object_detection/models/ssd_mobilenet_v3_feature_extractor.py中进行修改：

在127行附近，

设置了head branch的个数：‘from_layer’列表value中的个数

设置了feature map是backbone中的，还是另外添加的：self._from_layer[0]、self._from_layer[1]就是backbone中的，‘’就是预占位的另外添加的

设置了feature map的channel深度：‘layer_depth’

127     feature_map_layout = {
128         'from_layer': [
129             self._from_layer[0], self._from_layer[1], '', '', '', ''
130         ],
131         'layer_depth': [-1, -1, 512, 256, 256, 128],
132         'use_depthwise': self._use_depthwise,
133         'use_explicit_padding': self._use_explicit_padding,
134     }

结语

如果您有修改意见或问题，欢迎留言或者通过邮箱和我联系。
手打很辛苦，如果我的文章对您有帮助，转载请注明出处。

Zhang_Chen_

发布了57 篇原创文章 · 获赞 19 · 访问量 2万+

私信关注

【深度学习】TensorFlow Object Detection API的使用

【深度学习】TensorFlow Object Detection API的使用

关于TensorFlow Object Detection API

相关连接

在Ubuntu16.04上的安装过程

在训练好的权重上对自己的数据集进行fine-tune

结语

猜你喜欢