【深度学习】TensorFlow Object Detection API的使用
关于TensorFlow Object Detection API
TensorFlow Object Detection API可以通过简单的配置,实现一些常见的目标检测网络,包括使用不同backbone的SSD、Faster-RCNN、Mask-RCNN等。即可以下载在一些常用公开数据集上训练好的权重文件,测试网络训练效果,也可以在这些权重基础上对自己的数据集上进行fine-tune。操作简单。
相关连接
- TensorFlow Object Detection API主页:https://github.com/tensorflow/models/tree/master/research/object_detection
- 支持的网络,以及可下载的权重文件:https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
- clone工程以及下载页面:https://github.com/tensorflow/models
- 官方安装教程:https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md
在Ubuntu16.04上的安装过程
- clone工程到本地
git clone https://github.com/tensorflow/models.git
- 安装一些依赖
sudo apt-get install python-pil python-lxml python-tk
pip install --user Cython
pip install --user contextlib2
pip install --user matplotlib
参数- -user代表仅该用户的安装,安装后仅该用户可用
另外,还要安装protobuf,详细安装过程在【环境搭建】linux上编译安装caffe框架+Makefile.config文件详解中进行了说明
- 到工程的research文件夹下,执行
protoc object_detection/protos/*.proto --python_out=.
- 添加环境变量
export PYTHONPATH="/path/to/tensorflow/models/research":$PYTHONPATH
export PYTHONPATH="/path/to/tensorflow/models/research/slim":$PYTHONPATH
/path/to/tensorflow/models/就是clone的工程路径
- 安装cocoapi
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make
cp -r pycocotools <path_to_tensorflow>/models/research/
- 测试是否安装成功
在工程路径中的research文件夹中执行
python object_detection/builders/model_builder_test.py
- 特别注意
TensorFlow的版本,官方要求
Tensorflow (>=1.12.0)
亲测1.7.0和1.11.0都会报错
在训练好的权重上对自己的数据集进行fine-tune
- 在Tensorflow detection model zoo中下载预训练的模型:
- 在工程路径下的research/object_detection/samples/configs中,有不同网络的配置文件样例:
下载的预训练模型压缩包里有pipeline.config,fine-tune在这个文件基础上进行修改 - 其中ssd_mobilenet_v2_coco中的pipeline.config内容如下:
model {
ssd {
num_classes: 1
// 类别,更改成自己数据集的类别数(与caffe不同,这里不用加1,也就是不用考虑background)
image_resizer {
fixed_shape_resizer {
height: 300
// 训练图片要resize的height
width: 300
// 训练图片要resize的width
}
}
feature_extractor {
type: "ssd_mobilenet_v2"
// 网络模型
depth_multiplier: 1.0
min_depth: 16
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.99999989895e-05
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.0299999993294
}
}
activation: RELU_6
batch_norm {
decay: 0.999700009823
center: true
scale: true
epsilon: 0.0010000000475
train: true
}
}
# batch_norm_trainable: true
use_depthwise: true
}
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.99999989895e-05
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.0299999993294
}
}
activation: RELU_6
batch_norm {
decay: 0.999700009823
center: true
scale: true
epsilon: 0.0010000000475
train: true
}
}
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.800000011921
kernel_size: 3
box_code_size: 4
apply_sigmoid_to_scores: false
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
// 创建anchor的特征图的数量
min_scale: 0.20000000298
max_scale: 0.949999988079
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
# aspect_ratios: 3.0
# aspect_ratios: 0.333299994469
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 0.300000011921
# iou_threshold: 0.600000023842
iou_threshold: 0.5
max_detections_per_class: 10000
max_total_detections: 10000
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid {
}
}
hard_example_miner {
num_hard_examples: 3000
// hard样本的最大数量,如果设置为0,则用mns阈值过滤后的所有样本进行训练。默认为64
iou_threshold: 0.990000009537
// 大于这个值,是正样本,小于这个值是负样本,默认为0.7
loss_type: CLASSIFICATION
// 对哪个loss使用hard样本抽样,默认是BOTH
max_negatives_per_positive: 3
// 负样本和正样本比例的最大值
min_negatives_per_image: 3
// 负样本的最少个数
}
classification_weight: 1.0
localization_weight: 1.0
}
}
}
train_config {
batch_size: 8
// batch size
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
// min_object_covered = 1 [default=1.0],默认随机裁剪至少包括一个gtbox的1.0的面积
// min_aspect_ratio = 2 [default=0.75],默认随机裁剪的长宽比在[0.75, 1.33]
// max_aspect_ratio = 3 [default=1.33]
// min_area = 4 [default=0.1],默认随机裁剪的面积和原图的面积比例在[0.1, 1]
// max_area = 5 [default=1.0]
// overlap_thresh = 6 [default=0.3],默认剪裁后图片中的gtbox,如果和原图的gtbox比例小于0.3,则去除这个gtbox
// clip_boxes = 8 [default=true],更新gtbox的位置信息
// random_coef = 7 [default=0.0],保留原图的概率默认是0.0
}
}
optimizer {
rms_prop_optimizer {
learning_rate {
exponential_decay_learning_rate {
initial_learning_rate: 0.001
// 指数衰减学习率,初始学习率
decay_steps: 15031
decay_factor: 0.1
// staircase = 4 [default = true],默认使用阶梯下降的方式
}
}
momentum_optimizer_value: 0.899999976158
decay: 0.899999976158
epsilon: 1.0
}
}
fine_tune_checkpoint: "预训练权重文件"
// 解压后的下载的权重文件夹/model.ckpt
num_steps: 60125
// 最大迭代次数
fine_tune_checkpoint_type: "detection"
}
train_input_reader {
label_map_path: "labelmap文件"
tf_record_input_reader {
input_path: "训练集的.record文件"
}
}
eval_config {
num_examples: 2000
max_evals: 10
use_moving_averages: false
}
eval_input_reader {
label_map_path: "labelmap文件"
shuffle: false
num_readers: 1
tf_record_input_reader {
input_path: "验证集的.record文件"
}
}
配置文件和caffe的prototxt文件很像,每一个关键字的功能可以在research/object_detection/protos/中的proto文件中查询得知
- 其中关于SSD anchor生成的源码在multiple_grid_anchor_generator.py中:
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Generates grid anchors on the fly corresponding to multiple CNN layers.
Generates grid anchors on the fly corresponding to multiple CNN layers as
described in:
"SSD: Single Shot MultiBox Detector"
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu, Alexander C. Berg
(see Section 2.2: Choosing scales and aspect ratios for default boxes)
"""
import numpy as np
import tensorflow as tf
from object_detection.anchor_generators import grid_anchor_generator
from object_detection.core import anchor_generator
from object_detection.core import box_list_ops
class MultipleGridAnchorGenerator(anchor_generator.AnchorGenerator):
"""Generate a grid of anchors for multiple CNN layers."""
def __init__(self,
box_specs_list,
base_anchor_size=None,
anchor_strides=None,
anchor_offsets=None,
clip_window=None):
"""Constructs a MultipleGridAnchorGenerator.
To construct anchors, at multiple grid resolutions, one must provide a
list of feature_map_shape_list (e.g., [(8, 8), (4, 4)]), and for each grid
size, a corresponding list of (scale, aspect ratio) box specifications.
For example:
box_specs_list = [[(.1, 1.0), (.1, 2.0)], # for 8x8 grid
[(.2, 1.0), (.3, 1.0), (.2, 2.0)]] # for 4x4 grid
To support the fully convolutional setting, we pass grid sizes in at
generation time, while scale and aspect ratios are fixed at construction
time.
Args:
box_specs_list: list of list of (scale, aspect ratio) pairs with the
outside list having the same number of entries as feature_map_shape_list
(which is passed in at generation time).
base_anchor_size: base anchor size as [height, width]
(length-2 float numpy or Tensor, default=[1.0, 1.0]).
The height and width values are normalized to the
minimum dimension of the input height and width, so that
when the base anchor height equals the base anchor
width, the resulting anchor is square even if the input
image is not square.
anchor_strides: list of pairs of strides in pixels (in y and x directions
respectively). For example, setting anchor_strides=[(25, 25), (50, 50)]
means that we want the anchors corresponding to the first layer to be
strided by 25 pixels and those in the second layer to be strided by 50
pixels in both y and x directions. If anchor_strides=None, they are set
to be the reciprocal of the corresponding feature map shapes.
anchor_offsets: list of pairs of offsets in pixels (in y and x directions
respectively). The offset specifies where we want the center of the
(0, 0)-th anchor to lie for each layer. For example, setting
anchor_offsets=[(10, 10), (20, 20)]) means that we want the
(0, 0)-th anchor of the first layer to lie at (10, 10) in pixel space
and likewise that we want the (0, 0)-th anchor of the second layer to
lie at (25, 25) in pixel space. If anchor_offsets=None, then they are
set to be half of the corresponding anchor stride.
clip_window: a tensor of shape [4] specifying a window to which all
anchors should be clipped. If clip_window is None, then no clipping
is performed.
Raises:
ValueError: if box_specs_list is not a list of list of pairs
ValueError: if clip_window is not either None or a tensor of shape [4]
"""
if isinstance(box_specs_list, list) and all(
[isinstance(list_item, list) for list_item in box_specs_list]):
self._box_specs = box_specs_list
else:
raise ValueError('box_specs_list is expected to be a '
'list of lists of pairs')
if base_anchor_size is None:
base_anchor_size = [256, 256]
self._base_anchor_size = base_anchor_size
self._anchor_strides = anchor_strides
self._anchor_offsets = anchor_offsets
if clip_window is not None and clip_window.get_shape().as_list() != [4]:
raise ValueError('clip_window must either be None or a shape [4] tensor')
self._clip_window = clip_window
self._scales = []
self._aspect_ratios = []
for box_spec in self._box_specs:
if not all([isinstance(entry, tuple) and len(entry) == 2
for entry in box_spec]):
raise ValueError('box_specs_list is expected to be a '
'list of lists of pairs')
scales, aspect_ratios = zip(*box_spec)
self._scales.append(scales)
self._aspect_ratios.append(aspect_ratios)
for arg, arg_name in zip([self._anchor_strides, self._anchor_offsets],
['anchor_strides', 'anchor_offsets']):
if arg and not (isinstance(arg, list) and
len(arg) == len(self._box_specs)):
raise ValueError('%s must be a list with the same length '
'as self._box_specs' % arg_name)
if arg and not all([
isinstance(list_item, tuple) and len(list_item) == 2
for list_item in arg
]):
raise ValueError('%s must be a list of pairs.' % arg_name)
def name_scope(self):
return 'MultipleGridAnchorGenerator'
def num_anchors_per_location(self):
"""Returns the number of anchors per spatial location.
Returns:
a list of integers, one for each expected feature map to be passed to
the Generate function.
"""
return [len(box_specs) for box_specs in self._box_specs]
def _generate(self, feature_map_shape_list, im_height=1, im_width=1):
"""Generates a collection of bounding boxes to be used as anchors.
The number of anchors generated for a single grid with shape MxM where we
place k boxes over each grid center is k*M^2 and thus the total number of
anchors is the sum over all grids. In our box_specs_list example
(see the constructor docstring), we would place two boxes over each grid
point on an 8x8 grid and three boxes over each grid point on a 4x4 grid and
thus end up with 2*8^2 + 3*4^2 = 176 anchors in total. The layout of the
output anchors follows the order of how the grid sizes and box_specs are
specified (with box_spec index varying the fastest, followed by width
index, then height index, then grid index).
Args:
feature_map_shape_list: list of pairs of convnet layer resolutions in the
format [(height_0, width_0), (height_1, width_1), ...]. For example,
setting feature_map_shape_list=[(8, 8), (7, 7)] asks for anchors that
correspond to an 8x8 layer followed by a 7x7 layer.
im_height: the height of the image to generate the grid for. If both
im_height and im_width are 1, the generated anchors default to
absolute coordinates, otherwise normalized coordinates are produced.
im_width: the width of the image to generate the grid for. If both
im_height and im_width are 1, the generated anchors default to
absolute coordinates, otherwise normalized coordinates are produced.
Returns:
boxes_list: a list of BoxLists each holding anchor boxes corresponding to
the input feature map shapes.
Raises:
ValueError: if feature_map_shape_list, box_specs_list do not have the same
length.
ValueError: if feature_map_shape_list does not consist of pairs of
integers
"""
if not (isinstance(feature_map_shape_list, list)
and len(feature_map_shape_list) == len(self._box_specs)):
raise ValueError('feature_map_shape_list must be a list with the same '
'length as self._box_specs')
if not all([isinstance(list_item, tuple) and len(list_item) == 2
for list_item in feature_map_shape_list]):
raise ValueError('feature_map_shape_list must be a list of pairs.')
im_height = tf.cast(im_height, dtype=tf.float32)
im_width = tf.cast(im_width, dtype=tf.float32)
if not self._anchor_strides:
anchor_strides = [(1.0 / tf.cast(pair[0], dtype=tf.float32),
1.0 / tf.cast(pair[1], dtype=tf.float32))
for pair in feature_map_shape_list]
else:
anchor_strides = [(tf.cast(stride[0], dtype=tf.float32) / im_height,
tf.cast(stride[1], dtype=tf.float32) / im_width)
for stride in self._anchor_strides]
if not self._anchor_offsets:
anchor_offsets = [(0.5 * stride[0], 0.5 * stride[1])
for stride in anchor_strides]
else:
anchor_offsets = [(tf.cast(offset[0], dtype=tf.float32) / im_height,
tf.cast(offset[1], dtype=tf.float32) / im_width)
for offset in self._anchor_offsets]
for arg, arg_name in zip([anchor_strides, anchor_offsets],
['anchor_strides', 'anchor_offsets']):
if not (isinstance(arg, list) and len(arg) == len(self._box_specs)):
raise ValueError('%s must be a list with the same length '
'as self._box_specs' % arg_name)
if not all([isinstance(list_item, tuple) and len(list_item) == 2
for list_item in arg]):
raise ValueError('%s must be a list of pairs.' % arg_name)
anchor_grid_list = []
min_im_shape = tf.minimum(im_height, im_width)
scale_height = min_im_shape / im_height
scale_width = min_im_shape / im_width
if not tf.contrib.framework.is_tensor(self._base_anchor_size):
base_anchor_size = [
scale_height * tf.constant(self._base_anchor_size[0],
dtype=tf.float32),
scale_width * tf.constant(self._base_anchor_size[1],
dtype=tf.float32)
]
else:
base_anchor_size = [
scale_height * self._base_anchor_size[0],
scale_width * self._base_anchor_size[1]
]
for feature_map_index, (grid_size, scales, aspect_ratios, stride,
offset) in enumerate(
zip(feature_map_shape_list, self._scales,
self._aspect_ratios, anchor_strides,
anchor_offsets)):
tiled_anchors = grid_anchor_generator.tile_anchors(
grid_height=grid_size[0],
grid_width=grid_size[1],
scales=scales,
aspect_ratios=aspect_ratios,
base_anchor_size=base_anchor_size,
anchor_stride=stride,
anchor_offset=offset)
if self._clip_window is not None:
tiled_anchors = box_list_ops.clip_to_window(
tiled_anchors, self._clip_window, filter_nonoverlapping=False)
num_anchors_in_layer = tiled_anchors.num_boxes_static()
if num_anchors_in_layer is None:
num_anchors_in_layer = tiled_anchors.num_boxes()
anchor_indices = feature_map_index * tf.ones([num_anchors_in_layer])
tiled_anchors.add_field('feature_map_index', anchor_indices)
anchor_grid_list.append(tiled_anchors)
return anchor_grid_list
def create_ssd_anchors(num_layers=6,
min_scale=0.2,
max_scale=0.95,
scales=None,
aspect_ratios=(1.0, 2.0, 3.0, 1.0 / 2, 1.0 / 3),
interpolated_scale_aspect_ratio=1.0,
base_anchor_size=None,
anchor_strides=None,
anchor_offsets=None,
reduce_boxes_in_lowest_layer=True):
"""Creates MultipleGridAnchorGenerator for SSD anchors.
This function instantiates a MultipleGridAnchorGenerator that reproduces
``default box`` construction proposed by Liu et al in the SSD paper.
See Section 2.2 for details. Grid sizes are assumed to be passed in
at generation time from finest resolution to coarsest resolution --- this is
used to (linearly) interpolate scales of anchor boxes corresponding to the
intermediate grid sizes.
Anchors that are returned by calling the `generate` method on the returned
MultipleGridAnchorGenerator object are always in normalized coordinates
and clipped to the unit square: (i.e. all coordinates lie in [0, 1]x[0, 1]).
Args:
num_layers: integer number of grid layers to create anchors for (actual
grid sizes passed in at generation time)
min_scale: scale of anchors corresponding to finest resolution (float)
max_scale: scale of anchors corresponding to coarsest resolution (float)
scales: As list of anchor scales to use. When not None and not empty,
min_scale and max_scale are not used.
aspect_ratios: list or tuple of (float) aspect ratios to place on each
grid point.
interpolated_scale_aspect_ratio: An additional anchor is added with this
aspect ratio and a scale interpolated between the scale for a layer
and the scale for the next layer (1.0 for the last layer).
This anchor is not included if this value is 0.
base_anchor_size: base anchor size as [height, width].
The height and width values are normalized to the minimum dimension of the
input height and width, so that when the base anchor height equals the
base anchor width, the resulting anchor is square even if the input image
is not square.
anchor_strides: list of pairs of strides in pixels (in y and x directions
respectively). For example, setting anchor_strides=[(25, 25), (50, 50)]
means that we want the anchors corresponding to the first layer to be
strided by 25 pixels and those in the second layer to be strided by 50
pixels in both y and x directions. If anchor_strides=None, they are set to
be the reciprocal of the corresponding feature map shapes.
anchor_offsets: list of pairs of offsets in pixels (in y and x directions
respectively). The offset specifies where we want the center of the
(0, 0)-th anchor to lie for each layer. For example, setting
anchor_offsets=[(10, 10), (20, 20)]) means that we want the
(0, 0)-th anchor of the first layer to lie at (10, 10) in pixel space
and likewise that we want the (0, 0)-th anchor of the second layer to lie
at (25, 25) in pixel space. If anchor_offsets=None, then they are set to
be half of the corresponding anchor stride.
reduce_boxes_in_lowest_layer: a boolean to indicate whether the fixed 3
boxes per location is used in the lowest layer.
Returns:
a MultipleGridAnchorGenerator
"""
if base_anchor_size is None:
base_anchor_size = [1.0, 1.0]
// 正如proto文件中指明,base_anchor_height和base_anchor_width默认值都是1.0
box_specs_list = []
if scales is None or not scales:
// 我们的config文件里也没有给scales赋值
scales = [min_scale + (max_scale - min_scale) * i / (num_layers - 1)
for i in range(num_layers)] + [1.0]
// scales = [0.20000000298, 0.3499999999998, 0.49999999701959996, 0.6499999940394, 0.7999999910591999, 0.949999988079, 1.0]
// 也就是说,第一个值是min_scale + 0,第二个值是min_scale + (max_scale - min_scale) * 1 /(num_layers - 1)
// 第三个值是min_scale + (max_scale - min_scale) * 2 /(num_layers - 1)
// ...
// 倒数第二个值是min_scale + (max_scale - min_scale) * (num_layers - 1) /(num_layers - 1) = max_scales
// 最后一个值是1.0
else:
# Add 1.0 to the end, which will only be used in scale_next below and used
# for computing an interpolated scale for the largest scale in the list.
scales += [1.0]
for layer, scale, scale_next in zip(
range(num_layers), scales[:-1], scales[1:]):
// layer是[0, 1, 2 ... num_layers - 1]
// scale是[0.20000000298, 0.3499999999998, 0.49999999701959996, 0.6499999940394, 0.7999999910591999, 0.949999988079]
// sacle_next是[0.3499999999998, 0.49999999701959996, 0.6499999940394, 0.7999999910591999, 0.949999988079, 1.0]
// 都是num_layers个
layer_box_specs = []
if layer == 0 and reduce_boxes_in_lowest_layer:
// proto中指明reduce_boxes_in_lowest_layer默认为True
layer_box_specs = [(0.1, 1.0), (scale, 2.0), (scale, 0.5)]
// layer_box_specs = [(0.1, 1.0), (0.20000000298, 2.0), (0.20000000298, 0.5)]
else:
for aspect_ratio in aspect_ratios:
layer_box_specs.append((scale, aspect_ratio))
# Add one more anchor, with a scale between the current scale, and the
# scale for the next layer, with a specified aspect ratio (1.0 by
# default).
if interpolated_scale_aspect_ratio > 0.0:
// proto中指明interpolated_scale_aspect_ratio默认为1.0
layer_box_specs.append((np.sqrt(scale*scale_next),
interpolated_scale_aspect_ratio))
box_specs_list.append(layer_box_specs)
// box_specs_list中包含num_layers个元素,就是代表每一层上的anchor信息
// 1.[(0.1, 1.0), (0.20000000298, 2.0), (0.20000000298, 0.5)]
// 2.[(0.3499999999998, ratio1), (0.3499999999998, ratio2), ... (0.3499999999998和0.49999999701959996的乘积开平方, 1.0)]
// 3.[(0.49999999701959996, ratio1), ...]
// 4.[...]
// 5.[...]
// 6.[(0.949999988079, ratio1), ... (0.949999988079的开平方, 1.0)]
// 其中列表中每一个元组,第一个值是anchor的尺寸信息,第二个值是anchor的宽高比,列表中元组的个数就是feature map上一个点上anchor的个数
return MultipleGridAnchorGenerator(box_specs_list, base_anchor_size,
anchor_strides, anchor_offsets)
- 拷贝工程路径下的research/object_detection/data/文件夹中的任意一个.pbtxt文件,更改成适合自己的数据集:
item {
id: 1
name: 'person'
}
- 数据集处理:
这里分两个文件夹,将图片和xml文件分开存放。
先把VOC数据集格式的xml文件转化为csv文件
执行这个脚本有两个参数,第一个参数:存放xml文件的路径;第二个参数:输出的csv文件,后缀为.csv
# -*- coding: utf-8 -*-
import os, sys
import glob
import pandas as pd
import xml.etree.ElementTree as ET
def xml_to_csv(_path, _out_file):
xml_list = []
for each in os.listdir(_path):
xml_file = os.path.join(_path, each)
tree = ET.parse(xml_file)
root = tree.getroot()
for member in root.findall('object'):
value = (xml_file.split(".")[0].split("/")[-1].strip() + ".jpg",
int(root.find('size').find("width").text),
int(root.find('size').find("height").text),
member.find("name").text,
int(member.find("bndbox").find("xmin").text),
int(member.find("bndbox").find("ymin").text),
int(member.find("bndbox").find("xmax").text),
int(member.find("bndbox").find("ymax").text))
xml_list.append(value)
column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
xml_df = pd.DataFrame(xml_list, columns=column_name)
xml_df.to_csv(_out_file, index=None)
print('Successfully converted xml to csv.')
if __name__ == '__main__':
xml_to_csv(sys.argv[1], sys.argv[2])
然后生成tf-record数据
执行这个脚本有三个参数,第一个参数:- -csv_input,csv文件;第二个参数:- -output_path,tf-record文件,后缀.record;第三个参数:- -img_path,图片所在路径
# -*- coding: utf-8 -*-
import os
import io
import pandas as pd
import tensorflow as tf
from PIL import Image
from object_detection.utils import dataset_util
flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('img_path', '', 'Path to image')
FLAGS = flags.FLAGS
def class_text_to_int(row_label):
if row_label == 'person':
return 1
else:
None
def create_tf_example(row):
full_path = os.path.join(FLAGS.img_path, '{}'.format(str(row['filename'])))
with tf.gfile.GFile(full_path, 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size
filename = row['filename'].encode('utf8')
image_format = b'jpg'
xmins = [row['xmin'] / width]
xmaxs = [row['xmax'] / width]
ymins = [row['ymin'] / height]
ymaxs = [row['ymax'] / height]
classes_text = [row['class'].encode('utf8')]
classes = [class_text_to_int(row['class'])]
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
def main(_):
writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
examples = pd.read_csv(FLAGS.csv_input)
for index, row in examples.iterrows():
tf_example = create_tf_example(row)
writer.write(tf_example.SerializeToString())
writer.close()
if __name__ == '__main__':
tf.app.run()
- 开始fine-tune:
在research/object_detection/legacy/路径下
python3 train.py --train_dir='Directory to save the checkpoints and training summaries.' --pipeline_config_path='Path to a pipeline_pb2.TrainEvalPipelineConfig config file. If provided, other configs are ignored' --logtostderr=True
- 可能会遇到打印输出loss日志两遍的情况,解决方法如下:
在research/object_detection/utils/variables_helper.py文件中,注释掉如下信息:
else:
logging.warning('Variable [%s] not available in checkpoint',
variable_name)
- 用tensorboard可视化learning rate、loss等信息:
tensorboard --logdir=events.out.tfevents文件所在路径
在本地浏览器上输入:
- 服务器IP地址:创建container时给tensorboard留的端口号(在docker中)
- 服务器IP地址:6006(非docker中)
进入tensorboard可视化界面
- 使用tensorboard时有时会报错:
locale.Error: unsupported locale setting
有时使用pip时也会出现这个错误,是语言设置的问题,在终端中执行如下命令解决:
export LC_ALL=C
- 在验证集上计算精度(mAP)
在research/object_detection/legacy/路径下
python3 eval.py --logtostderr=True --checkpoint_dir="Directory containing checkpoints to evaluate, typically set to `train_dir` used in the training job." --eval_dir='Directory to write eval summaries to.' --pipeline_config_path='Path to a pipeline_pb2.TrainEvalPipelineConfig config file. If provided, other configs are ignored'
- ckpt文件转pb文件
在research/object_detection/路径下
python3 export_inference_graph.py --input_type=image_tensor --pipeline_config_path='Path to a pipeline_pb2.TrainEvalPipelineConfig config file.' --trained_checkpoint_prefix="Path to trained checkpoint, typically of the form 'path/to/model.ckpt-250000'" --output_directory='Path to write outputs.'
- 对特征提取器head branch 的featue map进行修改
拿ssd_mobilenet_v3特征提取器作为例子,在models/research/object_detection/models/ssd_mobilenet_v3_feature_extractor.py中进行修改:
在127行附近,
设置了head branch的个数:‘from_layer’列表value中的个数
设置了feature map是backbone中的,还是另外添加的:self._from_layer[0]、self._from_layer[1]就是backbone中的,‘’就是预占位的另外添加的
设置了feature map的channel深度:‘layer_depth’
127 feature_map_layout = {
128 'from_layer': [
129 self._from_layer[0], self._from_layer[1], '', '', '', ''
130 ],
131 'layer_depth': [-1, -1, 512, 256, 256, 128],
132 'use_depthwise': self._use_depthwise,
133 'use_explicit_padding': self._use_explicit_padding,
134 }
结语
如果您有修改意见或问题,欢迎留言或者通过邮箱和我联系。
手打很辛苦,如果我的文章对您有帮助,转载请注明出处。