文章目录

TF object_detection API

使用API训练数据集的一般流程

1. 创建tfrecord
2. 训练
3. 将训练得到的权重文件合并为*.pb文件
4. 评估

create_pascal_tf_record.py
train.py
trainer.py
pipline config
example-pets-eval
evaluator.py
错误记录

TF object_detection API

这个API是tensorflow官方提供的工程模板，之前曾经尝试过但没有跑通，这次看的比较深入，基本上熟悉了训练、测试、评估的操作流程。实验了VOC2007训练、Pet数据集训练等。下面记录的是研究过程中的一些总结。

使用API训练数据集的一般流程

适当修改下面的对应路径和配置文件

1. 创建tfrecord

python dataset_tools/create_pet_tf_record.py \
    --data_dir=/media/han/E/mWork/datasets/Oxford-IIIT_Pet_Dataset \
    --output_dir=trainLogs_pets/tfrecord

2. 训练

如果不能运行，那么执行：

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

python train.py \
        --logtostderr \
        --train_dir=trainLogs_pets/output \
        --pipeline_config_path=trainLogs_pets/ssd_mobilenet_v1_pets.config

3. 将训练得到的权重文件合并为*.pb文件

python export_inference_graph.py --input_type image_tensor \
    --pipeline_config_path trainLogs_pets/ssd_mobilenet_v1_pets.config \
    --trained_checkpoint_prefix trainLogs_pets/output/model.ckpt-100000 \
    --output_directory trainLogs_pets/output

4. 评估

python eval.py \
        --logtostderr \
        --checkpoint_dir=trainLogs_pets/output \
        --eval_dir=trainLogs_pets/eval \
        --pipeline_config_path=trainLogs_pets/ssd_mobilenet_v1_pets.config

create_pascal_tf_record.py

ignore_difficult_instances #忽视难例就是不训练难例

该文件会将图像文件也编码进.record文件中，所以生成文件比较大

  with tf.gfile.GFile(full_path, 'rb') as fid:
    encoded_jpg = fid.read()   
  example = tf.train.Example(features=tf.train.Features(feature={
    'image/height': dataset_util.int64_feature(height),
    'image/width': dataset_util.int64_feature(width),
    'image/filename': dataset_util.bytes_feature(
    data['filename'].encode('utf8')),
    'image/source_id': dataset_util.bytes_feature(
    data['filename'].encode('utf8')),
    'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
    'image/encoded': dataset_util.bytes_feature(encoded_jpg), #图像raw数据
    'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')), #jpeg格式，也就是说保存的图像是压缩后的大小
    'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
    'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
    'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
    'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
    'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
    'image/object/class/label': dataset_util.int64_list_feature(classes),
    'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
    'image/object/truncated': dataset_util.int64_list_feature(truncated),
    'image/object/view': dataset_util.bytes_list_feature(poses),
    }

因为每一类的训练txt中都包含所有训练样本的文件名，所以这里虽然是aeroplane，但仍然将所有训练样本转换成.record
VOC train中只有2501个；所以这里也可以修改为用train_val进行训练

    examples_path = os.path.join(data_dir, year, 'ImageSets', 'Main',
                                 'aeroplane_' + FLAGS.set + '.txt')

train.py

functools：Python的functools模块主要为函数式编程而设计，用于增强函数功能
参考：http://kuanghy.github.io/2016/10/26/python-functools

from functools import partial #functools.partial用于创建一个偏函数，用一些默认参数包装一个可调用对象。

def add(x, y):
    return x + y

add_y = partial(add, 3)  # add_y 是一个新的函数，只需要一个参数y,隐含x=3
ret=add_y(4) #ret=7

train.py中的例子

#使用Python的functools模块中partial偏函数；将model_builder.build()函数中的参数固定，并生成新的函数model_fn
  model_fn = functools.partial(
      model_builder.build,
      model_config=model_config,
      is_training=True)

tf.logging.set_verbosity(tf.logging.INFO) 设置logging冗余
如果不设置这行代码，则控制台不显示消息。
TensorFlow使用五个不同级别的日志消息。按照上升的顺序，它们是DEBUG，INFO，WARN，ERROR和FATAL。当您在任何这些级别配置日志记录时，TensorFlow将输出与该级别相对应的所有日志消息以及所有级别的严重级别。例如，如果设置了ERROR的日志记录级别，则会收到包含ERROR和FATAL消息的日志输出，如果设置了一个DEBUG级别，则会从所有五个级别获取日志消息。 # 默认情况下，TENSFlow在WARN的日志记录级别进行配置，但是在跟踪模型训练时，您需要将级别调整为INFO，这将提供适合操作正在进行的其他反馈。参考：https://blog.csdn.net/caokaifa/article/details/80385501?utm_source=copy
在写入消息时，调用tf.logging.info('Image size: %dx%d' % (width, height))
分布式设置

  #下面5行代码是分布式计算时候用到的，如果是单PC，不影响计算
  env = json.loads(os.environ.get('TF_CONFIG', '{}'))
  cluster_data = env.get('cluster', None) #从Python环境配置中搜索是否存在集群cluster
  cluster = tf.train.ClusterSpec(cluster_data) if cluster_data else None
  task_data = env.get('task', None) or {'type': 'master', 'index': 0} #task_data是master表示任务由主机完成
  task_info = type('TaskSpec', (object,), task_data)

trainer.train()

  trainer.train(
      create_input_dict_fn,#创建tensor输入字典的函数，利用functools.partial对get_next()函数修饰得到
      model_fn,#创建检测模型和计算Loss的函数，利用functools.partial对model_builder.build()函数进行得到
      train_config,#训练参数配置,protobuf；字典类型
      master,
      task,
      FLAGS.num_clones,
      worker_replicas,
      FLAGS.clone_on_cpu,
      ps_tasks,
      worker_job_name,
      is_chief,
      FLAGS.train_dir,
      graph_hook_fn=graph_rewriter_fn)
   
    '''
      Args:
    create_tensor_dict_fn: a function to create a tensor input dictionary.
    create_model_fn: a function that creates a DetectionModel and generates losses.
    train_config: a train_pb2.TrainConfig protobuf.
    master: BNS name of the TensorFlow master to use.
    task: The task id of this training instance.
    num_clones: The number of clones to run per machine.
    worker_replicas: The number of work replicas to train with.
    clone_on_cpu: True if clones should be forced to run on CPU.
    ps_tasks: Number of parameter server tasks.
    worker_job_name: Name of the worker job.
    is_chief: Whether this replica is the chief replica.
    train_dir: Directory to write checkpoints and training summaries to.
    graph_hook_fn: Optional function that is called after the inference graph is
      built (before optimization). This is helpful to perform additional changes
      to the training graph such as adding FakeQuant ops. The function should
      modify the default graph.
    '''

trainer.py

create_model_fn

#create_model_fn是一个function.partial装饰过的函数，如果后面加上'()',则表明调用该函数，因为不需要参数，所以括号里面为空
detection_model = create_model_fn()

pipline config

# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model { #定义模型配置
  ssd { #ssd-begin
    num_classes: 37 #类别数量，如果是自己的数据集，千万要注意修改
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher { #匹配参数，哪种情况下算匹配正确；默认阈值是0.5
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true #低于unmatched_threshold的为负样本
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity { #使用iou相似度进行度量
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6 #？？
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer { #权重初始化设置
            truncated_normal_initializer { #使用截断正态分布初始化方法
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    } # ssd-end
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner { #难例挖掘
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 24
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate { #指数衰减-学习率
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  #是否加载之前的训练模型，如果为空，则从零开始训练； /media/han/E/mWork/mCode/models/research/object_detection/ssd_mobilenet_v1_coco_2017_11_17/model.ckpt
  fine_tune_checkpoint: ""
  from_detection_checkpoint: true #加载预训练模型的分类权重
  load_all_detection_checkpoint_vars: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 100000  #最大迭代次数
  data_augmentation_options { #数据增强方式
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    # 如果record文件是分块的，那么使用？通配符匹配
    input_path: "trainLogs_pets/tfrecord/pet_faces_train.record-?????-of-?????"
  }
  label_map_path: "/media/han/E/mWork/mCode/models/research/object_detection/data/pet_label_map.pbtxt"
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  num_examples: 1101
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "trainLogs_pets/tfrecord/pet_faces_val.record-?????-of-?????"
  }
  label_map_path: "/media/han/E/mWork/mCode/models/research/object_detection/data/pet_label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

example-pets-eval

训练10000 steps

(base) han@MS:/media/han/E/mWork/mCode/models/research/object_detection$ python eval.py --logtostderr --checkpoint_dir=trainLogs_pets/output         --eval_dir=trainLogs_pets/eval     --pipeline_config_path=trainLogs_pets/ssd_mobilenet_v1_pets.config

eval结果：

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.499
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.729
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.590
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.270
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.541
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.645
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.669
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.669
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.478
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.707

evaluator.py

模型评估有多种标准，VOC、COC、OID_challenge等，该参数需要在pipline.config中的eval_config中设置，默认是pascal_voc_detection_metrics

EVAL_METRICS_CLASS_DICT = {
    'pascal_voc_detection_metrics':
        object_detection_evaluation.PascalDetectionEvaluator,
    'weighted_pascal_voc_detection_metrics':
        object_detection_evaluation.WeightedPascalDetectionEvaluator,
    'pascal_voc_instance_segmentation_metrics':
        object_detection_evaluation.PascalInstanceSegmentationEvaluator,
    'weighted_pascal_voc_instance_segmentation_metrics':
        object_detection_evaluation.WeightedPascalInstanceSegmentationEvaluator,
    'open_images_V2_detection_metrics':
        object_detection_evaluation.OpenImagesDetectionEvaluator,
    'coco_detection_metrics':
        coco_evaluation.CocoDetectionEvaluator,
    'coco_mask_metrics':
        coco_evaluation.CocoMaskEvaluator,
    'oid_challenge_object_detection_metrics':
        object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
}

EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'

错误记录

ValueError: Tried to convert ‘t’ to a tensor and failed. Error: Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted []
解决方法：
https://github.com/tensorflow/models/issues/3705#issuecomment-375563179

Tensorflow object_detection API笔记

文章目录

TF object_detection API

使用API训练数据集的一般流程

1. 创建tfrecord

2. 训练

3. 将训练得到的权重文件合并为*.pb文件

4. 评估

create_pascal_tf_record.py

train.py

trainer.py

pipline config

example-pets-eval

evaluator.py

错误记录

猜你喜欢