tensorflow takes notes on target detection


Introduction

Tensorflow has added a lot of new features after updating to version 1.0, among which a lot of deep network structures written in the tf framework ( https://github.com/tensorflow/models  ) have been released, which greatly reduces the difficulty of development. Using the off-the-shelf network structure, Whether fine-tuning or retraining is a lot easier. Recently, the author finally ran through the ssd_mobilenet_v1 model of the TensorFlow Object Detection API. Here I record the entire process of how to run through the data preparation to the model use. I believe it can be helpful to myself and some students.

The Object Detection API provides pre-trained weights for 5 network structures, all of which are trained with the COCO dataset. These five models are SSD+mobilenet, SSD+inception_v2, R-FCN+resnet101, faster RCNN+resnet101, faster RCNN+inception+resnet101. The accuracy of each model and the time required for computation are as follows. The following is an introduction to how to use Object Detection to train your own model.

The installation of TensorFlow is no longer explained here. There are a lot of online tutorials, and you can find very detailed documentation for installing TensorFlow.

Preparation before training:

Use protobuf to configure the model and training parameters, so the protobuf library must be compiled before the API is used normally. Here you can download the directly compiled pb library ( https://github.com/google/protobuf/releases  ), after decompressing the compressed package, put protoc is added to the environment variable:

$ cd tensorflow/models

$ protoc object_detection/protos/*.proto --python_out=.

 

(I added protoc to the environment variable, and I encountered an error that the *.proto file could not be found. Later, I put protoc.exe in the models/object_detection directory and re-executed it)

Then add models and slim (tf advanced framework) to python environment variables:

PYTHONPATH=$PYTHONPATH:/your/path/to/tensorflow/models:/your/path/to/tensorflow/models/slim

 

data preparation:

The dataset needs to be converted into PASCAL VOC structure. The API provides create_pascal_tf_record.py to convert the VOC structure dataset into .record format. However we found an easier way, Datitran provides an easier way to produce the .record format.

First of all, you need to label the corresponding image of the image, here you can use the labelImg tool. Each time a sample is marked, an xml marking file is generated. Then, these marked xml files are placed in two directories according to the training set and the validation set, and the xml_to_csv.py script is provided in Datitran. Here, you only need to specify the name of the marked directory. Next, we then need to convert the corresponding csv format to .record format.

def main():
    # image_path = os.path.join(os.getcwd(), 'annotations')
    image_path = r'D:\training-sets\object-detection\sunglasses\label\test'
    xml_df = xml_to_csv(image_path)
    xml_df.to_csv('sunglasses_test_labels.csv', index=None)
    print('Successfully converted xml to csv.')

 

调用generate_tfrecord.py,注意要指定--csv_input与--output_path这两个参数。执行下面命令:

python generate_tfrecord.py --csv_input=sunglasses_test_labels.csv --output_path=sunglass_test.record

 

这样就生成了训练及验证用的train.record与test.record。接下来指定标签名称,仿照models/ object_detection/data/ pet_label_map.pbtxt,重新创建一个文件,指定标签名。

item {
  id: 1
  name: 'sunglasses'
}

 

训练:

根据自己的需要,选择一款用coco数据集预训练的模型,把前缀model.ckpt放置在待训练的目录,这里meta文件保存了graph和metadata,ckpt保存了网络的weights,这几个文件表示预训练模型的初始状态。

打开ssd_mobilenet_v1_pets.config文件,并做如下修改:

  1. num_classes:修改为自己的classes num

  1. 将所有PATH_TO_BE_CONFIGURED的地方修改为自己之前设置的路径(共5处)

其他参数均保持默认参数。

准备好上述文件后就可以直接调用train文件进行训练。

python object_detection/train.py \
--logtostderr \
--pipeline_config_path= D:/training-sets /data-translate/training/ssd_mobilenet_v1_pets.config \
--train_dir=D:/training-sets/data-translate/training

 

TensorBoard监控:

通过tensorboard工具,可以监控训练过程,输入西面指令后,在浏览器输入localhost:6006(默认)即可。

tensorboard --logdir= D:/training-sets/data-translate/training

 

这里面有很多指标曲线,甚至有模型网络架构,笔者对于这里面很多指标含义还没有弄明白,不过感觉出TensorBoard这个工具应该是极其强大。不过我们可以通过Total_Loss来看整体训练的情况。

从整体上看,loss曲线确实是收敛的,整体的训练效果还是满意的。另外,TensorFlow还提供了训练过程中利用验证集验证准确性的能力,但是笔者在调用时,仍有些问题,这里暂时就不详细说明了。

Freeze Model模型导出:

查看模型实际的效果前,我们需要把训练的过程文件导出,生产.pb的模型文件。本来,tensorflow/python/tools/freeze_graph.py提供了freeze model的api,但是需要提供输出的final node names(一般是softmax之类的最后一层的激活函数命名),而object detection api提供提供了预训练好的网络,final node name并不好找,所以object_detection目录下还提供了export_inference_graph.py。

python export_inference_graph.py \
--input_type image_tensor
--pipeline_config_path D:/training-sets /data-translate/training/ssd_mobilenet_v1_pets.config \
--trained_checkpoint_prefix D:/training-sets /data-translate/training/ssd_mobilenet_v1_pets.config /model.ckpt-* \
--output_directory D:/training-sets /data-translate/training/result

 

导出完成后,在output_directory下,会生成frozen_inference_graph.pb、model.ckpt.data-00000-of-00001、model.ckpt.meta、model.ckpt.data文件。

调用生成模型:

目录下本身有一个调用的例子,稍微改造如下:

复制代码
import cv2
import numpy as np
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util


class TOD(object):
    def __init__(self):
        self.PATH_TO_CKPT = r'D:\lib\tf-model\models-master\object_detection\training\frozen_inference_graph.pb'
        self.PATH_TO_LABELS = r'D:\lib\tf-model\models-master\object_detection\training\sunglasses_label_map.pbtxt'
        self.NUM_CLASSES = 1
        self.detection_graph = self._load_model()
        self.category_index = self._load_label_map()

    def _load_model(self):
        detection_graph = tf.Graph()
        with detection_graph.as_default():
            od_graph_def = tf.GraphDef()
            with tf.gfile.GFile(self.PATH_TO_CKPT, 'rb') as fid:
                serialized_graph = fid.read()
                od_graph_def.ParseFromString(serialized_graph)
                tf.import_graph_def(od_graph_def, name='')
        return detection_graph

    def _load_label_map(self):
        label_map = label_map_util.load_labelmap(self.PATH_TO_LABELS)
        categories = label_map_util.convert_label_map_to_categories(label_map,
                                                                    max_num_classes=self.NUM_CLASSES,
                                                                    use_display_name=True)
        category_index = label_map_util.create_category_index(categories)
        return category_index

    def detect(self, image):
        with self.detection_graph.as_default():
            with tf.Session(graph=self.detection_graph) as sess:
                # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
                image_np_expanded = np.expand_dims(image, axis=0)
                image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0')
                boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0')
                scores = self.detection_graph.get_tensor_by_name('detection_scores:0')
                classes = self.detection_graph.get_tensor_by_name('detection_classes:0')
                num_detections = self.detection_graph.get_tensor_by_name('num_detections:0')
                # Actual detection.
                (boxes, scores, classes, num_detections) = sess.run(
                    [boxes, scores, classes, num_detections],
                    feed_dict={image_tensor: image_np_expanded})
                # Visualization of the results of a detection.
                vis_util.visualize_boxes_and_labels_on_image_array(
                    image,
                    np.squeeze(boxes),
                    np.squeeze(classes).astype(np.int32),
                    np.squeeze(scores),
                    self.category_index,
                    use_normalized_coordinates=True,
                    line_thickness=8)

        cv2.namedWindow("detection", cv2.WINDOW_NORMAL)
        cv2.imshow("detection", image)
        cv2.waitKey(0)

if __name__ == '__main__':
    image = cv2.imread('image.jpg')
    detecotr = TOD()
    detecotr.detect(image)
复制代码

 

下面是一些图片的识别效果:

相关阅读

当强化学习遇见泛函分析

google cloud :穷人也能玩深度学习

[ I am Jarvis ] :聊聊 FaceID 背后的深度学习视觉算法

此文已由作者授权腾讯云技术社区发布,转载请注明文章出处
原文链接:https://cloud.tencent.com/community/article/351424

海量技术实践经验,尽在云加社区! https://cloud.tencent.com/developer


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324416942&siteId=291194637