使用Google object_detection API与opencv实现简单的动态目标实时检测

这是我第一次在CSDN上发博客，如有错误或不满可以向我指出或联系我，希望这篇文章能够帮助一些人。

现在是第二次修改版

人工智能很火，我也尝试自己去做一些小小的东西，最近自己训练了一个检测黑框眼镜的模型，后来又实现了实施性的检测，这篇文章就是想做个记录，也希望帮助更多的人。

下图是自己做的一个最简单基础的实时检测了，算法啥的可能都不是特别好。

代码见我的github上，地址https://github.com/CExplorer/real_time-detection，主要就是object_detection文件夹里的文件做的修改，如果你不知道也不要紧，把这篇文章看到后面就了解了。

（注：这个视频中的人不是我，我此时正在端着电脑）

自己也不够成熟，走过很多弯路，经常查了一些csdn博客去寻找解决bug的方法，然而有时候按照博客上执行可能也是行不通。原因可能在于对方可能环境已经是配置好的，而我们自己的电脑没有配置好，也可能是原作者认为过于简单，而对于像我这样的小白还是有些困难，因此我的第一篇博客想尽量详细地写下完成这个检测的全过程，如果大家有遇到问题也可以向我询问，我的技术并不是很好，不过希望进步的同时能帮助到更多的人。这篇博客依然存在着一些漏洞，因此大家如果有问题可以在下方留言，或者加我qq2389388826,我会尽量帮大家解决问题。

首先关于目标检测，先要安装相应的环境，本人是使用python3语言配合tensorflow的库编写的，使用的是windows系统。本人强烈推荐能安装anaconda这个开源软件，集成了许多的库文件，是一个很方便的包管理的软件，大家可以去官网上下载https://www.anaconda.com/download/#windows

本人用的是python3写的，官方已公布其在2020年后不再维护python2，所以还是建议下载python3的。

如果你没有安装过python，那么你可以不必要安装，在安装anaconda时会自动安装上，如果你有安装，也不要紧，直接安装即可。安装过程除了路径自己选择外，基本不需要修改，不过有一处如下图

最上面的是没有勾的，建议选上，不然自己配置环境路径对于小白来说真的挺烦的。

这里我看到有一篇博客是关于安装这个的，写的应该挺详细的，他用的是Anaconda3-4.2.0-Windows-x86_64.exe版本，大家可以去下载最新版本，安装方式基本差不多，但是建议安装为了节省空间可以自己选择安装路径，不要默认。其博客地址：https://blog.csdn.net/u012318074/article/details/77075209

在下载完成后，大家可以安装tensorflow了，tensorflow有cpu的和gpu的，本人电脑能力有限所以使用cpu的，如果是gpu的安装还需要安装cuda，大家可以搜索如何安装gpu版的tensorflow，本人没有亲自安装过这里就不介绍，只简单介绍一下cpu版的tensorflow的安装。如果大家想了解cpu和gpu版tensorflow的区别可以上网搜索下。

win+R打开运行，输入cmd打开命令行，输入命令：

pip install tensorflow

然后等待安装好就可以了，可以验证一下，我们打开cmd命令行，输入python打开python的界面，依此输入以下代码：

import tensorflow as tf
t = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(t))

其实只要第一行代码不报错基本就不报错了，输出结果是

b'Hello, Tensorflow!'

然后我们安装python的opencv，仍然在命令行中依此输入以下命令：

pip install --upgrade setuptools
pip install opencv-python

安装完成后我们打开一个python的IDE（编辑器），可用以下代码验证（前提是电脑得带摄像头，我的是笔记本电脑）

import cv2
cap = cv2.VideoCapture(0)
while(1):
    ret, frame = cap.read()
    cv2.imshow("capture", frame)
    cv2.imwrite("fangjian2.jpg", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyAllWindows()

启动，如果有缺少包的报错就直接在cmd命令行输入pip install (缺少的包名)，后面出现错误也这么做。

这时如果配置好了，你就可以看到一个界面，就是你摄像头拍摄的界面，按q键就可以结束程序。

如果大家在安装如上包的过程中遇到问题可以在网上查阅解决方法，我也可以尽力帮助大家。

下面我们去下载谷歌写好的tensorflow的api，网址：https://github.com/tensorflow/models

可以下载下安装包，或者有git clone (这里写你要下载的网址)，这条命令在cmd中运行，不过一定要到目标文件夹下使用这条命令，不然会clone到不同的位置，可能不是你想要的位置。这种方法相较下载压缩包的优势之一包括不需要解压。

下载完成后，可以把解压后的文件夹放在一个地方，我是直接放在D盘的根目录下。

然后还有一点很重要，就是配置环境变量，我们打开环境变量开始编辑。

新建系统变量，名称PYTHONPATH，然后就是你下载的tensorflow-models目录下的几个文件夹路径包含进去

大家根据自己安装路径来，主要是tensorflow-models和此文件夹里面research文件夹和slim文件夹的路径，配置不好环境在后面训练模型时可能会出现no model named ‘object_detection’这样的错误。

我们就是要实现实时的目标检测，Google的tensorflow_models里有很多模型，这里我们需要的只是目标检测，在tensorflow_models/research/object_detection这个文件夹里，我们主要要用到这个object_detection，如果大家对其他模型感兴趣可以自己尝试去训练一波。

接着我们去下载protobuf这个软件，这个软件用于把tensorflow_models/research/object_detection/protos/文件夹下的.proto文件转换为.py文件。大家去官网https://github.com/google/protobuf/releases下载protoc-3.4.0-win32.zip这个压缩包，新版本的我试过，至少在我的电脑上编译存在问题，文件无法用，版本问题不是很大。

然后我们把压缩包解压，会发现里面有一个bin文件夹，打开发现有一个protoc.exe文件，将其放入你电脑文件目录C：/windows/system32的文件夹下，然后我们在cmd中定位到~/tensorflow_models/research文件夹下，输入如下代码开始编译文件：

protoc object_detection/protos/*.proto --python_out=.

如下图所示，目标文件夹下已有了很多编译完成的.py文件了

然后我们的准备阶段就已经完成了，下面我们就开始进入实战吧

首先我们先直接测试谷歌给的图片，在我们下载anaconda时，我们已经安装了jupyter notebook这个软件，是一个很方便写python的IDE，虽然时网页打开的，但是不需要联网的。这里，我们在cmd命令行中定位到object_detection这个文件夹的位置（就是tensorflow_models这个文件夹里的一个文件夹，相对路径时 ./research/object_detection），然后输入命令：

jupyter notebook

等待我们就可以进入网页，然后在页面中打开objec_detection_tutorial.ipynb这个文件，然后依次运行这些代码，我记得中间有一段代码运行可能会报连接不成功的错误，这个不是代码或环境配置的原因，就相当于你进一个网站没成功一样，可以多试试，如果不行的话也不要紧，哪里只是下载他训练好的模型，后面会讲到自己训练模型，用自己的模型来检测。

那么我们可以看到已经检测到了小狗狗

然后我们就开始自己尝试训练模型吧。

我的第一次是训练这个模型去检测黑框眼镜，因为我的一个学长做过，所以我靠自己摸索着做一个。于是，我了解到Google的这个api就是别人写好的，我准备初步改动然后自己训练自己模型，并实现实时的检测。

首先，我去百度了100张带黑框眼镜的人的图片：

我们把这些图片放在tensorflow-models\research\object_detection\images\train的文件夹下（文件夹没有自己创建），但是如果直接用图片训练，那么就是检测人了，我们需要把图片标注好，告诉计算机哪里是眼镜框。

我推荐labelImag这个标注用的小工具，大家可以在https://github.com/tzutalin/labelImg 上下载，然后直接运行它的labelImag.py文件即可进入标注界面。或者大家可以这样，在命令行cmd输入：

pip install labelimg

这样就会自动安装到根目录下，大家启动只需要在cmd输入labelimg就可以打开这个工具界面。

我们点击open dir这个按键，定位到tensorflow-models\research\object_detection\images\train，然后就可以一次导入全部的图片进行标注了，按A和D是左右换图片，W是开始画框。画的框全是矩形，大家把眼镜框住，然后做好标注的标签，我写的是glasses。

注，你在使用labelImag的过程中会输入很多的包才能运行起来，它说缺少哪个包就直接pip install那个包，其中可能有PyQt4包的缺少，因为在代码中PyQt5如果没有他会检测PyQt4的包，所以只会报PyQt4包不存在，而这个包可能已下载不到了。大家只要pip install PyQt5就可以啦。标注好一张图就ctrl+s,保存位置默认就行了，即和图片位置一样。

我们保存好的格式是xml格式，对于Tensorflow，需要输入专门的tfrecord格式。

这里借鉴下别人写好的程序，他的GitHub地址是https://github.com/XiangGuo1992/Screen-Vehicle-Detection-using-Tensorflow-API，大家可以去下载他的xml_to_csv.py和generate_tfrecord.py文件，代码这里我就直接打出来吧。注意，代码不能直接用，需要根据自己的需求来改！

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET
 
os.chdir('D:\\test\\test_images\\frame2')
path = 'D:\\test\\test_images\\frame2'
 
def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df
 
 
def main():
    image_path = path
    xml_df = xml_to_csv(image_path)
    xml_df.to_csv('tv_vehicle_labels.csv', index=None)
    print('Successfully converted xml to csv.')

main()

这个代码主要是将所有的xml格式文件转化为csv格式文件，根据其作者的要求，我们要把第六行和第七行里代码的路径改为我们放置xml文件的路径，然后在第34行，可以把tv_vehicle_labels.csv这段代码改成自己想要导出后得到的名字，我决定改成glasses.csv，注意后缀名csv一定要加上。

然后我们还需要把这个csv文件转换为tfrecord格式。

"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/tv_vehicle_labels.csv  --output_path=train.record
  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=test.record
"""

import os
import io
import pandas as pd
import tensorflow as tf
 
from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict
 
os.chdir('D:\\tensorflow-model\\research\\object_detection\\')   
#这里改自己的object_detection的路径下
 
flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS = flags.FLAGS
 
 
# TO-DO replace this with label map
#注意将对应的label改成自己的类别！！！！！！！！！！
def class_text_to_int(row_label):
    if row_label == 'tv':
        return 1
    elif row_label == 'vehicle':
        return 2
    else:
        None
 
 
def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
 
 
def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size
 
    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []
 
    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))
 
    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example
 
 
def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(os.getcwd(), 'images')
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())
 
    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))
 
 
if __name__ == '__main__':
    tf.app.run()

这里除了他的输出路径需要更改外，我们还需要更改他的class_text_to_int(row_label)这个函数，因为我这里只有标注了一个模型，取名是glasses，所以我需要更改这个函数，我只有一个眼镜的，只要一个就可以了，大家如果多标注了一些标签可以多写一些，这个函数根据我更改后是这样：

def class_text_to_int(row_label):
    if row_label == 'glasses':
        return 1
    else:
        None

然后我们把这个文件放到我们的object_detection这个文件夹中，然后打开cmd也定位到这，输入命令：

python generate_tfrecord.py --csv_input=data/train/glasses.csv  --output_path=data/train.record

这里--csv_input就是生成的那个csv文件所在相对位置，--out_put就是输出的相对位置。这里就默认放在data这个文件夹下吧。

好了，这时候我们就有了record格式的文件，这个就是我们训练所需要的文件。此处存在一个bug，需要转换时把对应的原图片放在images这个文件夹下，和转换代码有关，在main那里第二行代码path=后面，大家可以自己改名字，否则必须把图片放在images这个文件夹下，而且所有图片必须对应csv文件。大家先这么做，后期我会完善代码与教程。

我们训练需要有一个模型来训练，像fast rcnn,ssd等，我们直接去拿这些模型来训练，不同模型有不同的作用，那么我们就这样吧，用ssd来训练，大家可以在这上面下载：

https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs

我选择ssd_mobilenet_v1_coco.config,大家感兴趣可以尝试其他模型，这里我拿这个模型举例：

我们下载好后放在training这个文件夹下，然后我们用记事本编辑，这里我推荐一款记事本：notepad++，大家可以去网络上搜索并下载这个记事本，也可以用自带的记事本编译。

我们进行如下修改：

这里的num_classes后面数字根据自己设的标签来，我只设了一个glasses,所以改成1。

这里的batch_size是一次喂入网络的图片的多少，如果电脑配置不是很好的话可以尽量调低，但是调低虽然速度会快些但是训练精度没有24准，如果配置好就默认24就好。

这里上面是训练的tfrecord的相对路径，因为我们在环境变量中配置了object_detection的相对位置，所以我们直接相对object_detection这个文件夹设置路径就行。

然后我们到自己的data文件夹下，看到有很多的pbtxt文件吗，我们也写一个，我这里起名叫glasses.pbtxt,由于我只写了一个标签glasses,所以我就打了一个：

item {
  id: 1
  name: 'glasses'
}

如果大家标记了更多的标签，就可以多打几个这个，想id后改成2，名称对应什么，记住这里你写的名字一定要和你标注时起的名字一样！

现在我们开始训练吧，cmd定位到object_detection这个文件夹下，我们开始训练，命令如下：

python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config

这里是把训练的文件保存在相对路径的training的文件夹下，使用的是ssd这个模型跑的，大家可以自己去查查各种模型，可以换不同的模型跑的。

如果中途停止了，直接再次点上述命令，就接着上次训练了。

我们可以打开可视化工具tensorboard来观察loss：

tensorboard --logdir=training

我们需要把训练收集的数据转化为.pb格式，命令如下：

python export_inference_graph.py \
    --input_type image_tensor \ 
    --pipeline_config_path training/ssd_mobilenet_v1_coco.config \  
    --trained_checkpoint_prefix training/model.ckpt-****** \  
    --output_directory glasses1

这里training里面的一串*****是大家训练的保存的最大步数，大家可以打开training这个文件夹看到，后面的output_directory是保存.pb文件的文件夹名字。

这些都是我在不同步数递增保存的模型。

然后我们就可以测试了，cmd定位到object_detection路径，输入jupyter notebook

原来有一个object_detection_tutorial.ipynb的文件，我们保留一个原本，再复制一个起个名字吧

然后我们简单修改：

第一个黄色部分改为你输出的模型文件夹名，我的是glasses，第二个是我们前面改的.pbtxt文件，改成自己写的，第三个是标签数，我只有1个，就写1就行，根据自己情况来。画黑线的直接删掉。

这download直接删去就行

这个注意，意思是他检测的图片就是你的图片名前面是image,后面带的数字根据range里面的数字来确定，比如现在就会遍历test_images文件夹里的image1和image2图片。

然后就改好了，我们把张图片放入文件夹中，这里我直接改名image1，然后测试，这里注意，如果前期不出框只是训练步数不足，多练至少3w步差不多就能出框了。

这时，我们就会了训练自己模型实现图像检测，那么如何实现实时检测呢？我们介绍opencv这个库，我们可以把图像一帧一帧检测，检测完就可以直接导出，然后依此显示，这样就成了视频流了，就是实时检测，我有两个代码，这里先是只供参考吧，然后我直接写.py格式文件，大家直接运行就行了，两个代码，第一个比较卡，第二个比较流畅了，供大家参考，后期我会把我的代码上传到我的github上，给大家做个参考。

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import matplotlib as plt
import tarfile
import tensorflow as tf
import zipfile
import cv2
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

from utils import label_map_util

from utils import visualization_utils as vis_util
# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("D:\tensorflow-models\research\object_detection")
from object_detection.utils import ops as utils_ops

if tf.__version__ < '1.4.0':
    raise ImportError('Please upgrade your tensorflow installation to v1.4.* or later!')

# What model to download.
MODEL_NAME = 'Glasses13'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'glasses.pbtxt')

NUM_CLASSES = 1

detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')

label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    return np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)

# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 51) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)


def run_inference_for_single_image(image, graph):
    with graph.as_default():
        with tf.Session() as sess:
        # Get handles to input and output tensors
            ops = tf.get_default_graph().get_operations()
            all_tensor_names = {output.name for op in ops for output in op.outputs}
            tensor_dict = {}
            for key in [
            'num_detections', 'detection_boxes', 'detection_scores',
            'detection_classes', 'detection_masks'
            ]:
                tensor_name = key + ':0'
                if tensor_name in all_tensor_names:
                    tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
                    tensor_name)
            if 'detection_masks' in tensor_dict:
                # The following processing is only for single image
                detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
                detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
                # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
                real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
                detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
                detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
                detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
                    detection_masks, detection_boxes, image.shape[0], image.shape[1])
                detection_masks_reframed = tf.cast(
                    tf.greater(detection_masks_reframed, 0.5), tf.uint8)
                # Follow the convention by adding back the batch dimension
                tensor_dict['detection_masks'] = tf.expand_dims(
                    detection_masks_reframed, 0)
            image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

        # Run inference
            output_dict = sess.run(tensor_dict,
                                feed_dict={image_tensor: np.expand_dims(image, 0)})

        # all outputs are float32 numpy arrays, so convert types as appropriate
            output_dict['num_detections'] = int(output_dict['num_detections'][0])
            output_dict['detection_classes'] = output_dict['detection_classes'][0].astype(np.uint8)
            output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
            output_dict['detection_scores'] = output_dict['detection_scores'][0]
            if 'detection_masks' in output_dict:
                output_dict['detection_masks'] = output_dict['detection_masks'][0]
    return output_dict

cap = cv2.VideoCapture(0)
j=1
for image_path in TEST_IMAGE_PATHS:
  ret, frame = cap.read()
  cv2.imwrite("./test_images/image{}.jpg".format(j), frame)
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  output_dict = run_inference_for_single_image(image_np, detection_graph)
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
          image_np,
          output_dict['detection_boxes'],
          output_dict['detection_classes'],
          output_dict['detection_scores'],
          category_index,
          instance_masks=output_dict.get('detection_masks'),
          use_normalized_coordinates=True,
          line_thickness=8)
  plt.figure(figsize=IMAGE_SIZE)
  plt.imshow(image_np)
  img = Image.fromarray(image_np.astype('uint8')).convert('RGB')
  img.save("./test_images/image_processed{}.jpg".format(j))  
  cv2.imshow("before", frame)
  src=cv2.imread("./test_images/image_processed{}.jpg".format(j))       
  cv2.namedWindow('after', cv2.WINDOW_AUTOSIZE)
  cv2.imshow('after', src)
  j=j+1
  if cv2.waitKey(1) & 0xFF == ord('q'):
    break
cap.release()
cv2.destroyAllWindows()

第二个代码：

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import matplotlib as plt
import tarfile
import tensorflow as tf
import zipfile
import cv2
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

from utils import label_map_util

from utils import visualization_utils as vis_util
# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("D:\tensorflow-models\research\object_detection")
from object_detection.utils import ops as utils_ops

if tf.__version__ < '1.4.0':
    raise ImportError('Please upgrade your tensorflow installation to v1.4.* or later!')

# What model to download.
MODEL_NAME = 'Glasses13'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'glasses.pbtxt')

NUM_CLASSES = 1

detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')

label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

cap=cv2.VideoCapture(0) # 0 stands for very first webcam attach
filename="D:\\tensorflow-models\\research\\object_detection\\a.avi"#[place were i stored my output file]
codec=cv2.VideoWriter_fourcc('m','p','4','v')#fourcc stands for four character code
framerate=30
resolution=(640,480)
    
VideoFileOutput=cv2.VideoWriter(filename,codec,framerate, resolution)
    
with detection_graph.as_default():
    with tf.Session(graph=detection_graph) as sess:   
        ret=True
        while (ret):        
            ret, image_np=cap.read() 
            # Definite input and output Tensors for detection_graph
            image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
            # Each box represents a part of the image where a particular object was detected.
            detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
            # Each score represent how level of confidence for each of the objects.
            # Score is shown on the result image, together with the class label.
            detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
            detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
            num_detections = detection_graph.get_tensor_by_name('num_detections:0')

              # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
            image_np_expanded = np.expand_dims(image_np, axis=0)
              # Actual detection.
            (boxes, scores, classes, num) = sess.run(
                  [detection_boxes, detection_scores, detection_classes, num_detections],
                  feed_dict={image_tensor: image_np_expanded})
              # Visualization of the results of a detection.
            vis_util.visualize_boxes_and_labels_on_image_array(
                  image_np,
                  np.squeeze(boxes),
                  np.squeeze(classes).astype(np.int32),
                  np.squeeze(scores),
                  category_index,
                  use_normalized_coordinates=True,
                  line_thickness=8)

            VideoFileOutput.write(image_np)
            cv2.imshow('实时检测',image_np)
            if cv2.waitKey(25) & 0xFF==ord('q'):
                break
                cv2.destroyAllWindows()
                cap.release()

后面这个设置可以把检测直接变成视频，大家可以用谷歌训练好的那个模型，这样就可以直接检测各种小玩意了。

代码在我的github上，地址：https://github.com/CExplorer/real_time-detection

如果大家有不懂的问题可以联系我，我尽量为大家解答问题。

CExploer

发布了12 篇原创文章 · 获赞 38 · 访问量 2万+

私信关注

使用Google object_detection API与opencv实现简单的动态目标实时检测

猜你喜欢