Tensorflow unmanned vehicles use SSD (single-shot multi-frame detection) on the mobile terminal to recognize objects and understand Graph

The environment is Raspberry Pi 3B+. Of course, the installation of tensorflow here does not have to be in the Raspberry Pi environment. It only needs to be ARM architecture . That is to say, most embedded systems on the market currently use this reduced instruction set.
For detection on the computer side, those who are interested can refer to the SSD (Single Shot MultiBox Detector) series: SSD improved version of computer vision (smooth L1 norm loss and focus loss) "4"

The operating system belongs to Linux, so let's get familiar with our own hardware environment, mainly to check which kind of chip architecture our own belongs to.

1. Linux view chip architecture

As mentioned earlier, it needs to be installed in the ARM architecture, how to check the current chip architecture through instructions.
Because they are all in the Linux system, the commands entered are the same, and any one of the following three commands can view the architecture to which they belong: 

arch
uname -a
file /bin/bash

X86 diagram (I installed WSL here):

Here is X86_64 , which is the 64-bit extension of X86. It was originally designed by AMD, so it is also called "AMD64", and later adopted by Intel, it is called "Intel64"
ARM map:

The aarch64 here is the 64-bit instruction set introduced in the ARMv8-A architecture , which belongs to the ARM architecture.
There is also a cash register of the Linux system. Because it takes up less resources, it only needs to use a low-end chip. Generally, it is "Pentium", and the returned one is i686  .

2. Installation environment

Install tensorflow , of course, choose this version according to your own situation, the latest version is tensorflow-2.4.0-cp35-none-linux_armv6l.whl
# https://github.com/lhelontra/tensorflow-on-arm/releases
pip install tensorflow-1.8.0-cp27-none-linux_armv7l.whl

2.1, SSD mobile terminal model

#http://download.tensorflow.org/models/object_detection/ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz
tar -xzvf ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz .

2.2. Install the OpenCV vision library

pip install opencv-python

2.3, run Demo, command line operation

python opencv_camera.py
# Or if the .py file is loaded in Jupyter, its content will be copied, click to run
%load opencv_camera.py

3. Object recognition

Next, we will come to the actual operation, let the camera detect the object and mark the object category. Here we use the CSI camera. If yours is a USB interface, the code also has comments.

3.1. Import related libraries

Import related libraries, and the source code of object_detection etc. gives the address at the end. If you are interested in testing, you can clone it and play.

import numpy as np
import cv2
import os,time
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_utils
import ipywidgets.widgets as widgets
from image_fun import bgr8_to_jpeg

3.2. Camera initialization

The camera is initialized, and the camera with the USB interface is also given here. I will borrow the camera on the unmanned vehicle as the CSI interface. If your camera is USB, you can view the device index
through the ls /dev/video* command, camera = USBCamera(capture_device=1)

#from jetcam.usb_camera import USBCamera
from jetcam.csi_camera import CSICamera
from jetcam.utils import bgr8_to_jpeg

#camera = USBCamera(width=320, height=240, capture_fps=30)
camera = CSICamera(width=320, height=240, capture_fps=30)
#将相机设置为running = True,为新帧附加回调
camera.running = True

3.3. Install JetCam

Of course, if JetCam is not installed here, you can install it first. JetCam is an easy-to-use Python camera interface for NVIDIA Jetson, install jetcam :

git clone https://github.com/NVIDIA-AI-IOT/jetcam
cd jetcam

sudo python3 setup.py install

3.4, Image display component

After the camera is initialized, we create a new picture component in Jupyter to update the data captured by the camera

image_widget = widgets.Image(format='jpg', width=320, height=240)
display(image_widget)
image_widget.value = bgr8_to_jpeg(camera.value)

After running, there will be a picture here, but it is the first still frame. If you want to update continuously, you will use the following two update methods.

3.5. Initialize the model

Load the lightweight SSD model and tags

MODEL_NAME = 'ssdlite_mobilenet_v2_coco_2018_05_09'
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb' 
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

NUM_CLASSES = 90
IMAGE_SIZE = (12, 8)
fileAlreadyExists = os.path.isfile(PATH_TO_CKPT)

if not fileAlreadyExists:
    print('Model does not exsist !')
    exit

3.6. Load Graph

Here is the calculation graph loaded above, which is a frozen serialized graph, which means that it cannot be trained, which means that it is used for prediction and is a pre-trained model.
In addition, the label map is also loaded, and these labels are classified to make a dictionary type corresponding to id and name. The usage of Graph will be introduced in detail later!

print('Loading...')
detection_graph = tf.Graph()
with detection_graph.as_default(): #语句下定义属于计算图detection_graph的张量和操作
    od_graph_def = tf.compat.v1.GraphDef()
    with tf.io.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: 
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True) 
category_index = label_map_util.create_category_index(categories)
print('Finish Load Graph..')

print(len(category_index),category_index)
print(category_index[1]['name'])#person

The category_index here is a dictionary of 80 categories, similar to  {1: {'id': 1, 'name': 'person'}, 2: {'id': 2, 'name': 'bicycle'},...} as shown below  :

3.7. Camera detection

There are two ways to call the camera for real-time detection. One is to read the value of the camera cyclically, that is, to process it in an infinite loop; the other is to use a callback function to automatically update the value to the image component when the value of the camera changes. 

3.7.1, infinite loop 

    # Main
    t_start = time.time()
    fps = 0

    with detection_graph.as_default():
        with tf.compat.v1.Session(graph=detection_graph) as sess:
            while True:
                frame = camera.value
               # ret, frame = cap.read()
    #            frame = cv2.flip(frame, -1) # Flip camera vertically
    #             frame = cv2.resize(frame,(320,240))
                ##############
                image_np_expanded = np.expand_dims(frame, axis=0)
                image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
                detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
                detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
                detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
                num_detections = detection_graph.get_tensor_by_name('num_detections:0')
    #             print('Running detection..')
                (boxes, scores, classes, num) = sess.run( 
                    [detection_boxes, detection_scores, detection_classes, num_detections], 
                    feed_dict={image_tensor: image_np_expanded})
    #             print('Done.  Visualizing..')
                vis_utils.visualize_boxes_and_labels_on_image_array(
                        frame,
                        np.squeeze(boxes),
                        np.squeeze(classes).astype(np.int32),
                        np.squeeze(scores),
                        category_index,
                        use_normalized_coordinates=True,
                        line_thickness=8)
        
                for i in range(0, 10):
                    if scores[0][i] >= 0.5:
                        print(category_index[int(classes[0][i])]['name'])
                ##############
                fps = fps + 1
                mfps = fps / (time.time() - t_start)
                cv2.putText(frame, "FPS " + str(int(mfps)), (10,10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,255), 2)
                image_widget.value = bgr8_to_jpeg(frame)

3.7.2. Callback function

Create an update function and call it when the camera is started, so that when the value of the camera changes, the update function will handle it. This method is generally recommended!

detection_graph.as_default()
sess = tf.compat.v1.Session(graph=detection_graph)
t_start = time.time()
fps = 0

def update_image(change):
    global fps
    global sess
    frame = change['new']
    image_np_expanded = np.expand_dims(frame, axis=0)
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
    detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
    detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
    detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')

    (boxes, scores, classes, num) = sess.run(
        [detection_boxes, detection_scores, detection_classes, num_detections],
        feed_dict={image_tensor: image_np_expanded})

    # 锚框与标签
    vis_utils.visualize_boxes_and_labels_on_image_array(
        frame,
        np.squeeze(boxes),
        np.squeeze(classes).astype(np.int32),
        np.squeeze(scores),
        category_index,
        use_normalized_coordinates=True,
        line_thickness=8)
    '''
    for i in range(0, 10):
        if scores[0][i] >= 0.5:
            print(category_index[int(classes[0][i])]['name'])
    '''

    fps = fps + 1
    mfps = fps / (time.time() - t_start)
    cv2.putText(frame, "FPS " + str(int(mfps)), (10,10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,255), 2)
    image_widget.value = bgr8_to_jpeg(frame)

#摄像头更新函数
camera.observe(update_image, names='value')

#停止更新,单纯的unobserve不能停止摄像头的继续运行,我加了一个stop停止
#camera.unobserve(update_image, names='value')
#camera.stop()

The renderings are as follows:

Of course, there are also situations where the recognition is wrong. For example, in the picture below, the cigarette behind is recognized as a book. This is also normal, and it is related to the data set. After all, it looks like the display shape of a book.

4、tf.Graph()

Here is a related introduction to the tf.Graph function that appears in the above code . It is a calculation graph in Tensorflow. There are special calculation modules for tensors inside. You can also create multiple calculation graphs, usually expressed as G(V,E), where G represents a graph, V is the set of vertices in graph G, and E is the set of edges in graph G. Such construction can be processed in parallel, which is very helpful for improving performance.
Graph is essentially a data structure of collection array (Tensor tensor) and Operation operation, which can run in non-Python environment, and only needs to explain this data structure. Therefore, by building a Graph, we can easily represent and optimize the calculation process, thereby realizing the training and inference of the deep learning model.
Once the Graph is built, we can use TensorFlow's Session object to execute the Graph. During execution, the Session will transfer data from one node to another according to the relationship between nodes and edges defined in the Graph, and perform operations on each node to finally obtain the calculation results we need.
Let's look at an example:

g1 = tf.Graph()
g2 = tf.Graph()

with g1.as_default():
    a = tf.constant([1,2])
    b = tf.constant([3,4])
    c = tf.constant([5,6])
    result1 = a + b + c

with g2.as_default():
    d = tf.constant([11,22,33])
    e = tf.constant([33,44,55])
    result2 = d * e

After defining the two calculation graphs, we can calculate under them respectively, and then execute the calculation results through the run of the Session

with tf.compat.v1.Session(graph=g1) as sess:
    print(a)#Tensor("Const_12:0", shape=(2,), dtype=int32)
    print(result1)#Tensor("add_8:0", shape=(2,), dtype=int32)
    print(sess.run(result1))#[ 9 12]

It can be seen that under the calculation graph of g1, we can view the calculation node and the calculation node, here is the addition operation, and then look at the multiplication operation of g2:

with tf.compat.v1.Session(graph=g2) as sess:
    print(d)#Tensor("Const_12:0", shape=(3,), dtype=int32)
    print(result2)#Tensor("mul:0", shape=(3,), dtype=int32)
    print(sess.run(result2))#[ 363  968 1815]

We can also verify which of these tensor calculation graphs belong to:

print(a.graph is g1)#True
print(a.graph is g2)#False
print(d.graph is g1)#False
print(d.graph is g2)#True

4.1, fetches parameter in run()

Of course, the fetches parameter value in this run function can be a single element, a list, or a tuple in addition to the summation and product operation above.

import tensorflow as tf

sess = tf.compat.v1.Session()
a = tf.constant([10, 20])
b = tf.constant([1.0, 2.0])

v = sess.run(a)
print(type(v),v)#<class 'numpy.ndarray'> [10 20]

v = sess.run([a, b])
print(v)#[array([10, 20], dtype=int32), array([1., 2.], dtype=float32)]

Of course, it can also be a dictionary, and the a and b values ​​in it will be converted into numpy arrays

import collections
MyData = collections.namedtuple('MyData', ['a', 'b'])
v = sess.run({'k1': MyData(a, b), 'k2': [b, a]})
print(v)
#{'k1': MyData(a=array([10, 20], dtype=int32), b=array([1., 2.], dtype=float32)), 'k2': [array([1., 2.], dtype=float32), array([10, 20], dtype=int32)]}

4.2, feed_dict parameter in run()

The input value of this feed_dict parameter needs to be a dictionary type:

v = sess.run([a, b],feed_dict={b:[33,44]})
print(v)
#[array([10, 20], dtype=int32), array([33., 44.], dtype=float32)]

More detailed source code: https://github.com/yihangzhao/SSDMobile

Guess you like

Origin blog.csdn.net/weixin_41896770/article/details/131877128