Step by step to learn OAK one: Hello World (use DepthAI Python API to display color video stream)

As a traditional introductory ceremony in the programming world, we also named our first project after Hello World. In fact, our project has nothing to do with Hello World.
In this program, we use DepthAI Python API to realize the function of displaying OAK camera color video stream and capturing objects step by step

Environmental requirements:

  • Python >=3.6
  • DepthAI Python API
  • cv2, blobconverterand numpythe Python module

create program

Setup 1: Create the file

  • Create a folder and create a new 1-hello-world folder
  • Open the folder with vscode
  • Create a new hello_world.py file

Setup 2: Install dependencies

Before installing dependencies, you need to create and activate a virtual environment. I have created a virtual environment OAKenv here, enter cd in the terminal... to return to the root directory of OAKenv, and enter to activate the virtual OAKenv\Scripts\activateenvironment

Install pip dependencies:

pip install numpy opencv-python depthai blobconverter --user

Use the pip command to install four Python packages: numpy, opencv-python, depthhai, and blobconverter. The installation of these packages is carried out for the current user, and parameters are used --userto indicate that these packages are installed in the local environment of the current user instead of the system environment.

The functions of these four packages are as follows:

  1. numpy : is a Python library for scientific computing and numerical manipulation. It provides high-performance multidimensional array objects and functions for manipulating these arrays.
  2. opencv-python : is a Python interface to OpenCV (an open source computer vision library). It provides a wealth of image processing and computer vision functions, such as image reading, processing, analysis, feature extraction, etc.
  3. depthhai : is a library for deep learning and computer vision, especially for inference using DepthAI hardware accelerators. DepthAI is a technology that combines AI cameras and embedded neural network accelerators to provide real-time depth perception and intelligent analysis functions.
  4. blobconverter : is a tool for converting deep learning models into a binary format that can run on the target device. It can convert models trained by different frameworks (such as TensorFlow, PyTorch, etc.) into a format that is convenient for inference on specific devices.

Setup 3: Import required packages

Import the packages required by the project

import numpy as np #numpy - 处理depthai返回的数据包数据
import cv2 # opencv -显示视频流
import depthai # depthai - 调用depthai 访问相机及其数据包进行图像采集
import blobconverter # blobconverter - 编译并下载MyriadX神经网络Blob

Setup 4: Define the pipeline

Any action of DepthAI, whether neural inference or color camera output, requires defining a pipeline, including nodes and connections corresponding to our needs.

Here we want to see frames from a color camera, and a simple neural network running on them.

Create an empty pipeline object

# 创建一个空的pipeline对象
pipeline = depthai.Pipeline()

Setup 5: Add ColorCamera node

Now, the first node we will add is the ColorCamera.

cam_rgb = pipeline.create(depthai.node.ColorCamera)
cam_rgb.setPreviewSize(300,300)
cam_rgb.setInterleaved(False)

The code above creates a node named cam_rgb, ColorCamerasets the size of the preview to 300x300 pixels, and sets the storage format of the image to non-interlaced.

  1. pipeline.create(depthai.node.ColorCamera): Use to depthai.node.ColorCameracreate a ColorCameranode as the source of image acquisition. pipelineis a DepthAI pipeline object.

  2. cam_rgb.setPreviewSize(300, 300): Set cam_rgbthe preview size of the node to 300x300 pixels. This means that images captured from the camera are resized to 300x300 for display or further processing.

  3. cam_rgb.setInterleaved(False): Set the storage format of the image to non-interleaved (non-interleaved) format. The interleaved format refers to the interleaving of pixels of different color channels when the image data is stored, while the non-interlaced format stores the pixels of each color channel sequentially.

The purpose of this code is to configure cam_rgbthe node to capture RGB images from the camera and use them in subsequent processing.

Setup 6: Define mobile network detection network nodes

Next, define a MobileNetDetectionNetwork node with a mobilenetssd network. The blob file for this example will be automatically compiled and downloaded using the blobconverter tool. The blobconverter.from_zoo() function returns the Path of the model, so we can directly put it into the detection_nn.setBlobPath() function. With this node, the output of nn will be parsed on the device side and we will receive a ready-made detection object. In order for this to work we also need to set a confidence threshold to filter out incorrect results

detection_nn = pipeline.create(depthai.node.MobileNetDetectionNetwork)
# Set path of the blob (NN model). We will use blobconverter to convert&download the model
# detection_nn.setBlobPath("/path/to/model.blob")
detection_nn.setBlobPath(blobconverter.from_zoo(name='mobilenet-ssd', shaves=6))
detection_nn.setConfidenceThreshold(0.5)

The code above creates a node named detection_nnand MobileNetDetectionNetworkconfigures its settings for object detection.

  1. pipeline.create(depthai.node.MobileNetDetectionNetwork): Use to depthai.node.MobileNetDetectionNetworkcreate a MobileNetDetectionNetworknode for object detection.
  2. detection_nn.setBlobPath(blobconverter.from_zoo(name='mobilenet-ssd', shaves=6)): Set the model file path. Functions are used here blobconverter.from_zoo()to download and convert models from the pre-trained model library. name='mobilenet-ssd'Indicates that the MobileNet-SSD model is selected, shaves=6and that six shave cores are selected for model inference.
  3. detection_nn.setConfidenceThreshold(0.5): Set the confidence threshold for object detection to 0.5. This means that only targets with confidence greater than 0.5 will be considered valid.

The purpose of this code is to configure detection_nnthe node to use the MobileNet-SSD object detection model and set the confidence threshold to 0.5. In a subsequent run of the pipeline, this node will use the model to perform object detection on images acquired from the camera.

Setup 7: Connect the color camera preview output to the neural network input

cam_rgb.preview.link(detection_nn.input)

This line of code cam_rgblinks the node's preview output to detection_nnthe node's input.

cam_rgb.previewIndicates cam_rgbthe preview output port of the node, through which the image preview data collected from the camera can be obtained. .link(detection_nn.input)Indicates that it will be linked cam_rgb.previewwith detection_nnthe input port of the node, and the image data will be passed to detection_nnthe node for object detection.

The purpose of this line of code is to establish a data flow path, pass the image preview data collected from the camera to the detection_nnnode for object detection, and trigger the processing of the node in subsequent pipeline runs.

Setup 8: Create XLinkOut node

Now, we need to receive the color camera frames and the neural network inference results - since these results are generated on the device, we need to transfer them to our machine (host, here is my computer). The communication between the device and the host is handled by XLink, here, since we want to receive data from the device to the host, we will use the XLinkOut node

xout_rgb = pipeline.create(depthai.node.XLinkOut)
xout_rgb.setStreamName("rgb")
cam_rgb.preview.link(xout_rgb.input)

xout_nn = pipeline.create(depthai.node.XLinkOut)
xout_nn.setStreamName("nn")
detection_nn.out.link(xout_nn.input)

This code creates two XLinkOut nodes and connects the nodes to the corresponding inputs and outputs.

  1. xout_rgb = pipeline.create(depthai.node.XLinkOut): Create an XLinkOut node named as xout_rgb, which is used to output the image data collected by the camera.

  2. xout_rgb.setStreamName("rgb"): Set xout_rgbthe output stream name of the node to "rgb".

  3. cam_rgb.preview.link(xout_rgb.input): cam_rgbLink the node's preview output to xout_rgbthe node's input to pass image data to xout_rgbthe node for output.

  4. xout_nn = pipeline.create(depthai.node.XLinkOut): Create another XLinkOut node, named as xout_nn, to output the result data of target detection.

  5. xout_nn.setStreamName("nn"): Set xout_nnthe output stream name of the node to "nn".

  6. detection_nn.out.link(xout_nn.input): detection_nnLink the output of the node to xout_nnthe input of the node to transfer the target detection result data to xout_nnthe node for output.

The purpose of this code is to create two XLinkOut nodes, one of which is used to output the image data collected by the camera, and the other is used to output the result data of the target detection. By linking the input and output between nodes, the data flow can be passed to the corresponding node, and the result can be output in real time during the pipeline operation.

Setup 9: Initialize the DepthAI device

With the pipe defined, we can now initialize the device with the pipe and start it

with depthai.Device(pipeline) as device:

Note here: By default, DepthAI is accessed as a USB3 device. If you want to communicate via USB2, you can initialize DepthAI with the following code

device = depthai.Device(pipeline, usb2Mode=True)

From here, the pipeline will run on the device, producing the results we require. allows us to capture them

Setup 10: Add helper objects

Since the XLinkOut node is already defined in the pipeline, we will now define a host-side output queue to access the generated results

    q_rgb = device.getOutputQueue("rgb")
    q_nn = device.getOutputQueue("nn")

This code obtains two output queues from the device, corresponding to image data and object detection result data respectively.

  1. q_rgb = device.getOutputQueue("rgb"): Obtain the output queue named "rgb" from the device, which is used to receive the image data collected by the camera. In this way, by obtaining the output queue, the actual image data can be obtained from the device.

  2. q_nn = device.getOutputQueue("nn"): Get the output queue named "nn" from the device to receive the result data of target detection. In this way, by obtaining the output queue, the actual target detection result data can be obtained from the device.

By obtaining the output queue, the image data and target detection result data can be obtained in real time during the running of the Pipeline for further processing and use.

Setup 11: Define two variables for storing data

    frame = None
    detections = []

Two variables are initialized:

  1. frame = None: frameIt is a variable used to store image frame data, the initial value is None. This variable can be assigned the actual image data for subsequent processing or display.
  2. detections = []: detectionsis an empty list used to store the object detection results. The output of an object detection algorithm is usually a set of coordinates, category labels, and confidence of the detected object’s box. This empty list can be used to store these detection results for later use.

Setup 12: Define the auxiliary function frameNorm

Convert bounding box coordinates from normalized ranges to actual pixel locations

Due to an implementation detail of the neural network, bounding box coordinates in the inference results are represented as floating point numbers in the range between 0 and 1 - relative to the width/height of the frame (e.g. if the image has a width of 200 pixels and the neural network The x_min coordinate returned by nn is equal to 0.2, which means that the actual (normalized) x_min coordinate is 40 pixels).

So a helper function frameNorm needs to be defined, which will convert these values ​​in the range <0…1> to actual pixel positions.

    def frame_norm(frame,bbox):
        normVals = np.full(len(bbox),frame.shape[0])
        normVals[::2] = frame.shape[1]
        return (np.clip(np.array(bbox),0,1)*normVals).astype(int)

This code defines a function called frameNorm that takes two parameters: frame and bbox. What it does is convert bounding box coordinates from normalized ranges to actual pixel locations.

First, the function creates an array normVals with the same length as the bbox, whose initial value is the height of the frame (frame.shape[0]). The even index positions in the array correspond to the width values ​​of the bounding box, and the odd index positions correspond to the height values ​​of the bounding box.

Next, set the width value to the width of the frame by setting the normVals[::2] value to frame.shape[1].

Then, the bbox array is clipped to the range [0, 1] and multiplied by the normVals array to scale the normalized coordinate values ​​to actual pixel locations. Finally, use astype to convert the result to an integer type.

Finally, the function returns the transformed bbox, which is the actual pixel position coordinates.

Setup 13: Start the main program loop

After preparing everything above, we can start the main program loop

    while True:

Define variables to get data from the queue

In this loop, the first thing to do is get the latest results from the nn node and the color camera

        in_rgb = q_rgb.tryGet()
        in_nn = q_nn.tryGet()

In this code, in_rgb = q_rgb.tryGet()and in_nn = q_nn.tryGet()is trying to get data from the queue q_rgband .q_nn

q_rgb.tryGet()Will try to q_rgbget data from the queue and assign the obtained data to in_rgbthe variable. It will return if there is no data available in the queue None.

Similarly, q_nn.tryGet()it will try q_nnto get data from the queue and assign the obtained data to in_nnthe variable. It will also return if no data is available in the queue None.

Either from the rgb camera or the neural network nn will be provided as a 1D array, so they both need transformations to be usable for display (we've defined one of the transformations we need - the frameNorm function)

Receive frames from rgb camera

First, to receive frames from the rgb camera, we use the getCvFrame command

        if in_rgb is not None:
            frame = in_rgb.getCvFrame()

In the code above, we first check in_rgbfor null. If it is not empty, we use getCvFrame()the method to obtain a frame of image transmitted from the RGB camera and assign it to the variable frame. This way we can use framethe variables for subsequent image processing or display.

Receive the results of the neural network

Second, we receive the results of the neural network. The default MobileNetSSD result has 7 fields, each field is image_id, label, confidence, x_min, y_min, x_max, y_max, by accessing the detection array, we receive detection objects that allow us to access these fields

        if in_nn is not None:
            detections = in_nn.detections

In the code above, we first check in_nnfor null. If it is not empty, we get in_nnthe detection result in and assign it to the variable detections. These detection results may include information such as object category, location, and confidence level, which can be used for subsequent applications or displays.

in_nn.detectionsRepresents in_nnthe detection results obtained from the neural network model. The result may be a list, array, or other data structure containing information about objects detected in the input image.

The specific structure and content depend on the neural network model used and the application scenario. Usually, each detection result may contain information such as object category, bounding box location, confidence, etc. The information can be parsed and processed according to specific situations to meet specific needs.

Show results

So far we have fetched all the results from the DepthAI device, the only thing left is to actually display them.

        if frame is not None:
            for detection in detections:
                bbox = frame_norm(frame,(detection.xmin,detection.ymin,detection.xmax,detection.ymax))
                cv2.rectangle(frame,(bbox[0],bbox[1]),(bbox[2],bbox[3]),(255,0,0),2)
            cv2.imshow("preview",frame)

The above code judges whether the frame exists, and if there is a frame ( frame is not None), the detection result is displayed on the image using the OpenCV library. detection.xminFor each detection result, the coordinate information ( , detection.ymin, detection.xmax, ) of the detection frame can be used to detection.ymaxdraw a rectangular frame on the frame. Then, use cv2.rectanglethe function to draw a rectangle, and set the color to (255, 0, 0) and the line width to 2. Finally, use cv2.imshowa function to display the frame and name it "preview".

Here you can see the usage of the frame_norm function we defined earlier to normalize the bounding box coordinates. We use cv2.rectangle to draw a rectangular box on the RGB frame as an indicator of the object, and then use cv2.imshow to display the frame.

Terminate program

Use the cv2.waitKey method to terminate the program, which waits for the user to press a key - here we want to break out of the loop when the user presses the q key

        if cv2.waitKey(1) == ord('q'):
            break

Setup 14: Run the program

Enter the following command in the terminal to run the program

python hello_world.py

insert image description here
insert image description here
So far, our first OAK program has been successfully run.

Guess you like

Origin blog.csdn.net/w137160164/article/details/131445323