As a traditional introductory ceremony in the programming world, we also named our first project after Hello World. In fact, our project has nothing to do with Hello World.
In this program, we use DepthAI Python API to realize the function of displaying OAK camera color video stream and capturing objects step by step
Table of contents
- Environmental requirements:
- create program
-
- Setup 1: Create the file
- Setup 2: Install dependencies
- Setup 3: Import required packages
- Setup 4: Define the pipeline
- Setup 5: Add ColorCamera node
- Setup 6: Define mobile network detection network nodes
- Setup 7: Connect the color camera preview output to the neural network input
- Setup 8: Create XLinkOut node
- Setup 9: Initialize the DepthAI device
- Setup 10: Add helper objects
- Setup 11: Define two variables for storing data
- Setup 12: Define the auxiliary function frameNorm
- Setup 13: Start the main program loop
- Setup 14: Run the program
Environmental requirements:
- Python >=3.6
- DepthAI Python API
cv2
,blobconverter
andnumpy
the Python module
create program
Setup 1: Create the file
- Create a folder and create a new 1-hello-world folder
- Open the folder with vscode
- Create a new hello_world.py file
Setup 2: Install dependencies
Before installing dependencies, you need to create and activate a virtual environment. I have created a virtual environment OAKenv here, enter cd in the terminal... to return to the root directory of OAKenv, and enter to activate the virtual OAKenv\Scripts\activate
environment
Install pip dependencies:
pip install numpy opencv-python depthai blobconverter --user
Use the pip command to install four Python packages: numpy, opencv-python, depthhai, and blobconverter. The installation of these packages is carried out for the current user, and parameters are used --user
to indicate that these packages are installed in the local environment of the current user instead of the system environment.
The functions of these four packages are as follows:
- numpy : is a Python library for scientific computing and numerical manipulation. It provides high-performance multidimensional array objects and functions for manipulating these arrays.
- opencv-python : is a Python interface to OpenCV (an open source computer vision library). It provides a wealth of image processing and computer vision functions, such as image reading, processing, analysis, feature extraction, etc.
- depthhai : is a library for deep learning and computer vision, especially for inference using DepthAI hardware accelerators. DepthAI is a technology that combines AI cameras and embedded neural network accelerators to provide real-time depth perception and intelligent analysis functions.
- blobconverter : is a tool for converting deep learning models into a binary format that can run on the target device. It can convert models trained by different frameworks (such as TensorFlow, PyTorch, etc.) into a format that is convenient for inference on specific devices.
Setup 3: Import required packages
Import the packages required by the project
import numpy as np #numpy - 处理depthai返回的数据包数据
import cv2 # opencv -显示视频流
import depthai # depthai - 调用depthai 访问相机及其数据包进行图像采集
import blobconverter # blobconverter - 编译并下载MyriadX神经网络Blob
Setup 4: Define the pipeline
Any action of DepthAI, whether neural inference or color camera output, requires defining a pipeline, including nodes and connections corresponding to our needs.
Here we want to see frames from a color camera, and a simple neural network running on them.
Create an empty pipeline object
# 创建一个空的pipeline对象
pipeline = depthai.Pipeline()
Setup 5: Add ColorCamera node
Now, the first node we will add is the ColorCamera.
cam_rgb = pipeline.create(depthai.node.ColorCamera)
cam_rgb.setPreviewSize(300,300)
cam_rgb.setInterleaved(False)
The code above creates a node named cam_rgb
, ColorCamera
sets the size of the preview to 300x300 pixels, and sets the storage format of the image to non-interlaced.
-
pipeline.create(depthai.node.ColorCamera)
: Use todepthai.node.ColorCamera
create aColorCamera
node as the source of image acquisition.pipeline
is a DepthAI pipeline object. -
cam_rgb.setPreviewSize(300, 300)
: Setcam_rgb
the preview size of the node to 300x300 pixels. This means that images captured from the camera are resized to 300x300 for display or further processing. -
cam_rgb.setInterleaved(False)
: Set the storage format of the image to non-interleaved (non-interleaved) format. The interleaved format refers to the interleaving of pixels of different color channels when the image data is stored, while the non-interlaced format stores the pixels of each color channel sequentially.
The purpose of this code is to configure cam_rgb
the node to capture RGB images from the camera and use them in subsequent processing.
Setup 6: Define mobile network detection network nodes
Next, define a MobileNetDetectionNetwork node with a mobilenetssd network. The blob file for this example will be automatically compiled and downloaded using the blobconverter tool. The blobconverter.from_zoo() function returns the Path of the model, so we can directly put it into the detection_nn.setBlobPath() function. With this node, the output of nn will be parsed on the device side and we will receive a ready-made detection object. In order for this to work we also need to set a confidence threshold to filter out incorrect results
detection_nn = pipeline.create(depthai.node.MobileNetDetectionNetwork)
# Set path of the blob (NN model). We will use blobconverter to convert&download the model
# detection_nn.setBlobPath("/path/to/model.blob")
detection_nn.setBlobPath(blobconverter.from_zoo(name='mobilenet-ssd', shaves=6))
detection_nn.setConfidenceThreshold(0.5)
The code above creates a node named detection_nn
and MobileNetDetectionNetwork
configures its settings for object detection.
pipeline.create(depthai.node.MobileNetDetectionNetwork)
: Use todepthai.node.MobileNetDetectionNetwork
create aMobileNetDetectionNetwork
node for object detection.detection_nn.setBlobPath(blobconverter.from_zoo(name='mobilenet-ssd', shaves=6))
: Set the model file path. Functions are used hereblobconverter.from_zoo()
to download and convert models from the pre-trained model library.name='mobilenet-ssd'
Indicates that the MobileNet-SSD model is selected,shaves=6
and that six shave cores are selected for model inference.detection_nn.setConfidenceThreshold(0.5)
: Set the confidence threshold for object detection to 0.5. This means that only targets with confidence greater than 0.5 will be considered valid.
The purpose of this code is to configure detection_nn
the node to use the MobileNet-SSD object detection model and set the confidence threshold to 0.5. In a subsequent run of the pipeline, this node will use the model to perform object detection on images acquired from the camera.
Setup 7: Connect the color camera preview output to the neural network input
cam_rgb.preview.link(detection_nn.input)
This line of code cam_rgb
links the node's preview output to detection_nn
the node's input.
cam_rgb.preview
Indicates cam_rgb
the preview output port of the node, through which the image preview data collected from the camera can be obtained. .link(detection_nn.input)
Indicates that it will be linked cam_rgb.preview
with detection_nn
the input port of the node, and the image data will be passed to detection_nn
the node for object detection.
The purpose of this line of code is to establish a data flow path, pass the image preview data collected from the camera to the detection_nn
node for object detection, and trigger the processing of the node in subsequent pipeline runs.
Setup 8: Create XLinkOut node
Now, we need to receive the color camera frames and the neural network inference results - since these results are generated on the device, we need to transfer them to our machine (host, here is my computer). The communication between the device and the host is handled by XLink, here, since we want to receive data from the device to the host, we will use the XLinkOut node
xout_rgb = pipeline.create(depthai.node.XLinkOut)
xout_rgb.setStreamName("rgb")
cam_rgb.preview.link(xout_rgb.input)
xout_nn = pipeline.create(depthai.node.XLinkOut)
xout_nn.setStreamName("nn")
detection_nn.out.link(xout_nn.input)
This code creates two XLinkOut nodes and connects the nodes to the corresponding inputs and outputs.
-
xout_rgb = pipeline.create(depthai.node.XLinkOut)
: Create an XLinkOut node named asxout_rgb
, which is used to output the image data collected by the camera. -
xout_rgb.setStreamName("rgb")
: Setxout_rgb
the output stream name of the node to "rgb". -
cam_rgb.preview.link(xout_rgb.input)
:cam_rgb
Link the node's preview output toxout_rgb
the node's input to pass image data toxout_rgb
the node for output. -
xout_nn = pipeline.create(depthai.node.XLinkOut)
: Create another XLinkOut node, named asxout_nn
, to output the result data of target detection. -
xout_nn.setStreamName("nn")
: Setxout_nn
the output stream name of the node to "nn". -
detection_nn.out.link(xout_nn.input)
:detection_nn
Link the output of the node toxout_nn
the input of the node to transfer the target detection result data toxout_nn
the node for output.
The purpose of this code is to create two XLinkOut nodes, one of which is used to output the image data collected by the camera, and the other is used to output the result data of the target detection. By linking the input and output between nodes, the data flow can be passed to the corresponding node, and the result can be output in real time during the pipeline operation.
Setup 9: Initialize the DepthAI device
With the pipe defined, we can now initialize the device with the pipe and start it
with depthai.Device(pipeline) as device:
Note here: By default, DepthAI is accessed as a USB3 device. If you want to communicate via USB2, you can initialize DepthAI with the following code
device = depthai.Device(pipeline, usb2Mode=True)
From here, the pipeline will run on the device, producing the results we require. allows us to capture them
Setup 10: Add helper objects
Since the XLinkOut node is already defined in the pipeline, we will now define a host-side output queue to access the generated results
q_rgb = device.getOutputQueue("rgb")
q_nn = device.getOutputQueue("nn")
This code obtains two output queues from the device, corresponding to image data and object detection result data respectively.
-
q_rgb = device.getOutputQueue("rgb")
: Obtain the output queue named "rgb" from the device, which is used to receive the image data collected by the camera. In this way, by obtaining the output queue, the actual image data can be obtained from the device. -
q_nn = device.getOutputQueue("nn")
: Get the output queue named "nn" from the device to receive the result data of target detection. In this way, by obtaining the output queue, the actual target detection result data can be obtained from the device.
By obtaining the output queue, the image data and target detection result data can be obtained in real time during the running of the Pipeline for further processing and use.
Setup 11: Define two variables for storing data
frame = None
detections = []
Two variables are initialized:
frame = None
:frame
It is a variable used to store image frame data, the initial value isNone
. This variable can be assigned the actual image data for subsequent processing or display.detections = []
:detections
is an empty list used to store the object detection results. The output of an object detection algorithm is usually a set of coordinates, category labels, and confidence of the detected object’s box. This empty list can be used to store these detection results for later use.
Setup 12: Define the auxiliary function frameNorm
Convert bounding box coordinates from normalized ranges to actual pixel locations
Due to an implementation detail of the neural network, bounding box coordinates in the inference results are represented as floating point numbers in the range between 0 and 1 - relative to the width/height of the frame (e.g. if the image has a width of 200 pixels and the neural network The x_min coordinate returned by nn is equal to 0.2, which means that the actual (normalized) x_min coordinate is 40 pixels).
So a helper function frameNorm needs to be defined, which will convert these values in the range <0…1> to actual pixel positions.
def frame_norm(frame,bbox):
normVals = np.full(len(bbox),frame.shape[0])
normVals[::2] = frame.shape[1]
return (np.clip(np.array(bbox),0,1)*normVals).astype(int)
This code defines a function called frameNorm that takes two parameters: frame and bbox. What it does is convert bounding box coordinates from normalized ranges to actual pixel locations.
First, the function creates an array normVals with the same length as the bbox, whose initial value is the height of the frame (frame.shape[0]). The even index positions in the array correspond to the width values of the bounding box, and the odd index positions correspond to the height values of the bounding box.
Next, set the width value to the width of the frame by setting the normVals[::2] value to frame.shape[1].
Then, the bbox array is clipped to the range [0, 1] and multiplied by the normVals array to scale the normalized coordinate values to actual pixel locations. Finally, use astype to convert the result to an integer type.
Finally, the function returns the transformed bbox, which is the actual pixel position coordinates.
Setup 13: Start the main program loop
After preparing everything above, we can start the main program loop
while True:
Define variables to get data from the queue
In this loop, the first thing to do is get the latest results from the nn node and the color camera
in_rgb = q_rgb.tryGet()
in_nn = q_nn.tryGet()
In this code, in_rgb = q_rgb.tryGet()
and in_nn = q_nn.tryGet()
is trying to get data from the queue q_rgb
and .q_nn
q_rgb.tryGet()
Will try to q_rgb
get data from the queue and assign the obtained data to in_rgb
the variable. It will return if there is no data available in the queue None
.
Similarly, q_nn.tryGet()
it will try q_nn
to get data from the queue and assign the obtained data to in_nn
the variable. It will also return if no data is available in the queue None
.
Either from the rgb camera or the neural network nn will be provided as a 1D array, so they both need transformations to be usable for display (we've defined one of the transformations we need - the frameNorm function)
Receive frames from rgb camera
First, to receive frames from the rgb camera, we use the getCvFrame command
if in_rgb is not None:
frame = in_rgb.getCvFrame()
In the code above, we first check in_rgb
for null. If it is not empty, we use getCvFrame()
the method to obtain a frame of image transmitted from the RGB camera and assign it to the variable frame
. This way we can use frame
the variables for subsequent image processing or display.
Receive the results of the neural network
Second, we receive the results of the neural network. The default MobileNetSSD result has 7 fields, each field is image_id, label, confidence, x_min, y_min, x_max, y_max, by accessing the detection array, we receive detection objects that allow us to access these fields
if in_nn is not None:
detections = in_nn.detections
In the code above, we first check in_nn
for null. If it is not empty, we get in_nn
the detection result in and assign it to the variable detections
. These detection results may include information such as object category, location, and confidence level, which can be used for subsequent applications or displays.
in_nn.detections
Represents in_nn
the detection results obtained from the neural network model. The result may be a list, array, or other data structure containing information about objects detected in the input image.
The specific structure and content depend on the neural network model used and the application scenario. Usually, each detection result may contain information such as object category, bounding box location, confidence, etc. The information can be parsed and processed according to specific situations to meet specific needs.
Show results
So far we have fetched all the results from the DepthAI device, the only thing left is to actually display them.
if frame is not None:
for detection in detections:
bbox = frame_norm(frame,(detection.xmin,detection.ymin,detection.xmax,detection.ymax))
cv2.rectangle(frame,(bbox[0],bbox[1]),(bbox[2],bbox[3]),(255,0,0),2)
cv2.imshow("preview",frame)
The above code judges whether the frame exists, and if there is a frame ( frame is not None
), the detection result is displayed on the image using the OpenCV library. detection.xmin
For each detection result, the coordinate information ( , detection.ymin
, detection.xmax
, ) of the detection frame can be used to detection.ymax
draw a rectangular frame on the frame. Then, use cv2.rectangle
the function to draw a rectangle, and set the color to (255, 0, 0) and the line width to 2. Finally, use cv2.imshow
a function to display the frame and name it "preview".
Here you can see the usage of the frame_norm function we defined earlier to normalize the bounding box coordinates. We use cv2.rectangle to draw a rectangular box on the RGB frame as an indicator of the object, and then use cv2.imshow to display the frame.
Terminate program
Use the cv2.waitKey method to terminate the program, which waits for the user to press a key - here we want to break out of the loop when the user presses the q key
if cv2.waitKey(1) == ord('q'):
break
Setup 14: Run the program
Enter the following command in the terminal to run the program
python hello_world.py
So far, our first OAK program has been successfully run.