Supervising Professor: Zhang Yang of Shenzhen University of Technology (Intel Global Innovation Ambassador)

Author: Li Yipeng, Shenzhen University of Technology (Electronic Science and Technology Class of 2021)

1.1 Introduction

This article will be expanded on the article " Self-trained Pytorch model is optimized and deployed on AIxBoard™ using OpenVINO ". It will introduce how to use the OpenVINO Python API to optimize and deploy the YOLOv5 model to complete the YOLOv5 target detection task.

The development environment of the Python program in this article is Ubuntu20.04 LTS + PyCharm, and the hardware platform is the AIxBoard™ developer kit.

Project background of this article: For the 2023 Eleventh National College Student Optoelectronic Design Competition Competition Question 2 "Maze Treasure Hunt" optoelectronic smart car topic. Based on the treasure style of this competition, I used deep learning to train a model that can classify dominoes with four different colors and different marking shapes. The domino style is detailed in Figure 1.1.

Figure 1.1 Four types of dominoes

1.2 YOLOv5 and target detection

YOLO (You Only Look Once) is a target detection model. Target detection is an important task in computer vision. The purpose is to find a specific object in a picture, and it also requires identifying the type and location of the object. In the previous article, the Pytorch model was used for image classification and only required to identify the type of objects in the picture. The specific differences can be seen intuitively from Figure 1.2.1.

Figure 1.2.1 Image classification, target positioning, and target detection

1. Construct a data set for images through labelImg

Labelimg is a data annotation software that supports the output of annotated data in yolo, PascalVOC and other formats. Here we can choose the yolo format.

Execute in environment:

pip install labelimg -i https://mirror.baidu.com/pypi/simple

Then open the labelimg software, as shown in Figure 1.2.2

Figure 1.2.2 labelImg software interface

As shown in the figure, after selecting the image data directory (Open Dir) and the data annotation saving directory (Choose Save Dir), you can manually annotate the desired objects.

2. After labeling, check the label file in the saving directory to see if it is correct, as shown in Figure 1.2.3

Figure 1.2.3 Annotated files

3. A total of 2,000 images were annotated in this experiment (500 images in a single category, 4 categories in total). The specific training process has a more detailed , which will not be explained here. After obtaining the YOLOv5 model, it is converted into an IR model through OpenVINO Model Optimization, so that a certain degree of optimization can be achieved in terms of processing speed and accuracy.

1.3 Use OpenVINO Runtime to perform inference on YOLOv5 model

In this chapter we will use OpenVINO Runtime in Pycharm to optimize inference on the YOLOv5 model we trained.

The entire reasoning process can be roughly divided into:

Inference core initialization → Preprocess the input graph → Input to the inference engine to obtain the results → Obtain the results through confidence/NMS (non-maximum suppression) filtering → Visualize the results through the OpenCV API.

1.3.1 Import function package

import openvino.runtime as ov
import cv2
import numpy as np
import openvino.preprocess as op

This time we imported four function packages, namely OpenVINO Runtime & PreProcess, Numpy, and OpenCV. The difference from before is that we need to use OpenVINO's own preprocessing API to preprocess our model so that the model can work normally under OpenVINO's inference engine.

1. Introduction to PreProcess API:

OpenVINO PreProcess is a member of the OpenVINO Python API family. It mainly provides an API function library native to OpenVINO Runtime for data preprocessing. When PreProcess is not used, developers need to use third-party libraries such as OpenCV to preprocess it. , but OpenCV, as an open source and extensive function library, data preprocessing can only be loaded into the CPU for implementation. This undoubtedly increases the cost of CPU resources, and the processed data needs to be returned to computing devices such as iGPU afterwards. reasoning. PreProcess provides a way to directly integrate preprocessing into the model execution graph. The entire model workflow flows on the iGPU, which eliminates the need to rely on the CPU and improves execution efficiency.

Due to the different input data, we need preprocessing to process the data correctly. For example, change the precision, change the input color channel, the layout of the input data, etc.

The overall PreProcess process is roughly:

Create PPP (PrePostProcess) object → declare input data information → specify Layout → set output tensor information → build Model from PPP object and perform inference

It can be clearly seen that the existence of PreProcess makes preprocessing very simple and easy to understand. You only need to check the input and output information of the model before conversion, and then compare the input data in your own environment to make preprocessing changes. Moreover, the entire environment can run on computing devices such as iGPU, which reduces the burden on the CPU and allows more valuable resources to be used to deal with other important things.

1.3.2 Model loading

Load the model:

def Init():
    global core
    global model
    global compiled_model
global infer_request
#核心创建
core = ov.Core() 
#读取用YOLOv5模型转换而来的IR模型
model = core.read_model("best2.xml", "best2.bin") 
#运用PPP(PrePostProcessor)对模型进行预处理
Premodel = op.PrePostProcessor(model)
Premodel.input().tensor().set_element_type(ov.Type.u8).set_layout(ov.Layout("NHWC")).set_color_format(op.ColorFormat.BGR)
Premodel.input().preprocess().convert_element_type(ov.Type.f32).convert_color(op.ColorFormat.RGB).scale(
        [255., 255., 255.])
    Premodel.input().model().set_layout(ov.Layout("NCHW"))
    Premodel.output(0).tensor().set_element_type(ov.Type.f32)
    model = Premodel.build()
    compiled_model = core.compile_model(model, "CPU") #加载模型，可用CPU or GPU
    infer_request = compiled_model.create_infer_request() #生成推理

1.3.3 Image size adjustment

Due to the uncertainty of the size of the input image, we specially add a Resize link here to adapt to images of different resolutions. However, if the size of the input image is relatively stable, we only need to find the aspect ratio of the transformation image.

def resizeimg(image, new_shape):
old_size = image.shape[:2]
#记录新形状和原生图像矩形形状的比率
    ratio = float(new_shape[-1] / max(old_size)) 
    new_size = tuple([int(x * ratio) for x in old_size])
    image = cv2.resize(image, (new_size[1], new_size[0]))
    delta_w = new_shape[1] - new_size[1]
    delta_h = new_shape[0] - new_size[0]
color = [100, 100, 100]
new_im = cv2.copyMakeBorder(image, 0, delta_h, 0, delta_w, cv2.BORDER_CONSTANT, value=color)    #增广操作
    return new_im, delta_w, delta_h

1.3.4 Reasoning process and result display

In the previous section, we defined the preprocessed image for the input image. In this section, it is the core of the OpenVINO Runtime inference program.

#************************************#
#               推理主程序             #
def main(img,infer_request):
    push =[]
    img_re,dw,dh = resizeimg(img,(640,640)) #尺寸处理
    input_tensor = np.expand_dims(img_re, 0) #获得输入张量
    infer_request.infer({0: input_tensor}) #输入到推理引擎
    output = infer_request.get_output_tensor(0) #获得推理结果
    detections = output.data[0] #获得检测数据
    boxes = []
    class_ids = []
    confidences = []
    for prediction in detections:
        confidence = prediction[4].item() #获取置信度
        if confidence >= 0.6: #初步过滤，过滤掉绝大多数的无效数据
            classes_scores = prediction[5:]
            _, _, _, max_indx = cv2.minMaxLoc(classes_scores)
            class_id = max_indx[1]
            if (classes_scores[class_id] > .25):
                confidences.append(confidence)
                class_ids.append(class_id)
                x, y, w, h = prediction[0].item(), prediction[1].item(), prediction[2].item(), prediction[3].item() #获取有效信息
                xmin = x - (w / 2) #由于NMSBoxes缘故，需要从中心点得到左上角点
                ymin = y - (h / 2)
                box = np.array([xmin, ymin, w, h]) #记录数据
                boxes.append(box)
    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.5) #NMS筛选
    detections = []
    for i in indexes:
        j = i.item()
        detections.append({"class_index": class_ids[j], "confidence": confidences[j], "box": boxes[j]}) #储存获取的目标名称和框选位
    for detection in detections:
        box = detection["box"]
        classId = detection["class_index"]
        confidence = detection["confidence"]

The writing of the above inference function has been completed. The following is the main program to run:

#********************主程序***********************#
def MainToSolve(infer):
    img = cv2.imread("boundtest.jpg")  #如果需要实时，只需要将输入img变成从摄像机抓取的帧画面
main(img,infer)

#从这里开始，初始化以及推理
Init()
MainToSolve(infer_request)

When we run this program, we will get Figure 1.3.1.

Figure 1.3.1 Image results

As shown in the figure, the YOLOv5 model is converted into an IR model, and then passes through PPP preprocessing and runtime engine, and then successfully runs on AlxBoard. Overall performance is very good.

1.4 Brief description of model application scenarios

The original Pytorch model only completed the task of image classification. This paper uses YOLOv5 training and uses OpenVINO technology to complete the more difficult task of target detection. By obtaining the location of the object, we can better provide site location information. Used to accurately perform tasks on objects (grab or push)

The four-wheeled car equipped with AlxBoard is shown in Figure 1.4.1.

Figure 1.4.1 AlxBoard smart car

Through this car, we can also use our imagination to create more application scenarios. By empowering the car system with OpenVINO, we can also realize more unique application scenarios such as air-to-ground mapless navigation.

1.5 Conclusion

After the self-training YOLOv5 model is optimized through the OpenVINO Model Optimizer model, it is preprocessed with OpenVINO PreProcess. After processing, OpenVINO Runtime is used for inference. The inference process is simple and clear. Due to the addition of PPP (PrePostProcess) pre-processing technology in the entire inference process, the entire process can be run on the iGPU, effectively reducing CPU overhead. The model optimized through OpenVINO technology has obvious advantages. Together with the AlxBoard developer board, we can quickly build a smart car to verify the system.

OpenVINO is simple and easy to use, and provides complete documentation and OpenVINO Notebooks examples to help developers focus on the implementation of their own applications and algorithm construction.

Self-trained YOLOv5 model optimized using OpenVINO™ and deployed on AIxBoard™