1. Environment installation

The OpenVINO ^TM toolkit version 2022.1 was officially released on March 22, 2022. Compared with previous versions, major innovations have taken place. It provides preprocessing API functions, ONNX front-end APIs, and AUTO device plug-ins, and supports direct reading of the flying propeller model. Inference supports dynamically changing the shape of the model, which greatly promotes the application of different networks. On September 23, 2022, OpenVINO ^TM Toolkit Version 2022.2 launched, fine-tuning 2022.1 to include support for Intel's latest CPUs and discrete GPUs to enable even more AI innovations and opportunities.

^{The OpenVINO TM} 2022.2 version is selected here . For the Python version, we can directly use the PIP command to install it. It is recommended to use Anaconda to create a virtual environment for installation. For the latest version, enter the following command directly in the created virtual environment to install:

// 更新pip
python -m pip install --upgrade pip
// 安装
pip install openvino-dev[ONNX,tensorflow2]==2022.2.0

If there is an error in downloading the installation package or the network during the installation process, you can re-run the installation command and continue the previous installation.

2. Create a reasoning class Predictor

from openvino.runtime import Core
class Predictor:
    """
    OpenVINO 模型推理器
    """
    def __init__(self, model_path):
        ie_core = Core()
        model = ie_core.read_model(model=model_path)
        self.compiled_model = ie_core.compile_model(model=model, device_name="CPU")
    def get_inputs_name(self, num):
        return self.compiled_model.input(num)
    
    def get_outputs_name(self, num):
        return self.compiled_model.output(num)
    
    def predict(self, input_data):
        return self.compiled_model([input_data])

Since only the PP-YOLOE model reasoning is performed here, the Predictor class is simply encapsulated: it mainly includes an initialization function, which is responsible for reading the local model and loading it into the specified device; the function of obtaining the input and output names and the model prediction function.

3. Data processing method

3.1 Input image preprocessing

def process_image(input_image, size):
    """输入图片与处理方法，按照PP-Yoloe模型要求预处理图片数据

    Args:
        input_image (uint8): 输入图片矩阵
        size (int): 模型输入大小

    Returns:
        float32: 返回处理后的图片矩阵数据
    """
    max_len = max(input_image.shape)
    img = np.zeros([max_len,max_len,3],np.uint8)
    img[0:input_image.shape[0],0:input_image.shape[1]] = input_image # 将图片放到正方形背景中
    img = cv.cvtColor(img,cv.COLOR_BGR2RGB)  # BGR转RGB
    img = cv.resize(img, (size, size), cv.INTER_NEAREST) # 缩放图片
    img = np.transpose(img,[2, 0, 1]) # 转换格式
    img = img / 255.0 # 归一化
    img = np.expand_dims(img,0) # 增加维度
    return img

According to the input requirements of the PP-YOLOE model, image data is processed, mainly including image channel conversion, image scaling, transformation matrix, data normalization, and increasing matrix dimensions. According to the input settings of the PP-YOLOE model, the normalization method is to directly divide the pixel points by 255, and integrate the input data between 0 and 1 to speed up the calculation of the model. The ONNX format of the PP-YOLOE model only supports the inference of bath_size=1, so in the end, the dimension of the data matrix can be directly increased by one dimension.

3.2 Model output processing

def process_result(box_results, conf_results):
    """按照PP-Yolove模型输出要求，处理数据，非极大值抑制，提取预测结果

    Args:
        box_results (float32): 预测框预测结果
        conf_results (float32): 置信度预测结果
    Returns:
        float: 预测框
        float: 分数
        int: 类别
    """
    conf_results = np.transpose(conf_results,[0, 2, 1]) # 转置
    # 设置输出形状
    box_results =box_results.reshape(8400,4) 
    conf_results = conf_results.reshape(8400,80)
    scores = []
    classes = []
    boxes = []
    for i in range(8400):
        conf = conf_results[i,:] # 预测分数
        score = np.max(conf) # 获取类别
        # 筛选较小的预测类别
        if score > 0.5:
            classes.append(np.argmax(conf)) 
            scores.append(score) 
            boxes.append(box_results[i,:])
    scores = np.array(scores)
    boxes = np.array(boxes)
    # 非极大值抑制筛选重复的预测结果
    indexs = tf.image.non_max_suppression(boxes,scores,len(scores),0.25,0.35)
    # 处理非极大值抑制后的结果
    result_box = []
    result_score = []
    result_class = []
    for i, index in enumerate(indexs):
        result_score.append(scores[index])
        result_box.append(boxes[index,:])
        result_class.append(classes[index])
    # 返沪结果转为矩阵
    return np.array(result_box),np.array(result_score),np.array(result_class)

Since the PP-YOLOE we used has been trimmed by us, the output of the model is the unprocessed result data. There are two model output nodes, one is the output of the prediction frame, and the other is the output of the confidence value, so the output needs to be adjusted later. The results are processed.

The output shape of the confidence result is [1, 80, 8400], and the actual 80 represents the confidence value of 80 categories corresponding to a prediction result, and 8400 means that there are 8400 prediction results; and the output result of the prediction box is [1 , 8400, 4], corresponding to the prediction frame of 8400 prediction results, where 4 represents the horizontal and vertical coordinates of the upper left vertex and the lower right vertex of the prediction frame.

Therefore, the result processing mainly includes the following aspects:

The confidence result is transposed, and the category with the largest prediction result, the prediction score and the corresponding prediction box are extracted;
Non-maximum suppression extracts predicted boxes and categories.

3.3 Drawing prediction results

def draw_box(image, boxes, scores, classes, lables):
    """将预测结果绘制到图像上

    Args:
        image (uint8): 原图片
        boxes (float32): 预测框
        scores (float32): 分数
        classes (int): 类别
        lables (str): 标签

    Returns:
        uint8: 标注好的图片
    """
    scale = max(image.shape) / 640.0 # 缩放比例
    for i in range(len(classes)):
        box = boxes[i,:]

        x1 = int(box[0] * scale)
        y1 = int(box[1] * scale)
        x2 = int(box[2] * scale)
        y2 = int(box[3] * scale)
        
        lable = lables[classes[i]]
        score = scores[i]
        cv.rectangle(image, (x1, y1), (x2, y2), (0,0,255), 2, cv.LINE_8)
        cv.putText(image,lable+":"+str(score),(x1,y1-10),cv.FONT_HERSHEY_SIMPLEX, 0.55, (0, 0, 255), 2)
        
    return image

After the result processing in the previous step, the prediction frame, score and category are finally obtained, and finally the prediction result is drawn on the picture through OpenCV, which is mainly two steps of drawing a prediction frame and writing the score and category.

4. Model reasoning

    '''-------------------1. 导入相关信息 ----------------------'''
    # yoloe_model_path = "E:/Text_Model/pp-yoloe/ppyoloe_plus_crn_s_80e_coco.onnx"
    yoloe_model_path = "E:/Text_Model/pp-yoloe/ppyoloe_plus_crn_s_80e_coco.xml"
    image_path = "E:/Text_dataset/YOLOv5/0001.jpg"
    lable_path = "E:/Git_space/基于OpenVINO部署PP-YOLOE模型/model/lable.txt";
    '''-------------------2. 创建模型预测器 ----------------------'''
    predictor = Predictor(model_path = yoloe_model_path)
    '''-------------------3. 预处理模型输入数据 ----------------------'''
    image = cv.imread(image_path)
    input_image = process_image(image, 640)
    '''-------------------4. 模型推理 ----------------------'''
    results = predictor.predict(input_data=input_image)
    '''-------------------5. 后处理预测结果 ----------------------'''
    boxes_name = predictor.get_outputs_name(0)
    conf_name = predictor.get_outputs_name(1)
    
    boxes, scores, classes = process_result(box_results=results[boxes_name], conf_results=results[conf_name]) # 处理结果
    lables = read_lable(lable_path=lable_path) # 读取lable
    result_image = draw_box(image=image, boxes=boxes, scores=scores, classes=classes, lables=lables) # 绘制结果
    cv.imshow("result",result_image)
    cv.waitKey(0)

According to the model inference process, finally call the model inference class for implementation:

Import related information: mainly define the model address, the address of the image to be predicted and the category file;
Create a model predictor: mainly initialize the prediction class, read the local model, here you can read the ONNX model and the IR model in two formats;
Preprocessing images: call the defined image processing method to convert local image data into data for model inference;
Model reasoning: load the processed image data into the model, and obtain model reasoning results;
Process model results: mainly by calling the result processing method. If visualization is required, the prediction results can be drawn into pictures.

Deploying the PaddlePadle-YOLOE model based on OpenVINO—4.Python implementation