Object recognition based on OpenVINO

Principle of YOLOV5

YOLOv5 is a fast and efficient target detection algorithm with excellent real-time performance and high accuracy. The algorithm uses deep learning technology to achieve end-to-end object detection, and can also achieve excellent performance in the case of limited computing resources

YOLOv5 uses an Anchor-based detection method to perform target detection through a predefined Anchor size in the input image. Compared with the traditional sliding window method, this method can simultaneously perform target detection on feature maps of different scales, which greatly improves the detection efficiency and accuracy

In terms of algorithm implementation, YOLOv5 adopts a lightweight network structure and multi-scale training strategy. Specifically, the algorithm uses CSPDarknet53 as the backbone network, and uses technologies such as Bottleneck residual block and SPP module to further strengthen the network's representation ability and receptive field. In addition, the algorithm also introduces a multi-scale training strategy. By training the model on images of different scales, the model can better adapt to the target detection tasks in different scenarios.

In addition to the optimization of the algorithm itself, YOLOv5 also uses a series of technical means to improve the performance and robustness of the algorithm. For example, using the Mish activation function instead of the traditional ReLU activation function can effectively avoid the gradient disappearance problem; using the DropBlock regularization method can improve the generalization ability and anti-overfitting ability of the model; using AutoAugment data enhancement technology can increase the diversity of data , to further improve the accuracy and robustness of the model

In short, YOLOv5 is a fast, efficient, and highly accurate target detection algorithm. By adopting end-to-end detection methods, Anchor-based detection methods, lightweight network structures and multi-scale training strategies and other optimization measures, it has successfully achieved Efficiently complete object detection tasks with limited computing resources

Environment installation

pip install labelimg
pip install openvino-dev[onnx,tensorflow]==2022.2.0
pip install paddle2onnx==1.0.5 -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install tensorflow-gpu==2.7.0
pip install paddlepaddle

data processing

The classification data we choose here are the seven characters in "Pleasant Goat and Big Big Wolf", and the label names are set as follows:

meiyangyang
xiyangyang
feiyangyang
lanyangyang
huitailang
manyangyang
hongtailang

Here we use 1432 pictures for training, and use LabelImg to start labeling:

After we have processed all the tags, we can process the data format:

First we enter the yolo folder:

The image and information we have just processed are stored in the mask:

When we complete this step, we can start data processing. First, we open CMD in the current directory and run gen.py directly.

Then enter the following path and run the following command:

python yolov5_2_coco.py --dir_path dataset/YOLOV5

环境搭建

我们在 paddle 新建一个环境，将生成的数据打包上传到 paddle 云端即可，并同时在 github 中下载 PaddleYOLO 一同进行上传

我们需要将数据放到 dataset 路径下：

然后我们需要修改我们的配置文件：

数据训练

这里我们修改了训练的轮数和数据分类数，然后在最开始的路径下新建一个 notebook 文件运行如下代码：

当我们全部运行结束后，会生成这两个文件夹：

这里我们子需要下载以下文件夹即可：

这里我们下载好解压出来即可

模型处理

这里我们需要将我们下载的文件放到如下目录中：

由于我们训练出来的模型并不能直接使用，所以我们按照顺序执行如下代码即可：

python prune_paddle_model.py --model_dir ppyoloe_crn_s_80 --model_filename model.pdmodel --params_filename model.pdiparams --output_names tmp_16 concat_14.tmp_0 --save_dir export_model

paddle2onnx --model_dir export_model --model_filename model.pdmodel --params_filename model.pdiparams --input_shape_dict "{'image':[1,3,640,640]}" --opset_version 11 --save_file ppyoloe_crn_s_80.onnx

mo --input_model ppyoloe_crn_s_80.onnx

这样我们就得到了我们需要的模型：

我们将这两个模型文件与我们的代码文件放到我们的 jupyter notebook 路径下引入即可：

然后我们直接运行代码即可，代码如下：

from openvino.runtime import Core
import openvino.runtime as ov
import cv2 as cv
import numpy as np
import tensorflow as tf
import paddle.fluid as fluid
# OpenVINO 模型推理器
class Predictor:
    """
    OpenVINO 模型推理器
    """
    def __init__(self, model_path):
        ie_core = Core()
        model = ie_core.read_model(model=model_path)
        self.compiled_model = ie_core.compile_model(model=model, device_name="CPU")
    def get_inputs_name(self, num):
        return self.compiled_model.input(num)
    
    def get_outputs_name(self, num):
        return self.compiled_model.output(num)
    
    def predict(self, input_data):
        return self.compiled_model([input_data])
    
    def get_request(self):
        return self.compiled_model.create_infer_request()
# 图像预处理
def process_image(input_image, size):
    """输入图片与处理方法，按照PP-Yoloe模型要求预处理图片数据

    Args:
        input_image (uint8): 输入图片矩阵
        size (int): 模型输入大小

    Returns:
        float32: 返回处理后的图片矩阵数据
    """
    max_len = max(input_image.shape)
    img = np.zeros([max_len,max_len,3],np.uint8)
    img[0:input_image.shape[0],0:input_image.shape[1]] = input_image # 将图片放到正方形背景中
    img = cv.cvtColor(img,cv.COLOR_BGR2RGB)  # BGR转RGB
    img = cv.resize(img, (size, size), cv.INTER_NEAREST) # 缩放图片
    img = np.transpose(img,[2, 0, 1]) # 转换格式
    img = img / 255.0 # 归一化
    img = np.expand_dims(img,0) # 增加维度
    return img.astype(np.float32)
# 图像后处理
def process_result(box_results, conf_results):
    """按照PP-Yolove模型输出要求，处理数据，非极大值抑制，提取预测结果

    Args:
        box_results (float32): 预测框预测结果
        conf_results (float32): 置信度预测结果
    Returns:
        float: 预测框
        float: 分数
        int: 类别
    """
    conf_results = np.transpose(conf_results,[0, 2, 1]) # 转置
    # 设置输出形状
    box_results =box_results.reshape(8400,4) 
    conf_results = conf_results.reshape(8400,2)
    scores = []
    classes = []
    boxes = []
    for i in range(8400):
        conf = conf_results[i,:] # 预测分数
        score = np.max(conf) # 获取类别
        # 筛选较小的预测类别
        if score > 0.5:
            classes.append(np.argmax(conf)) 
            scores.append(score) 
            boxes.append(box_results[i,:])
    scores = np.array(scores)
    boxes = np.array(boxes)
    
    result_box = []
    result_score = []
    result_class = []
    # 非极大值抑制筛选重复的预测结果
    if len(boxes) != 0:
        # 非极大值抑制结果
        indexs = tf.image.non_max_suppression(boxes,scores,len(scores),0.25,0.35)
        for i, index in enumerate(indexs):
            result_score.append(scores[index])
            result_box.append(boxes[index,:])
            result_class.append(classes[index])
    # 返回结果
    return np.array(result_box),np.array(result_score),np.array(result_class)
# 画出预测框
def draw_box(image, boxes, scores, classes, labels):
    """将预测结果绘制到图像上

    Args:
        image (uint8): 原图片
        boxes (float32): 预测框
        scores (float32): 分数
        classes (int): 类别
        lables (str): 标签

    Returns:
        uint8: 标注好的图片
    """
    colors = [(0, 0, 255), (0, 255, 0)]
    scale = max(image.shape) / 640.0 # 缩放比例
    if len(classes) != 0:
        for i in range(len(classes)):
            box = boxes[i,:]
            x1 = int(box[0] * scale)
            y1 = int(box[1] * scale)
            x2 = int(box[2] * scale)
            y2 = int(box[3] * scale)
            label = labels[classes[i]]
            score = scores[i]
            cv.rectangle(image, (x1, y1), (x2, y2), colors[classes[i]], 2, cv.LINE_8)
            cv.putText(image,label+":"+str(score),(x1,y1-10),cv.FONT_HERSHEY_SIMPLEX, 0.55, colors[classes[i]], 2)
        
    return image
# 读取标签
def read_label(label_path):
    with open(label_path, 'r') as f:
        labels = f.read().split()
    return labels
# 同步推理
label_path = "labels.txt"
yoloe_model_path = "ppyoloe_crn_s_80.xml"
predictor = Predictor(model_path = yoloe_model_path)
boxes_name = predictor.get_outputs_name(0)
conf_name = predictor.get_outputs_name(1)
labels = read_label(label_path=label_path)
cap = cv.VideoCapture(0)
while cap.isOpened():
    ret, frame = cap.read()
    frame = cv.flip(frame, 180)
    cv.namedWindow("MaskDetection", 0)  # 0可调大小，注意：窗口名必须imshow里面的一窗口名一直
    cv.resizeWindow("MaskDetection", 640, 480)    # 设置长和宽
    input_frame = process_image(frame, 640)
    results = predictor.predict(input_data=input_frame)
    boxes, scores, classes = process_result(box_results=results[boxes_name], conf_results=results[conf_name])
    result_frame = draw_box(image=frame, boxes=boxes, scores=scores, classes=classes, labels=labels)
    cv.imshow('MaskDetection', result_frame)
    key = cv.waitKey(1)
    if key == 27: #esc退出
        break
cap.release()
cv.destroyAllWindows()
# 异步推理
label_path = "labels.txt"
yoloe_model_path = "ppyoloe_crn_s_80.xml"
predictor = Predictor(model_path = yoloe_model_path)
input_layer = predictor.get_inputs_name(0)
labels = read_label(label_path=label_path)
cap = cv.VideoCapture(0)
curr_request = predictor.get_request()
next_request = predictor.get_request()
ret, frame = cap.read()
curr_frame = process_image(frame, 640)
curr_request.set_tensor(input_layer, ov.Tensor(curr_frame))
curr_request.start_async()
while cap.isOpened():
    ret, next_frame = cap.read()
    next_frame = cv.flip(next_frame, 180)
    cv.namedWindow("MaskDetection", 0)  # 0可调大小，注意：窗口名必须imshow里面的一窗口名一直
    cv.resizeWindow("MaskDetection", 640, 480)    # 设置长和宽
    in_frame = process_image(next_frame, 640)
    next_request.set_tensor(input_layer, ov.Tensor(in_frame))
    next_request.start_async()
    if curr_request.wait_for(-1) == 1:
        boxes_name = curr_request.get_output_tensor(0).data
        conf_name = curr_request.get_output_tensor(1).data
        boxes, scores, classes = process_result(box_results=boxes_name, conf_results=conf_name)
        frame = draw_box(image=frame, boxes=boxes, scores=scores, classes=classes, labels=labels)
        cv.imshow('MaskDetection', frame)
    frame = next_frame
    curr_request, next_request = next_request, curr_request
    key = cv.waitKey(1)
    if key == 27: #esc退出
        break
cap.release()
cv.destroyAllWindows()

这里同步推理和异步推理我们只需要使用一个即可
最后我们的运行效果如下: