Rockchip RK3588 development board: virtual machine yolov5 model conversion, python script on the development board calling npu and deploying the whole process

0. Background

Fully localized, using Rockchip rk3588 development board to replace the jetson nano development board.

1. Model transformation

The model conversion step needs to be completed in the virtual machine of the ubuntu20.04 desktop version on the laptop, including two main steps: converting yolov5s.pt to yolov5s.onnx, and converting yolov5s.onnx to yolov5s.rknn.

Mainly refer to the blog " yolov5 article - yolov5 trains pt model and converts it to rknn model, deploys it on RK3588 development board - the whole process from training to deployment "

1.1 Basic environment

Basic environment: Ubuntu 20.04 virtual machine on the x86 platform (note that a computer on the x86 platform is required here, an ordinary laptop will suffice; the system version in the virtual machine is Ubuntu 20.04 desktop version)

1.2 Create python environment
  • Install miniconda in the virtual machine, and then activate the base environment
  • Create a conda environment for python 3.8 (note that the python version here must be 3.8), refer to the following command
conda create -n rk3588 python=3.8
conda activate rk3588
pip install numpy -i https://mirror.baidu.com/pypi/simple
cd ~/Desktop
git clone https://gitcode.net/mirrors/rockchip-linux/rknn-toolkit2.git
pip install -r rknn-toolkit2/doc/requirements_cp38-1.4.0.txt -i https://mirror.baidu.com/pypi/simple
pip install pandas==1.4.* pyyaml matplotlib==3.3.* seaborn -i https://mirror.baidu.com/pypi/simple
1.3 Convert yolov5s.pt to yolov5s.onnx

First, download the yolov5 project code to the desktop (note that the yolov5 project here is actually version v5.0), as follows

cd ~/Desktop
git clone https://gitcode.net/mirrors/ultralytics/yolov5.git
cd yolov5
git reset --hard c5360f6e7009eb4d05f14d1cc9dae0963e949213

Secondly, find the download address of yolov5s.pt from the yolov5 project address , download it with Thunder, and upload yolov5s.pt to the virtual machine ~/Desktop/yolov5/weights directory;

Again, modify the Detect function in ~/Desktop/yolov5/models/yolo.py, as shown in the figure below (note that this part is only used during conversion and cannot be modified during training)
Insert image description here

Again, modify the export_onnx() function in ~/Desktop/yolov5/export.py, as shown below

Insert image description here

Finally, call the following command on the command line. In the weights directory, the yolov5s.onnx file exists:

python export.py --weights weights/yolov5s.pt --img 640 --batch 1 --include onnx
1.4 Convert yolov5s.onnx to yolov5s.rknn

First, download the rknn-toolkit2 project. This step has actually been done in environment preparation.

cd ~/Desktop
git clone https://gitcode.net/mirrors/rockchip-linux/rknn-toolkit2.git

Secondly, install the rknn-toolkit2 environment. This step has actually been done in environment preparation.

cd ~/Desktop/rknn-toolkit2
cd doc && pip install -r requirements_cp38-1.4.0.txt -i https://mirror.baidu.com/pypi/simple

Again, install the rknn-toolkit2 toolkit.

cd ~/Desktop/rknn-toolkit2
cd packages && pip install rknn_toolkit2-1.4.0_22dcfef4-cp38-cp38-linux_x86_64.whl -i https://mirror.baidu.com/pypi/simple

Test whether the installation is successful. Run the python environment in the terminal and enter

from rknn.api import RKNN

Again, copy yolov5s.onnx to the ~/Desktop/rknn-toolkit2/examples/onnx/yolov5 directory, and make some modifications to test.py in this directory, as shown below
Insert image description here
Insert image description here

Finally, execute python test.pyit to get yolov5s.rknn in the same directory.

2. Development board deployment

Using yolov5s.onnx, we run the yolov5 code. It is divided into c version and python version. The following operations are all performed on the development board.

2.1. c version
  • Download the official demo on the rk3588 development board
cd ~/Desktop
git clone https://gitcode.net/mirrors/rockchip-linux/rknpu2.git
  • Modify the file. First enter the rknpu2/examples/rknn_yolov5_demo directory, and then modify the header file postprocess.h in the include file, as shown below
    Insert image description here

Secondly, modify the coco_80_labels_list.txt file in the model directory, change it to your own class and save it, as shown below
Insert image description here

Finally, place the converted rknn file in the model/RK3588 directory, compile and run the shell. After the command is successfully executed, the install directory will be generated.

bash ./build-linux_RK3588.sh

(3) Run demo. Upload yolov5s.rknn to the model/RK3588 directory, put the pictures that need to be inferred in the model directory, and run

cd install/rknn_yolov5_demo_linux
./rknn_yolov5_demo ./model/RK3588/yolov5s.rknn ./model/bus.jpg
2.1. python version (must be python 3.9)

This version of the API mainly refers to the "RKNN Toolkit Lite2 User Guide" .

  • update source
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ jammy main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-updates main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ jammy-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-backports main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ jammy-backports main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-security main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ jammy-security main restricted universe multiverse

Update source:

sudo apt-get update 
  • miniconda installation

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh (online installation)
It is recommended to install offline. Go to the miniconda official website , select the py3.8 version, and download it with Thunder, as follows
Insert image description here

Then upload Miniconda3-py38_23.1.0-1-Linux-aarch64.sh to the ~/Downloads directory of the rk3588 board and perform the installation operation.

bash ./Miniconda3-py38_23.1.0-1-Linux-aarch64.sh
  • Create a python environment, mainly including numpy, opencv, psutils, etc.
conda create -n rk3588 python=3.9
conda activate rk3588
pip install numpy opencv-python -i https://mirror.baidu.com/pypi/simple
  • Download the RKNN Toolkit2 project to the desktop
cd ~/Desktop && git clone https://gitcode.net/mirrors/rockchip-linux/rknn-toolkit2.git
  • Install RKNN Toolkit Lite2 environment
cd rknn-toolkit2/rknn_toolkit_lite2/packages
pip install rknn_toolkit_lite2-1.4.0-cp39-cp39-linux_aarch64.whl -i https://mirror.baidu.com/pypi/simple
  • Add .so file. The main purpose here is to ensure that the python script can call the npu's C script normally.
cd ~/Downloads && git clone https://gitcode.net/mirrors/rockchip-linux/rknpu2.git
sudo cp rknpu2/runtime/RK3588/Linux/librknn_api/aarch64/librknn* /usr/lib
  • test environment. The test cases are in the examples/inference_with_lite directory.
cd rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite
python test.py

The running results are as follows
Insert image description here

  • python script to test yolov5. Create data in the inference_with_lite directory and put the test image into the directory; upload yolov5s.rknn to the inference_with_lite directory; create yolov5.py, perform inference on the test image, and save the result to res.jpg in the same directory. (Reference link https://github.com/ChuanSe/yolov5-PT-to-RKNN/blob/main/detect.py) The code is as follows
import os
import urllib
import traceback
import time
import sys
import numpy as np
import cv2
#from rknn.api import RKNN
import platform
from rknnlite.api import RKNNLite
import multiprocessing

ONNX_MODEL = 'yolov5s.onnx'
RKNN_MODEL = 'yolov5s.rknn'
IMG_PATH = './data/car.png'
DATASET = './dataset.txt'

QUANTIZE_ON = True

OBJ_THRESH = 0.25
NMS_THRESH = 0.45
IMG_SIZE = 640

CLASSES = ("person", "bicycle", "car", "motorbike ", "aeroplane ", "bus ", "train", "truck ", "boat", "traffic light",
           "fire hydrant", "stop sign ", "parking meter", "bench", "bird", "cat", "dog ", "horse ", "sheep", "cow", "elephant",
           "bear", "zebra ", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
           "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife ",
           "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza ", "donut", "cake", "chair", "sofa",
           "pottedplant", "bed", "diningtable", "toilet ", "tvmonitor", "laptop    ", "mouse  ", "remote ", "keyboard ", "cell phone", "microwave ",
           "oven ", "toaster", "sink", "refrigerator ", "book", "clock", "vase", "scissors ", "teddy bear ", "hair drier", "toothbrush ")


# decice tree for rk356x/rk3588
DEVICE_COMPATIBLE_NODE = '/proc/device-tree/compatible'

def get_host():
    # get platform and device type
    system = platform.system()
    machine = platform.machine()
    os_machine = system + '-' + machine
    if os_machine == 'Linux-aarch64':
        try:
            with open(DEVICE_COMPATIBLE_NODE) as f:
                device_compatible_str = f.read()
                if 'rk3588' in device_compatible_str:
                    host = 'RK3588'
                else:
                    host = 'RK356x'
        except IOError:
            print('Read device node {} failed.'.format(DEVICE_COMPATIBLE_NODE))
            exit(-1)
    else:
        host = os_machine
    return host

INPUT_SIZE = 224
RK3588_RKNN_MODEL = 'resnet18_for_rk3588.rknn'


def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def xywh2xyxy(x):
    # Convert [x, y, w, h] to [x1, y1, x2, y2]
    y = np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
    y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
    y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
    return y


def process(input, mask, anchors):

    anchors = [anchors[i] for i in mask]
    grid_h, grid_w = map(int, input.shape[0:2])

    box_confidence = sigmoid(input[..., 4])
    box_confidence = np.expand_dims(box_confidence, axis=-1)

    box_class_probs = sigmoid(input[..., 5:])

    box_xy = sigmoid(input[..., :2])*2 - 0.5

    col = np.tile(np.arange(0, grid_w), grid_w).reshape(-1, grid_w)
    row = np.tile(np.arange(0, grid_h).reshape(-1, 1), grid_h)
    col = col.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    row = row.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    grid = np.concatenate((col, row), axis=-1)
    box_xy += grid
    box_xy *= int(IMG_SIZE/grid_h)

    box_wh = pow(sigmoid(input[..., 2:4])*2, 2)
    box_wh = box_wh * anchors

    box = np.concatenate((box_xy, box_wh), axis=-1)

    return box, box_confidence, box_class_probs


def filter_boxes(boxes, box_confidences, box_class_probs):
    """Filter boxes with box threshold. It's a bit different with origin yolov5 post process!

    # Arguments
        boxes: ndarray, boxes of objects.
        box_confidences: ndarray, confidences of objects.
        box_class_probs: ndarray, class_probs of objects.

    # Returns
        boxes: ndarray, filtered boxes.
        classes: ndarray, classes for boxes.
        scores: ndarray, scores for boxes.
    """
    boxes = boxes.reshape(-1, 4)
    box_confidences = box_confidences.reshape(-1)
    box_class_probs = box_class_probs.reshape(-1, box_class_probs.shape[-1])

    _box_pos = np.where(box_confidences >= OBJ_THRESH)
    boxes = boxes[_box_pos]
    box_confidences = box_confidences[_box_pos]
    box_class_probs = box_class_probs[_box_pos]

    class_max_score = np.max(box_class_probs, axis=-1)
    classes = np.argmax(box_class_probs, axis=-1)
    _class_pos = np.where(class_max_score >= OBJ_THRESH)

    boxes = boxes[_class_pos]
    classes = classes[_class_pos]
    scores = (class_max_score* box_confidences)[_class_pos]

    return boxes, classes, scores


def nms_boxes(boxes, scores):
    """Suppress non-maximal boxes.

    # Arguments
        boxes: ndarray, boxes of objects.
        scores: ndarray, scores of objects.

    # Returns
        keep: ndarray, index of effective boxes.
    """
    x = boxes[:, 0]
    y = boxes[:, 1]
    w = boxes[:, 2] - boxes[:, 0]
    h = boxes[:, 3] - boxes[:, 1]

    areas = w * h
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)

        xx1 = np.maximum(x[i], x[order[1:]])
        yy1 = np.maximum(y[i], y[order[1:]])
        xx2 = np.minimum(x[i] + w[i], x[order[1:]] + w[order[1:]])
        yy2 = np.minimum(y[i] + h[i], y[order[1:]] + h[order[1:]])

        w1 = np.maximum(0.0, xx2 - xx1 + 0.00001)
        h1 = np.maximum(0.0, yy2 - yy1 + 0.00001)
        inter = w1 * h1

        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= NMS_THRESH)[0]
        order = order[inds + 1]
    keep = np.array(keep)
    return keep


def yolov5_post_process(input_data):
    masks = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
    anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
               [59, 119], [116, 90], [156, 198], [373, 326]]

    boxes, classes, scores = [], [], []
    for input, mask in zip(input_data, masks):
        b, c, s = process(input, mask, anchors)
        b, c, s = filter_boxes(b, c, s)
        boxes.append(b)
        classes.append(c)
        scores.append(s)

    boxes = np.concatenate(boxes)
    boxes = xywh2xyxy(boxes)
    classes = np.concatenate(classes)
    scores = np.concatenate(scores)

    nboxes, nclasses, nscores = [], [], []
    for c in set(classes):
        inds = np.where(classes == c)
        b = boxes[inds]
        c = classes[inds]
        s = scores[inds]

        keep = nms_boxes(b, s)

        nboxes.append(b[keep])
        nclasses.append(c[keep])
        nscores.append(s[keep])

    if not nclasses and not nscores:
        return None, None, None

    boxes = np.concatenate(nboxes)
    classes = np.concatenate(nclasses)
    scores = np.concatenate(nscores)

    return boxes, classes, scores


def draw(image, boxes, scores, classes):
    """Draw the boxes on the image.

    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        top, left, right, bottom = box
        print('class: {}, score: {}'.format(CLASSES[cl], score))
        print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(top, left, right, bottom))
        top = int(top)
        left = int(left)
        right = int(right)
        bottom = int(bottom)

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 2)


def letterbox(im, new_shape=(640, 640), color=(0, 0, 0)):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)

def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):
    # 将预测的坐标信息coords(相对img1_shape)转换回相对原图尺度(img0_shape)
    #:param img1_shape: 缩放后的图像大小  [H, W]=[384, 512]
    #:param coords: 预测的box信息 [7,4]  [anchor_nums, x1y1x2y2] 这个预测信息是相对缩放后的图像尺寸(img1_shape)的
    #:param img0_shape: 原图的大小  [H, W, C]=[375, 500, 3]
    #:param ratio_pad: 缩放过程中的缩放比例以及pad  一般不传入
    #:return: coords: 相对原图尺寸(img0_shape)的预测信息

    # Rescale coords (xyxy) from img1_shape to img0_shape
    if ratio_pad is None:  # calculate from img0_shape
        # gain = old/new = 1.024  max(img1_shape): 求img1的较长边  这一步对应的是之前的letterbox步骤
        gain = max(img1_shape) / max(img0_shape)
        # wh padding 这一步起不起作用,完全取决于letterbox的方式
        # 当letterbox为letter_pad_img时,pad=(0.0, 64.0); 当letterbox为leeter_img时,pad=(0.0, 0.0)
        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    # 将相对img1的预测信息缩放得到相对原图img0的预测信息
    coords[:, [0, 2]] -= pad[0]  # x padding
    coords[:, [1, 3]] -= pad[1]  # y padding
    coords[:, :4] /= gain  # 缩放
    # 缩放到原图的预测结果,并对预测值进行了一定的约束,防止预测结果超出图像的尺寸
    clip_coords(coords, img0_shape)
    return coords

def clip_coords(boxes, img_shape):
    # Clip bounding xyxy bounding boxes to image shape (height, width)
    # np.clip(c, a, b): 将矩阵c中所有的元素约束在[a, b]中间
    # 如果某个元素小于a,就将这个元素变为a;如果元素大于b,就将这个元素变为b
    # 这里将预测得到的xyxy做个约束,是因为当物体处于图片边缘的时候,预测值是有可能超过图片大小的
    #:param boxes: 函数开始=>缩放到原图的预测结果[7, 4]
    # 函数结束=>缩放到原图的预测结果,并对预测值进行了一定的约束,防止预测结果超出图像的尺寸
    #:param img_shape: 原图的shape [H, W, C]=[375, 500, 3]

    boxes[:, 0] = np.clip(boxes[:, 0], 0, img_shape[1])  # x1
    boxes[:, 1] = np.clip(boxes[:, 1], 0, img_shape[0])  # y1
    boxes[:, 2] = np.clip(boxes[:, 2], 0, img_shape[1])  # x2
    boxes[:, 3] = np.clip(boxes[:, 3], 0, img_shape[0])  # y2

def yolov5Detection(roundNum):
    print('当前进程ID:{}'.format(os.getpid()))
    #host_name = get_host()
    rknn_model = 'yolov5s.rknn'

    # Create RKNN object
    #rknn = RKNN(verbose=True)
    #rknn_lite = RKNNLite(verbose=True) # 详细日志显示在终端上
    rknn_lite = RKNNLite()
    
    # load RKNN model
    print('--> Load RKNN model')
    ret = rknn_lite.load_rknn(rknn_model)
    if ret != 0:
        print('Load RKNN model failed')
        exit(ret)
    print('done')

    # Init runtime environment
    print('--> Init runtime environment')
    #ret = rknn.init_runtime()
    ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_AUTO)
    # ret = rknn.init_runtime('rk3566')
    if ret != 0:
        print('Init runtime environment failed!')
        exit(ret)
    print('done')



    starttime = time.time()
    for ii in range(roundNum):
        print("进程{},执行第{}轮推理".format(os.getpid(), ii+1))
        # Set inputs
        img0 = cv2.imread(IMG_PATH)
        img = img0.copy()
        img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
    
        # Inference
        print('--> Running model')
        outputs = rknn_lite.inference(inputs=[img])
        #np.save('./onnx_yolov5_0.npy', outputs[0])
        #np.save('./onnx_yolov5_1.npy', outputs[1])
        #np.save('./onnx_yolov5_2.npy', outputs[2])
        print('done')
        
    
        # post process
        input0_data = outputs[0]
        input1_data = outputs[1]
        input2_data = outputs[2]
    
        input0_data = input0_data.reshape([3, -1]+list(input0_data.shape[-2:]))
        input1_data = input1_data.reshape([3, -1]+list(input1_data.shape[-2:]))
        input2_data = input2_data.reshape([3, -1]+list(input2_data.shape[-2:]))
    
        input_data = list()
        input_data.append(np.transpose(input0_data, (2, 3, 0, 1)))
        input_data.append(np.transpose(input1_data, (2, 3, 0, 1)))
        input_data.append(np.transpose(input2_data, (2, 3, 0, 1)))
    
        boxes, classes, scores = yolov5_post_process(input_data) # 此时检测框为缩放后的尺寸
        img1_shape = img.shape  # letterbox缩放后的图片尺寸
        img0_shape = img0.shape  # 原始图片尺寸
        boxes = self.scale_coords(img1_shape, boxes, img0_shape)  # 将缩放后图片上的预测结果,调整到原图片尺寸上

    
        #img_1 = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
        img_1 = img0.copy()
        if boxes is not None:
            draw(img_1, boxes, scores, classes) # 在原图上做检测框
            #cv2.imwrite('res.jpg', img_1)
        # show output
        # cv2.imshow("post process result", img_1)
        # cv2.waitKey(0)
        # cv2.destroyAllWindows()
        #time.sleep(0.001)
    
    endtime = time.time()
    print("进程Pid:{}, 总耗时{}秒,单轮平均耗时{}秒".format(os.getpid(), endtime-starttime, (endtime-starttime) / float(roundNum)))

    rknn_lite.release()


    

if __name__ == '__main__':
    roundNum = 1000
    total = 9
    processes = []
    for i in range(total):
        myprocess = multiprocessing.Process(target=yolov5Detection,args=(roundNum,))
        processes.append(myprocess)
    for i in range(total):
        processes[i].daemon = True
        processes[i].start()
    
    for _ in range(roundNum):
        print('主进程pid:{},当前共有{}个子进程'.format(os.getpid(), total))
        time.sleep(1)

3. Performance testing

The following test is a 1000-cycle yolov5 image reading, inference, post-processing and other steps. The inference speed below is the total time-consuming of the complete process of single reading, inference and post-processing.
Insert image description here

Guess you like

Origin blog.csdn.net/qq_30841655/article/details/129836860