[Project Deployment] Teach you step by step how to deploy OCR services on RKNN (Part 1)

Preface

I have been exploring how to deploy OCR models on RKNN some time ago. It took me a long time to finally deploy the model to the NPU of rv1126. Although the process was not very difficult, I had encountered some problems during OCR deployment in the RKNN group. Questions, students often see the chat records and add me to QQ to ask related questions. Taking advantage of the recent weekend, I plan to write an article to record the whole process of my deployment, as a sharing, and to facilitate future review and deepen my impression.

1. Preparation work

Hardware support :

  • PC host (ubuntu) system
  • rv1126 board
  • Double-ended USB cable (for adb debugging between PC and board)

Software support :

  • Set up the rknn-toolkit environment on PC
  • Refresh the system with rv1126 board
  • rv1126 sdk package

Official document reference: rv1126 Development Guide

First, you need to set up the rknn-toolkit environment in the ubuntu system. The establishment is relatively simple. The rknn-toolkit version I use here is version 1.6.0, which is provided in the /external/rknn-toolkit folder in the rv1126 SDK package. You can also download it from the official hub , but the downloaded whl package is the latest version. Pay attention to the package name. It only supports Linux and python3.6 or python3.5 environments, so it cannot be used under Windows.
Insert image description here
Create a conda virtual environment and execute the following installation command:

#创建名为rknn,python环境为3.6的虚拟环境
conda create -n rknn python=3.6

# 安装你需要安装的环境包,此处省略
pip install  ...

# 安装 RKNN-Toolkit
pip install rknn_toolkit-1.6.0-cp36-cp36m-linux_x86_64.whl

# 检查是否安装成功,import rknn 库
(rknn) rk@rk:~/rknn-toolkit-v1.6.0/package$  python
>>> from rknn.api import RKNN
>>>

For rv1126SDK development kit and board flashing system, please refer to my previous article: SDK environment preparation and system burning . For other boards, you can solve it yourself.

2. Model conversion and quantification

Here I have prepared three onnx models, all based on PytorchOCR. One is a lightweight DBNet text detection, which is converted from paddleocr weight to PytorchOCR and then to onnx. The other is a CRNN recognition model, which is also converted from paddleocr to PytorchOCR. , the last one is a recognition model trained by myself, because as mentioned before, the current RKNN model does not support LSTM operator conversion. I tried to convert it to RKNN through the pt file of jit::torch, but the output effect after conversion It is wrong and cannot be quantified, so I designed a recognition network for the deployment platform. For the design process, please refer to my previous article: Deploying CRNN model using RKNN to step on the pit optimization process . What I released here is a relatively small version with universal recognition effect. CRNN is not as good as paddleocr. The main reason is that the training data is not as good as paddleocr. However, it is better to use in some specific scenarios than paddleocr because the training data is created for the scene and many common fonts are added. , general recognition will not be too bad. If it can meet the requirements of your own scene, you can use it. The important thing is that it is very fast after quantization, and after turning on pre-compilation, the model is only 1.5M after quantization, and inference only takes a few milliseconds. , what more bikes do you want?
Insert image description here
For the weight file, see the Baidu Cloud link at the end of the article.

Related reference:
https://github.com/PaddlePaddle/Paddle2ONNX

https://github.com/PaddlePaddle/PaddleOCR

https://github.com/WenmuZhou/PytorchOCR

The rknn model quantification is completed during model conversion, so it is necessary to prepare a batch of pictures for quantification of the detection model and a batch of models for quantification of the recognition model. I have prepared hundreds of pictures of the detection model and two of the recognition model. About a thousand pictures, and then write the image path to a txt file respectively for loading during quantification:
Insert image description here
rknn model conversion provides two methods, one is python code conversion, and the other is interface conversion. If you want to convert the interface, install rknn After setting up the environment, you need to enter in the terminal: python3 -m rknn.bin.visualization , and then the following interface will appear:
Insert image description here
Choose according to your original model format. The original model here is the onnx model, so select onnx. After entering, the meaning of each option is your own. It is not difficult to understand if you translate it and compare it (the interface of version rknn toolkit 1.7.1 seems to be in Chinese). What needs to be paid attention to is the pre-compilation option (Whether To Enable Pre-Compile). Pre-compiling the RKNN model can reduce the model initialization time, but it cannot pass the simulation. processor for reasoning or performance evaluation:
Insert image description here
The second way to convert is through python code conversion. The code is as follows, modify as needed:

import os
from rknn.api import RKNN
import numpy as np
import onnxruntime as ort

onnx_model = 'model/repvgg_s.onnx' #onnx路径
save_rknn_dir = 'model/repvgg_s.rknn'#rknn保存路径

def norm(img):
    mean = 0.5
    std = 0.5
    img_data = (img.astype(np.float32)/255 - mean) / std
    return img_data

if __name__ == '__main__':

    # Create RKNN object
    rknn = RKNN(verbose=True)

    image = np.random.randn(1,3,32,448).astype(np.float32)             # 创建一个np数组,分别用onnx和rknn推理看转换后的输出差异,检测模型输入是1,3,640,640 ,识别模型输入是1,3,32,448
    onnx_net = ort.InferenceSession(onnx_model)                         # onnx推理

    onnx_infer = onnx_net.run(None, {
    
    'input': norm(image)})                    # 如果是paddle2onnx转出来的模型输入名字默认是 "x"

    # pre-process config
    print('--> Config model')
    rknn.config(mean_values=[[127.5, 127.5, 127.5]], std_values=[[127.5, 127.5, 127.5]], reorder_channel='2 1 0', target_platform=['rv1126'], batch_size=4,quantized_dtype='asymmetric_quantized-u8')  # 需要输入为RGB#####需要转化一下均值和归一化的值
    # rknn.config(mean_values=[[0.0, 0.0, 0.0]], std_values=[[255, 255, 255]], reorder_channel='2 1 0', target_platform=['rv1126'], batch_size=1)  # 需要输入为RGB
    print('done')

    # model_name = onnx_model[onnx_model.rfind('/') + 1:]
    # Load ONNX model
    print('--> Loading model %s' % onnx_model)
    ret = rknn.load_onnx(model=onnx_model)
    if ret != 0:
        print('Load %s failed!' % onnx_model)
        exit(ret)
    print('done')
    # Build model
    print('--> Building model')
    # rknn.build(do_quantization=False)
    ret = rknn.build(do_quantization=True, dataset='dataset_448.txt', pre_compile=False)
    #do_quantization是否对模型进行量化,datase量化校正数据集,pre_compil模型预编译开关,预编译 RKNN 模型可以减少模型初始化时间,但是无法通过模拟器进行推理或性能评估
    if ret != 0:
        print('Build net failed!')
        exit(ret)
    print('done')

    # Export RKNN model
    print('--> Export RKNN model')
    ret = rknn.export_rknn(save_rknn_dir)
    if ret != 0:
        print('Export rknn failed!')
        exit(ret)

    ret = rknn.init_runtime(target='rv1126',device_id="a0c4f1cae341b3df")            # 两个参数分别是板子型号和device_id,device_id在双头usb线连接后通过 adb devices查看
    if ret != 0:
        print('init runtime failed.')
        exit(ret)
    print('done')

    # Inference
    print('--> Running model')
    outputs = rknn.inference(inputs=[image])

    # perf
    print('--> Begin evaluate model performance')
    perf_results = rknn.eval_perf(inputs=[image])              # 模型评估
    print('done')
    print()

    print("->>模型前向对比!")
    print("--------------rknn outputs--------------------")
    print(outputs[0])
    print()

    print("--------------onnx outputs--------------------")
    print(onnx_infer[0])
    print()

    std = np.std(outputs[0]-onnx_infer[0])
    print(std)                                # 如果这个值比较大的话,说明模型转换后不太理想

    rknn.release()

There should be no problem in converting quantization using the det_new.onnx and repvgg_s.onnx I provided. Rec_mbv3.onnx cannot be converted. If you are interested, you can try it yourself. Note that I turned on pre-compilation when detecting model conversion, but the effect was very poor. Therefore, the following detection results are obtained when precompilation is not turned on, and the model loading time will be longer.

The following is the quantification process. It may take longer if quantification is turned on:
Insert image description here
the test results after successful quantification, and finally the variance is calculated using the output of onnx and the output of rknn:
Insert image description here

The effect after quantification of the detection model, model 640 x 640 fixed input:
Insert image description here
detection model rknn test code:

import numpy as np
import cv2
from rknn.api import RKNN
import torch
# from label_convert import CTCLabelConverter

import cv2
import numpy as np
import pyclipper
from shapely.geometry import Polygon
from matplotlib import pyplot as plt

class DBPostProcess():
    def __init__(self, thresh=0.3, box_thresh=0.7, max_candidates=1000, unclip_ratio=2):
        self.min_size = 3
        self.thresh = thresh
        self.box_thresh = box_thresh
        self.max_candidates = max_candidates
        self.unclip_ratio = unclip_ratio

    def __call__(self, pred, h_w_list, is_output_polygon=False):
        '''
        batch: (image, polygons, ignore_tags
        h_w_list: 包含[h,w]的数组
        pred:
            binary: text region segmentation map, with shape (N, 1,H, W)
        '''
        pred = pred[:, 0, :, :]
        segmentation = self.binarize(pred)
        boxes_batch = []
        scores_batch = []
        for batch_index in range(pred.shape[0]):
            height, width = h_w_list[batch_index]
            boxes, scores = self.post_p(pred[batch_index], segmentation[batch_index], width, height,
                                        is_output_polygon=is_output_polygon)
            boxes_batch.append(boxes)
            scores_batch.append(scores)
        return boxes_batch, scores_batch

    def binarize(self, pred):
        return pred > self.thresh

    def post_p(self, pred, bitmap, dest_width, dest_height, is_output_polygon=False):
        '''
        _bitmap: single map with shape (H, W),
            whose values are binarized as {0, 1}
        '''
        height, width = pred.shape
        boxes = []
        new_scores = []
        # bitmap = bitmap.cpu().numpy()
        if cv2.__version__.startswith('3'):
            _, contours, _ = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
        if cv2.__version__.startswith('4'):
            contours, _ = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
        for contour in contours[:self.max_candidates]:
            epsilon = 0.005 * cv2.arcLength(contour, True)
            approx = cv2.approxPolyDP(contour, epsilon, True)
            points = approx.reshape((-1, 2))
            if points.shape[0] < 4:
                continue
            score = self.box_score_fast(pred, contour.squeeze(1))
            if self.box_thresh > score:
                continue
            if points.shape[0] > 2:
                box = self.unclip(points, unclip_ratio=self.unclip_ratio)
                if len(box) > 1:
                    continue
            else:
                continue
            four_point_box, sside = self.get_mini_boxes(box.reshape((-1, 1, 2)))
            if sside < self.min_size + 2:
                continue
            if not isinstance(dest_width, int):
                dest_width = dest_width.item()
                dest_height = dest_height.item()
            if not is_output_polygon:
                box = np.array(four_point_box)
            else:
                box = box.reshape(-1, 2)
            box[:, 0] = np.clip(np.round(box[:, 0] / width * dest_width), 0, dest_width)
            box[:, 1] = np.clip(np.round(box[:, 1] / height * dest_height), 0, dest_height)
            boxes.append(box)
            new_scores.append(score)
        return boxes, new_scores

    def unclip(self, box, unclip_ratio=1.5):
        poly = Polygon(box)
        distance = poly.area * unclip_ratio / poly.length
        offset = pyclipper.PyclipperOffset()
        offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
        expanded = np.array(offset.Execute(distance))
        return expanded

    def get_mini_boxes(self, contour):
        bounding_box = cv2.minAreaRect(contour)
        points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])

        index_1, index_2, index_3, index_4 = 0, 1, 2, 3
        if points[1][1] > points[0][1]:
            index_1 = 0
            index_4 = 1
        else:
            index_1 = 1
            index_4 = 0
        if points[3][1] > points[2][1]:
            index_2 = 2
            index_3 = 3
        else:
            index_2 = 3
            index_3 = 2

        box = [points[index_1], points[index_2], points[index_3], points[index_4]]
        return box, min(bounding_box[1])

    def box_score_fast(self, bitmap, _box):
        # bitmap = bitmap.detach().cpu().numpy()
        h, w = bitmap.shape[:2]
        box = _box.copy()
        xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1)
        xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1)
        ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1)
        ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1)

        mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
        box[:, 0] = box[:, 0] - xmin
        box[:, 1] = box[:, 1] - ymin
        cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
        return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]


def narrow_224_32(image, expected_size=(224,32)):
    ih, iw = image.shape[0:2]
    ew, eh = expected_size
    # scale = eh / ih
    scale = min((eh/ih),(ew/iw))
    # scale = eh / max(iw,ih)
    nh = int(ih * scale)
    nw = int(iw * scale)
    image = cv2.resize(image, (nw, nh), interpolation=cv2.INTER_CUBIC)

    top = 0
    bottom = eh - nh
    left = 0
    right = ew - nw

    new_img = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114))
    return image,new_img

def draw_bbox(img_path, result, color=(0, 0, 255), thickness=2):
    import cv2
    if isinstance(img_path, str):
        img_path = cv2.imread(img_path)
        # img_path = cv2.cvtColor(img_path, cv2.COLOR_BGR2RGB)
    img_path = img_path.copy()
    for point in result:
        point = point.astype(int)
        cv2.polylines(img_path, [point], True, color, thickness)
    return img_path

if __name__ == '__main__':

    post_proess = DBPostProcess()
    is_output_polygon = False
    # Create RKNN object
    rknn = RKNN()

    ret = rknn.load_rknn('./model/det_new.rknn')

    # Set inputs
    img = cv2.imread('./idcard/2.jpg')
    origin_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img0 ,image = narrow_224_32(img,expected_size=(640,640))

    # init runtime environment
    print('--> Init runtime environment')
    ret = rknn.init_runtime(target='rv1126',device_id="a0c4f1cae341b3df")
    if ret != 0:
        print('Init runtime environment failed')
        exit(ret)
    print('done')

    # Inference
    print('--> Running model')
    outputs = rknn.inference(inputs=[image])

    # perf
    print('--> Begin evaluate model performance')
    perf_results = rknn.eval_perf(inputs=[image])
    print('done')

    feat_2 = torch.from_numpy(outputs[0])
    print(feat_2.size())
    box_list, score_list = post_proess(outputs[0], [image.shape[:2]], is_output_polygon=is_output_polygon)
    box_list, score_list = box_list[0], score_list[0]
    if len(box_list) > 0:
        idx = [x.sum() > 0 for x in box_list]
        box_list = [box_list[i] for i, v in enumerate(idx) if v]
        score_list = [score_list[i] for i, v in enumerate(idx) if v]
    else:
        box_list, score_list = [], []

    img = draw_bbox(image, box_list)
    img = img[0:img0.shape[0],0:img0.shape[1]]
    cv2.imshow("img",img)
    cv2.waitKey()
    rknn.release()

The quantified effect of the recognition model, model 32 x 448 fixed input:
Insert image description here
recognition model rknn test code:

import numpy as np
import cv2
from rknn.api import RKNN
import torch
# from label_convert import CTCLabelConverter

class CTCLabelConverter(object):
    """ Convert between text-label and text-index """

    def __init__(self, character):
        # character (str): set of the possible characters.
        dict_character = []
        with open(character, "rb") as fin:
            lines = fin.readlines()
            for line in lines:
                line = line.decode('utf-8').strip("\n").strip("\r\n")
                dict_character += list(line)
        # dict_character = list(character)

        self.dict = {
    
    }
        for i, char in enumerate(dict_character):
            # NOTE: 0 is reserved for 'blank' token required by CTCLoss
            self.dict[char] = i + 1
        #TODO replace ‘ ’ with special symbol
        self.character = ['[blank]'] + dict_character+[' ']  # dummy '[blank]' token for CTCLoss (index 0)

    def encode(self, text, batch_max_length=None):
        """convert text-label into text-index.
        input:
            text: text labels of each image. [batch_size]
        output:
            text: concatenated text index for CTCLoss.
                    [sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
            length: length of each text. [batch_size]
        """
        length = [len(s) for s in text]
        # text = ''.join(text)
        # text = [self.dict[char] for char in text]
        d = []
        batch_max_length = max(length)
        for s in text:
            t = [self.dict[char] for char in s]
            t.extend([0] * (batch_max_length - len(s)))
            d.append(t)
        return (torch.tensor(d, dtype=torch.long), torch.tensor(length, dtype=torch.long))

    def decode(self, preds, raw=False):
        """ convert text-index into text-label. """
        preds_idx = preds.argmax(axis=2)
        preds_prob = preds.max(axis=2)
        result_list = []
        for word, prob in zip(preds_idx, preds_prob):
            if raw:
                result_list.append((''.join([self.character[int(i)] for i in word]), prob))
            else:
                result = []
                conf = []
                for i, index in enumerate(word):
                    if word[i] != 0 and (not (i > 0 and word[i - 1] == word[i])):
                        result.append(self.character[int(index)])
                        conf.append(prob[i])
                result_list.append((''.join(result), conf))
        return result_list


def narrow_224_32(image, expected_size=(224,32)):
    ih, iw = image.shape[0:2]
    ew, eh = expected_size
    scale = eh / ih
    # scale = eh / max(iw,ih)
    nh = int(ih * scale)
    nw = int(iw * scale)
    image = cv2.resize(image, (nw, nh), interpolation=cv2.INTER_CUBIC)
    top = 0
    bottom = eh - nh - top
    left = 0
    right = ew - nw - left

    new_img = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114))
    return new_img


if __name__ == '__main__':

    dict_path = r"./dict/dict_text.txt"
    converter = CTCLabelConverter(dict_path)
    # Create RKNN object
    rknn = RKNN()

    ret = rknn.load_rknn('./model/repvgg_s.rknn')

    # Set inputs
    img = cv2.imread('crnn_img/33925.jpg')
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    origin_img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
    image = narrow_224_32(img,expected_size=(448,32))


    # init runtime environment
    print('--> Init runtime environment')
    ret = rknn.init_runtime(target='rv1126',device_id="a0c4f1cae341b3df")
    if ret != 0:
        print('Init runtime environment failed')
        exit(ret)
    print('done')

    # Inference
    print('--> Running model')
    outputs = rknn.inference(inputs=[image])

    # perf
    print('--> Begin evaluate model performance')
    perf_results = rknn.eval_perf(inputs=[image])
    print('done')

    feat_2 = torch.from_numpy(outputs[0])
    # print(feat_2.size())
    #
    txt = converter.decode(feat_2.detach().cpu().numpy())
    print(txt)
    cv2.imshow("img",img)
    cv2.waitKey()
    rknn.release()

At this point, the model conversion has been successful. The article is too long. Let’s move on to the next article: [Project Deployment] Teach you step by step how to deploy OCR services on RKNN (Part 2) . The next article mainly adds some code implementation and writing. It is a bit long and has not been written. over. . .

Baidu Cloud link: https://pan.baidu.com/s/1jSirZT2LBOWQxohCEORp5g Password: vrjk , the files provided are:
Insert image description here
The first rknn model is a pre-compiled conversion model turned on, the model is smaller, and the model is faster during initialization It loads faster. The pre-compiled conversion accuracy of the detection model is too low, so it is not provided. You can try it yourself. rec_mbv3.onnx cannot be transferred, so there is no corresponding rknn file. ppocr_keys_v1.txt is the keys of the ppocr recognition model rec_mbv3.onnx. File, dict_text.txt is the keys file of repvgg_s.onnx that I trained.

おすすめ

転載: blog.csdn.net/qq_39056987/article/details/123574943
おすすめ