Environment installation

First of all, install the Rockchip Rv1126 SDK on the virtual machine. It is important to have rknn_toolchain
generally in the following path:

sdk/external/rknn-toolkit

Follow the steps in the doc to install.
The project code is at: https://github.com/liuyuan000/rv1126_mnist/tree/main/RKNN_Cpp/rknn_mnist_demo

model training

First, take the classification of MNIST handwritten digits as an example to illustrate the full-process deployment of the classification model. First, use torch to train the model, and run the following code:

import torch
import numpy as np
from matplotlib import pyplot as plt
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets
import torch.nn.functional as F

"""
卷积运算 使用mnist数据集，和10-4，11类似的，只是这里：1.输出训练轮的acc 2.模型上使用torch.nn.Sequential
"""
# Super parameter ------------------------------------------------------------------------------------
batch_size = 64
learning_rate = 0.01
momentum = 0.5
EPOCH = 10

# Prepare dataset ------------------------------------------------------------------------------------
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
# softmax归一化指数函数(https://blog.csdn.net/lz_peter/article/details/84574716),其中0.1307是mean均值和0.3081是std标准差

train_dataset = datasets.MNIST(root='./data/mnist', train=True, transform=transform,download=True)  # 本地没有就加上download=True
test_dataset = datasets.MNIST(root='./data/mnist', train=False, transform=transform)  # train=True训练集，=False测试集
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

fig = plt.figure()
for i in range(12):
    plt.subplot(3, 4, i+1)
    plt.tight_layout()
    plt.imshow(train_dataset.train_data[i], cmap='gray', interpolation='none')
    plt.title("Labels: {}".format(train_dataset.train_labels[i]))
    plt.xticks([])
    plt.yticks([])
plt.show()


# 训练集乱序，测试集有序
# Design model using class ------------------------------------------------------------------------------
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 10, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.conv2 = torch.nn.Sequential(
            torch.nn.Conv2d(10, 20, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(320, 50),
            torch.nn.Linear(50, 10),
        )

    def forward(self, x):
        batch_size = x.size(0)
        x = self.conv1(x)  # 一层卷积层,一层池化层,一层激活层(图是先卷积后激活再池化，差别不大)
        x = self.conv2(x)  # 再来一次
        x = x.view(batch_size, -1)  # flatten 变成全连接网络需要的输入 (batch, 20,4,4) ==> (batch,320), -1 此处自动算出的是320
        x = self.fc(x)
        x = torch.sigmoid(x)
        return x  # 最后输出的是维度为10的，也就是（对应数学符号的0~9）


model = Net()


# Construct loss and optimizer ------------------------------------------------------------------------------
criterion = torch.nn.CrossEntropyLoss()  # 交叉熵损失
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)  # lr学习率，momentum冲量


# Train and Test CLASS --------------------------------------------------------------------------------------
# 把单独的一轮一环封装在函数类里
def train(epoch):
    running_loss = 0.0  # 这整个epoch的loss清零
    running_total = 0
    running_correct = 0
    for batch_idx, data in enumerate(train_loader, 0):
        inputs, target = data
        optimizer.zero_grad()

        # forward + backward + update
        outputs = model(inputs)
        loss = criterion(outputs, target)

        loss.backward()
        optimizer.step()

        # 把运行中的loss累加起来，为了下面300次一除
        running_loss += loss.item()
        # 把运行中的准确率acc算出来
        _, predicted = torch.max(outputs.data, dim=1)
        running_total += inputs.shape[0]
        running_correct += (predicted == target).sum().item()

        if batch_idx % 300 == 299:  # 不想要每一次都出loss，浪费时间，选择每300次出一个平均损失,和准确率
            print('[%d, %5d]: loss: %.3f , acc: %.2f %%'
                  % (epoch + 1, batch_idx + 1, running_loss / 300, 100 * running_correct / running_total))
            running_loss = 0.0  # 这小批300的loss清零
            running_total = 0
            running_correct = 0  # 这小批300的acc清零

        torch.save(model.state_dict(), './model_Mnist.pth')
        # torch.save(optimizer.state_dict(), './optimizer_Mnist.pth')


def test():
    correct = 0
    total = 0
    with torch.no_grad():  # 测试集不用算梯度
        for data in test_loader:
            images, labels = data
            outputs = model(images)
            _, predicted = torch.max(outputs.data, dim=1)  # dim = 1 列是第0个维度，行是第1个维度，沿着行(第1个维度)去找1.最大值和2.最大值的下标
            total += labels.size(0)  # 张量之间的比较运算
            correct += (predicted == labels).sum().item()
    acc = correct / total
    print('[%d / %d]: Accuracy on test set: %.1f %% ' % (epoch+1, EPOCH, 100 * acc))  # 求测试的准确率，正确数/总数
    return acc


# Start train and Test --------------------------------------------------------------------------------------
if __name__ == '__main__':
    acc_list_test = []
    for epoch in range(EPOCH):
        train(epoch)
        # if epoch % 10 == 9:  #每训练10轮 测试1次
        acc_test = test()
        acc_list_test.append(acc_test)

    plt.plot(acc_list_test)
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy On TestSet')
    plt.show()

The program will automatically download the data set and perform training.
After running, we get: model_Mnist.pth model

ONNX model conversion

Still use torch to load the trained model and convert it to the onnx model format. The conversion code is as follows:

import torch
import numpy as np

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 10, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.conv2 = torch.nn.Sequential(
            torch.nn.Conv2d(10, 20, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(320, 50),
            torch.nn.Linear(50, 10),
        )

    def forward(self, x):
        batch_size = x.size(0)
        x = self.conv1(x)  # 一层卷积层,一层池化层,一层激活层(图是先卷积后激活再池化，差别不大)
        x = self.conv2(x)  # 再来一次
        x = x.view(batch_size, -1)  # flatten 变成全连接网络需要的输入 (batch, 20,4,4) ==> (batch,320), -1 此处自动算出的是320
        x = self.fc(x)
        x = torch.sigmoid(x)
        return x  # 最后输出的是维度为10的，也就是（对应数学符号的0~9）


model = Net()

model.load_state_dict(torch.load('./model_Mnist.pth'))
model.eval()

x=torch.randn((1,1,28,28))

torch.onnx.export(model, # 搭建的网络
    x, # 输入张量
    'mnist.onnx', # 输出模型名称
    input_names=["input"], # 输入命名
    output_names=["output"], # 输出命名
    opset_version=10,
    dynamic_axes={
    
    'input':{
    
    0:'batch'}, 'output':{
    
    0:'batch'}}  # 动态轴
)

What needs to be paid attention to is the opset_version, the operator version should not be selected too high, otherwise problems may occur during subsequent model conversion.
In this way, we can get the mnist.onnx file, which can be opened with Netron:
insert image description here
it can be seen that the structure of the entire network is still very simple.

RKNN model conversion

After getting the onnx model, we can userknn. apineutralRKNNConvert it to RKNN format for easy reasoning on the board.
At this time, a virtual machine should be used to connect to the development board. After conversion, the development board will be automatically used for model inference and other operations.
Before the conversion, you need to prepare several dataset pictures, and put the picture path in the dataset.txt file, so that you can fine-tune the network parameters during quantization.
The following programs can be used to extract pictures from the MNIST dataset:

import os
import shutil
from tqdm import tqdm
from torchvision import datasets
from concurrent.futures import ThreadPoolExecutor


def mnist_export(root: str = './data/minst'):
    """Export MNIST data to a local folder using multi-threading.

    Args:
        root (str, optional): Path to local folder. Defaults to './data/minst'.
    """
    for i in range(10):
        os.makedirs(os.path.join(root, f'./{i}'), exist_ok=True)
    split_list = ['train', 'test']
    data = {
    
    
        split: datasets.MNIST(
            root='./tmp',
            train=split == 'train',
            download=True
        ) for split in split_list
    }
    total = sum([len(data[split]) for split in split_list])
    with tqdm(total=total) as pbar:
        with ThreadPoolExecutor() as tp:
            for split in split_list:
                for index, (image, label) in enumerate(data[split]):
                    tmp = os.path.join(root, f'{label}/{split}_{index}.png')
                    tp.submit(image.save, tmp).add_done_callback(
                        lambda func: pbar.update()
                    )
    shutil.rmtree('./tmp')


if __name__ == '__main__':
    mnist_export('./data/minst')

dataset.txt is enough to store the image path, the example is as follows:

/home/alientek/atk/mnist/data/minst/0/test_3.png
/home/alientek/atk/mnist/data/minst/0/test_10.png
/home/alientek/atk/mnist/data/minst/0/test_13.png
/home/alientek/atk/mnist/data/minst/0/test_25.png
/home/alientek/atk/mnist/data/minst/0/test_28.png
/home/alientek/atk/mnist/data/minst/0/test_55.png
/home/alientek/atk/mnist/data/minst/0/test_69.png
/home/alientek/atk/mnist/data/minst/0/test_7.png
/home/alientek/atk/mnist/data/minst/0/test_101.png
/home/alientek/atk/mnist/data/minst/0/test_126.png

At this time, use python to perform quantization operations, and specify the platform as RV1126:

import os
import urllib
import traceback
import time
import sys
import numpy as np
import cv2
from rknn.api import RKNN

ONNX_MODEL = 'mnist.onnx'
RKNN_MODEL = 'mnist.rknn'


def show_outputs(outputs):
    output = outputs[0][0]
    output_sorted = sorted(output, reverse=True)
    top5_str = 'resnet50v2\n-----TOP 5-----\n'
    for i in range(5):
        value = output_sorted[i]
        index = np.where(output == value)
        for j in range(len(index)):
            if (i + j) >= 5:
                break
            if value > 0:
                topi = '{}: {}\n'.format(index[j], value)
            else:
                topi = '-1: 0.0\n'
            top5_str += topi
    print(top5_str)


def readable_speed(speed):
    speed_bytes = float(speed)
    speed_kbytes = speed_bytes / 1024
    if speed_kbytes > 1024:
        speed_mbytes = speed_kbytes / 1024
        if speed_mbytes > 1024:
            speed_gbytes = speed_mbytes / 1024
            return "{:.2f} GB/s".format(speed_gbytes)
        else:
            return "{:.2f} MB/s".format(speed_mbytes)
    else:
        return "{:.2f} KB/s".format(speed_kbytes)


def show_progress(blocknum, blocksize, totalsize):
    speed = (blocknum * blocksize) / (time.time() - start_time)
    speed_str = " Speed: {}".format(readable_speed(speed))
    recv_size = blocknum * blocksize

    f = sys.stdout
    progress = (recv_size / totalsize)
    progress_str = "{:.2f}%".format(progress * 100)
    n = round(progress * 50)
    s = ('#' * n).ljust(50, '-')
    f.write(progress_str.ljust(8, ' ') + '[' + s + ']' + speed_str)
    f.flush()
    f.write('\r\n')


if __name__ == '__main__':

    # Create RKNN object
    rknn = RKNN()

    # If resnet50v2 does not exist, download it.
    # Download address:
    # https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.onnx

    
    # pre-process config
    print('--> config model')
    rknn.config(mean_values=[[0]], std_values=[[1]], reorder_channel='0 1 2',target_platform='rv1126')# 灰度图，均值方差单通道
    print('done')

    # Load tensorflow model
    print('--> Loading model')
    ret = rknn.load_onnx(model=ONNX_MODEL)
    if ret != 0:
        print('Load mnist failed!')
        exit(ret)
    print('done')

    # Build model
    print('--> Building model')
    ret = rknn.build(do_quantization=True, dataset='./dataset.txt')
    if ret != 0:
        print('Build resnet50 failed!')
        exit(ret)
    print('done')

    # Export rknn model
    print('--> Export RKNN model')
    ret = rknn.export_rknn(RKNN_MODEL)
    if ret != 0:
        print('Export resnet50v2.rknn failed!')
        exit(ret)
    print('done')

    # Set inputs
    #img = cv2.imread('/home/alientek/atk/mnist/data/minst/0/test_3.png',0)
    img = cv2.imread('/home/alientek/atk/mnist/data/minst/1/test_2.png',0)# 推理图片路径，灰度图
    #img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    # init runtime environment
    print('--> Init runtime environment')
    ret = rknn.init_runtime(target='rv1126')
    #ret = rknn.init_runtime(target='rv1126',perf_debug=True,eval_mem=True)
    #ret = rknn.eval_perf(inputs=[img], is_print=True)
    #memory_detail = rknn.eval_memory()
    if ret != 0:
        print('Init runtime environment failed')
        exit(ret)
    print('done')

    # Inference
    print('--> Running model')
    outputs = rknn.inference(inputs=[img])
    show_outputs(outputs)
    print('done')

    rknn.release()

Run the above program to get:

--> Export RKNN model
done
--> Init runtime environment
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.7.0 (f75fb8e build: 2021-07-20 16:23:11)
D RKNNAPI:   DRV: 1.7.0 (7880361 build: 2021-08-16 14:05:08)
D RKNNAPI: ==============================================
done
--> Running model
resnet50v2
-----TOP 5-----
[1]: 1.0
-1: 0.0
-1: 0.0
-1: 0.0
-1: 0.0

done

The result of the final output is 1, and the network has 100% confidence.

Executable on-board inference

In the end, for the sake of efficiency and other considerations, the executable file written by cpp is generally run on the development board.
Here I upload the whole project to the github
website as follows: https://github.com/liuyuan000/rv1126_mnist/
After downloading, open it to: RKNN_Cpp/rknn_mnist_demo path
Modify the cross-compilation toolchain and change it to your path:

RV1109_TOOL_CHAIN=/opt/atk-dlrv1126-toolchain/usr
GCC_COMPILER=${RV1109_TOOL_CHAIN}/bin/arm-linux-gnueabihf

Then execute the following command directly

./build.sh

After execution, it is as follows:
Please add a picture description
We get the build and install folders.

Copy the mnist/RKNN_Cpp/rknn_mnist_demo/install/rknn_mnist_demo folder under install to the development board:

adb push ./rknn_mnist_demo/ /demo/MY

And copy a picture in the mnist dataset to the development board
and execute:

adb shell

Enter the command line of the development board:
Please add a picture description
execute the following command to load the model and test image:

./rknn_mnist_demo ./model/mnist.rknn ./model/test_871.png

Program output:
Please add a picture description
the final 871 pictures are classified as 0, check it:

the classification ends successfully.

Full-process deployment of the classification model based on the RV1126 platform (with engineering)

The whole process deployment of classification model based on RV1126 platform

Environment installation

model training

ONNX model conversion

RKNN model conversion

Executable on-board inference

Guess you like