The whole process deployment of classification model based on RV1126 platform
Environment installation
First of all, install the Rockchip Rv1126 SDK on the virtual machine. It is important to have rknn_toolchain
generally in the following path:
sdk/external/rknn-toolkit
Follow the steps in the doc to install.
The project code is at: https://github.com/liuyuan000/rv1126_mnist/tree/main/RKNN_Cpp/rknn_mnist_demo
model training
First, take the classification of MNIST handwritten digits as an example to illustrate the full-process deployment of the classification model. First, use torch to train the model, and run the following code:
import torch
import numpy as np
from matplotlib import pyplot as plt
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets
import torch.nn.functional as F
"""
卷积运算 使用mnist数据集,和10-4,11类似的,只是这里:1.输出训练轮的acc 2.模型上使用torch.nn.Sequential
"""
# Super parameter ------------------------------------------------------------------------------------
batch_size = 64
learning_rate = 0.01
momentum = 0.5
EPOCH = 10
# Prepare dataset ------------------------------------------------------------------------------------
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
# softmax归一化指数函数(https://blog.csdn.net/lz_peter/article/details/84574716),其中0.1307是mean均值和0.3081是std标准差
train_dataset = datasets.MNIST(root='./data/mnist', train=True, transform=transform,download=True) # 本地没有就加上download=True
test_dataset = datasets.MNIST(root='./data/mnist', train=False, transform=transform) # train=True训练集,=False测试集
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
fig = plt.figure()
for i in range(12):
plt.subplot(3, 4, i+1)
plt.tight_layout()
plt.imshow(train_dataset.train_data[i], cmap='gray', interpolation='none')
plt.title("Labels: {}".format(train_dataset.train_labels[i]))
plt.xticks([])
plt.yticks([])
plt.show()
# 训练集乱序,测试集有序
# Design model using class ------------------------------------------------------------------------------
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = torch.nn.Sequential(
torch.nn.Conv2d(1, 10, kernel_size=5),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2),
)
self.conv2 = torch.nn.Sequential(
torch.nn.Conv2d(10, 20, kernel_size=5),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2),
)
self.fc = torch.nn.Sequential(
torch.nn.Linear(320, 50),
torch.nn.Linear(50, 10),
)
def forward(self, x):
batch_size = x.size(0)
x = self.conv1(x) # 一层卷积层,一层池化层,一层激活层(图是先卷积后激活再池化,差别不大)
x = self.conv2(x) # 再来一次
x = x.view(batch_size, -1) # flatten 变成全连接网络需要的输入 (batch, 20,4,4) ==> (batch,320), -1 此处自动算出的是320
x = self.fc(x)
x = torch.sigmoid(x)
return x # 最后输出的是维度为10的,也就是(对应数学符号的0~9)
model = Net()
# Construct loss and optimizer ------------------------------------------------------------------------------
criterion = torch.nn.CrossEntropyLoss() # 交叉熵损失
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum) # lr学习率,momentum冲量
# Train and Test CLASS --------------------------------------------------------------------------------------
# 把单独的一轮一环封装在函数类里
def train(epoch):
running_loss = 0.0 # 这整个epoch的loss清零
running_total = 0
running_correct = 0
for batch_idx, data in enumerate(train_loader, 0):
inputs, target = data
optimizer.zero_grad()
# forward + backward + update
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
# 把运行中的loss累加起来,为了下面300次一除
running_loss += loss.item()
# 把运行中的准确率acc算出来
_, predicted = torch.max(outputs.data, dim=1)
running_total += inputs.shape[0]
running_correct += (predicted == target).sum().item()
if batch_idx % 300 == 299: # 不想要每一次都出loss,浪费时间,选择每300次出一个平均损失,和准确率
print('[%d, %5d]: loss: %.3f , acc: %.2f %%'
% (epoch + 1, batch_idx + 1, running_loss / 300, 100 * running_correct / running_total))
running_loss = 0.0 # 这小批300的loss清零
running_total = 0
running_correct = 0 # 这小批300的acc清零
torch.save(model.state_dict(), './model_Mnist.pth')
# torch.save(optimizer.state_dict(), './optimizer_Mnist.pth')
def test():
correct = 0
total = 0
with torch.no_grad(): # 测试集不用算梯度
for data in test_loader:
images, labels = data
outputs = model(images)
_, predicted = torch.max(outputs.data, dim=1) # dim = 1 列是第0个维度,行是第1个维度,沿着行(第1个维度)去找1.最大值和2.最大值的下标
total += labels.size(0) # 张量之间的比较运算
correct += (predicted == labels).sum().item()
acc = correct / total
print('[%d / %d]: Accuracy on test set: %.1f %% ' % (epoch+1, EPOCH, 100 * acc)) # 求测试的准确率,正确数/总数
return acc
# Start train and Test --------------------------------------------------------------------------------------
if __name__ == '__main__':
acc_list_test = []
for epoch in range(EPOCH):
train(epoch)
# if epoch % 10 == 9: #每训练10轮 测试1次
acc_test = test()
acc_list_test.append(acc_test)
plt.plot(acc_list_test)
plt.xlabel('Epoch')
plt.ylabel('Accuracy On TestSet')
plt.show()
The program will automatically download the data set and perform training.
After running, we get: model_Mnist.pth model
ONNX model conversion
Still use torch to load the trained model and convert it to the onnx model format. The conversion code is as follows:
import torch
import numpy as np
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = torch.nn.Sequential(
torch.nn.Conv2d(1, 10, kernel_size=5),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2),
)
self.conv2 = torch.nn.Sequential(
torch.nn.Conv2d(10, 20, kernel_size=5),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2),
)
self.fc = torch.nn.Sequential(
torch.nn.Linear(320, 50),
torch.nn.Linear(50, 10),
)
def forward(self, x):
batch_size = x.size(0)
x = self.conv1(x) # 一层卷积层,一层池化层,一层激活层(图是先卷积后激活再池化,差别不大)
x = self.conv2(x) # 再来一次
x = x.view(batch_size, -1) # flatten 变成全连接网络需要的输入 (batch, 20,4,4) ==> (batch,320), -1 此处自动算出的是320
x = self.fc(x)
x = torch.sigmoid(x)
return x # 最后输出的是维度为10的,也就是(对应数学符号的0~9)
model = Net()
model.load_state_dict(torch.load('./model_Mnist.pth'))
model.eval()
x=torch.randn((1,1,28,28))
torch.onnx.export(model, # 搭建的网络
x, # 输入张量
'mnist.onnx', # 输出模型名称
input_names=["input"], # 输入命名
output_names=["output"], # 输出命名
opset_version=10,
dynamic_axes={
'input':{
0:'batch'}, 'output':{
0:'batch'}} # 动态轴
)
What needs to be paid attention to is the opset_version, the operator version should not be selected too high, otherwise problems may occur during subsequent model conversion.
In this way, we can get the mnist.onnx file, which can be opened with Netron:
it can be seen that the structure of the entire network is still very simple.
RKNN model conversion
After getting the onnx model, we can userknn. apineutralRKNNConvert it to RKNN format for easy reasoning on the board.
At this time, a virtual machine should be used to connect to the development board. After conversion, the development board will be automatically used for model inference and other operations.
Before the conversion, you need to prepare several dataset pictures, and put the picture path in the dataset.txt file, so that you can fine-tune the network parameters during quantization.
The following programs can be used to extract pictures from the MNIST dataset:
import os
import shutil
from tqdm import tqdm
from torchvision import datasets
from concurrent.futures import ThreadPoolExecutor
def mnist_export(root: str = './data/minst'):
"""Export MNIST data to a local folder using multi-threading.
Args:
root (str, optional): Path to local folder. Defaults to './data/minst'.
"""
for i in range(10):
os.makedirs(os.path.join(root, f'./{i}'), exist_ok=True)
split_list = ['train', 'test']
data = {
split: datasets.MNIST(
root='./tmp',
train=split == 'train',
download=True
) for split in split_list
}
total = sum([len(data[split]) for split in split_list])
with tqdm(total=total) as pbar:
with ThreadPoolExecutor() as tp:
for split in split_list:
for index, (image, label) in enumerate(data[split]):
tmp = os.path.join(root, f'{label}/{split}_{index}.png')
tp.submit(image.save, tmp).add_done_callback(
lambda func: pbar.update()
)
shutil.rmtree('./tmp')
if __name__ == '__main__':
mnist_export('./data/minst')
dataset.txt is enough to store the image path, the example is as follows:
/home/alientek/atk/mnist/data/minst/0/test_3.png
/home/alientek/atk/mnist/data/minst/0/test_10.png
/home/alientek/atk/mnist/data/minst/0/test_13.png
/home/alientek/atk/mnist/data/minst/0/test_25.png
/home/alientek/atk/mnist/data/minst/0/test_28.png
/home/alientek/atk/mnist/data/minst/0/test_55.png
/home/alientek/atk/mnist/data/minst/0/test_69.png
/home/alientek/atk/mnist/data/minst/0/test_7.png
/home/alientek/atk/mnist/data/minst/0/test_101.png
/home/alientek/atk/mnist/data/minst/0/test_126.png
At this time, use python to perform quantization operations, and specify the platform as RV1126:
import os
import urllib
import traceback
import time
import sys
import numpy as np
import cv2
from rknn.api import RKNN
ONNX_MODEL = 'mnist.onnx'
RKNN_MODEL = 'mnist.rknn'
def show_outputs(outputs):
output = outputs[0][0]
output_sorted = sorted(output, reverse=True)
top5_str = 'resnet50v2\n-----TOP 5-----\n'
for i in range(5):
value = output_sorted[i]
index = np.where(output == value)
for j in range(len(index)):
if (i + j) >= 5:
break
if value > 0:
topi = '{}: {}\n'.format(index[j], value)
else:
topi = '-1: 0.0\n'
top5_str += topi
print(top5_str)
def readable_speed(speed):
speed_bytes = float(speed)
speed_kbytes = speed_bytes / 1024
if speed_kbytes > 1024:
speed_mbytes = speed_kbytes / 1024
if speed_mbytes > 1024:
speed_gbytes = speed_mbytes / 1024
return "{:.2f} GB/s".format(speed_gbytes)
else:
return "{:.2f} MB/s".format(speed_mbytes)
else:
return "{:.2f} KB/s".format(speed_kbytes)
def show_progress(blocknum, blocksize, totalsize):
speed = (blocknum * blocksize) / (time.time() - start_time)
speed_str = " Speed: {}".format(readable_speed(speed))
recv_size = blocknum * blocksize
f = sys.stdout
progress = (recv_size / totalsize)
progress_str = "{:.2f}%".format(progress * 100)
n = round(progress * 50)
s = ('#' * n).ljust(50, '-')
f.write(progress_str.ljust(8, ' ') + '[' + s + ']' + speed_str)
f.flush()
f.write('\r\n')
if __name__ == '__main__':
# Create RKNN object
rknn = RKNN()
# If resnet50v2 does not exist, download it.
# Download address:
# https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.onnx
# pre-process config
print('--> config model')
rknn.config(mean_values=[[0]], std_values=[[1]], reorder_channel='0 1 2',target_platform='rv1126')# 灰度图,均值方差单通道
print('done')
# Load tensorflow model
print('--> Loading model')
ret = rknn.load_onnx(model=ONNX_MODEL)
if ret != 0:
print('Load mnist failed!')
exit(ret)
print('done')
# Build model
print('--> Building model')
ret = rknn.build(do_quantization=True, dataset='./dataset.txt')
if ret != 0:
print('Build resnet50 failed!')
exit(ret)
print('done')
# Export rknn model
print('--> Export RKNN model')
ret = rknn.export_rknn(RKNN_MODEL)
if ret != 0:
print('Export resnet50v2.rknn failed!')
exit(ret)
print('done')
# Set inputs
#img = cv2.imread('/home/alientek/atk/mnist/data/minst/0/test_3.png',0)
img = cv2.imread('/home/alientek/atk/mnist/data/minst/1/test_2.png',0)# 推理图片路径,灰度图
#img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# init runtime environment
print('--> Init runtime environment')
ret = rknn.init_runtime(target='rv1126')
#ret = rknn.init_runtime(target='rv1126',perf_debug=True,eval_mem=True)
#ret = rknn.eval_perf(inputs=[img], is_print=True)
#memory_detail = rknn.eval_memory()
if ret != 0:
print('Init runtime environment failed')
exit(ret)
print('done')
# Inference
print('--> Running model')
outputs = rknn.inference(inputs=[img])
show_outputs(outputs)
print('done')
rknn.release()
Run the above program to get:
--> Export RKNN model
done
--> Init runtime environment
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI: API: 1.7.0 (f75fb8e build: 2021-07-20 16:23:11)
D RKNNAPI: DRV: 1.7.0 (7880361 build: 2021-08-16 14:05:08)
D RKNNAPI: ==============================================
done
--> Running model
resnet50v2
-----TOP 5-----
[1]: 1.0
-1: 0.0
-1: 0.0
-1: 0.0
-1: 0.0
done
The result of the final output is 1, and the network has 100% confidence.
Executable on-board inference
In the end, for the sake of efficiency and other considerations, the executable file written by cpp is generally run on the development board.
Here I upload the whole project to the github
website as follows: https://github.com/liuyuan000/rv1126_mnist/
After downloading, open it to: RKNN_Cpp/rknn_mnist_demo path
Modify the cross-compilation toolchain and change it to your path:
RV1109_TOOL_CHAIN=/opt/atk-dlrv1126-toolchain/usr
GCC_COMPILER=${RV1109_TOOL_CHAIN}/bin/arm-linux-gnueabihf
Then execute the following command directly
./build.sh
After execution, it is as follows:
We get the build and install folders.
Copy the mnist/RKNN_Cpp/rknn_mnist_demo/install/rknn_mnist_demo folder under install to the development board:
adb push ./rknn_mnist_demo/ /demo/MY
And copy a picture in the mnist dataset to the development board
and execute:
adb shell
Enter the command line of the development board:
execute the following command to load the model and test image:
./rknn_mnist_demo ./model/mnist.rknn ./model/test_871.png
Program output:
the final 871 pictures are classified as 0, check it:
the classification ends successfully.