1. Development environment configuration
- Linux development environment
- An x86 host with Ubuntu 16.04/18.04/20.04 installed, the running memory is recommended to be more than 12GB
- Download the SophonSDK development kit (v23.03.01)
(1) Unzip the SDK package
sudo apt-get install p7zip
sudo apt-get install p7zip-full
7z x Release_<date>-public.zip
cd Release_<date>-public
(2) Docker installation – TPU-MLIR environment initialization
# 安装docker
sudo apt-get install docker.io
# docker命令免root权限执行
# 创建docker用户组,若已有docker组会报错,没关系可忽略
sudo groupadd docker
# 将当前用户加入docker组
sudo gpasswd -a ${USER} docker
# 重启docker服务
sudo service docker restart
# 切换当前会话到新group或重新登录重启X会话
newgrp docker
提示:需要logout系统然后重新登录,再使用docker就不需要sudo了。
(3) Create a docker container and enter Docker
docker run -v $PWD/:/workspace -p 8001:8001 -it sophgo/tpuc_dev:latest
(4) Load tpu-mlir – activate environment variables
The following operations need to be in a Docker container. For the use of Docker, please refer to Start Docker Container .
$ tar zxf tpu-mlir_xxxx.tar.gz
$ source tpu-mlir_xxxx/envsetup.sh
_xxxx represents the version number of tpu-mlir, and envsetup.sh
the following environment variables will be added:
variable name | value | illustrate |
---|---|---|
TPUC_ROOT | tpu-mlir_xxx | The location of the SDK package after decompression |
MODEL_ZOO_PATH | ${TPUC_ROOT}/…/model-zoo | The location of the model-zoo folder, which is at the same level as the SDK |
envsetup.sh
The modified content of the environment variable is:
export PATH=${TPUC_ROOT}/bin:$PATH
export PATH=${TPUC_ROOT}/python/tools:$PATH
export PATH=${TPUC_ROOT}/python/utils:$PATH
export PATH=${TPUC_ROOT}/python/test:$PATH
export PATH=${TPUC_ROOT}/python/samples:$PATH
export LD_LIBRARY_PATH=$TPUC_ROOT/lib:$LD_LIBRARY_PATH
export PYTHONPATH=${TPUC_ROOT}/python:$PYTHONPATH
export MODEL_ZOO_PATH=${TPUC_ROOT}/../model-zoo
2. Compile the ONNX format model
This chapter takes as yolov5s.onnx
an example to introduce how to compile and migrate an onnx model to run on the BM1684X TPU platform.
The model comes from yolov5's official website: https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.onnx
(1) Prepare the working directory, model files and data
Create model_yolov5s
a directory, note that it is the same level directory as tpu-mlir; and put both model files and image files into model_yolov5s
the directory.
The operation command is as follows:
$ mkdir model_yolov5s && cd model_yolov5s
$ cp $TPUC_ROOT/regression/model/yolov5s.onnx .
$ cp -rf $TPUC_ROOT/regression/dataset/COCO2017 .
$ cp -rf $TPUC_ROOT/regression/image .
$ mkdir workspace && cd workspace
Here $TPUC_ROOT
is the environment variable, corresponding to the tpu-mlir_xxxx directory.
The conversion model is mainly divided into two steps (executed in docker)
(2) ONNX to MLIR
One is by model_transform.py
converting the original model into a mlir file;
If the model is an image input, we need to understand the preprocessing of the model before converting the model. If the model uses the preprocessed npz file as input, there is no need to consider preprocessing. The preprocessing process is expressed as follows (x represents input):
The picture of yolov5 on the official website is rgb, each value will be multiplied 1/255
, converted into mean and scale corresponding to 0.0,0.0,0.0
and 0.0039216,0.0039216,0.0039216
.
The model conversion command is as follows:
$ model_transform.py \
--model_name yolov5s \
--model_def ../yolov5s.onnx \
--input_shapes [[1,3,640,640]] \
--mean 0.0,0.0,0.0 \
--scale 0.0039216,0.0039216,0.0039216 \
--keep_aspect_ratio \
--pixel_format rgb \
--output_names 350,498,646 \
--test_input ../image/dog.jpg \
--test_result yolov5s_top_outputs.npz \
--mlir yolov5s.mlir \
--post_handle_type yolo
Output of the conversion process:
The final generated ${model_name}_in_f32.npz
files are as follows:
(3) MLIR to FP32
The second is to model_deploy.py
convert the mlir file into a bmodel .
To convert the mlir file to f32 bmodel, the command is as follows:
$ model_deploy.py \
--mlir yolov5s.mlir \
--quantize F32 \
--chip bm1684x \
--test_input yolov5s_in_f32.npz \
--test_reference yolov5s_top_outputs.npz \
--tolerance 0.99,0.99 \
--model yolov5s_1684x_f32.bmodel
The final generated ${model_name}_1684x_f32.bmodel
and other related files are as follows:
(4) MLIR to INT8 model
Generate Calibration Table
Before converting to INT8 model, you need to run calibration to get the calibration table; the number of input data is prepared according to the situation. About 100~1000 sheets.
Then with the calibration table, a symmetric or asymmetric bmodel is generated. If the symmetry meets the requirements, it is generally not recommended to use the asymmetric model, because the performance of the asymmetric model will be slightly worse than that of the symmetric model.
Here we use the existing 100 pictures from COCO2017 as an example to perform calibration:
$ run_calibration.py yolov5s.mlir \
--dataset ../COCO2017 \
--input_num 100 \
-o yolov5s_cali_table
Operation process diagram:
${model_name}_cali_table
After the operation is completed , a file named will be generated , which is used as the input file for subsequent compilation of the INT8 model.
Compile to INT8 symmetric quantization model
To convert to INT8 symmetric quantization model, execute the following command:
$ model_deploy.py \
--mlir yolov5s.mlir \
--quantize INT8 \
--calibration_table yolov5s_cali_table \
--chip bm1684x \
--test_input yolov5s_in_f32.npz \
--test_reference yolov5s_top_outputs.npz \
--tolerance 0.85,0.45 \
--model yolov5s_1684x_int8_sym.bmodel
The output of the conversion process is as follows:
The final generated ${model_name}_1684x_int8_sym.bmodel
and other related files are as follows:
Compile to INT8 asymmetric quantization model
To convert to INT8 asymmetric quantization model, execute the following command:
$ model_deploy.py \
--mlir yolov5s.mlir \
--quantize INT8 \
--asymmetric \
--calibration_table yolov5s_cali_table \
--chip bm1684x \
--test_input yolov5s_in_f32.npz \
--test_reference yolov5s_top_outputs.npz \
--tolerance 0.90,0.55 \
--model yolov5s_1684x_int8_asym.bmodel
After the compilation is complete, ${model_name}_1684x_int8_asym.bmodel
a file named will be generated.
Effect comparison
In this release package, there are yolov5 use cases and source code paths written in python $TPUC_ROOT/python/samples/detect_yolov5.py
, which are used for object detection on pictures. Read the code to understand how the model is used: first pre-process to get the input of the model, then infer to get the output, and finally do post-processing. Use the following codes to verify the execution results of onnx/f32/int8 respectively.
The implementation of the onnx model is as follows, to get dog_onnx.jpg
:
$ detect_yolov5.py \
--input ../image/dog.jpg \
--model ../yolov5s.onnx \
--output dog_onnx.jpg
The execution method of f32 bmodel is as follows, and we get dog_f32.jpg
:
$ detect_yolov5.py \
--input ../image/dog.jpg \
--model yolov5s_1684x_f32.bmodel \
--output dog_f32.jpg
The execution method of int8 symmetric bmodel is as follows, and it is obtained dog_int8_sym.jpg
:
$ detect_yolov5.py \
--input ../image/dog.jpg \
--model yolov5s_1684x_int8_sym.bmodel \
--output dog_int8_sym.jpg
The implementation of int8 asymmetric bmodel is as follows, get dog_int8_asym.jpg
:
$ detect_yolov5.py \
--input ../image/dog.jpg \
--model yolov5s_1684x_int8_asym.bmodel \
--output dog_int8_asym.jpg
The final detection image file is generated as follows:
The comparison of the detection accuracy of the four pictures is as follows:
由于运行环境不同, 最终的效果和精度与上图会有些差异。
problem – solution
The output information error encountered when transferring mlir to F32 :
resolution process
Error 1 solution:
[Cause analysis] Due to model_transform.py
the post-processing of the file by yolo, the shapes of the two generated npz files are different, resulting in an error in operation.
[Solution] model_transform.py
Modify the running command and process the operation steps.
(Delete the –post_handle_type yolo option, the shape of the npz file generated by the default post-processing can be consistent, and the FP32 format model is successfully generated)
[Solution process] – analyze the error and track the cause of the error step by step
mlir_shell.py
The runtime error
3. Compile the TFlite format model
First, set up according to the environment configuration in 1;
Taking resnet50_int8.tflite
the model as an example, how to compile and migrate a TFLite model to run on the BM1684X TPU platform.
(1) Prepare the working directory, model files and data
Create model_resnet50_tf
a directory, note that it is the same level directory as tpu-mlir; and put the test image file into model_resnet50_tf
the directory.
The operation is as follows:
$ mkdir model_resnet50_tf && cd model_resnet50_tf
$ cp $TPUC_ROOT/regression/model/resnet50_int8.tflite .
$ cp -rf $TPUC_ROOT/regression/image .
$ mkdir workspace && cd workspace
Here $TPUC_ROOT
is the environment variable, corresponding to the tpu-mlir_xxxx directory.
(2) TFLite to MLIR
The model in this example is bgr input, mean is 103.939,116.779,123.68
, scale is1.0,1.0,1.0
The model conversion command is as follows:
$ model_transform.py \
--model_name resnet50_tf \
--model_def ../resnet50_int8.tflite \
--input_shapes [[1,3,224,224]] \
--mean 103.939,116.779,123.68 \
--scale 1.0,1.0,1.0 \
--pixel_format bgr \
--test_input ../image/cat.jpg \
--test_result resnet50_tf_top_outputs.npz \
--mlir resnet50_tf.mlir
After converting to mlir file, a resnet50_tf_in_f32.npz
file will be generated, which is the input file of the model.
resnet50_tf_in_f32.npz
The output file is as follows. After converting to mlir file, a file will be generated , which is the input file of the model.
(3) MLIR to model
This model is a tflite asymmetric quantization model, which can be converted into a model according to the following parameters:
$ model_deploy.py \
--mlir resnet50_tf.mlir \
--quantize INT8 \
--asymmetric \
--chip bm1684x \
--test_input resnet50_tf_in_f32.npz \
--test_reference resnet50_tf_top_outputs.npz \
--model resnet50_tf_1684x.bmodel
After the compilation is complete, resnet50_tf_1684x.bmodel
a file named will be generated.
4. Compile the Caffe format model
First, set up according to the environment configuration in 1;
This chapter takes mobilenet_v2_deploy.prototxt
and mobilenet_v2.caffemodel
as examples to introduce how to compile and migrate a caffe model to run on the BM1684X TPU platform.
(1) Prepare the working directory, model files and data
Create mobilenet_v2
a directory, note that it is the same level directory as tpu-mlir; and put both model files and image files into mobilenet_v2
the directory.
The operation is as follows:
$ mkdir mobilenet_v2 && cd mobilenet_v2
$ cp $TPUC_ROOT/regression/model/mobilenet_v2_deploy.prototxt .
$ cp $TPUC_ROOT/regression/model/mobilenet_v2.caffemodel .
$ cp -rf $TPUC_ROOT/regression/dataset/ILSVRC2012 .
$ cp -rf $TPUC_ROOT/regression/image .
$ mkdir workspace && cd workspace
Here $TPUC_ROOT
is the environment variable, corresponding to the tpu-mlir_xxxx directory.
(2) Caffe to MLIR
103.94,116.78,123.68
The model in this example is BGR input, mean and scale are and , respectively 0.017,0.017,0.017
.
The model conversion command is as follows:
$ model_transform.py \
--model_name mobilenet_v2 \
--model_def ../mobilenet_v2_deploy.prototxt \
--model_data ../mobilenet_v2.caffemodel \
--input_shapes [[1,3,224,224]] \
--resize_dims=256,256 \
--mean 103.94,116.78,123.68 \
--scale 0.017,0.017,0.017 \
--pixel_format bgr \
--test_input ../image/cat.jpg \
--test_result mobilenet_v2_top_outputs.npz \
--mlir mobilenet_v2.mlir
${model_name}_in_f32.npz
After conversion, the output file is as follows. After converting to mlir file, a file will be generated , which is the input file of the model.
(3) MLIR to F32 model
Convert the mlir file to f32 bmodel, the operation method is as follows:
$ model_deploy.py \
--mlir mobilenet_v2.mlir \
--quantize F32 \
--chip bm1684x \
--test_input mobilenet_v2_in_f32.npz \
--test_reference mobilenet_v2_top_outputs.npz \
--tolerance 0.99,0.99 \
--model mobilenet_v2_1684x_f32.bmodel
After the compilation is complete, ${model_name}_1684x_f32.bmodel
a file named will be generated.
(4) MLIR to INT8 model
Generate Calibration Table
Before converting to INT8 model, you need to run calibration to get the calibration table; the number of input data is prepared according to the situation. About 100~1000 sheets.
Then with the calibration table, a symmetric or asymmetric bmodel is generated. If the symmetry meets the requirements, it is generally not recommended to use the asymmetric model, because the performance of the asymmetric model will be slightly worse than that of the symmetric model.
Here we use the existing 100 pictures from ILSVRC2012 as an example to perform calibration:
$ run_calibration.py mobilenet_v2.mlir \
--dataset ../ILSVRC2012 \
--input_num 100 \
-o mobilenet_v2_cali_table
Generate calibration table process:
.After
the operation is completed, ${model_name}_cali_table
a file named will be generated, which is used as the input file for subsequent compilation of the INT8 model.
Compile to INT8 symmetric quantization model
To convert to INT8 symmetric quantization model, execute the following command:
$ model_deploy.py \
--mlir mobilenet_v2.mlir \
--quantize INT8 \
--calibration_table mobilenet_v2_cali_table \
--chip bm1684x \
--test_input mobilenet_v2_in_f32.npz \
--test_reference mobilenet_v2_top_outputs.npz \
--tolerance 0.96,0.70 \
--model mobilenet_v2_1684x_int8_sym.bmodel
After the compilation is complete, ${model_name}_1684x_int8_sym.bmodel
a file named will be generated.
Compile to INT8 asymmetric quantization model
To convert to INT8 asymmetric quantization model, execute the following command:
$ model_deploy.py \
--mlir mobilenet_v2.mlir \
--quantize INT8 \
--asymmetric \
--calibration_table mobilenet_v2_cali_table \
--chip bm1684x \
--test_input mobilenet_v2_in_f32.npz \
--test_reference mobilenet_v2_top_outputs.npz \
--tolerance 0.95,0.69 \
--model mobilenet_v2_1684x_int8_asym.bmodel
After the compilation is complete, ${model_name}_1684x_int8_asym.bmodel
a file named will be generated.
5. How to use mixed precision
First, set up according to the environment configuration in 1;
This chapter takes the detection network yolov3 tiny
model as an example to introduce how to use mixed precision. The model is from https://github.com/onnx/models/tree/main/vision/object_detection_segmentation/tiny-yolov3.
(1) Prepare the working directory, model files and data
Create yolov3_tiny
a directory, note that it is the same level directory as tpu-mlir; and put both model files and image files into yolov3_tiny
the directory.
The operation is as follows:
$ mkdir yolov3_tiny && cd yolov3_tiny
$ wget https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/tiny-yolov3/model/tiny-yolov3-11.onnx
$ cp -rf $TPUC_ROOT/regression/dataset/COCO2017 .
$ mkdir workspace && cd workspace
Here $TPUC_ROOT
is the environment variable, corresponding to the tpu-mlir_xxxx directory. Note that if tiny-yolov3-11.onnx
the download with wget fails, you can download the original package by visiting github and put it in yolov3_tiny
the folder.
(2) Verify the original model
detect_yolov3.py
It is a verification program that has been written and can be used to yolov3_tiny
verify the network. The execution process is as follows:
$ detect_yolov3.py \
--model ../tiny-yolov3-11.onnx \
--input ../COCO2017/000000366711.jpg \
--output yolov3_onnx.jpg
After execution, print the detected results as follows:
person:60.7%
orange:77.5%
And get the picture yolov3_onnx.jpg
, as follows
(3) Convert to INT8 symmetric quantization model
As in the transfer model method introduced in the previous chapter, there is no parameter description here, only the operation process.
Step 1: Convert to F32 mlir
$ model_transform.py \
--model_name yolov3_tiny \
--model_def ../tiny-yolov3-11.onnx \
--input_shapes [[1,3,416,416]] \
--scale 0.0039216,0.0039216,0.0039216 \
--pixel_format rgb \
--keep_aspect_ratio \
--pad_value 128 \
--output_names=transpose_output1,transpose_output \
--mlir yolov3_tiny.mlir
The final generated file is as follows:
Step 2: Generate calibration table
$ run_calibration.py yolov3_tiny.mlir \
--dataset ../COCO2017 \
--input_num 100 \
-o yolov3_cali_table
Generate the calibration table file as follows:
Step 3: Rotationally Symmetric Quantized Model
$ model_deploy.py \
--mlir yolov3_tiny.mlir \
--quantize INT8 \
--calibration_table yolov3_cali_table \
--chip bm1684x \
--model yolov3_int8.bmodel
The final output file is as follows:
Step 4: Validate the model
$ detect_yolov3.py \
--model yolov3_int8.bmodel \
--input ../COCO2017/000000366711.jpg \
--output yolov3_int8.jpg
After execution, the following print information is displayed:
person:64.0%
orange:73.0%
[
Get the picture yolov3_int8.jpg
, as follows
It can be seen that compared with the original model, the int8 symmetric quantization model has a poor detection effect on orange individuals in this picture.
(4) Convert to a mixed-precision quantization model
On the basis of converting to int8 symmetric quantization model, perform the following steps.
Step 1: Generate a mixed-precision quantization table
Use to run_qtable.py
generate a mixed-precision quantization table, and the relevant parameters are described as follows:
parameter name | required? | illustrate |
---|---|---|
none | yes | specify mlir file |
dataset | no | Specify the directory of the input sample, the path to put the corresponding picture, or npz, or npy |
data_list | no | Specify the sample list, and dataset must choose one of the two |
calibration_table | yes | Enter the calibration table |
chip | yes | Specify the platform that the model will use, support bm1684x/bm1684/cv183x/cv182x/cv181x/cv180x |
fp_type | no | Specifies the float type used in mixed precision, supports auto, F16, F32, BF16, the default is auto, which means it is automatically selected by the program |
input_num | no | Specify the number of input samples, the default is 10 |
expected_cos | no | Specify the minimum cos value of the final output layer of the expected network, generally the default is 0.99, the smaller the value, the more layers may be set as floating point calculations |
min_layer_cos | no | Specify the minimum value of expected output cos of each layer. Below this value, it will try to set floating-point calculation. Generally, the default is 0.99. |
debug_cmd | no | Specify the debug command string, for development use, the default is empty |
o | yes | Output mixed precision quantization table |
In this example, the default 10 pictures are used for calibration, and the execution command is as follows (for CV18xx series chips, set chip to the corresponding chip name):
$ run_qtable.py yolov3_tiny.mlir \
--dataset ../COCO2017 \
--calibration_table yolov3_cali_table \
--chip bm1684x \
--min_layer_cos 0.999 \ #若这里使用默认的0.99时,程序会检测到原始int8模型已满足0.99的cos,从而直接不再搜素
--expected_cos 0.9999 \
-o yolov3_qtable
After execution, the final output is printed as follows: (The output information of different compilation hosts may be slightly different)
The above int8 outputs_cos indicates the cos similarity between the original network output of the int8 model and fp32, the mix model outputs_cos indicates the cos similarity of the network output after some layers use mixed precision, and the total time indicates that the search time is 11.2 seconds. In addition, the generated mixed precision quantization table yolov3_qtable
.
yolov3_qtable
The content is as follows:
In this table, the first column indicates the corresponding layer, the second column indicates the type, and the supported types are F32/F16/BF16/INT8. In addition, a loss table file will also be generated at the same time full_loss_table.txt
.
full_loss_table.txt
The content is as follows:
The table is arranged smoothly according to cos from small to large, which means that after the predecessor Layer of this layer has been changed to the corresponding floating point mode according to its respective cos, the cos calculated by this layer, if the cos is still smaller than the previous min_layer_cos parameter, it will be the Layers and immediate successor layers are set to floating-point calculations. run_qtable.py
It will continue to calculate the output cos of the entire network after setting a certain adjacent 2 layers as floating-point calculation each time. If the cos is greater than the specified expected_cos, the search will exit. Therefore, if you set a larger expected_cos, you will try to set more layers to floating-point calculations.
Step 2: Generate a mixed-precision quantization model
$ model_deploy.py \
--mlir yolov3_tiny.mlir \
--quantize INT8 \
--quantize_table yolov3_qtable \
--calibration_table yolov3_cali_table \
--chip bm1684x \
--model yolov3_mix.bmodel
The final generated file is as follows:
Step 3: Validate the mixed precision model
$ detect_yolov3.py \
--model yolov3_mix.bmodel \
--input ../COCO2017/000000366711.jpg \
--output yolov3_mix.jpg
After execution, the printed result is:
person:64.0%
orange:72.9%
Get the picture yolov3_mix.jpg
, as follows
It should be noted that, in addition to using run_qtable to generate the quantization table, you can also set the name and quantization type of the OP that needs to be quantized with mixed precision in the quantization table according to the similarity comparison results of each layer in the model.
6. Model deployment example
(1) Use TPU for pre-processing
At present, the two main series of chips supported by TPU-MLIR, BM168x and CV18xx, both support common image preprocessing to be added to the model for calculation. The developer can pass the corresponding preprocessing parameters through the compilation option during the model compilation stage, and the compiler will directly insert the corresponding preprocessing operator before the model operation, and the generated bmodel or cvimodel can directly use the image before preprocessing as input. Input, use the TPU to process the pre-processing operation along with the model reasoning process.
preprocessing type | BM168x | CV18xx |
---|---|---|
image cropping | True | True |
Normalized calculation | True | True |
NHWC to NCHW | True | True |
BGR/RGB conversion | True | True |
Among them, the image cropping will first adjust the image to the corresponding size according to the “–resize_dims” parameter input when using the model_transform tool, and then crop it to the size input by the model. The normalization calculation supports directly normalizing the image data that has not been preprocessed (that is, the data in the unsigned int8 format).
To incorporate preprocessing into the model, you need to use the "--fuse_preprocess" parameter when deploying with the model_deploy tool. If verification is to be done, the input test_input needs to be input in the original format of the image (ie jpg, jpeg and png format), and correspondingly, an npz file corresponding to the original image input will be generated, named ${model_name}_in_ori.npz
.
In addition, when the actual external input format is different from the format of the model, use "–customization_format" to specify the actual external input format. The supported formats are described as follows:
customization_format | illustrate | BM1684X | CV18xx |
---|---|---|---|
None | Consistent with the original model input, no processing is done. default | True | True |
RGB_PLANAR | rgb order, placed according to nchw | True | True |
RGB_PACKED | rgb order, placed according to nhwc | True | True |
BGR_PLANAR | bgr order, placed according to nchw | True | True |
BGR_PACKED | bgr order, placed according to nhwc | True | True |
GRAYSCALE | There is only one gray channel, press nchw | True | True |
YUV420_PLANAR | yuv420 planner format, input from vpss | False | True |
YUV_NV21 | NV21 format of yuv420, input from vpss | False | True |
YUV_NV12 | NV12 format of yuv420, input from vpss | False | True |
RGBA_PLANAR | rgba format, placed according to nchw | False | True |
Among them, the "YUV*" type format is the unique input format of CV18xx series chips. When the order of color channels in customization_format is different from the model input, a channel conversion operation will be performed. If the customization_format parameter is not set in the command, the corresponding customization_format will be automatically obtained according to the pixel_format and channel_format parameters defined when using the model_transform tool.
Take the mobilenet_v2 model as an example, refer to the chapter "Compile the Caffe model", use the model_transform tool in the tpu-mlir/regression/regression_out/ directory to generate the original mlir, and use the run_calibration tool to generate the calibration table.
(2) BM1684X deployment
The command to generate the fusion preprocessed INT8 symmetrical quantized bmodel model is as follows:
$ model_deploy.py \
--mlir mobilenet_v2.mlir \
--quantize INT8 \
--calibration_table mobilenet_v2_cali_table \
--chip bm1684x \
--test_input ../image/cat.jpg \
--test_reference mobilenet_v2_top_outputs.npz \
--tolerance 0.96,0.70 \
--fuse_preprocess \
--model mobilenet_v2_bm1684x_int8_sym_fuse_preprocess.bmodel
The process output diagram is as follows:
The final generated file is as follows:
7. Transfer each frame model to ONNX for reference
This chapter mainly refers to how to convert PyTorch, TensorFlow and PaddlePaddle models to ONNX models. Readers can also refer to the model conversion tutorial provided by ONNX official warehouse: https://github.com/onnx/tutorials.
All the operations in this chapter are performed in the Docker container. For the specific environment configuration method, please refer to Chapter 1.
(1) PyTorch model to ONNX
This section takes a self-built simple PyTorch model as an example to perform onnx conversion, and the environment configuration and directory are consistent with Section 1.
Step 0: Create a working directory
Create and enter the torch_model directory on the command line.
$ mkdir torch_model
$ cd torch_model
Step 1: Build and save the model
Create a script named simple_net.py in this directory and run it. The specific content of the script is as follows:
#!/usr/bin/env python3
import torch
# Build a simple nn model
class SimpleModel(torch.nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.m1 = torch.nn.Conv2d(3, 8, 3, 1, 0)
self.m2 = torch.nn.Conv2d(8, 8, 3, 1, 1)
def forward(self, x):
y0 = self.m1(x)
y1 = self.m2(y0)
y2 = y0 + y1
return y2
# Create a SimpleModel and save its weight in the current directory
model = SimpleModel()
torch.save(model.state_dict(), "weight.pth")
After running, we will get a weight.pth weight file in the current directory.
Step 2: Export ONNX model
Create another script named export_onnx.py in this directory and run it. The specific content of the script is as follows:
#!/usr/bin/env python3
import torch
from simple_net import SimpleModel
# Load the pretrained model and export it as onnx
model = SimpleModel()
model.eval()
checkpoint = torch.load("weight.pth", map_location="cpu")
model.load_state_dict(checkpoint)
# Prepare input tensor
input = torch.randn(1, 3, 16, 16, requires_grad=True)
# Export the torch model as onnx
torch.onnx.export(model,
input,
'model.onnx', # name of the exported onnx model
opset_version=13,
export_params=True,
do_constant_folding=True)
After running the script, we can get the onnx model named model.onnx in the current directory.
(2) TensorFlow model to ONNX
This section uses the mobilenet_v1_0.25_224 model provided in the TensorFlow official repository as a conversion example.
Step 0: Create a working directory
Create and enter the tf_model directory on the command line.
$ mkdir tf_model
$ cd tf_model
Step 1: Prepare and convert the model
Download the model with the following command on the command line and use the tf2onnx tool to export it as an ONNX model:
$ wget -nc http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_224.tgz
# tar to get "*.pb" model def file
$ tar xzf mobilenet_v1_0.25_224.tgz
[
$ python -m tf2onnx.convert --graphdef mobilenet_v1_0.25_224_frozen.pb \
--output mnet_25.onnx --inputs input:0 \
--inputs-as-nchw input:0 \
--outputs MobilenetV1/Predictions/Reshape_1:0
After running all the above commands, we can get the onnx model named mnet_25.onnx in the current directory.
(3) PaddlePaddle model to ONNX
This section uses the SqueezeNet1_1 model provided in the official PaddlePaddle repository as a conversion example.
Step 0: Create a working directory
Create and enter the pp_model directory on the command line.
$ mkdir pp_model
$ cd pp_model
Step 1: Prepare the model
Download the model with the following command on the command line:
$ wget https://bj.bcebos.com/paddlehub/fastdeploy/SqueezeNet1_1_infer.tgz
$ tar xzf SqueezeNet1_1_infer.tgz
$ cd SqueezeNet1_1_infer
And use the paddle_infer_shape.py script in the PaddlePaddle project to perform shape inference on the model. Here, the input shape is set to [1,3,224,224] in NCHW format:
$ wget https://raw.githubusercontent.com/PaddlePaddle/Paddle2ONNX/develop/tools/paddle/paddle_infer_shape.py
$ python paddle_infer_shape.py --model_dir . \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_dir new_model \
--input_shape_dict="{'inputs':[1,3,224,224]}"
After running all the above commands, we will be in the SqueezeNet1_1_infer directory, and there will be a new_model directory under this directory.
Step 2: Convert the model
Install the paddle2onnx tool through the following command on the command line, and use this tool to convert the PaddlePaddle model to the ONNX model:
$ pip install paddle2onnx
$ paddle2onnx --model_dir new_model \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--opset_version 13 \
--save_file squeezenet1_1.onnx
After running all the above commands we will get an onnx model named squeezenet1_1.onnx.
8. BM168x Test Guide
(1) Configure the system environment
If you are using Docker for the first time, please use the method in the development environment configuration to install and configure Docker.
For the first time use git-lfs
, you can execute the following commands to install and configure
[Only for the first time, and the configuration is in the user's own system , not in the Docker container ]
$ sudo apt install curl
$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
$ sudo apt-get install git-lfs
BM1684X cloud platform PCIE mode:
(2) Get the model-zoo model
In tpu-mlir_xxxx.tar.gz
the same level directory (tpu-mlir release package), use the following command to clone model-zoo
the project:
$ git clone --depth=1 https://github.com/sophgo/model-zoo
$ cd model-zoo
$ git lfs pull --include "*.onnx,*.jpg,*.JPEG" --exclude=""
$ cd ../
The pull process of lfs in BM1684X cloud platform PCIE mode:
If it has been cloned, model-zoo
you can execute the following command to synchronize the model to the latest state:
$ cd model-zoo
$ git pull
$ git lfs pull --include "*.onnx,*.jpg,*.JPEG" --exclude=""
$ cd ../
This process GitHub
downloads a large amount of data from . Due to differences in specific network environments, this process may take a long time.
(3) Obtain the tpu-perf tool
Download the latest tpu-perf
wheel installation package from https://github.com/sophgo/tpu-perf/releases address. For example: tpu_perf-xxx-py3-none-manylinux2014_x86_64.whl . And tpu-perf
place the package in model-zoo
the same directory as . The directory structure at this point should be as follows:
├── tpu_perf-x.x.x-py3-none-manylinux2014_x86_64.whl
├── tpu-mlir_xxxx
└── model-zoo
(4) Deployment test
Enter the docker container and activate the environment variable of tpu-mlir, where XXXX indicates the directory where tpu_mlir is stored.
$ docker exec -it 容器id /bin/bash
$ source XXXX/XXXX/XXXX/envsetup.sh
Installtpu-perf
$ pip3 install ../tpu_perf-x.x.x-py3-none-manylinux2014_x86_64.whl
(5) Compile the model
model-zoo
The relevant confg.yaml
configuration of the SDK test content. For example: The configuration file for resnet18 is model-zoo/vision/classification/resnet18-v2/config.yaml
.
Execute the following command to run all test samples:
$ cd ../model-zoo
$ python3 -m tpu_perf.build --mlir -l full_cases.txt
At this point the following models are compiled:
* efficientnet-lite4
* mobilenet_v2
* resnet18
* resnet50_v2
* shufflenet_v2
* squeezenet1.0
* vgg16
* yolov5s
After the command ends normally, you will see the newly generated output
folder (the test output content is in this folder). Modify output
the properties of the folder to ensure that it can be accessed by systems outside Docker.
$ chmod -R a+rw output
(6) PCIE mode running test
Running the test needs to be performed in an environment outside of Docker (here, it is assumed that you have installed and configured the 1684X device and driver), and you can exit the Docker environment:
$ exit
Run the following command under the PCIE board to test the generated bmodel
performance.
$ pip3 install ./tpu_perf-*-py3-none-manylinux2014_x86_64.whl
$ cd model-zoo
$ python3 -m tpu_perf.run --mlir -l full_cases.txt
Note: If multiple SOPHGO accelerator cards are installed on the host, you can specify the running device tpu_perf
by adding when using . like:--devices id
tpu_perf
$ python3 -m tpu_perf.run --devices 2 --mlir -l full_cases.txt
Installtpu-perf
$ pip3 install ../tpu_perf-x.x.x-py3-none-manylinux2014_x86_64.whl