TPU-MLIR environment construction and use

1. Development environment configuration

  • Linux development environment
    1. An x86 host with Ubuntu 16.04/18.04/20.04 installed, the running memory is recommended to be more than 12GB
    2. Download the SophonSDK development kit (v23.03.01)

insert image description here

(1) Unzip the SDK package

sudo apt-get install p7zip
sudo apt-get install p7zip-full
7z x Release_<date>-public.zip
cd Release_<date>-public

(2) Docker installation – TPU-MLIR environment initialization

# 安装docker
sudo apt-get install docker.io
# docker命令免root权限执行
# 创建docker用户组,若已有docker组会报错,没关系可忽略
sudo groupadd docker
# 将当前用户加入docker组
sudo gpasswd -a ${USER} docker
# 重启docker服务
sudo service docker restart
# 切换当前会话到新group或重新登录重启X会话
newgrp docker
提示:需要logout系统然后重新登录,再使用docker就不需要sudo了。

(3) Create a docker container and enter Docker

docker run -v $PWD/:/workspace -p 8001:8001 -it sophgo/tpuc_dev:latest

insert image description here

(4) Load tpu-mlir – activate environment variables

The following operations need to be in a Docker container. For the use of Docker, please refer to Start Docker Container .

$ tar zxf tpu-mlir_xxxx.tar.gz
$ source tpu-mlir_xxxx/envsetup.sh

_xxxx represents the version number of tpu-mlir, and envsetup.shthe following environment variables will be added:

variable name value illustrate
TPUC_ROOT tpu-mlir_xxx The location of the SDK package after decompression
MODEL_ZOO_PATH ${TPUC_ROOT}/…/model-zoo The location of the model-zoo folder, which is at the same level as the SDK

envsetup.shThe modified content of the environment variable is:

export PATH=${TPUC_ROOT}/bin:$PATH
export PATH=${TPUC_ROOT}/python/tools:$PATH
export PATH=${TPUC_ROOT}/python/utils:$PATH
export PATH=${TPUC_ROOT}/python/test:$PATH
export PATH=${TPUC_ROOT}/python/samples:$PATH
export LD_LIBRARY_PATH=$TPUC_ROOT/lib:$LD_LIBRARY_PATH
export PYTHONPATH=${TPUC_ROOT}/python:$PYTHONPATH
export MODEL_ZOO_PATH=${TPUC_ROOT}/../model-zoo

insert image description here

2. Compile the ONNX format model

This chapter takes as yolov5s.onnxan example to introduce how to compile and migrate an onnx model to run on the BM1684X TPU platform.

The model comes from yolov5's official website: https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.onnx

(1) Prepare the working directory, model files and data

Create model_yolov5sa directory, note that it is the same level directory as tpu-mlir; and put both model files and image files into model_yolov5sthe directory.

The operation command is as follows:

$ mkdir model_yolov5s && cd model_yolov5s
$ cp $TPUC_ROOT/regression/model/yolov5s.onnx .
$ cp -rf $TPUC_ROOT/regression/dataset/COCO2017 .
$ cp -rf $TPUC_ROOT/regression/image .
$ mkdir workspace && cd workspace

Here $TPUC_ROOTis the environment variable, corresponding to the tpu-mlir_xxxx directory.

The conversion model is mainly divided into two steps (executed in docker)

(2) ONNX to MLIR

​ One is by model_transform.pyconverting the original model into a mlir file;

If the model is an image input, we need to understand the preprocessing of the model before converting the model. If the model uses the preprocessed npz file as input, there is no need to consider preprocessing. The preprocessing process is expressed as follows (x represents input):

insert image description here

The picture of yolov5 on the official website is rgb, each value will be multiplied 1/255, converted into mean and scale corresponding to 0.0,0.0,0.0and 0.0039216,0.0039216,0.0039216.

The model conversion command is as follows:

$ model_transform.py \
    --model_name yolov5s \
    --model_def ../yolov5s.onnx \
    --input_shapes [[1,3,640,640]] \
    --mean 0.0,0.0,0.0 \
    --scale 0.0039216,0.0039216,0.0039216 \
    --keep_aspect_ratio \
    --pixel_format rgb \
    --output_names 350,498,646 \
    --test_input ../image/dog.jpg \
    --test_result yolov5s_top_outputs.npz \
    --mlir yolov5s.mlir \
    --post_handle_type yolo

Output of the conversion process:

insert image description here
insert image description here
insert image description here

The final generated ${model_name}_in_f32.npzfiles are as follows:
insert image description here

(3) MLIR to FP32

​ The second is to model_deploy.pyconvert the mlir file into a bmodel .

To convert the mlir file to f32 bmodel, the command is as follows:

$ model_deploy.py \
    --mlir yolov5s.mlir \
    --quantize F32 \
    --chip bm1684x \
    --test_input yolov5s_in_f32.npz \
    --test_reference yolov5s_top_outputs.npz \
    --tolerance 0.99,0.99 \
    --model yolov5s_1684x_f32.bmodel

The final generated ${model_name}_1684x_f32.bmodeland other related files are as follows:
insert image description here

(4) MLIR to INT8 model

Generate Calibration Table

Before converting to INT8 model, you need to run calibration to get the calibration table; the number of input data is prepared according to the situation. About 100~1000 sheets.

Then with the calibration table, a symmetric or asymmetric bmodel is generated. If the symmetry meets the requirements, it is generally not recommended to use the asymmetric model, because the performance of the asymmetric model will be slightly worse than that of the symmetric model.

Here we use the existing 100 pictures from COCO2017 as an example to perform calibration:

$ run_calibration.py yolov5s.mlir \
    --dataset ../COCO2017 \
    --input_num 100 \
    -o yolov5s_cali_table

Operation process diagram:

insert image description here

${model_name}_cali_tableAfter the operation is completed , a file named will be generated , which is used as the input file for subsequent compilation of the INT8 model.
insert image description here

Compile to INT8 symmetric quantization model

To convert to INT8 symmetric quantization model, execute the following command:

$ model_deploy.py \
    --mlir yolov5s.mlir \
    --quantize INT8 \
    --calibration_table yolov5s_cali_table \
    --chip bm1684x \
    --test_input yolov5s_in_f32.npz \
    --test_reference yolov5s_top_outputs.npz \
    --tolerance 0.85,0.45 \
    --model yolov5s_1684x_int8_sym.bmodel

The output of the conversion process is as follows:
insert image description here

insert image description here

The final generated ${model_name}_1684x_int8_sym.bmodeland other related files are as follows:

insert image description here

Compile to INT8 asymmetric quantization model

To convert to INT8 asymmetric quantization model, execute the following command:

$ model_deploy.py \
    --mlir yolov5s.mlir \
    --quantize INT8 \
    --asymmetric \
    --calibration_table yolov5s_cali_table \
    --chip bm1684x \
    --test_input yolov5s_in_f32.npz \
    --test_reference yolov5s_top_outputs.npz \
    --tolerance 0.90,0.55 \
    --model yolov5s_1684x_int8_asym.bmodel

After the compilation is complete, ${model_name}_1684x_int8_asym.bmodela file named will be generated.

insert image description here

insert image description here

Effect comparison

In this release package, there are yolov5 use cases and source code paths written in python $TPUC_ROOT/python/samples/detect_yolov5.py, which are used for object detection on pictures. Read the code to understand how the model is used: first pre-process to get the input of the model, then infer to get the output, and finally do post-processing. Use the following codes to verify the execution results of onnx/f32/int8 respectively.

The implementation of the onnx model is as follows, to get dog_onnx.jpg:

$ detect_yolov5.py \
    --input ../image/dog.jpg \
    --model ../yolov5s.onnx \
    --output dog_onnx.jpg

The execution method of f32 bmodel is as follows, and we get dog_f32.jpg:

$ detect_yolov5.py \
    --input ../image/dog.jpg \
    --model yolov5s_1684x_f32.bmodel \
    --output dog_f32.jpg

The execution method of int8 symmetric bmodel is as follows, and it is obtained dog_int8_sym.jpg:

$ detect_yolov5.py \
    --input ../image/dog.jpg \
    --model yolov5s_1684x_int8_sym.bmodel \
    --output dog_int8_sym.jpg

The implementation of int8 asymmetric bmodel is as follows, get dog_int8_asym.jpg:

$ detect_yolov5.py \
    --input ../image/dog.jpg \
    --model yolov5s_1684x_int8_asym.bmodel \
    --output dog_int8_asym.jpg

The final detection image file is generated as follows:

insert image description here

The comparison of the detection accuracy of the four pictures is as follows:

insert image description here

由于运行环境不同, 最终的效果和精度与上图会有些差异。

problem – solution

The output information error encountered when transferring mlir to F32 :

insert image description here

resolution process

Error 1 solution:

[Cause analysis] Due to model_transform.pythe post-processing of the file by yolo, the shapes of the two generated npz files are different, resulting in an error in operation.

[Solution] model_transform.pyModify the running command and process the operation steps.

(Delete the –post_handle_type yolo option, the shape of the npz file generated by the default post-processing can be consistent, and the FP32 format model is successfully generated)

[Solution process] – analyze the error and track the cause of the error step by step

mlir_shell.pyThe runtime error

insert image description here
insert image description here

3. Compile the TFlite format model

First, set up according to the environment configuration in 1;

Taking resnet50_int8.tflitethe model as an example, how to compile and migrate a TFLite model to run on the BM1684X TPU platform.

(1) Prepare the working directory, model files and data

Create model_resnet50_tfa directory, note that it is the same level directory as tpu-mlir; and put the test image file into model_resnet50_tfthe directory.

The operation is as follows:

$ mkdir model_resnet50_tf && cd model_resnet50_tf
$ cp $TPUC_ROOT/regression/model/resnet50_int8.tflite .
$ cp -rf $TPUC_ROOT/regression/image .
$ mkdir workspace && cd workspace

Here $TPUC_ROOTis the environment variable, corresponding to the tpu-mlir_xxxx directory.

(2) TFLite to MLIR

The model in this example is bgr input, mean is 103.939,116.779,123.68, scale is1.0,1.0,1.0

The model conversion command is as follows:

$ model_transform.py \
    --model_name resnet50_tf \
    --model_def  ../resnet50_int8.tflite \
    --input_shapes [[1,3,224,224]] \
    --mean 103.939,116.779,123.68 \
    --scale 1.0,1.0,1.0 \
    --pixel_format bgr \
    --test_input ../image/cat.jpg \
    --test_result resnet50_tf_top_outputs.npz \
    --mlir resnet50_tf.mlir

After converting to mlir file, a resnet50_tf_in_f32.npzfile will be generated, which is the input file of the model.

resnet50_tf_in_f32.npzThe output file is as follows. After converting to mlir file, a file will be generated , which is the input file of the model.
insert image description here

(3) MLIR to model

This model is a tflite asymmetric quantization model, which can be converted into a model according to the following parameters:

$ model_deploy.py \
    --mlir resnet50_tf.mlir \
    --quantize INT8 \
    --asymmetric \
    --chip bm1684x \
    --test_input resnet50_tf_in_f32.npz \
    --test_reference resnet50_tf_top_outputs.npz \
    --model resnet50_tf_1684x.bmodel

After the compilation is complete, resnet50_tf_1684x.bmodela file named will be generated.
insert image description here

4. Compile the Caffe format model

First, set up according to the environment configuration in 1;

This chapter takes mobilenet_v2_deploy.prototxtand mobilenet_v2.caffemodelas examples to introduce how to compile and migrate a caffe model to run on the BM1684X TPU platform.

(1) Prepare the working directory, model files and data

Create mobilenet_v2a directory, note that it is the same level directory as tpu-mlir; and put both model files and image files into mobilenet_v2the directory.

The operation is as follows:

$ mkdir mobilenet_v2 && cd mobilenet_v2
$ cp $TPUC_ROOT/regression/model/mobilenet_v2_deploy.prototxt .
$ cp $TPUC_ROOT/regression/model/mobilenet_v2.caffemodel .
$ cp -rf $TPUC_ROOT/regression/dataset/ILSVRC2012 .
$ cp -rf $TPUC_ROOT/regression/image .
$ mkdir workspace && cd workspace

Here $TPUC_ROOTis the environment variable, corresponding to the tpu-mlir_xxxx directory.

(2) Caffe to MLIR

103.94,116.78,123.68The model in this example is BGR input, mean and scale are and , respectively 0.017,0.017,0.017.

The model conversion command is as follows:

$ model_transform.py \
    --model_name mobilenet_v2 \
    --model_def ../mobilenet_v2_deploy.prototxt \
    --model_data ../mobilenet_v2.caffemodel \
    --input_shapes [[1,3,224,224]] \
    --resize_dims=256,256 \
    --mean 103.94,116.78,123.68 \
    --scale 0.017,0.017,0.017 \
    --pixel_format bgr \
    --test_input ../image/cat.jpg \
    --test_result mobilenet_v2_top_outputs.npz \
    --mlir mobilenet_v2.mlir

${model_name}_in_f32.npzAfter conversion, the output file is as follows. After converting to mlir file, a file will be generated , which is the input file of the model.
insert image description here

(3) MLIR to F32 model

Convert the mlir file to f32 bmodel, the operation method is as follows:

$ model_deploy.py \
    --mlir mobilenet_v2.mlir \
    --quantize F32 \
    --chip bm1684x \
    --test_input mobilenet_v2_in_f32.npz \
    --test_reference mobilenet_v2_top_outputs.npz \
    --tolerance 0.99,0.99 \
    --model mobilenet_v2_1684x_f32.bmodel

After the compilation is complete, ${model_name}_1684x_f32.bmodela file named will be generated.
insert image description here

(4) MLIR to INT8 model

Generate Calibration Table

Before converting to INT8 model, you need to run calibration to get the calibration table; the number of input data is prepared according to the situation. About 100~1000 sheets.

Then with the calibration table, a symmetric or asymmetric bmodel is generated. If the symmetry meets the requirements, it is generally not recommended to use the asymmetric model, because the performance of the asymmetric model will be slightly worse than that of the symmetric model.

Here we use the existing 100 pictures from ILSVRC2012 as an example to perform calibration:

$ run_calibration.py mobilenet_v2.mlir \
    --dataset ../ILSVRC2012 \
    --input_num 100 \
    -o mobilenet_v2_cali_table

Generate calibration table process:

insert image description here
.After
the operation is completed, ${model_name}_cali_tablea file named will be generated, which is used as the input file for subsequent compilation of the INT8 model.
insert image description here

Compile to INT8 symmetric quantization model

To convert to INT8 symmetric quantization model, execute the following command:

$ model_deploy.py \
    --mlir mobilenet_v2.mlir \
    --quantize INT8 \
    --calibration_table mobilenet_v2_cali_table \
    --chip bm1684x \
    --test_input mobilenet_v2_in_f32.npz \
    --test_reference mobilenet_v2_top_outputs.npz \
    --tolerance 0.96,0.70 \
    --model mobilenet_v2_1684x_int8_sym.bmodel

After the compilation is complete, ${model_name}_1684x_int8_sym.bmodela file named will be generated.

Compile to INT8 asymmetric quantization model

To convert to INT8 asymmetric quantization model, execute the following command:

$ model_deploy.py \
    --mlir mobilenet_v2.mlir \
    --quantize INT8 \
    --asymmetric \
    --calibration_table mobilenet_v2_cali_table \
    --chip bm1684x \
    --test_input mobilenet_v2_in_f32.npz \
    --test_reference mobilenet_v2_top_outputs.npz \
    --tolerance 0.95,0.69 \
    --model mobilenet_v2_1684x_int8_asym.bmodel

After the compilation is complete, ${model_name}_1684x_int8_asym.bmodela file named will be generated.

5. How to use mixed precision

First, set up according to the environment configuration in 1;

This chapter takes the detection network yolov3 tinymodel as an example to introduce how to use mixed precision. The model is from https://github.com/onnx/models/tree/main/vision/object_detection_segmentation/tiny-yolov3.

(1) Prepare the working directory, model files and data

Create yolov3_tinya directory, note that it is the same level directory as tpu-mlir; and put both model files and image files into yolov3_tinythe directory.

The operation is as follows:

$ mkdir yolov3_tiny && cd yolov3_tiny
$ wget https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/tiny-yolov3/model/tiny-yolov3-11.onnx
$ cp -rf $TPUC_ROOT/regression/dataset/COCO2017 .
$ mkdir workspace && cd workspace

Here $TPUC_ROOTis the environment variable, corresponding to the tpu-mlir_xxxx directory. Note that if tiny-yolov3-11.onnxthe download with wget fails, you can download the original package by visiting github and put it in yolov3_tinythe folder.

(2) Verify the original model

detect_yolov3.pyIt is a verification program that has been written and can be used to yolov3_tinyverify the network. The execution process is as follows:

$ detect_yolov3.py \
     --model ../tiny-yolov3-11.onnx \
     --input ../COCO2017/000000366711.jpg \
     --output yolov3_onnx.jpg

After execution, print the detected results as follows:

person:60.7%
orange:77.5%

And get the picture yolov3_onnx.jpg, as follows

insert image description here

(3) Convert to INT8 symmetric quantization model

As in the transfer model method introduced in the previous chapter, there is no parameter description here, only the operation process.

Step 1: Convert to F32 mlir
$ model_transform.py \
    --model_name yolov3_tiny \
    --model_def ../tiny-yolov3-11.onnx \
    --input_shapes [[1,3,416,416]] \
    --scale 0.0039216,0.0039216,0.0039216 \
    --pixel_format rgb \
    --keep_aspect_ratio \
    --pad_value 128 \
    --output_names=transpose_output1,transpose_output \
    --mlir yolov3_tiny.mlir

insert image description here

The final generated file is as follows:
insert image description here

Step 2: Generate calibration table
$ run_calibration.py yolov3_tiny.mlir \
    --dataset ../COCO2017 \
    --input_num 100 \
    -o yolov3_cali_table

insert image description here

Generate the calibration table file as follows:
insert image description here

Step 3: Rotationally Symmetric Quantized Model
$ model_deploy.py \
    --mlir yolov3_tiny.mlir \
    --quantize INT8 \
    --calibration_table yolov3_cali_table \
    --chip bm1684x \
    --model yolov3_int8.bmodel

The final output file is as follows:

insert image description here

Step 4: Validate the model
$ detect_yolov3.py \
     --model yolov3_int8.bmodel \
     --input ../COCO2017/000000366711.jpg \
     --output yolov3_int8.jpg

After execution, the following print information is displayed:

person:64.0%
orange:73.0%

[

Get the picture yolov3_int8.jpg, as follows
insert image description here

It can be seen that compared with the original model, the int8 symmetric quantization model has a poor detection effect on orange individuals in this picture.

(4) Convert to a mixed-precision quantization model

On the basis of converting to int8 symmetric quantization model, perform the following steps.

Step 1: Generate a mixed-precision quantization table

Use to run_qtable.pygenerate a mixed-precision quantization table, and the relevant parameters are described as follows:

parameter name required? illustrate
none yes specify mlir file
dataset no Specify the directory of the input sample, the path to put the corresponding picture, or npz, or npy
data_list no Specify the sample list, and dataset must choose one of the two
calibration_table yes Enter the calibration table
chip yes Specify the platform that the model will use, support bm1684x/bm1684/cv183x/cv182x/cv181x/cv180x
fp_type no Specifies the float type used in mixed precision, supports auto, F16, F32, BF16, the default is auto, which means it is automatically selected by the program
input_num no Specify the number of input samples, the default is 10
expected_cos no Specify the minimum cos value of the final output layer of the expected network, generally the default is 0.99, the smaller the value, the more layers may be set as floating point calculations
min_layer_cos no Specify the minimum value of expected output cos of each layer. Below this value, it will try to set floating-point calculation. Generally, the default is 0.99.
debug_cmd no Specify the debug command string, for development use, the default is empty
o yes Output mixed precision quantization table

In this example, the default 10 pictures are used for calibration, and the execution command is as follows (for CV18xx series chips, set chip to the corresponding chip name):

$ run_qtable.py yolov3_tiny.mlir \
    --dataset ../COCO2017 \
    --calibration_table yolov3_cali_table \
    --chip bm1684x \
    --min_layer_cos 0.999 \ #若这里使用默认的0.99时,程序会检测到原始int8模型已满足0.99的cos,从而直接不再搜素
    --expected_cos 0.9999 \
    -o yolov3_qtable

After execution, the final output is printed as follows: (The output information of different compilation hosts may be slightly different)
insert image description here

The above int8 outputs_cos indicates the cos similarity between the original network output of the int8 model and fp32, the mix model outputs_cos indicates the cos similarity of the network output after some layers use mixed precision, and the total time indicates that the search time is 11.2 seconds. In addition, the generated mixed precision quantization table yolov3_qtable.
insert image description here

yolov3_qtableThe content is as follows:
insert image description here

In this table, the first column indicates the corresponding layer, the second column indicates the type, and the supported types are F32/F16/BF16/INT8. In addition, a loss table file will also be generated at the same time full_loss_table.txt.

full_loss_table.txtThe content is as follows:

insert image description here

The table is arranged smoothly according to cos from small to large, which means that after the predecessor Layer of this layer has been changed to the corresponding floating point mode according to its respective cos, the cos calculated by this layer, if the cos is still smaller than the previous min_layer_cos parameter, it will be the Layers and immediate successor layers are set to floating-point calculations. run_qtable.pyIt will continue to calculate the output cos of the entire network after setting a certain adjacent 2 layers as floating-point calculation each time. If the cos is greater than the specified expected_cos, the search will exit. Therefore, if you set a larger expected_cos, you will try to set more layers to floating-point calculations.

Step 2: Generate a mixed-precision quantization model
$ model_deploy.py \
    --mlir yolov3_tiny.mlir \
    --quantize INT8 \
    --quantize_table yolov3_qtable \
    --calibration_table yolov3_cali_table \
    --chip bm1684x \
    --model yolov3_mix.bmodel

The final generated file is as follows:
insert image description here

Step 3: Validate the mixed precision model
$ detect_yolov3.py \
     --model yolov3_mix.bmodel \
     --input ../COCO2017/000000366711.jpg \
     --output yolov3_mix.jpg

After execution, the printed result is:

person:64.0%
orange:72.9%

Get the picture yolov3_mix.jpg, as follows
insert image description here

It should be noted that, in addition to using run_qtable to generate the quantization table, you can also set the name and quantization type of the OP that needs to be quantized with mixed precision in the quantization table according to the similarity comparison results of each layer in the model.

6. Model deployment example

(1) Use TPU for pre-processing

At present, the two main series of chips supported by TPU-MLIR, BM168x and CV18xx, both support common image preprocessing to be added to the model for calculation. The developer can pass the corresponding preprocessing parameters through the compilation option during the model compilation stage, and the compiler will directly insert the corresponding preprocessing operator before the model operation, and the generated bmodel or cvimodel can directly use the image before preprocessing as input. Input, use the TPU to process the pre-processing operation along with the model reasoning process.

preprocessing type BM168x CV18xx
image cropping True True
Normalized calculation True True
NHWC to NCHW True True
BGR/RGB conversion True True

Among them, the image cropping will first adjust the image to the corresponding size according to the “–resize_dims” parameter input when using the model_transform tool, and then crop it to the size input by the model. The normalization calculation supports directly normalizing the image data that has not been preprocessed (that is, the data in the unsigned int8 format).

To incorporate preprocessing into the model, you need to use the "--fuse_preprocess" parameter when deploying with the model_deploy tool. If verification is to be done, the input test_input needs to be input in the original format of the image (ie jpg, jpeg and png format), and correspondingly, an npz file corresponding to the original image input will be generated, named ${model_name}_in_ori.npz.

In addition, when the actual external input format is different from the format of the model, use "–customization_format" to specify the actual external input format. The supported formats are described as follows:

customization_format illustrate BM1684X CV18xx
None Consistent with the original model input, no processing is done. default True True
RGB_PLANAR rgb order, placed according to nchw True True
RGB_PACKED rgb order, placed according to nhwc True True
BGR_PLANAR bgr order, placed according to nchw True True
BGR_PACKED bgr order, placed according to nhwc True True
GRAYSCALE There is only one gray channel, press nchw True True
YUV420_PLANAR yuv420 planner format, input from vpss False True
YUV_NV21 NV21 format of yuv420, input from vpss False True
YUV_NV12 NV12 format of yuv420, input from vpss False True
RGBA_PLANAR rgba format, placed according to nchw False True

Among them, the "YUV*" type format is the unique input format of CV18xx series chips. When the order of color channels in customization_format is different from the model input, a channel conversion operation will be performed. If the customization_format parameter is not set in the command, the corresponding customization_format will be automatically obtained according to the pixel_format and channel_format parameters defined when using the model_transform tool.

Take the mobilenet_v2 model as an example, refer to the chapter "Compile the Caffe model", use the model_transform tool in the tpu-mlir/regression/regression_out/ directory to generate the original mlir, and use the run_calibration tool to generate the calibration table.

(2) BM1684X deployment

The command to generate the fusion preprocessed INT8 symmetrical quantized bmodel model is as follows:

$ model_deploy.py \
    --mlir mobilenet_v2.mlir \
    --quantize INT8 \
    --calibration_table mobilenet_v2_cali_table \
    --chip bm1684x \
    --test_input ../image/cat.jpg \
    --test_reference mobilenet_v2_top_outputs.npz \
    --tolerance 0.96,0.70 \
    --fuse_preprocess \
    --model mobilenet_v2_bm1684x_int8_sym_fuse_preprocess.bmodel

The process output diagram is as follows:

insert image description here

The final generated file is as follows:
insert image description here

7. Transfer each frame model to ONNX for reference

This chapter mainly refers to how to convert PyTorch, TensorFlow and PaddlePaddle models to ONNX models. Readers can also refer to the model conversion tutorial provided by ONNX official warehouse: https://github.com/onnx/tutorials.

All the operations in this chapter are performed in the Docker container. For the specific environment configuration method, please refer to Chapter 1.

(1) PyTorch model to ONNX

This section takes a self-built simple PyTorch model as an example to perform onnx conversion, and the environment configuration and directory are consistent with Section 1.

Step 0: Create a working directory

Create and enter the torch_model directory on the command line.

$ mkdir torch_model
$ cd torch_model
Step 1: Build and save the model

Create a script named simple_net.py in this directory and run it. The specific content of the script is as follows:

#!/usr/bin/env python3
import torch
 
# Build a simple nn model
class SimpleModel(torch.nn.Module):
 
   def __init__(self):
       super(SimpleModel, self).__init__()
       self.m1 = torch.nn.Conv2d(3, 8, 3, 1, 0)
      self.m2 = torch.nn.Conv2d(8, 8, 3, 1, 1)

   def forward(self, x):
      y0 = self.m1(x)
      y1 = self.m2(y0)
      y2 = y0 + y1
      return y2

# Create a SimpleModel and save its weight in the current directory
model = SimpleModel()
torch.save(model.state_dict(), "weight.pth")

After running, we will get a weight.pth weight file in the current directory.
insert image description here

Step 2: Export ONNX model

Create another script named export_onnx.py in this directory and run it. The specific content of the script is as follows:

 #!/usr/bin/env python3
 import torch
 from simple_net import SimpleModel
 
 # Load the pretrained model and export it as onnx
 model = SimpleModel()
 model.eval()
 checkpoint = torch.load("weight.pth", map_location="cpu")
 model.load_state_dict(checkpoint)

# Prepare input tensor
input = torch.randn(1, 3, 16, 16, requires_grad=True)

# Export the torch model as onnx
torch.onnx.export(model,
                  input,
                  'model.onnx', # name of the exported onnx model
                  opset_version=13,
                  export_params=True,
                  do_constant_folding=True)

After running the script, we can get the onnx model named model.onnx in the current directory.

(2) TensorFlow model to ONNX

This section uses the mobilenet_v1_0.25_224 model provided in the TensorFlow official repository as a conversion example.

Step 0: Create a working directory

Create and enter the tf_model directory on the command line.

$ mkdir tf_model
$ cd tf_model
Step 1: Prepare and convert the model

Download the model with the following command on the command line and use the tf2onnx tool to export it as an ONNX model:

$ wget -nc http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_224.tgz
# tar to get "*.pb" model def file
$ tar xzf mobilenet_v1_0.25_224.tgz

[

$ python -m tf2onnx.convert --graphdef mobilenet_v1_0.25_224_frozen.pb \
    --output mnet_25.onnx --inputs input:0 \
    --inputs-as-nchw input:0 \
    --outputs MobilenetV1/Predictions/Reshape_1:0

After running all the above commands, we can get the onnx model named mnet_25.onnx in the current directory.

(3) PaddlePaddle model to ONNX

This section uses the SqueezeNet1_1 model provided in the official PaddlePaddle repository as a conversion example.

Step 0: Create a working directory

Create and enter the pp_model directory on the command line.

$ mkdir pp_model
$ cd pp_model
Step 1: Prepare the model

Download the model with the following command on the command line:

$ wget https://bj.bcebos.com/paddlehub/fastdeploy/SqueezeNet1_1_infer.tgz
$ tar xzf SqueezeNet1_1_infer.tgz
$ cd SqueezeNet1_1_infer

And use the paddle_infer_shape.py script in the PaddlePaddle project to perform shape inference on the model. Here, the input shape is set to [1,3,224,224] in NCHW format:

$ wget https://raw.githubusercontent.com/PaddlePaddle/Paddle2ONNX/develop/tools/paddle/paddle_infer_shape.py
$ python paddle_infer_shape.py  --model_dir . \
                          --model_filename inference.pdmodel \
                          --params_filename inference.pdiparams \
                          --save_dir new_model \
                          --input_shape_dict="{'inputs':[1,3,224,224]}"

After running all the above commands, we will be in the SqueezeNet1_1_infer directory, and there will be a new_model directory under this directory.
insert image description here

Step 2: Convert the model

Install the paddle2onnx tool through the following command on the command line, and use this tool to convert the PaddlePaddle model to the ONNX model:

$ pip install paddle2onnx
$ paddle2onnx  --model_dir new_model \
          --model_filename inference.pdmodel \
          --params_filename inference.pdiparams \
          --opset_version 13 \
          --save_file squeezenet1_1.onnx

After running all the above commands we will get an onnx model named squeezenet1_1.onnx.

8. BM168x Test Guide

(1) Configure the system environment

If you are using Docker for the first time, please use the method in the development environment configuration to install and configure Docker.

For the first time use git-lfs, you can execute the following commands to install and configure

[Only for the first time, and the configuration is in the user's own system , not in the Docker container ]

$ sudo apt install curl
$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
$ sudo apt-get install git-lfs

insert image description here

BM1684X cloud platform PCIE mode:
insert image description here

(2) Get the model-zoo model

In tpu-mlir_xxxx.tar.gzthe same level directory (tpu-mlir release package), use the following command to clone model-zoothe project:

$ git clone --depth=1 https://github.com/sophgo/model-zoo
$ cd model-zoo
$ git lfs pull --include "*.onnx,*.jpg,*.JPEG" --exclude=""
$ cd ../

The pull process of lfs in BM1684X cloud platform PCIE mode:

insert image description here

If it has been cloned, model-zooyou can execute the following command to synchronize the model to the latest state:

$ cd model-zoo
$ git pull
$ git lfs pull --include "*.onnx,*.jpg,*.JPEG" --exclude=""
$ cd ../

This process GitHubdownloads a large amount of data from . Due to differences in specific network environments, this process may take a long time.

(3) Obtain the tpu-perf tool

Download the latest tpu-perfwheel installation package from https://github.com/sophgo/tpu-perf/releases address. For example: tpu_perf-xxx-py3-none-manylinux2014_x86_64.whl . And tpu-perfplace the package in model-zoothe same directory as . The directory structure at this point should be as follows:

├── tpu_perf-x.x.x-py3-none-manylinux2014_x86_64.whl
├── tpu-mlir_xxxx
└── model-zoo

insert image description here

(4) Deployment test

Enter the docker container and activate the environment variable of tpu-mlir, where XXXX indicates the directory where tpu_mlir is stored.

$ docker exec -it 容器id /bin/bash
$ source XXXX/XXXX/XXXX/envsetup.sh

insert image description here

Installtpu-perf

$ pip3 install ../tpu_perf-x.x.x-py3-none-manylinux2014_x86_64.whl

(5) Compile the model

model-zooThe relevant confg.yamlconfiguration of the SDK test content. For example: The configuration file for resnet18 is model-zoo/vision/classification/resnet18-v2/config.yaml.

Execute the following command to run all test samples:

$ cd ../model-zoo
$ python3 -m tpu_perf.build --mlir -l full_cases.txt

At this point the following models are compiled:

* efficientnet-lite4
* mobilenet_v2
* resnet18
* resnet50_v2
* shufflenet_v2
* squeezenet1.0
* vgg16
* yolov5s

After the command ends normally, you will see the newly generated outputfolder (the test output content is in this folder). Modify outputthe properties of the folder to ensure that it can be accessed by systems outside Docker.

$ chmod -R a+rw output

(6) PCIE mode running test

Running the test needs to be performed in an environment outside of Docker (here, it is assumed that you have installed and configured the 1684X device and driver), and you can exit the Docker environment:

$ exit

Run the following command under the PCIE board to test the generated bmodelperformance.

$ pip3 install ./tpu_perf-*-py3-none-manylinux2014_x86_64.whl
$ cd model-zoo
$ python3 -m tpu_perf.run --mlir -l full_cases.txt

Note: If multiple SOPHGO accelerator cards are installed on the host, you can specify the running device tpu_perfby adding when using . like:--devices idtpu_perf

$ python3 -m tpu_perf.run --devices 2 --mlir -l full_cases.txt

Installtpu-perf

$ pip3 install ../tpu_perf-x.x.x-py3-none-manylinux2014_x86_64.whl

Guess you like

Origin blog.csdn.net/lily_19861986/article/details/131213536