reference link

mmdeploy/docs/en/01-how-to-build/linux-x86_64.md at main · open-mmlab/mmdeploy

Toolchains installation

OS: Ubuntu 18.04
cuda：11.4

Before installing cuda, you need to confirm whether the driver of your machine supports this version of cuda. You can open the link to view the relationship between the cuda version and the driver . The download link for the driver is: https://www.nvidia.com/download/index. aspx .
After confirming the above information or updating your own driver, you can open the link , choose the appropriate configuration according to your machine, get the download command, download the file and install it.

Add the cuda environment variable to the system

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

secret：8.2

The download link of cudnn is: https://developer.nvidia.com/rdp/cudnn-archive . The cudnn version downloaded in this article is 8.2.1, which is adapted to the cuda 11.x version. Please search for the specific installation process by yourself.

cmake: The cmake version needs to be greater than or equal to 3.14.0, and the installation method is as follows:

wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0-linux-x86_64.tar.gz
tar -xzvf cmake-3.20.0-linux-x86_64.tar.gz
sudo ln -sf $(pwd)/cmake-3.20.0-linux-x86_64/bin/* /usr/bin/

depends on the environment

MMdeploy can be divided into two parts: Model Converter and SDK. Among them, Model Converter converts the model into a format file required by the target inference engine, such as TensorRT, and generates a TensorRT Engine file. The SDK provides multiple programming languages to deploy the target inference engine in industrial production.

Install dependencies for Model Converter

miniconda

install miniconda

Download the miniconda file, open the link , and find the file suitable for your target machine
install minconda

You can install it with the following command, just keep entering during the process. By default, the environment variables of conda will be written into the bashrc file.

sudo bash Miniconda3-latest-Linux-x86_64.sh

Activate the conda environment

The following command can activate the conda environment and enter the base virtual environment.

source ~/.bashrc

Replace conda domestic source

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

If you only want to use domestic sources during installation, you can use the following command to achieve:

conda install -y -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/linux-64 opencv tensorboardX

Create a virtual environment for mmdeploy

conda create -n mmdeploy python=3.8 -y
conda activate mmdeploy

pytorch：1.10.0

Install pytorch 1.10.0 through the following command, the corresponding cudatoolkit is 11.3, and the installed cuda is 11.4, but it has no effect on subsequent tests.

conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge

mmcv

export cu_version=cu114 # cuda 11.4
export torch_version=torch1.10
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0rc2"

Install dependent environment for SDK

OpenCV

sudo apt-get install libopencv-dev

pplcv

git clone https://github.com/openppl-public/ppl.cv.git
cd ppl.cv
export PPLCV_DIR=$(pwd)
git checkout tags/v0.7.0 -b v0.7.0
./build.sh cuda

Install the inference engine

mmdeploy supports reasoning models with different backends. In this article, we mainly test the Tensor RT backend. This article takes the installation of the TensorRT inference engine as an example. For other inference platforms, please refer to the documentation .

TensorRT

Download the TensorRT tar file

tar -xvf TensorRT-8.2.4.2.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz

Add environment variables

# <<< TensorRT 8 <<<

export PATH=/home/xxxx/local/TensorRT-8.2.4.2/bin:$PATH
export LD_LIBRARY_PATH=/home/xxxx/local/TensorRT-8.2.4.2/lib:$LD_LIBRARY_PATH
export C_INCLUDE_PATH=/home/xxxx/local/TensorRT-8.2.4.2/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/home/xxxx/local/TensorRT-8.2.4.2/include:$CPLUS_INCLUDE_PATH

Install the TensorRT python package

cd python
# 根据python的版本安装
pip install tensorrt-8.2.4.2-cp38-none-linux_x86_64.whl

installpycuda

conda install -c conda-forge pycuda

compile mmdeploy

Compiling mmdeploy is divided into two parts, Model Converter and SDK. Among them, the function of the converter is to convert the pytorch model into the corresponding backend inference engine file, and test the performance on different data sets; the SDK provides different programming languages to deploy the backend inference engine to the actual industrial generation.

Model Converter

Compiling the model converter includes two steps, one is to compile the backend custom operator, and the other is to install the mmdeploy-python library.

Compile tensorrt operator

cd ${
    
    MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${
    
    TENSORRT_DIR} -DCUDNN_DIR=${
    
    CUDNN_DIR} ..
make -j$(nproc) && make install

After the compilation is successful, the libmmdeploy_tensorrt_ops.so dynamic library will be generated. mmdeploy-python will modify the function, module, and symbolic of the model during model conversion, and the corresponding tensorrt custom operator needs to be linked .

Install mmdeploy-python

cd ${
    
    MMDEPLOY_DIR}
mim install -e .

After the installation is complete, the models supported by mmdeploy can be converted accordingly. In fact, the subsequent SDK does not need to be installed.

Build SDK and Demo

This should be the deployment sdk based on mmdeploy. After obtaining the tensorrt engine through mmdeploy-python, the model can be directly deployed through this sdk.

cmake -DCMAKE_CXX_COMPILER=g++-7 \
-DMMDEPLOY_TARGET_BACKENDS=trt \
-DTENSORRT_DIR=/home/xxxx/local/TensorRT-8.2.4.2 \
-DCUDNN_DIR=/home/xxxx/local/cuda/ \
-DMMDEPLOY_BUILD_SDK=ON -DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON -DMMDEPLOY_TARGET_DEVICES="cuda;cpu" ..

conversion model

This paper tests the performance comparison between common models Faster R-CNN, Cascade R-CNN, DeepLabv3 and Mask R-CNN after conversion. For conversion model and test performance, please refer to the following two links:

benchmark

mmdeploy supports the conversion of different types of models. For specific running time and accuracy, please refer to the following link .

mmdet

Corresponding to mmdetection, Yolov3, Faster R-CNN, Cascade R-CNN and Mask R-CNN were tested.
mmdetection/docs/en/model_zoo.md at main · open-mmlab/mmdetection
encountered numpy version problems when installing MM Detection, and the solution was to compile and install cocoapi from source code.

git clone https://github.com/pdollar/coco.git

Screenshot 2023-07-04 17.36.36.png

Yolov3

The conversion engine script is as follows (static shape conversion is normal, but dynamic shape has problems):

#!/bin/bash

export MMDET_ROOT=/home/xxxx/workspace/mmlab/mmdetection
python ./tools/deploy.py \
    configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
    $MMDET_ROOT/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py \
    $MMDET_ROOT/exp/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth \
    $MMDET_ROOT/demo/demo.jpg \
    --work-dir work_dir \
    --device cuda:0

Yolo v3 performed Tensor RT Engine conversion without corresponding performance comparison.

Faster R-CNN

The script for converting engine files is similar to Yolo v3 conversion, only need to replace the corresponding deploy_config and model_config file paths.

#!/bin/bash

export MMDET_ROOT=/home/xxxx/workspace/mmlab/mmdetection
python ./tools/deploy.py \
    configs/mmdet/detection/faster_rcnn_tensorrt_static-800x1344.py \
    $MMDET_ROOT/configs/faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py \
    $MMDET_ROOT/exp/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
    $MMDET_ROOT/demo/demo.jpg \
    --work-dir work_dir \
    --device cuda:0

Test its performance on the COCO2017-val dataset.

pytorch

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.384
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.590
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.420
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.215
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.421
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.503
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.520
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.520
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.520
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.326
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.557
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.661
07/05 14:00:14 - mmengine - INFO - bbox_mAP_copypaste: 0.384 0.590 0.420 0.215 0.421 0.503
07/05 14:00:16 - mmengine - INFO - Results has been saved to exp/results.pkl.
07/05 14:00:16 - mmengine - INFO - Epoch(test) [5000/5000]    
coco/bbox_mAP: 0.3840  
coco/bbox_mAP_50: 0.5900  
coco/bbox_mAP_75: 0.4200  
coco/bbox_mAP_s: 0.2150  
coco/bbox_mAP_m: 0.4210  
coco/bbox_mAP_l: 0.5030  
data_time: 0.0028  time: 0.0402

tensorrt-fp32

onnxAfter the corresponding file is obtained through conversion engine, tools/test.pythe accuracy and speed can be tested through the file.

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.384
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.590
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.419
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.215
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.421
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.502
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.519
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.519
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.519
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.325
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.556
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.661
07/05 14:38:00 - mmengine - INFO - bbox_mAP_copypaste: 0.384 0.590 0.419 0.215 0.421 0.502
07/05 14:38:01 - mmengine - INFO - Epoch(test) [5000/5000]    
coco/bbox_mAP: 0.3840  coco/bbox_mAP_50: 0.5900  
coco/bbox_mAP_75: 0.4190  coco/bbox_mAP_s: 0.2150  
coco/bbox_mAP_m: 0.4210  coco/bbox_mAP_l: 0.5020  
data_time: 0.0030  time: 0.0297

tensorrt-fp16

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.384
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.590
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.418
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.215
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.420
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.501
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.519
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.519
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.519
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.325
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.556
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.659
07/05 14:58:28 - mmengine - INFO - bbox_mAP_copypaste: 0.384 0.590 0.418 0.215 0.420 0.501
07/05 14:58:29 - mmengine - INFO - Epoch(test) [5000/5000]    
coco/bbox_mAP: 0.3840  coco/bbox_mAP_50: 0.5900  
coco/bbox_mAP_75: 0.4180  coco/bbox_mAP_s: 0.2150  
coco/bbox_mAP_m: 0.4200  coco/bbox_mAP_l: 0.5010  
data_time: 0.0028  time: 0.0172

run time test

tensorrt-fp16 (only includes engine reasoning time)

07/05 15:20:09 - mmengine - INFO - [tensorrt]-4750 times per count: 10.58 ms, 94.51 FPS
07/05 15:20:09 - mmengine - INFO - Epoch(test) [4750/5000]    eta: 0:00:04  time: 0.0146  data_time: 0.0017  memory: 38  
07/05 15:20:10 - mmengine - INFO - Epoch(test) [4800/5000]    eta: 0:00:03  time: 0.0208  data_time: 0.0048  memory: 39  
07/05 15:20:11 - mmengine - INFO - [tensorrt]-4850 times per count: 10.58 ms, 94.54 FPS
07/05 15:20:11 - mmengine - INFO - Epoch(test) [4850/5000]    eta: 0:00:02  time: 0.0173  data_time: 0.0031  memory: 39  
07/05 15:20:12 - mmengine - INFO - Epoch(test) [4900/5000]    eta: 0:00:01  time: 0.0142  data_time: 0.0016  memory: 38  
07/05 15:20:13 - mmengine - INFO - [tensorrt]-4950 times per count: 10.58 ms, 94.55 FPS
07/05 15:20:13 - mmengine - INFO - Epoch(test) [4950/5000]    eta: 0:00:00  time: 0.0197  data_time: 0.0024  memory: 39  
07/05 15:20:13 - mmengine - INFO - Epoch(test) [5000/5000]    eta: 0:00:00  time: 0.0148  data_time: 0.0016  memory: 39

Screenshot 2023-07-06 18.41.16.png
Similarly, after getting the TensorRT Engine, we can use the trtexec tool to load the engine test time, and the trtexec reasoning command:

trtexec --loadEngine=work_dir/faster_rcnn_dynamic_fp16/end2end.engine --iterations=5000 --plugins=mmdeploy/lib/libmmdeploy_tensorrt_ops.so --minShapes=input:1x3x320x320 --optShapes=input:1x3x800x1344 --maxShapes=input:1x3x1344x1344 --shapes=input:1x3x800x1344 --fp16 --workspace=8000

tensorrt-fp32

07/05 15:35:09 - mmengine - INFO - [tensorrt]-4750 times per count: 21.97 ms, 45.51 FPS
07/05 15:35:09 - mmengine - INFO - Epoch(test) [4750/5000]    eta: 0:00:07  time: 0.0252  data_time: 0.0016  memory: 38  
07/05 15:35:10 - mmengine - INFO - Epoch(test) [4800/5000]    eta: 0:00:06  time: 0.0262  data_time: 0.0016  memory: 39  
07/05 15:35:11 - mmengine - INFO - [tensorrt]-4850 times per count: 21.96 ms, 45.53 FPS
07/05 15:35:11 - mmengine - INFO - Epoch(test) [4850/5000]    eta: 0:00:04  time: 0.0283  data_time: 0.0024  memory: 39  
07/05 15:35:13 - mmengine - INFO - Epoch(test) [4900/5000]    eta: 0:00:03  time: 0.0262  data_time: 0.0018  memory: 38  
07/05 15:35:14 - mmengine - INFO - [tensorrt]-4950 times per count: 21.97 ms, 45.52 FPS
07/05 15:35:14 - mmengine - INFO - Epoch(test) [4950/5000]    eta: 0:00:01  time: 0.0304  data_time: 0.0029  memory: 39  
07/05 15:35:16 - mmengine - INFO - Epoch(test) [5000/5000]    eta: 0:00:00  time: 0.0363  data_time: 0.0033  memory: 39

Screenshot 2023-07-06 18.28.18.png

pytorch

07/06 14:49:56 - mmengine - INFO - (GB) mem_used: 81.76 | uss: 3.89 | pss: 3.96 | total_proc: 1
07/06 14:50:45 - mmengine - INFO - ==================================
07/06 14:50:45 - mmengine - INFO - Done image [1000/5000], fps: 27.6 img/s, times per image: 36.2 ms/img, cuda memory: 520 MB
07/06 14:50:45 - mmengine - INFO - (GB) mem_used: 78.25 | uss: 4.74 | pss: 4.81 | total_proc: 1
07/06 14:51:32 - mmengine - INFO - ==================================
07/06 14:51:32 - mmengine - INFO - Done image [2000/5000], fps: 27.8 img/s, times per image: 36.0 ms/img, cuda memory: 520 MB
07/06 14:51:33 - mmengine - INFO - (GB) mem_used: 84.05 | uss: 4.74 | pss: 4.81 | total_proc: 1
07/06 14:52:21 - mmengine - INFO - ==================================
07/06 14:52:21 - mmengine - INFO - Done image [3000/5000], fps: 27.8 img/s, times per image: 36.0 ms/img, cuda memory: 534 MB
07/06 14:52:22 - mmengine - INFO - (GB) mem_used: 83.13 | uss: 4.74 | pss: 4.81 | total_proc: 1
07/06 14:53:09 - mmengine - INFO - ==================================
07/06 14:53:09 - mmengine - INFO - Done image [4000/5000], fps: 27.8 img/s, times per image: 36.0 ms/img, cuda memory: 520 MB
07/06 14:53:09 - mmengine - INFO - (GB) mem_used: 82.96 | uss: 4.74 | pss: 4.81 | total_proc: 1
07/06 14:53:56 - mmengine - INFO - ==================================
07/06 14:53:56 - mmengine - INFO - Done image [5000/5000], fps: 27.7 img/s, times per image: 36.0 ms/img, cuda memory: 534 MB
07/06 14:53:56 - mmengine - INFO - (GB) mem_used: 83.42 | uss: 4.74 | pss: 4.81 | total_proc: 1
07/06 14:53:56 - mmengine - INFO - ============== Done ==================
07/06 14:53:56 - mmengine - INFO - Overall fps: 27.7 img/s, times per image: 36.1 ms/img
07/06 14:53:56 - mmengine - INFO - cuda memory: 163 MB
07/06 14:53:57 - mmengine - INFO - (GB) mem_used: 83.29 | uss: 4.74 | pss: 4.81 | total_proc: 1

Screenshot 2023-07-06 18.36.43.png

statistics

FasterR-CNN-R50	mAP	latency(ms)	memory(G)
pytorch	0.384	36.1	3.903
tensorrt-fp32	0.384	21.69	4.243
tensorrt-fp16	0.384	10.58	3.634

Cascade R- CNN

performance comparison

pytorch

Screenshot 2023-07-07 09.40.03.png

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.403
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.586
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.440
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.225
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.438
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.529
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.543
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.543
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.543
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.333
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.582
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.689
07/07 09:41:21 - mmengine - INFO - bbox_mAP_copypaste: 0.403 0.586 0.440 0.225 0.438 0.529
07/07 09:41:23 - mmengine - INFO - Results has been saved to exp/results.pkl.
07/07 09:41:23 - mmengine - INFO - Epoch(test) [5000/5000]    
coco/bbox_mAP: 0.4030  coco/bbox_mAP_50: 0.5860  
coco/bbox_mAP_75: 0.4400  coco/bbox_mAP_s: 0.2250  
coco/bbox_mAP_m: 0.4380  coco/bbox_mAP_l: 0.5290  
data_time: 0.0028  time: 0.0474

tensorrt-fp32

Screenshot 2023-07-07 10.08.51.png

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.403
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.586
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.439
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.225
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.437
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.529
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.543
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.543
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.543
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.333
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.581
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.689
07/07 10:06:05 - mmengine - INFO - bbox_mAP_copypaste: 0.403 0.586 0.439 0.225 0.437 0.529
07/07 10:06:05 - mmengine - INFO - Epoch(test) [5000/5000]    
coco/bbox_mAP: 0.4030  coco/bbox_mAP_50: 0.5860  
coco/bbox_mAP_75: 0.4390  coco/bbox_mAP_s: 0.2250  
coco/bbox_mAP_m: 0.4370  coco/bbox_mAP_l: 0.5290  
data_time: 0.0022  time: 0.0310

07/07 10:09:36 - mmengine - INFO - [tensorrt]-3050 times per count: 25.29 ms, 39.54 FPS
07/07 10:10:08 - mmengine - INFO - [tensorrt]-4050 times per count: 25.33 ms, 39.48 FPS

tensorrt-fp16

Screenshot 2023-07-07 10.22.51.png

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.403
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.586
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.439
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.225
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.438
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.531
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.333
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.581
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.691
07/07 10:23:56 - mmengine - INFO - bbox_mAP_copypaste: 0.403 0.586 0.439 0.225 0.438 0.531
07/07 10:23:56 - mmengine - INFO - Epoch(test) [5000/5000]    
coco/bbox_mAP: 0.4030  coco/bbox_mAP_50: 0.5860  
coco/bbox_mAP_75: 0.4390  coco/bbox_mAP_s: 0.2250  
coco/bbox_mAP_m: 0.4380  coco/bbox_mAP_l: 0.5310  
data_time: 0.0026  time: 0.0192

07/07 10:22:42 - mmengine - INFO - [tensorrt]-3050 times per count: 13.16 ms, 75.96 FPS
07/07 10:23:01 - mmengine - INFO - [tensorrt]-4050 times per count: 13.15 ms, 76.03 FPS

statistics

cascade rcnn-R50	mAP	latency(ms)	memory(G)
pytorch	0.4030	40.3	4.124
tensorrt-fp32	0.4030	25.33	4.347
tensorrt-fp16	0.4030	13.16	3.757

Mask R-CNN

It is very important to use the configuration file in MMdeploy/configs/mmdet/ instance-seg when converting the instance segmentation model ‼ ️.

performance comparison

pytorch

Screenshot 2023-07-07 16.14.18.png

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.354
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.564
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.380
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.166
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.382
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.525
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.481
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.481
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.481
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.283
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.515
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.646
07/07 16:21:43 - mmengine - INFO - segm_mAP_copypaste: 0.354 0.564 0.380 0.166 0.382 0.525
07/07 16:21:45 - mmengine - INFO - Results has been saved to exp/results.pkl.
07/07 16:21:45 - mmengine - INFO - Epoch(test) [5000/5000]    
coco/bbox_mAP: 0.3920  coco/bbox_mAP_50: 0.5960  
coco/bbox_mAP_75: 0.4280  coco/bbox_mAP_s: 0.2290  
coco/bbox_mAP_m: 0.4260  coco/bbox_mAP_l: 0.5120  
coco/segm_mAP: 0.3540  coco/segm_mAP_50: 0.5640  
coco/segm_mAP_75: 0.3800  coco/segm_mAP_s: 0.1660  
coco/segm_mAP_m: 0.3820  coco/segm_mAP_l: 0.5250  
data_time: 0.0225  time: 0.1063

Overall fps: 26.4 img/s, times per image: 37.9 ms/img

tensorrt-fp32

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.354
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.564
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.380
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.166
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.381
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.525
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.481
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.481
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.481
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.283
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.514
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.646
07/08 16:01:46 - mmengine - INFO - segm_mAP_copypaste: 0.354 0.564 0.380 0.166 0.381 0.525
07/08 16:01:47 - mmengine - INFO - Epoch(test) [5000/5000]    
coco/bbox_mAP: 0.3910  coco/bbox_mAP_50: 0.5960  
coco/bbox_mAP_75: 0.4280  coco/bbox_mAP_s: 0.2290  
coco/bbox_mAP_m: 0.4250  coco/bbox_mAP_l: 0.5110  
coco/segm_mAP: 0.3540  coco/segm_mAP_50: 0.5640  
coco/segm_mAP_75: 0.3800  coco/segm_mAP_s: 0.1660  
coco/segm_mAP_m: 0.3810  coco/segm_mAP_l: 0.5250  data_time: 0.0206  time: 0.0796

07/08 15:55:20 - mmengine - INFO - [tensorrt]-1050 times per count: 25.77 ms, 38.80 FPS
07/08 15:56:37 - mmengine - INFO - [tensorrt]-2050 times per count: 25.80 ms, 38.76 FPS
07/08 15:59:15 - mmengine - INFO - [tensorrt]-4050 times per count: 25.85 ms, 38.69 FPS

tensorrt-fp16

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.354
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.564
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.380
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.166
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.381
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.524
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.481
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.481
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.481
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.283
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.514
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.646
07/08 16:29:03 - mmengine - INFO - segm_mAP_copypaste: 0.354 0.564 0.380 0.166 0.381 0.524
07/08 16:29:04 - mmengine - INFO - Epoch(test) [5000/5000]    
coco/bbox_mAP: 0.3920  coco/bbox_mAP_50: 0.5950  
coco/bbox_mAP_75: 0.4290  coco/bbox_mAP_s: 0.2280  
coco/bbox_mAP_m: 0.4270  coco/bbox_mAP_l: 0.5100  
coco/segm_mAP: 0.3540  coco/segm_mAP_50: 0.5640  
coco/segm_mAP_75: 0.3800  coco/segm_mAP_s: 0.1660  
coco/segm_mAP_m: 0.3810  coco/segm_mAP_l: 0.5240  data_time: 0.0200  time: 0.0663

07/08 16:25:37 - mmengine - INFO - [tensorrt]-3050 times per count: 12.55 ms, 79.70 FPS
07/08 16:26:43 - mmengine - INFO - [tensorrt]-4050 times per count: 12.55 ms, 79.69 FPS

statistics

Mask R-CNN R50	mAP	segm_mAP	latency(ms)	memory(G)
pytorch	0.3920	0.3540	37.9	4.355
tensorrt-fp32	0.3910	0.3540	25.85	4.636
tensorrt-fp16	0.3920	0.3540	12.55	4.325

mmseg

cityscapes dataset preparation

Refer to the link below to prepare the cityscapes dataset.
mmsegmentation/docs/en/user_guides/2_dataset_prepare.md at main open-mmlab/mmsegmentation
In addition, for the cityscapes dataset, the following operations are required:
Screenshot 2023-07-10 10.33.50.png

deeplabv3

Test the results on the cityscapes dataset
Tutorial 4: Train and test with existing models — MMSegmentation 1.1.0 documentation
Officially given the performance of deeplabv3 under different configurations:
mmsegmentation/configs/deeplabv3 at main open-mmlab/mmsegmentation

pytorch

Screenshot 2023-07-07 11.57.48.png

+---------------+-------+-------+
|     Class     |  IoU  |  Acc  |
+---------------+-------+-------+
|      road     | 98.18 | 98.88 |
|    sidewalk   |  85.3 | 93.17 |
|    building   | 92.71 | 96.52 |
|      wall     | 53.12 | 58.34 |
|     fence     | 61.57 | 71.94 |
|      pole     | 65.34 | 77.46 |
| traffic light | 71.21 | 82.56 |
|  traffic sign | 79.58 | 87.34 |
|   vegetation  | 92.47 | 96.84 |
|    terrain    | 64.42 | 72.11 |
|      sky      | 94.46 | 98.12 |
|     person    | 82.53 | 92.13 |
|     rider     | 62.77 | 75.76 |
|      car      | 95.37 | 97.82 |
|     truck     | 82.07 |  92.3 |
|      bus      | 89.74 | 93.15 |
|     train     | 84.59 | 89.49 |
|   motorcycle  | 69.78 |  81.5 |
|    bicycle    | 78.04 | 89.03 |
+---------------+-------+-------+
07/07 11:53:43 - mmengine - INFO - Iter(test) [500/500]    
aAcc: 96.1700  mIoU: 79.1200  mAcc: 86.5500  data_time: 0.0068  time: 0.1914

Time: fps: 6.01 img/s

tensorrt-fp32

Screenshot 2023-07-07 14.31.43.png

+---------------+-------+-------+
|     Class     |  IoU  |  Acc  |
+---------------+-------+-------+
|      road     | 98.18 | 98.88 |
|    sidewalk   |  85.3 | 93.17 |
|    building   | 92.71 | 96.52 |
|      wall     | 53.12 | 58.34 |
|     fence     | 61.57 | 71.94 |
|      pole     | 65.34 | 77.46 |
| traffic light | 71.21 | 82.56 |
|  traffic sign | 79.58 | 87.34 |
|   vegetation  | 92.47 | 96.84 |
|    terrain    | 64.42 |  72.1 |
|      sky      | 94.46 | 98.12 |
|     person    | 82.53 | 92.13 |
|     rider     | 62.76 | 75.76 |
|      car      | 95.37 | 97.82 |
|     truck     | 82.07 |  92.3 |
|      bus      | 89.74 | 93.15 |
|     train     | 84.59 | 89.49 |
|   motorcycle  | 69.78 |  81.5 |
|    bicycle    | 78.04 | 89.03 |
+---------------+-------+-------+
07/07 14:28:39 - mmengine - INFO - Epoch(test) [500/500]    
aAcc: 96.1700  mIoU: 79.1200  mAcc: 86.5500  data_time: 0.0069  time: 0.1515

07/07 14:27:57 - mmengine - INFO - [tensorrt]-205 times per count: 131.08 ms, 7.63 FPS
07/07 14:28:11 - mmengine - INFO - [tensorrt]-305 times per count: 131.18 ms, 7.62 FPS
07/07 14:28:25 - mmengine - INFO - [tensorrt]-405 times per count: 131.46 ms, 7.61 FPS

tensorrt-fp16

Screenshot 2023-07-07 14.44.53.png

+---------------+-------+-------+
|     Class     |  IoU  |  Acc  |
+---------------+-------+-------+
|      road     | 98.18 | 98.88 |
|    sidewalk   |  85.3 | 93.17 |
|    building   | 92.71 | 96.53 |
|      wall     | 53.15 | 58.38 |
|     fence     | 61.58 | 71.95 |
|      pole     | 65.34 | 77.46 |
| traffic light | 71.21 | 82.55 |
|  traffic sign | 79.58 | 87.33 |
|   vegetation  | 92.47 | 96.84 |
|    terrain    | 64.41 | 72.07 |
|      sky      | 94.47 | 98.12 |
|     person    | 82.53 | 92.12 |
|     rider     | 62.76 | 75.77 |
|      car      | 95.37 | 97.82 |
|     truck     |  82.1 | 92.31 |
|      bus      | 89.74 | 93.14 |
|     train     | 84.58 | 89.49 |
|   motorcycle  | 69.76 | 81.49 |
|    bicycle    | 78.04 | 89.01 |
+---------------+-------+-------+
07/07 14:45:04 - mmengine - INFO - Epoch(test) [500/500]    
aAcc: 96.1700  mIoU: 79.1200  mAcc: 86.5500  
data_time: 0.0361  time: 0.0830

07/07 14:44:40 - mmengine - INFO - [tensorrt]-205 times per count: 36.42 ms, 27.46 FPS
07/07 14:44:48 - mmengine - INFO - [tensorrt]-305 times per count: 36.49 ms, 27.40 FPS
07/07 14:44:56 - mmengine - INFO - [tensorrt]-405 times per count: 36.41 ms, 27.47 FPS

deeplabv3	meow	latency(ms)	memory(G)
pytorch	79.12	166.39	12,587
tensorrt-fp32	79.12	131.46	5.308
tensorrt-fp16	79.12	36.49	4.032

Summarize

This article introduces how to install MMDeploy, describes the installation process in detail, and gives the corresponding version number. Afterwards, the performance of different types of algorithms after converting to TensorRT Engine was compared. It can be seen that the evaluation indicators basically did not drop points, and the running speed increased by 3-4 times under FP16. I personally feel that using MMdeploy to deploy MMDetection and MMsegmetation models is more efficient and has good compatibility. It is recommended to use it.

Linux installation mmdeploy1.2

reference link

Toolchains installation

depends on the environment

Install dependencies for Model Converter

miniconda

pytorch：1.10.0

mmcv

Install dependent environment for SDK

Install the inference engine

TensorRT

compile mmdeploy

Model Converter

Compile tensorrt operator

Install mmdeploy-python

Build SDK and Demo

conversion model

benchmark

mmdet

Yolov3

Faster R-CNN

Test its performance on the COCO2017-val dataset.

run time test

statistics

Cascade R- CNN

performance comparison

statistics

Mask R-CNN

performance comparison

statistics

mmseg

cityscapes dataset preparation

deeplabv3

Summarize

Guess you like