reference link
mmdeploy/docs/en/01-how-to-build/linux-x86_64.md at main · open-mmlab/mmdeploy
Toolchains installation
- OS: Ubuntu 18.04
- cuda:11.4
Before installing cuda, you need to confirm whether the driver of your machine supports this version of cuda. You can open the link to view the relationship between the cuda version and the driver . The download link for the driver is: https://www.nvidia.com/download/index. aspx .
After confirming the above information or updating your own driver, you can open the link , choose the appropriate configuration according to your machine, get the download command, download the file and install it.
Add the cuda environment variable to the system
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
- secret:8.2
The download link of cudnn is: https://developer.nvidia.com/rdp/cudnn-archive . The cudnn version downloaded in this article is 8.2.1, which is adapted to the cuda 11.x version. Please search for the specific installation process by yourself.
- cmake: The cmake version needs to be greater than or equal to 3.14.0, and the installation method is as follows:
wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0-linux-x86_64.tar.gz
tar -xzvf cmake-3.20.0-linux-x86_64.tar.gz
sudo ln -sf $(pwd)/cmake-3.20.0-linux-x86_64/bin/* /usr/bin/
depends on the environment
MMdeploy can be divided into two parts: Model Converter and SDK. Among them, Model Converter converts the model into a format file required by the target inference engine, such as TensorRT, and generates a TensorRT Engine file. The SDK provides multiple programming languages to deploy the target inference engine in industrial production.
Install dependencies for Model Converter
miniconda
- install miniconda
- Download the miniconda file, open the link , and find the file suitable for your target machine
- install minconda
You can install it with the following command, just keep entering during the process. By default, the environment variables of conda will be written into the bashrc file.
sudo bash Miniconda3-latest-Linux-x86_64.sh
- Activate the conda environment
The following command can activate the conda environment and enter the base virtual environment.
source ~/.bashrc
- Replace conda domestic source
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes
If you only want to use domestic sources during installation, you can use the following command to achieve:
conda install -y -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/linux-64 opencv tensorboardX
- Create a virtual environment for mmdeploy
conda create -n mmdeploy python=3.8 -y
conda activate mmdeploy
pytorch:1.10.0
Install pytorch 1.10.0 through the following command, the corresponding cudatoolkit is 11.3, and the installed cuda is 11.4, but it has no effect on subsequent tests.
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
mmcv
export cu_version=cu114 # cuda 11.4
export torch_version=torch1.10
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0rc2"
Install dependent environment for SDK
- OpenCV
sudo apt-get install libopencv-dev
- pplcv
git clone https://github.com/openppl-public/ppl.cv.git
cd ppl.cv
export PPLCV_DIR=$(pwd)
git checkout tags/v0.7.0 -b v0.7.0
./build.sh cuda
Install the inference engine
mmdeploy supports reasoning models with different backends. In this article, we mainly test the Tensor RT backend. This article takes the installation of the TensorRT inference engine as an example. For other inference platforms, please refer to the documentation .
TensorRT
- Download the TensorRT tar file
tar -xvf TensorRT-8.2.4.2.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
- Add environment variables
# <<< TensorRT 8 <<<
export PATH=/home/xxxx/local/TensorRT-8.2.4.2/bin:$PATH
export LD_LIBRARY_PATH=/home/xxxx/local/TensorRT-8.2.4.2/lib:$LD_LIBRARY_PATH
export C_INCLUDE_PATH=/home/xxxx/local/TensorRT-8.2.4.2/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/home/xxxx/local/TensorRT-8.2.4.2/include:$CPLUS_INCLUDE_PATH
- Install the TensorRT python package
cd python
# 根据python的版本安装
pip install tensorrt-8.2.4.2-cp38-none-linux_x86_64.whl
- installpycuda
conda install -c conda-forge pycuda
compile mmdeploy
Compiling mmdeploy is divided into two parts, Model Converter and SDK. Among them, the function of the converter is to convert the pytorch model into the corresponding backend inference engine file, and test the performance on different data sets; the SDK provides different programming languages to deploy the backend inference engine to the actual industrial generation.
Model Converter
Compiling the model converter includes two steps, one is to compile the backend custom operator, and the other is to install the mmdeploy-python library.
Compile tensorrt operator
cd ${
MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${
TENSORRT_DIR} -DCUDNN_DIR=${
CUDNN_DIR} ..
make -j$(nproc) && make install
After the compilation is successful, the libmmdeploy_tensorrt_ops.so dynamic library will be generated. mmdeploy-python will modify the function, module, and symbolic of the model during model conversion, and the corresponding tensorrt custom operator needs to be linked .
Install mmdeploy-python
cd ${
MMDEPLOY_DIR}
mim install -e .
After the installation is complete, the models supported by mmdeploy can be converted accordingly. In fact, the subsequent SDK does not need to be installed.
Build SDK and Demo
This should be the deployment sdk based on mmdeploy. After obtaining the tensorrt engine through mmdeploy-python, the model can be directly deployed through this sdk.
cmake -DCMAKE_CXX_COMPILER=g++-7 \
-DMMDEPLOY_TARGET_BACKENDS=trt \
-DTENSORRT_DIR=/home/xxxx/local/TensorRT-8.2.4.2 \
-DCUDNN_DIR=/home/xxxx/local/cuda/ \
-DMMDEPLOY_BUILD_SDK=ON -DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON -DMMDEPLOY_TARGET_DEVICES="cuda;cpu" ..
conversion model
This paper tests the performance comparison between common models Faster R-CNN, Cascade R-CNN, DeepLabv3 and Mask R-CNN after conversion. For conversion model and test performance, please refer to the following two links:
- https://github.com/open-mmlab/mmdeploy/blob/main/docs/en/02-how-to-run/convert_model.md
- https://github.com/open-mmlab/mmdeploy/blob/main/docs/en/02-how-to-run/profile_model.md
benchmark
mmdeploy supports the conversion of different types of models. For specific running time and accuracy, please refer to the following link .
mmdet
Corresponding to mmdetection, Yolov3, Faster R-CNN, Cascade R-CNN and Mask R-CNN were tested.
mmdetection/docs/en/model_zoo.md at main · open-mmlab/mmdetection
encountered numpy version problems when installing MM Detection, and the solution was to compile and install cocoapi from source code.
git clone https://github.com/pdollar/coco.git
Yolov3
The conversion engine script is as follows (static shape conversion is normal, but dynamic shape has problems):
#!/bin/bash
export MMDET_ROOT=/home/xxxx/workspace/mmlab/mmdetection
python ./tools/deploy.py \
configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
$MMDET_ROOT/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py \
$MMDET_ROOT/exp/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth \
$MMDET_ROOT/demo/demo.jpg \
--work-dir work_dir \
--device cuda:0
Yolo v3 performed Tensor RT Engine conversion without corresponding performance comparison.
Faster R-CNN
The script for converting engine files is similar to Yolo v3 conversion, only need to replace the corresponding deploy_config and model_config file paths.
#!/bin/bash
export MMDET_ROOT=/home/xxxx/workspace/mmlab/mmdetection
python ./tools/deploy.py \
configs/mmdet/detection/faster_rcnn_tensorrt_static-800x1344.py \
$MMDET_ROOT/configs/faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py \
$MMDET_ROOT/exp/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
$MMDET_ROOT/demo/demo.jpg \
--work-dir work_dir \
--device cuda:0
Test its performance on the COCO2017-val dataset.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.384
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.590
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.420
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.215
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.421
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.503
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.520
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.520
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.520
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.326
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.557
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.661
07/05 14:00:14 - mmengine - INFO - bbox_mAP_copypaste: 0.384 0.590 0.420 0.215 0.421 0.503
07/05 14:00:16 - mmengine - INFO - Results has been saved to exp/results.pkl.
07/05 14:00:16 - mmengine - INFO - Epoch(test) [5000/5000]
coco/bbox_mAP: 0.3840
coco/bbox_mAP_50: 0.5900
coco/bbox_mAP_75: 0.4200
coco/bbox_mAP_s: 0.2150
coco/bbox_mAP_m: 0.4210
coco/bbox_mAP_l: 0.5030
data_time: 0.0028 time: 0.0402
onnx
After the corresponding file is obtained through conversion engine
, tools/test.py
the accuracy and speed can be tested through the file.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.384
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.590
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.419
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.215
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.421
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.502
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.519
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.519
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.519
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.325
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.556
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.661
07/05 14:38:00 - mmengine - INFO - bbox_mAP_copypaste: 0.384 0.590 0.419 0.215 0.421 0.502
07/05 14:38:01 - mmengine - INFO - Epoch(test) [5000/5000]
coco/bbox_mAP: 0.3840 coco/bbox_mAP_50: 0.5900
coco/bbox_mAP_75: 0.4190 coco/bbox_mAP_s: 0.2150
coco/bbox_mAP_m: 0.4210 coco/bbox_mAP_l: 0.5020
data_time: 0.0030 time: 0.0297
- tensorrt-fp16
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.384
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.590
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.418
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.215
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.420
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.501
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.519
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.519
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.519
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.325
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.556
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.659
07/05 14:58:28 - mmengine - INFO - bbox_mAP_copypaste: 0.384 0.590 0.418 0.215 0.420 0.501
07/05 14:58:29 - mmengine - INFO - Epoch(test) [5000/5000]
coco/bbox_mAP: 0.3840 coco/bbox_mAP_50: 0.5900
coco/bbox_mAP_75: 0.4180 coco/bbox_mAP_s: 0.2150
coco/bbox_mAP_m: 0.4200 coco/bbox_mAP_l: 0.5010
data_time: 0.0028 time: 0.0172
run time test
- tensorrt-fp16 (only includes engine reasoning time)
07/05 15:20:09 - mmengine - INFO - [tensorrt]-4750 times per count: 10.58 ms, 94.51 FPS
07/05 15:20:09 - mmengine - INFO - Epoch(test) [4750/5000] eta: 0:00:04 time: 0.0146 data_time: 0.0017 memory: 38
07/05 15:20:10 - mmengine - INFO - Epoch(test) [4800/5000] eta: 0:00:03 time: 0.0208 data_time: 0.0048 memory: 39
07/05 15:20:11 - mmengine - INFO - [tensorrt]-4850 times per count: 10.58 ms, 94.54 FPS
07/05 15:20:11 - mmengine - INFO - Epoch(test) [4850/5000] eta: 0:00:02 time: 0.0173 data_time: 0.0031 memory: 39
07/05 15:20:12 - mmengine - INFO - Epoch(test) [4900/5000] eta: 0:00:01 time: 0.0142 data_time: 0.0016 memory: 38
07/05 15:20:13 - mmengine - INFO - [tensorrt]-4950 times per count: 10.58 ms, 94.55 FPS
07/05 15:20:13 - mmengine - INFO - Epoch(test) [4950/5000] eta: 0:00:00 time: 0.0197 data_time: 0.0024 memory: 39
07/05 15:20:13 - mmengine - INFO - Epoch(test) [5000/5000] eta: 0:00:00 time: 0.0148 data_time: 0.0016 memory: 39
Similarly, after getting the TensorRT Engine, we can use the trtexec tool to load the engine test time, and the trtexec reasoning command:
trtexec --loadEngine=work_dir/faster_rcnn_dynamic_fp16/end2end.engine --iterations=5000 --plugins=mmdeploy/lib/libmmdeploy_tensorrt_ops.so --minShapes=input:1x3x320x320 --optShapes=input:1x3x800x1344 --maxShapes=input:1x3x1344x1344 --shapes=input:1x3x800x1344 --fp16 --workspace=8000
- tensorrt-fp32
07/05 15:35:09 - mmengine - INFO - [tensorrt]-4750 times per count: 21.97 ms, 45.51 FPS
07/05 15:35:09 - mmengine - INFO - Epoch(test) [4750/5000] eta: 0:00:07 time: 0.0252 data_time: 0.0016 memory: 38
07/05 15:35:10 - mmengine - INFO - Epoch(test) [4800/5000] eta: 0:00:06 time: 0.0262 data_time: 0.0016 memory: 39
07/05 15:35:11 - mmengine - INFO - [tensorrt]-4850 times per count: 21.96 ms, 45.53 FPS
07/05 15:35:11 - mmengine - INFO - Epoch(test) [4850/5000] eta: 0:00:04 time: 0.0283 data_time: 0.0024 memory: 39
07/05 15:35:13 - mmengine - INFO - Epoch(test) [4900/5000] eta: 0:00:03 time: 0.0262 data_time: 0.0018 memory: 38
07/05 15:35:14 - mmengine - INFO - [tensorrt]-4950 times per count: 21.97 ms, 45.52 FPS
07/05 15:35:14 - mmengine - INFO - Epoch(test) [4950/5000] eta: 0:00:01 time: 0.0304 data_time: 0.0029 memory: 39
07/05 15:35:16 - mmengine - INFO - Epoch(test) [5000/5000] eta: 0:00:00 time: 0.0363 data_time: 0.0033 memory: 39
- pytorch
07/06 14:49:56 - mmengine - INFO - (GB) mem_used: 81.76 | uss: 3.89 | pss: 3.96 | total_proc: 1
07/06 14:50:45 - mmengine - INFO - ==================================
07/06 14:50:45 - mmengine - INFO - Done image [1000/5000], fps: 27.6 img/s, times per image: 36.2 ms/img, cuda memory: 520 MB
07/06 14:50:45 - mmengine - INFO - (GB) mem_used: 78.25 | uss: 4.74 | pss: 4.81 | total_proc: 1
07/06 14:51:32 - mmengine - INFO - ==================================
07/06 14:51:32 - mmengine - INFO - Done image [2000/5000], fps: 27.8 img/s, times per image: 36.0 ms/img, cuda memory: 520 MB
07/06 14:51:33 - mmengine - INFO - (GB) mem_used: 84.05 | uss: 4.74 | pss: 4.81 | total_proc: 1
07/06 14:52:21 - mmengine - INFO - ==================================
07/06 14:52:21 - mmengine - INFO - Done image [3000/5000], fps: 27.8 img/s, times per image: 36.0 ms/img, cuda memory: 534 MB
07/06 14:52:22 - mmengine - INFO - (GB) mem_used: 83.13 | uss: 4.74 | pss: 4.81 | total_proc: 1
07/06 14:53:09 - mmengine - INFO - ==================================
07/06 14:53:09 - mmengine - INFO - Done image [4000/5000], fps: 27.8 img/s, times per image: 36.0 ms/img, cuda memory: 520 MB
07/06 14:53:09 - mmengine - INFO - (GB) mem_used: 82.96 | uss: 4.74 | pss: 4.81 | total_proc: 1
07/06 14:53:56 - mmengine - INFO - ==================================
07/06 14:53:56 - mmengine - INFO - Done image [5000/5000], fps: 27.7 img/s, times per image: 36.0 ms/img, cuda memory: 534 MB
07/06 14:53:56 - mmengine - INFO - (GB) mem_used: 83.42 | uss: 4.74 | pss: 4.81 | total_proc: 1
07/06 14:53:56 - mmengine - INFO - ============== Done ==================
07/06 14:53:56 - mmengine - INFO - Overall fps: 27.7 img/s, times per image: 36.1 ms/img
07/06 14:53:56 - mmengine - INFO - cuda memory: 163 MB
07/06 14:53:57 - mmengine - INFO - (GB) mem_used: 83.29 | uss: 4.74 | pss: 4.81 | total_proc: 1
statistics
FasterR-CNN-R50 | mAP | latency(ms) | memory(G) |
---|---|---|---|
pytorch | 0.384 | 36.1 | 3.903 |
tensorrt-fp32 | 0.384 | 21.69 | 4.243 |
tensorrt-fp16 | 0.384 | 10.58 | 3.634 |
Cascade R- CNN
performance comparison
- pytorch
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.403
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.586
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.440
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.225
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.438
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.529
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.582
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.689
07/07 09:41:21 - mmengine - INFO - bbox_mAP_copypaste: 0.403 0.586 0.440 0.225 0.438 0.529
07/07 09:41:23 - mmengine - INFO - Results has been saved to exp/results.pkl.
07/07 09:41:23 - mmengine - INFO - Epoch(test) [5000/5000]
coco/bbox_mAP: 0.4030 coco/bbox_mAP_50: 0.5860
coco/bbox_mAP_75: 0.4400 coco/bbox_mAP_s: 0.2250
coco/bbox_mAP_m: 0.4380 coco/bbox_mAP_l: 0.5290
data_time: 0.0028 time: 0.0474
- tensorrt-fp32
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.403
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.586
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.439
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.225
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.437
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.529
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.581
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.689
07/07 10:06:05 - mmengine - INFO - bbox_mAP_copypaste: 0.403 0.586 0.439 0.225 0.437 0.529
07/07 10:06:05 - mmengine - INFO - Epoch(test) [5000/5000]
coco/bbox_mAP: 0.4030 coco/bbox_mAP_50: 0.5860
coco/bbox_mAP_75: 0.4390 coco/bbox_mAP_s: 0.2250
coco/bbox_mAP_m: 0.4370 coco/bbox_mAP_l: 0.5290
data_time: 0.0022 time: 0.0310
07/07 10:09:36 - mmengine - INFO - [tensorrt]-3050 times per count: 25.29 ms, 39.54 FPS
07/07 10:10:08 - mmengine - INFO - [tensorrt]-4050 times per count: 25.33 ms, 39.48 FPS
- tensorrt-fp16
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.403
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.586
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.439
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.225
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.438
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.531
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.542
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.542
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.542
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.581
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.691
07/07 10:23:56 - mmengine - INFO - bbox_mAP_copypaste: 0.403 0.586 0.439 0.225 0.438 0.531
07/07 10:23:56 - mmengine - INFO - Epoch(test) [5000/5000]
coco/bbox_mAP: 0.4030 coco/bbox_mAP_50: 0.5860
coco/bbox_mAP_75: 0.4390 coco/bbox_mAP_s: 0.2250
coco/bbox_mAP_m: 0.4380 coco/bbox_mAP_l: 0.5310
data_time: 0.0026 time: 0.0192
07/07 10:22:42 - mmengine - INFO - [tensorrt]-3050 times per count: 13.16 ms, 75.96 FPS
07/07 10:23:01 - mmengine - INFO - [tensorrt]-4050 times per count: 13.15 ms, 76.03 FPS
statistics
cascade rcnn-R50 | mAP | latency(ms) | memory(G) |
---|---|---|---|
pytorch | 0.4030 | 40.3 | 4.124 |
tensorrt-fp32 | 0.4030 | 25.33 | 4.347 |
tensorrt-fp16 | 0.4030 | 13.16 | 3.757 |
Mask R-CNN
It is very important to use the configuration file in MMdeploy/configs/mmdet/ instance-seg when converting the instance segmentation model ‼ ️.
performance comparison
- pytorch
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.354
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.564
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.380
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.166
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.382
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.525
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.283
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.515
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.646
07/07 16:21:43 - mmengine - INFO - segm_mAP_copypaste: 0.354 0.564 0.380 0.166 0.382 0.525
07/07 16:21:45 - mmengine - INFO - Results has been saved to exp/results.pkl.
07/07 16:21:45 - mmengine - INFO - Epoch(test) [5000/5000]
coco/bbox_mAP: 0.3920 coco/bbox_mAP_50: 0.5960
coco/bbox_mAP_75: 0.4280 coco/bbox_mAP_s: 0.2290
coco/bbox_mAP_m: 0.4260 coco/bbox_mAP_l: 0.5120
coco/segm_mAP: 0.3540 coco/segm_mAP_50: 0.5640
coco/segm_mAP_75: 0.3800 coco/segm_mAP_s: 0.1660
coco/segm_mAP_m: 0.3820 coco/segm_mAP_l: 0.5250
data_time: 0.0225 time: 0.1063
Overall fps: 26.4 img/s, times per image: 37.9 ms/img
- tensorrt-fp32
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.354
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.564
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.380
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.166
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.381
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.525
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.283
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.514
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.646
07/08 16:01:46 - mmengine - INFO - segm_mAP_copypaste: 0.354 0.564 0.380 0.166 0.381 0.525
07/08 16:01:47 - mmengine - INFO - Epoch(test) [5000/5000]
coco/bbox_mAP: 0.3910 coco/bbox_mAP_50: 0.5960
coco/bbox_mAP_75: 0.4280 coco/bbox_mAP_s: 0.2290
coco/bbox_mAP_m: 0.4250 coco/bbox_mAP_l: 0.5110
coco/segm_mAP: 0.3540 coco/segm_mAP_50: 0.5640
coco/segm_mAP_75: 0.3800 coco/segm_mAP_s: 0.1660
coco/segm_mAP_m: 0.3810 coco/segm_mAP_l: 0.5250 data_time: 0.0206 time: 0.0796
07/08 15:55:20 - mmengine - INFO - [tensorrt]-1050 times per count: 25.77 ms, 38.80 FPS
07/08 15:56:37 - mmengine - INFO - [tensorrt]-2050 times per count: 25.80 ms, 38.76 FPS
07/08 15:59:15 - mmengine - INFO - [tensorrt]-4050 times per count: 25.85 ms, 38.69 FPS
- tensorrt-fp16
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.354
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.564
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.380
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.166
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.381
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.524
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.283
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.514
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.646
07/08 16:29:03 - mmengine - INFO - segm_mAP_copypaste: 0.354 0.564 0.380 0.166 0.381 0.524
07/08 16:29:04 - mmengine - INFO - Epoch(test) [5000/5000]
coco/bbox_mAP: 0.3920 coco/bbox_mAP_50: 0.5950
coco/bbox_mAP_75: 0.4290 coco/bbox_mAP_s: 0.2280
coco/bbox_mAP_m: 0.4270 coco/bbox_mAP_l: 0.5100
coco/segm_mAP: 0.3540 coco/segm_mAP_50: 0.5640
coco/segm_mAP_75: 0.3800 coco/segm_mAP_s: 0.1660
coco/segm_mAP_m: 0.3810 coco/segm_mAP_l: 0.5240 data_time: 0.0200 time: 0.0663
07/08 16:25:37 - mmengine - INFO - [tensorrt]-3050 times per count: 12.55 ms, 79.70 FPS
07/08 16:26:43 - mmengine - INFO - [tensorrt]-4050 times per count: 12.55 ms, 79.69 FPS
statistics
Mask R-CNN R50 | mAP | segm_mAP | latency(ms) | memory(G) |
---|---|---|---|---|
pytorch | 0.3920 | 0.3540 | 37.9 | 4.355 |
tensorrt-fp32 | 0.3910 | 0.3540 | 25.85 | 4.636 |
tensorrt-fp16 | 0.3920 | 0.3540 | 12.55 | 4.325 |
mmseg
cityscapes dataset preparation
Refer to the link below to prepare the cityscapes dataset.
mmsegmentation/docs/en/user_guides/2_dataset_prepare.md at main open-mmlab/mmsegmentation
In addition, for the cityscapes dataset, the following operations are required:
deeplabv3
Test the results on the cityscapes dataset
Tutorial 4: Train and test with existing models — MMSegmentation 1.1.0 documentation
Officially given the performance of deeplabv3 under different configurations:
mmsegmentation/configs/deeplabv3 at main open-mmlab/mmsegmentation
- pytorch
+---------------+-------+-------+
| Class | IoU | Acc |
+---------------+-------+-------+
| road | 98.18 | 98.88 |
| sidewalk | 85.3 | 93.17 |
| building | 92.71 | 96.52 |
| wall | 53.12 | 58.34 |
| fence | 61.57 | 71.94 |
| pole | 65.34 | 77.46 |
| traffic light | 71.21 | 82.56 |
| traffic sign | 79.58 | 87.34 |
| vegetation | 92.47 | 96.84 |
| terrain | 64.42 | 72.11 |
| sky | 94.46 | 98.12 |
| person | 82.53 | 92.13 |
| rider | 62.77 | 75.76 |
| car | 95.37 | 97.82 |
| truck | 82.07 | 92.3 |
| bus | 89.74 | 93.15 |
| train | 84.59 | 89.49 |
| motorcycle | 69.78 | 81.5 |
| bicycle | 78.04 | 89.03 |
+---------------+-------+-------+
07/07 11:53:43 - mmengine - INFO - Iter(test) [500/500]
aAcc: 96.1700 mIoU: 79.1200 mAcc: 86.5500 data_time: 0.0068 time: 0.1914
Time: fps: 6.01 img/s
- tensorrt-fp32
+---------------+-------+-------+
| Class | IoU | Acc |
+---------------+-------+-------+
| road | 98.18 | 98.88 |
| sidewalk | 85.3 | 93.17 |
| building | 92.71 | 96.52 |
| wall | 53.12 | 58.34 |
| fence | 61.57 | 71.94 |
| pole | 65.34 | 77.46 |
| traffic light | 71.21 | 82.56 |
| traffic sign | 79.58 | 87.34 |
| vegetation | 92.47 | 96.84 |
| terrain | 64.42 | 72.1 |
| sky | 94.46 | 98.12 |
| person | 82.53 | 92.13 |
| rider | 62.76 | 75.76 |
| car | 95.37 | 97.82 |
| truck | 82.07 | 92.3 |
| bus | 89.74 | 93.15 |
| train | 84.59 | 89.49 |
| motorcycle | 69.78 | 81.5 |
| bicycle | 78.04 | 89.03 |
+---------------+-------+-------+
07/07 14:28:39 - mmengine - INFO - Epoch(test) [500/500]
aAcc: 96.1700 mIoU: 79.1200 mAcc: 86.5500 data_time: 0.0069 time: 0.1515
07/07 14:27:57 - mmengine - INFO - [tensorrt]-205 times per count: 131.08 ms, 7.63 FPS
07/07 14:28:11 - mmengine - INFO - [tensorrt]-305 times per count: 131.18 ms, 7.62 FPS
07/07 14:28:25 - mmengine - INFO - [tensorrt]-405 times per count: 131.46 ms, 7.61 FPS
- tensorrt-fp16
+---------------+-------+-------+
| Class | IoU | Acc |
+---------------+-------+-------+
| road | 98.18 | 98.88 |
| sidewalk | 85.3 | 93.17 |
| building | 92.71 | 96.53 |
| wall | 53.15 | 58.38 |
| fence | 61.58 | 71.95 |
| pole | 65.34 | 77.46 |
| traffic light | 71.21 | 82.55 |
| traffic sign | 79.58 | 87.33 |
| vegetation | 92.47 | 96.84 |
| terrain | 64.41 | 72.07 |
| sky | 94.47 | 98.12 |
| person | 82.53 | 92.12 |
| rider | 62.76 | 75.77 |
| car | 95.37 | 97.82 |
| truck | 82.1 | 92.31 |
| bus | 89.74 | 93.14 |
| train | 84.58 | 89.49 |
| motorcycle | 69.76 | 81.49 |
| bicycle | 78.04 | 89.01 |
+---------------+-------+-------+
07/07 14:45:04 - mmengine - INFO - Epoch(test) [500/500]
aAcc: 96.1700 mIoU: 79.1200 mAcc: 86.5500
data_time: 0.0361 time: 0.0830
07/07 14:44:40 - mmengine - INFO - [tensorrt]-205 times per count: 36.42 ms, 27.46 FPS
07/07 14:44:48 - mmengine - INFO - [tensorrt]-305 times per count: 36.49 ms, 27.40 FPS
07/07 14:44:56 - mmengine - INFO - [tensorrt]-405 times per count: 36.41 ms, 27.47 FPS
deeplabv3 | meow | latency(ms) | memory(G) |
---|---|---|---|
pytorch | 79.12 | 166.39 | 12,587 |
tensorrt-fp32 | 79.12 | 131.46 | 5.308 |
tensorrt-fp16 | 79.12 | 36.49 | 4.032 |
Summarize
This article introduces how to install MMDeploy, describes the installation process in detail, and gives the corresponding version number. Afterwards, the performance of different types of algorithms after converting to TensorRT Engine was compared. It can be seen that the evaluation indicators basically did not drop points, and the running speed increased by 3-4 times under FP16. I personally feel that using MMdeploy to deploy MMDetection and MMsegmetation models is more efficient and has good compatibility. It is recommended to use it.