TensorRT yolov4 yolov4-tiny yolov5
Article Directory
Software and hardware environment
Jetson Nano 4G
JP 4.4.1
CUDA 10.2
TensorRT 7.1.3.0
Pytorch 1.6.0
Reference Boss Project
https://github.com/enazoe/yolo-tensorrt The author encapsulated the model conversion
https://github.com/wang-xinyu/tensorrtx
https://github.com/jkjung-avt/tensorrt_demos
Brief process
The different model frameworks used mainly use pytorch and darknet.
Darknet → ONNX → TensorRT
PyTorch → ONNX/WTS → TensorRT
Pay attention to the points
1. TensorRT link address in CMakelist.txt.
2. CUDA link address in CMakelist.txt.
3. CUDA TensorRT environment variables.
4. Batch setting in cfg in Yolov4/v4-tiny (batch 1 or 4).
5. Correspondence between Yolov5 version and PyTorch version (Yolov5 v3.1 pytorch 1.6).
Operating procedures
As long as the environment is normal and the operation is not complicated, use the encapsulated method of the big guy.
git clone https://github.com/enazoe/yolo-tensorrt.git
cd yolo-tensorrt/
mkdir build
cd build/
cmake ..
make
./yolo-trt
Note: The yolov5 currently supported by the main version is version 3.0. Before compiling, modify the model that needs to be inferred in samples/sample_detector.cpp as follows:
Attaching the yolov4-tiny running picture, the reasoning takes about 36-40ms:
Record pit
I add the following nvcc directory to the /etc/environment file, and that's it.
CUDACXX=/usr/local/cuda-10.2/bin/nvcc
Inference speed record
system | model | frame | enter | accuracy | Time (ms) |
---|---|---|---|---|---|
JP4.3 | yolov5s v1.0 | PyTorch 1.4 | 416*416 | — | 60~80 |
JP4.4.1 | yolov4 | Tensorrt 7.13.0 | 416*416 | FP16 | 280 |
JP4.4.1 | yolov4 | Tensorrt 7.13.0 | 416*416 | FP32 | 380 |
JP4.4.1 | yolov4-tiny(batch 1) | Tensorrt 7.13.0 | 416*416 | FP16 | 45 |
JP4.4.1 | yolov4-tiny(batch 4) | Tensorrt 7.13.0 | 416*416 | FP16 | 38 |
JP4.4.1 | yolov5s v3.1 | Tensorrt 7.13.0 | 608*608 | FP16 | 110~130 |
JP4.4.1 | yolov5s v3.1 | PyTorch 1.6 | 608*608 | — | 170~210 |
JP4.4.1 | yolov5s v3.1 | PyTorch 1.6 | 416*416 | — | 120~140 |
I don't know why the yolov5s v1.0 version can run within 100ms directly on the nano JP4.3. If you have any knowledge, please let me know.