Jetson Nano (5) TensorRT yolov4 yolov4-tiny yolov5 actual measurement

TensorRT yolov4 yolov4-tiny yolov5

Software and hardware environment

Jetson Nano 4G
JP 4.4.1
CUDA 10.2
TensorRT 7.1.3.0
Pytorch 1.6.0

Reference Boss Project

https://github.com/enazoe/yolo-tensorrt The author encapsulated the model conversion
https://github.com/wang-xinyu/tensorrtx
https://github.com/jkjung-avt/tensorrt_demos

Brief process

The different model frameworks used mainly use pytorch and darknet.

Darknet → ONNX → TensorRT
PyTorch → ONNX/WTS → TensorRT

Pay attention to the points

1. TensorRT link address in CMakelist.txt.
2. CUDA link address in CMakelist.txt.
3. CUDA TensorRT environment variables.
4. Batch setting in cfg in Yolov4/v4-tiny (batch 1 or 4).
5. Correspondence between Yolov5 version and PyTorch version (Yolov5 v3.1 pytorch 1.6).

Operating procedures

As long as the environment is normal and the operation is not complicated, use the encapsulated method of the big guy.

git clone https://github.com/enazoe/yolo-tensorrt.git
cd yolo-tensorrt/
mkdir build
cd build/
cmake ..
make
./yolo-trt

Note: The yolov5 currently supported by the main version is version 3.0. Before compiling, modify the model that needs to be inferred in samples/sample_detector.cpp as follows:
Insert picture description here

Attaching the yolov4-tiny running picture, the reasoning takes about 36-40ms:
Insert picture description here
Insert picture description here

Record pit

Insert picture description here
I add the following nvcc directory to the /etc/environment file, and that's it.

CUDACXX=/usr/local/cuda-10.2/bin/nvcc

Inference speed record

system model frame enter accuracy Time (ms)
JP4.3 yolov5s v1.0 PyTorch 1.4 416*416 60~80
JP4.4.1 yolov4 Tensorrt 7.13.0 416*416 FP16 280
JP4.4.1 yolov4 Tensorrt 7.13.0 416*416 FP32 380
JP4.4.1 yolov4-tiny(batch 1) Tensorrt 7.13.0 416*416 FP16 45
JP4.4.1 yolov4-tiny(batch 4) Tensorrt 7.13.0 416*416 FP16 38
JP4.4.1 yolov5s v3.1 Tensorrt 7.13.0 608*608 FP16 110~130
JP4.4.1 yolov5s v3.1 PyTorch 1.6 608*608 170~210
JP4.4.1 yolov5s v3.1 PyTorch 1.6 416*416 120~140

I don't know why the yolov5s v1.0 version can run within 100ms directly on the nano JP4.3. If you have any knowledge, please let me know.

Guess you like

Origin blog.csdn.net/djj199301111/article/details/110173275