Accelerating ScaledYOLOv4 with TensorRT
Many people have written the TensorRT version of yolo, and I will write one too. The specific code can be found in my github
test environment
ubuntu 18.04
pytorch 1.7.1
jetpack 4.4
CUDA 11.0
TensorRT7.1
quick start
1. Generate onnx model
git clone --branch yolov4-csp https://github.com/WongKinYiu/ScaledYOLOv4
git clone https://github.com/talebolano/TensorRT-Scaled-YOLOv4
cp TensorRT-Scaled-YOLOv4/script/ScaledYOLOv4-csp/* ScaledYOLOv4/
下载yolov4-csp.weights到ScaledYOLOv4/
cd ScaledYOLOv4
python3 export.py
2. Compile
cd ../TensorRT-Scaled-YOLOv4
mkdir build
cd build
cmake ..
make -j8
3. Convert the onnx model to the trt model
./makeCudaEngine -i ../../ScaledYOLOv4/yolov4-csp.onnx -o yolov4-csp.trt
4. Test
./inferYoloCuda -e yolov4-csp.trt -i 你的图片 -show -save
speed effect
Mode | GPU | inference time | Ap |
---|---|---|---|
FP16 | V100 | 10ms | - |
FP16 | xavier | 35ms | - |
Using the mish plugin layer
1. Download the open source version of TensorRT and register the Mish plugin in builtin_op_importers.cpp. Paste the following code at the bottom of builtin_op_importers.cpp
DEFINE_BUILTIN_OP_IMPORTER(Mish)
{
ASSERT(inputs.at(0).is_tensor(), nvonnxparser::ErrorCode::kUNSUPPORTED_NODE); // input
std::vector<nvinfer1::ITensor*> tensors;
nvinfer1::ITensor* input = &convertToTensor(inputs.at(0),ctx);
tensors.push_back(input);
const std::string pluginName = "Mish_TRT";
const std::string pluginVersion = "001";
std::vector<nvinfer1::PluginField> f;
const auto mPluginRegistry = getPluginRegistry();
const auto pluginCreator
= mPluginRegistry->getPluginCreator(pluginName.c_str(), pluginVersion.c_str(), "");
nvinfer1::PluginFieldCollection fc;
fc.nbFields = f.size();
fc.fields = f.data();
nvinfer1::IPluginV2* plugin = pluginCreator->createPlugin(node.name().c_str(), &fc);
ASSERT(plugin != nullptr && "Mish plugin was not found in the plugin registry!",
ErrorCode::kUNSUPPORTED_NODE);
nvinfer1::IPluginV2Layer* layer = ctx->network()->addPluginV2(tensors.data(), tensors.size(), *plugin);
RETURN_ALL_OUTPUTS(layer);
}
2. Use MishImplementation() instead of Mish() in ScaledYOLOv4/models/models.py, and use
modules.add_module('activation', Mish())
replace with
modules.add_module('activation', MishImplementation())
3. Generate onnx model and convert it to trt model
python3 export.py
./makeCudaEngine -i ../../ScaledYOLOv4/yolov4-csp.onnx -o yolov4-csp.trt
4. Test
./inferYoloCuda -e yolov4-csp.trt -i 你的图片 -show -save
Note: Using the mish plugin in xavier is more than 10ms slower than not using it.