4.1. tensorRT basics (1) - overview

foreword

Teacher Du launched the tensorRT high-performance deployment course from scratch . I have read it before, but I didn’t take notes, and I forgot many things. This time I will do it again, and take notes by the way.

This course learns tensorRT basics - overview

The course outline can be seen in the mind map below

insert image description here

1. Basic overview of tensorRT

About tensorRT you need to know:

  1. The core of TensorRT lies in the optimization of model operators ( combining operators , using GPU features to select specific kernel functions, etc.), through tensorRT, the best performance can be obtained on NVIDIA series GPUs
  2. Therefore, the tensorRT model needs to select the optimal algorithm and configuration in the way it actually runs on the target GPU
  3. Therefore, the model generated by tensorRT can only be run under certain conditions (compiled trt version, cuda version, GPU model at compile time)
  4. The main knowledge points are the model structure definition method, compilation process configuration, reasoning process implementation, plug-in implementation, onnx understanding

Reference article: https://www.cnblogs.com/qccz123456/p/11767858.html

Reference video: https://www.bilibili.com/video/BV1Xw411f7FW/

2. Supplementary knowledge of tensorRT

2.1 What is tensorRT?

tensorRT is an SDK (Software Development Kit) that is a software development kit for optimizing trained deep learning models for high-performance reasoning

2.2 tensorRT features

Why can tensorRT speed up the reasoning process and how is it optimized? It is mainly reflected in the following aspects:

  • Operator Fusion Conv+Bias+ReLU -> CBR
  • Quantify
    • INT8 or FP16 and TF32
    • Storage advantages, computing advantages, communication advantages
  • kernel autotuning
    • Choose different optimization strategies and calculation methods according to different graphics card architectures, number of SMs, core frequencies, etc., to find the most suitable calculation method for the current architecture
    • Kernel can choose the most suitable algorithm for batches of different sizes and the complexity of the problem. TensorRT has pre-written many GPU implementations, and there is an automatic selection process.
  • Dynamic Tensor Thread WorkSpace
  • multi-stream execution

The figure below shows the internal optimization of tensorRT, which mainly speeds up the reasoning process by reducing data torsion through operator merging

insert image description here

Figure 2-1 internal optimization of tensorRT

2.3 tensorRT workflow

How does tensorRT build models? mainly in two ways

1. Build the model layer by layer through the TRT API

insert image description here

Figure 2-2 C++ interface of tensorRT

insert image description here

Figure 2-3 Python interface of tensorRT

2. NVIDIA officially provides three other ways to achieve more convenient packaging , as shown in the figure below

  • Calling the interface directly is very cumbersome. It is difficult to debug when problems occur, and the weight description is also cumbersome, so a more advanced method is also provided

  • For files in UFF format , libnvparsers.soyou can call the TRT API to parse the UFF file to build a model (the scheme adopted by tensorflow)

  • For files in ONNX format , libnvonnxparser.soyou can call TRT API to parse ONNX files to build models (the scheme adopted by pytorch)

  • Files in Caffe format , libnvcaffe_parser.soyou can call TRT API to parse Caffe files to build models (use less)

insert image description here

Figure 2-4 High-level workflow of tensorRT

2.4 Common schemes

Based on the release of tensorRT, someone has done work on it, https://github.com/wang-xinyu/tensorrtx , this repo writes hard codes for each model, and has written a lot of common model codes

insert image description here
insert image description here

Figure 2-5 Workflow of common solutions

The model construction method of the common scheme adopts hard code, which has poor flexibility. The new model requires its own layer-by-layer C++ code construction, which is not universal and poor in portability; and the hard code method is too flexible, requiring too many details to be controlled, and it is impossible to view the network structure for analysis and troubleshooting during deployment

This course mainly learns model compilation, reasoning and deployment with the onnx route. The main reasons are :

If you use onnx, the exported or modified onnx model can be easily transplanted to other engines , such as ncnn and rknn, which cannot be done by hard code. And it is used to troubleshoot errors, and it is also very convenient when modifying and adjusting

The workflow for this course is shown in the diagram below:

insert image description here
insert image description here

Figure 2-6 The workflow used in this course

2.5 tensorRT library files

Finally, let's take a look at tensorRT's library files

insert image description here

Figure 2-7 tensorRT library file

Summarize

This course is an overview of the basics of tensorRT. It mainly explains the characteristics of tensorRT. It is a software toolkit launched by NVIDIA to optimize the trained deep learning model to achieve high-performance reasoning. It is worth noting that due to the tensorRT model , it is necessary to select the optimal algorithm and configuration in an actual running manner on the target GPU , and therefore the model generated by tensorRT is strongly bound to its device , and is associated with the trt version, cuda version, and GPU model when it is compiled.

At the same time, we also learned about the workflow of tensorRT, which is mainly divided into two types. One is to write hard code through C++ or Python interface. This method is too flexible and inconvenient to transplant, and it is difficult to troubleshoot deployment errors. The other is to parse caffe, uff or onnx models through more advanced packaging and some library files. This method is easy to modify, adjust, troubleshoot, and facilitate transplantation. Therefore, this course adopts the second method.

Guess you like

Origin blog.csdn.net/qq_40672115/article/details/131751396