TensorRT里的专业名词解析

专业名词解析

Batch
A batch is a collection of inputs that can all be processed uniformly. Each instance in the
batch has the same shape and flows through the network in exactly the same way. All
instances can therefore be computed in parallel.

Builder
TensorRT’s model optimizer. The builder takes as input a network definition, performs
device-independent and device-specific optimizations, and creates an engine. For more
information about the builder, see Builder API. D

Dynamic batch
A mode of inference deployment where the batch size is not known until runtime.
Historically, TensorRT treated batch size as a special dimension, and the only dimension
which was configurable at runtime. TensorRT 6 and later allow engines to be built such that
all dimensions of inputs can be adjusted at runtime.

Engine
A representation of a model that has been optimized by the TensorRT builder. For more
information about the engine, see Execution API.

Explicit batch
An indication to the TensorRT builder that the model includes the batch size as one of the
dimensions of the input tensor(s). TensorRT’s implicit batch mode allows the batch size to
be omitted from the network definition and provided by the user at runtime, but this mode
is not supported by the ONNX parser.
这里面指的是是每次engine前向的时候(就是执行命令enqueue或者execute)是否需要专门指定batch-size，如果需要指定batch-size，就是**Explicit batch** 否则就是**implicit batch** 而onnx就必须是**Explicit batch** ，这个需要从在序列化的时候就通过一个命令指定，前向的时候再指定batch-size。

Framework integration
An integration of TensorRT into a framework such as TensorFlow, which allows model
optimization and inference to be performed within the framework.

Network definition
A representation of a model in TensorRT. A network definition is a graph of tensors and
operators.

ONNX
Open Neural Network eXchange. A framework-independent standard for representing
machine learning models. For more information about ONNX, see onnx.ai.
Additional Resources
NVIDIA TensorRT DU-10313-001_v8.0.2 | 30

ONNX parser
A parser for creating a TensorRT network definition from an ONNX model. For more details
on the C++ ONNX Parser, see NvONNXParser or the Python ONNX Parser.

Plan
An optimized inference engine in a serialized format. To initialize the inference engine, the
application will first deserialize the model from the plan file. A typical application will build
an engine once, and then serialize it as a plan file for later use.
就是保存到磁盘上的序列化模型

Precision
Refers to the numerical format used to represent values in a computational method. This
option is specified as part of the TensorRT build step. TensorRT supports Mixed Precision
Inference with FP32, FP16, or INT8 precisions. Devices prior to Ampere default to FP32.
Ampere and later devices default to TF32, a fast format using FP32 storage with
lowerprecision math.

扫描二维码关注公众号，回复： 13163117 查看本文章

Runtime
The component of TensorRT which performs inference on a TensorRT engine. The runtime
API supports synchronous and asynchronous execution, profiling, and enumeration and
querying of the bindings for an engine inputs and outputs.

TF-TRT
TensorFlow integration with TensorRT. Optimizes and executes compatible subgraphs,
allowing TensorFlow to execute the remaining grap

Reference

https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export