TensorRT acceleration principle

There are two main reasons why TensorRT can accelerate. On the one hand, it supports the calculation of INT8 and FP16; on the other hand, the network structure is reconstructed and optimized.

  • TensorRT supports the calculation of INT8 and FP16

When training deep learning networks, 32-bit or 16-bit data is usually used. TensorRT supports kFLOAT (float32), kHALF (float16), kINT8 (int8) three-precision calculations, and uses low-precision network reasoning to achieve the purpose of acceleration.

  • TensorRT refactored and optimized the network structure

TensorRT reconstructs the network structure, combines some operations that can be combined, and optimizes it according to the characteristics of the GPU.

  1. tensorRT eliminates the useless output layer in the network by analyzing the network model to reduce the calculation.
  2. For the vertical integration of the network structure, the three layers of conv, BN, and Relu of the current mainstream neural network are merged into one layer. For example, the common Inception structure shown in Figure 1 below is reconstructed into the network structure shown in Figure 2.

figure 1

  1. For the horizontal combination of the network, the horizontal combination refers to the fusion of layers that are input into the same tensor and perform the same operation, as shown in Figure 2 to Figure 3.

Reference connection:

Introduction to TensorRT Inference Engine and Introduction to Acceleration Principles

[Deep learning] Why TensorRT can make the model run faster

[TensorRT acceleration principle record]

Guess you like

Origin blog.csdn.net/wxplol/article/details/110540679