TensorRT + int8 official forum interesting discussion summary

  A recent study conducted by NVIDIA graphics acceleration and TensorRT int8 calibration, encountered a lot of problems, carefully study under preparation. Int8 calibration theory have been introduced in many forums, here feel interesting discussion recorded at the official forum, easy to post to read.

1, self-defined layer optimization int8

  INT8 plugin Layer TensorRT
  the About int8 use Model!
  Official developer said only supports four formats, the future will add more features, int8 did not say what they could achieve self-defined layer optimization.

2, the detection accuracy optimized reduced int8

  Int8 Calibration IS not Accurate ... See Image diff with and the without
  Convert For A Detect Model to int8, at The Performance drops A Lot
  was found to detect network through int8 optimization have differences, even the accuracy of the decline, but the official developer by yolo tests do not think this question and instead of entropy calibrator to calibrate the model with the legacy calibrator, will help improve the accuracy. But according to an official statement document, legacy calibrator should be discarded method.

3, after optimization to improve the accuracy int8

  Analyzing sampleInt8 accuracy
  was found by experiment after int8 optimization, improve the accuracy of recognition model, and the model itself was analysis may be over-fitting the training process, by optimizing the int8 reduce the extent of over-fitting model, so the test set in show recognition accuracy improved.

4, using the calibration table int8 issues on different devices

  Could TensorRT INT8 CalibrationTable be use on different Hardware platform?
  Official developer answer, optimized calibration table if you use the same TensorRT version of the method is the same, if a special mention after EntropyCalibrator2 5.1 (official document stating that the method requires DLA) can be different platform migration.
  Do tensorRT plan files are portable across different
GPUs which have the same type   should be noted that TensorRT file or caution on different platforms, may appear warning!

5, int8 optimization fastrcnn

  "Engine IS Buffer Full"
  the Sample official installation package has fastrcnn, but need to add a plugin, this discussion is August 2018, and referred to fastrcnn of int8 conversion seems a bit difficult to look at Mark.

6, after checking the accuracy of each layer optimized TensorRT

  How to check layer precision?
  使用nvprof。

7, a layer 0 is output, calibration failure cause int8

  Int8 Calibration failing when one layer's output
is uniformly zero   if the model of a layer weight of all 0 will cause the output layer are all 0, thus suggesting int8 calibration failed. Official developers think this is not common reasoning model, proposed changes in the model of the clipped branches.

8, memory use TensorRT model

  TensorFlow / TRT with multiple TF sessions -
Dynamic INT8 engine memory allocation errors   official developers say TensorRT background is required to use all of the memory to build the best reasoning model, such as memory designated TensorFlow will not take effect in TensorRT. And setMaxWorkSpace (X) to produce an API specifies only the size of the storage engine.

9 FOR

  DLA with Tesla to use How T4
  DLA is a setting item TensorRT, but there is official documentation only Jetson AGX Xavier indicate support DLA. Here it was stated that DLA is only used in mobile related product, a desktop GPU product without this unit.

10, large BatchSize can infer the speed upgrade TensorRT

  Why inference speedup increases with the increase of batch size in tensorrt int8?
  Official said developer large BatchSIze more efficient use of GPU, especially the use of a multiple of 32, such as the batch size may V100, T4 such full matrix multiplication using a proprietary core speed and full graphics on the connection layer.

11, TensorRT accelerate no effect

  See the using the any speedups TensorRT dont
  TensorRT acceleration performance will depend on how much of the original replace operation network to optimize the operation TensorRT for python + TensorFlow can be viewed by the following code.

trt_engine_ops = len([1 for n in trt_graph.node if str(n.op)=='TRTEngineOp'])

  

  

  

Published 24 original articles · won praise 8 · views 20000 +

Guess you like

Origin blog.csdn.net/yangjf91/article/details/92794182