First Sight-Introduction to TensorRT <transfer>

      The following points should be clear about TensorRT first:

1. TensorRT is a deep learning inference tool developed by NVIDIA. It only supports inference, not training. Currently, TensorRT3 already supports mainstream deep learning libraries such as Caffe, Caffe2, TensorFlow, MxNet, and Pytorch;

2. The bottom layer of TensorRT has been optimized in many aspects for NVIDIA graphics cards, not only quantization, but also can be used in conjunction with the CUDA CODEC SDK, which is another development kit DeepStream;

3. TensorRT is independent of the deep learning framework and is implemented by parsing the framework files, without the need to install additional DL libraries;

 
NVIDIA TensorRT is a high-performance neural network inference (Inference) engine for deploying deep learning applications in production environments such as image classification, segmentation, and object detection, providing maximum inference throughput and efficiency. TensorRT is the first programmable inference accelerator to accelerate existing and future network architectures.
TensorRT requires CUDA support. TensorRT includes a library created to optimize deep learning models deployed in production environments, taking trained neural networks (usually using 32-bit or 16-bit data) and optimizing those networks for reduced precision INT8 operations. With CUDA programmability, TensorRT will be able to accelerate boost depth
The growing trend of increasingly diverse and complex neural networks. With the massive acceleration of TensorRT, service providers can deploy these compute-intensive AI workloads at an affordable cost.

    Companies from all walks of life are already adopting the NVIDIA inference platform to gain new insights from data and deploy intelligent services to businesses and consumers.

    TensorRT is released by NVIDIA and currently includes TensorRT1, TensorRT 2, and TensorRT 3. It is a deep learning software package that supports FP16 features. TensorRT supports models using Caffe. TensorRT is relatively simple and easy to use, and can release the computing power of GPU to a greater extent in the inference stage of deep learning algorithms.
In the process of continuous improvement, TensorRT keeps improving the speed while ensuring the accuracy of the software. TensorRT automatically optimizes trained neural networks for runtime performance.

    TensorRT is a C ++ library. TensorRT can only be used for inference (inference), not for training.

    TensorRT basic processing process: ( 1), caffe model converts GIE model, or loads GIE-available model from disk or network; (2), run GIE engine (data is copied to GPU in advance); (3 ), extract results.

    There are two ways to convert the GIE model: ( 1), caffeToGIEModel; (2 ), refer to the sampleMNIST API to build the GIE model by yourself.

    Solving supervised machine learning problems with deep neural networks involves two steps: the first step is to use GPUs to train deep neural networks on massive labeled data, which requires iterative forward propagation and back propagation through the network. Finally, the trained model file will be generated. The second step is inference, that is, using the trained model to
The data makes predictions and only needs to be forward propagated through the network. TensorRT is a high-performance inference engine designed to provide maximum inference throughput and efficiency for common deep learning applications such as image classification, segmentation, object detection, and more. TensorRT optimizes trained neural networks for runtime performance.

    Using TensorRT involves two phases: build and deployment. During the construction phase, TensorRT optimizes the network configuration and generates an optimized plan for computing the forward propagation of the deep neural network. The plan is an optimized object code that can be serialized and stored in memory or on disk.
The deployment phase typically takes the form of a long-running service or user application that accepts batches of input data, performs inference by executing a plan on the input data, and returns batches of output data. With TensorRT, you don't need to install and run a deep learning framework on your deployment hardware.

    TensorRT build phase: The TensorRT runtime requires three files to deploy a classification neural network: a network architecture file (deploy.prototxt), trained weights (net.caffemodel), and a label file for each output class name. Also, you have to define batch size and output layer.

    TensorRT performs several important transformations and optimizations on the neural network graph: layers with unused outputs are eliminated to avoid unnecessary computation; where possible, convolution, bias, and ReLU layers are fused to form A single layer, including vertical layer blending and horizontal layer blending.
After the TensorRT parser reads in the trained network and configuration files, TensorRT performs its transformations transparently to the API user during the build phase.

    During the build phase, TensorRT optimizes the network, and during the deployment phase, TensorRT runs the optimized network to minimize latency and maximize throughput.

    TensorRT 2.1 key features: (1), support for custom layers; (2), INT8 support for performance improvements; (3), provide recurrent neural network (LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit) ) implementation; (4 ), "original" RNN layer implementation.

    In September 2017, NVIDIA released the neural network inference accelerator TensorRT3. TensorRT 3 is a high-performance optimized compiler and runtime engine for production deployment of artificial intelligence applications. It is used to deploy deep learning programs in production environments. It can quickly optimize, validate and deploy trained neural networks,
This enables inference work on hyperscale data centers, embedded GPUs, or automotive GPU platforms. It ensures highly accurate INT8 and FP16 network execution.

    TensorRT 3 can support all deep learning frameworks such as Caffe2, Mxnet, Pytorch, TensorFlow, etc. Combining TensorRT 3 and NVIDIA GPU, it can perform ultra-fast and efficient inference transmission in all frameworks, support image and language recognition, natural AI services such as language processing, visual search, and personalized recommendations.
With the help of this inference engine, the inference performance of the cloud and terminal devices including robots and driverless cars can be greatly improved, and costs can be effectively reduced.
 

 TensorRT 1.0 and TensorRT2.1 can be downloaded from  https://developer.nvidia.com/nvidia-tensorrt-download .

Installing TensorRT 2.1 requires:

(1), the operating system only supports Ubuntu14.04 or Ubuntu 16.04, currently does not support Windows and Mac;

(2), the installed CUDA requirements are 7.5 or 8.0;

(3) There are two ways to install TensorRT2.1: through the deb package or through the tar file;

(4) For users whose graphics cards are GTX 750 and K1200, CUDA needs to be upgraded to 8.0.

The TensorRT 2.1 user guide can be referred to:  http://docs.nvidia.com/deeplearning/sdk/tensorrt-user-guide/index.html  

Part of the above content is translated from:  https://devblogs.nvidia.com/parallelforall/deploying-deep-learning-nvidia-tensorrt/

GitHub: https://github.com/fengbingchun/CUDA_Test

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325292547&siteId=291194637