NVIDIA releases TensorRT-LLM library for Windows to speed up running large models locally

NVIDIA has released a Windows version of the TensorRT-LLM library, saying it will increase the running speed of large models on RTX by 4 times .

GeForce RTX and NVIDIA RTX GPUs, equipped with dedicated AI processors called Tensor Cores, are bringing the power of native generative AI to more than 100 million Windows PCs and workstations.

TensorRT-LLM is an open source library used to improve the inference performance of the above-mentioned GPUs running the latest AI large models (such as Llama 2 and Code Llama). Last month, NVIDIA released TensorRT-LLM for data centers. Now the latest release of TensorRT-LLM for Windows is mainly for home computers, increasing the speed of running LLM on PCs by 4 times.

NVIDIA also released tools to help developers accelerate LLM, including scripts to use TensorRT-LLM to optimize custom models, TensorRT-optimized open source models, and developer reference projects that demonstrate LLM response speed and quality.

Guess you like

Origin www.oschina.net/news/262298/tensorrt-llm-windows-stable-diffusion-rtx