October 18, 2023 – NVIDIA, the reigning champion of hardware in the realm of generative artificial intelligence, continues to push the boundaries of AI innovation. Their GPUs power data centers for industry giants like Microsoft and OpenAI, running services such as Bing Chat and ChatGPT. Today, NVIDIA has unveiled a new software tool aimed at enhancing the performance of Large Language Models (LLMs) on local Windows PCs.
In a recent blog post, NVIDIA introduced the TensorRT-LLM open-source library, originally designed for data centers but now available for Windows PCs. The standout feature is the significant fourfold performance boost TensorRT-LLM provides when paired with an NVIDIA GeForce RTX GPU on a Windows PC.
NVIDIA outlined the advantages of TensorRT-LLM for both developers and end-users:
With larger batch sizes, this acceleration can notably enhance the experience of using more complex LLMs, such as writing and coding assistants. It allows for the simultaneous generation of multiple unique auto-complete results, thereby accelerating performance and improving quality, ensuring users have the best choices available.
It’s worth noting that the blog post showcased an example of TensorRT-LLM in action. When posed with the question, “What NVIDIA technologies are integrated into ‘Mind Killer 2’?” using the LLaMa 2 base model, it produced an unhelpful response of “The game has not been announced.” In contrast, utilizing RAG (Retrieval Augmented Generation) and adding GeForce news to the vector database, along with connecting to the same Llama2 model, not only provided the correct answer—NVIDIA DLSS 3.5, NVIDIA Reflex, and ray tracing technology—but did so with improved speed thanks to the acceleration from TensorRT-LLM. This blend of speed and capability delivers more intelligent solutions for users.
TensorRT-LLM will soon be available on NVIDIA’s developer website.
NVIDIA also introduced several AI-based features in their latest GeForce driver update. This includes the new 1.5 version of RTX Video Super Resolution, which offers better resolution enhancement and reduced compression effects when streaming online videos. Furthermore, NVIDIA added TensorRT AI acceleration capabilities to the Stable Diffusion Web UI, enabling users with GeForce RTX GPUs to obtain images from AI image generators faster than ever before.