Logo for NVIDIA NIM for LLMs
Associated Products
NVIDIA NIM for large language models (LLMs) brings the power of state-of-the-art AI models to your applications, providing unmatched natural language processing and understanding capabilities.
Latest Tag
May 7, 2024
Compressed Size
6.77 GB
Multinode Support
Multi-Arch Support
24.02-day0 (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

NVIDIA NIM for GPU accelerated LLM inference through OpenAI compatible API's


NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to accelerate deployment of generative AI across your enterprise. This versatile runtime supports a broad spectrum of AI models—from open-source community models to NVIDIA AI Foundation models, as well as custom AI models. Leveraging industry standard APIs, developers can quickly build enterprise-grade AI applications with just a few lines of code.

What is NVIDIA NIM for Large Language Models (LLMs)

NVIDIA NIM for large language models (LLMs) brings the power of state-of-the-art AI models to your applications, providing unmatched natural language processing and understanding capabilities. Whether you're developing chatbots, content analyzers—or any application that needs to understand and generate human language— NIM has you covered. Built on the NVIDIA software platform incorporating CUDA, TRT, TRT-LLM, and Triton, NIM brings state of the art GPU accelerated LLM serving.

High Performance Features

  • NVIDIA NIM for LLMs enables optimized scheduling technique called in-flight batching. This technique takes advantage of the fact that the overall text generation process for an LLM can be broken down into multiple execution iterations on the model. With in-flight batching, rather than waiting for the whole batch to finish before moving on to the next set of requests, the Inference Microservice runtime immediately evicts finished sequences from the batch. The runtime then begins executing new requests while other requests are still in flight.

  • Scalable Deployment: Whether you're catering to a few users or millions, the microservice scales seamlessly to meet your demands.

  • Advanced Language Models: Built on cutting-edge LLM architectures, Inference Microservice provides optimized and pre-generated engines for a variety of popular models. The tooling to create GPU optimized models is also included.

  • Flexible Integration: Easily incorporate the microservice into existing workflows and applications, thanks to multiple API endpoints.

  • Secure Processing: Your data's privacy is paramount. Inference Microservice ensures that all inferences are processed securely, with rigorous data protection measures in place.


  • Chatbots & Virtual Assistants: Empower your bots with human-like language understanding and responsiveness.

  • Content Generation & Summarization: Generate high-quality content or distill lengthy articles into concise summaries with ease.

  • Sentiment Analysis: Understand user sentiments in real-time, driving better business decisions.

  • Language Translation: Break language barriers with efficient and accurate translation services.

  • And many more... The potential applications of Inference Microservice are vast, spanning across various industries and use-cases.

Compatible Infrastructure Software Versions

For optimal performance, deploy the supported NVIDIA AI Enterprise Infrastructure software with this NIM.

nim_llm: 24.02 Use this for models supported by TRT-LLM backend NIM. Pre-built engines are available for select GPUs and MRG available for generating engines for other GPUs

  • Mixtral-8x7B-v0.1
  • Mixtral-8x7B-Instruct-v0.1
  • Llama-2-13b-chat
  • Llama-2-13b
  • StarCoder
  • Nemotron-3-8B-QA-4k
  • Nemotron-3-8B-Chat-4k-SteerLM
  • Nemotron-3-8B-Base-4k

nim_llm: 24.02-day0 Use for the following models:

  • Llama-2-70b-chat
  • Llama-2-70b
  • Llama2-70B-SteerLM-Chat
  • CodeLlama-70b-Instruct-hf
  • CodeLlama-13b-Instruct-hf
  • CodeLlama-34b-Instruct-hf
  • Llama-2-7b-chat
  • Llama-2-7b
  • StarCoder2-15B
  • StarCoderPlus
  • Gemma 7B Instruct
  • Gemma 2B Instruct
  • Falcon-40B-Instruct
  • Phi-2
  • Mistral-7B-Instruct-v0.2

Getting Started with NVIDIA NIM

Please visit the NVIDIA NIM Collection page on how to get started: NVIDIA NIM for LLM

Get Help

Enterprise Support

Get access to knowledge base articles and support cases or submit a ticket.

NVIDIA AI Enterprise Documentation

Visit the NVIDIA AI Enterprise Documentation Hub for release documentation, deployment guides and more.

NVIDIA Licensing Portal

Go to the NVIDIA Licensing Portal to manage your software licenses. licensing portal for your products. Get Your Licenses


This NIM is licensed under the NVIDIA AI Product Agreement. By downloading and using the artifacts in this collection, you accept the terms and conditions of this license.