NVIDIA NIM for LLMs | NVIDIA NGC

NGC Catalog

Welcome Guest

For copy image paths and more information, please view on a desktop device.

Associated Products

Features

Description

NVIDIA NIM for large language models (LLMs) brings the power of state-of-the-art AI models to your applications, providing unmatched natural language processing and understanding capabilities.

Publisher

NVIDIA

Latest Tag

Modified

May 1, 2024

Compressed Size

0 B

Multinode Support

Multi-Arch Support

NVIDIA NIM for GPU accelerated LLM inference through OpenAI compatible API's

What is NVIDIA NIM

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to accelerate deployment of generative AI across your enterprise. This versatile runtime supports a broad spectrum of AI models—from open-source community models to NVIDIA AI Foundation models, as well as custom AI models. Leveraging industry standard APIs, developers can quickly build enterprise-grade AI applications with just a few lines of code.

What is NVIDIA NIM for Large Language Models (LLMs)

NVIDIA NIM for large language models (LLMs) brings the power of state-of-the-art AI models to your applications, providing unmatched natural language processing and understanding capabilities. Whether you're developing chatbots, content analyzers—or any application that needs to understand and generate human language— NIM has you covered. Built on the NVIDIA software platform incorporating CUDA, TRT, TRT-LLM, and Triton, NIM brings state of the art GPU accelerated LLM serving.

High Performance Features

NVIDIA NIM for LLMs enables optimized scheduling technique called in-flight batching. This technique takes advantage of the fact that the overall text generation process for an LLM can be broken down into multiple execution iterations on the model. With in-flight batching, rather than waiting for the whole batch to finish before moving on to the next set of requests, the Inference Microservice runtime immediately evicts finished sequences from the batch. The runtime then begins executing new requests while other requests are still in flight.
Scalable Deployment: Whether you're catering to a few users or millions, the microservice scales seamlessly to meet your demands.
Advanced Language Models: Built on cutting-edge LLM architectures, Inference Microservice provides optimized and pre-generated engines for a variety of popular models. The tooling to create GPU optimized models is also included.
Flexible Integration: Easily incorporate the microservice into existing workflows and applications, thanks to multiple API endpoints.
Secure Processing: Your data's privacy is paramount. Inference Microservice ensures that all inferences are processed securely, with rigorous data protection measures in place.

Applications

Chatbots & Virtual Assistants: Empower your bots with human-like language understanding and responsiveness.
Content Generation & Summarization: Generate high-quality content or distill lengthy articles into concise summaries with ease.
Sentiment Analysis: Understand user sentiments in real-time, driving better business decisions.
Language Translation: Break language barriers with efficient and accurate translation services.
And many more... The potential applications of Inference Microservice are vast, spanning across various industries and use-cases.

Compatible Infrastructure Software Versions

For optimal performance, deploy the supported NVIDIA AI Enterprise Infrastructure software with this NIM.

NVIDIA AI Enterprise Infrastructure 5

nim_llm: 24.02 Use this for models supported by TRT-LLM backend NIM. Pre-built engines are available for select GPUs and MRG available for generating engines for other GPUs

Mixtral-8x7B-v0.1
Mixtral-8x7B-Instruct-v0.1
Llama-2-13b-chat
Llama-2-13b
StarCoder
Nemotron-3-8B-QA-4k
Nemotron-3-8B-Chat-4k-SteerLM
Nemotron-3-8B-Base-4k

nim_llm: 24.02-day0 Use for the following models:

Llama-2-70b-chat
Llama-2-70b
Llama2-70B-SteerLM-Chat
CodeLlama-70b-Instruct-hf
CodeLlama-13b-Instruct-hf
CodeLlama-34b-Instruct-hf
Llama-2-7b-chat
Llama-2-7b
StarCoder2-15B
StarCoderPlus
Gemma 7B Instruct
Gemma 2B Instruct
Falcon-40B-Instruct
Phi-2
Mistral-7B-Instruct-v0.2

Getting Started with NVIDIA NIM

Please visit the NVIDIA NIM Collection page on how to get started: NVIDIA NIM for LLM

Get Help

Enterprise Support

Get access to knowledge base articles and support cases or submit a ticket.

NVIDIA AI Enterprise Documentation

Visit the NVIDIA AI Enterprise Documentation Hub for release documentation, deployment guides and more.

NVIDIA Licensing Portal

Go to the NVIDIA Licensing Portal to manage your software licenses. licensing portal for your products. Get Your Licenses

License

This NIM is licensed under the NVIDIA AI Product Agreement. By downloading and using the artifacts in this collection, you accept the terms and conditions of this license.