GPU-optimized AI, Machine Learning, & HPC Software

NVIDIA

Triton Inference Server is an open source software that lets teams deploy trained AI models from any framework, from local or cloud storage and on any GPU- or CPU-based infrastructure in the cloud, data center, or embedded devices.

Container

NVIDIA

TensorRT

NVIDIA TensorRT is a C++ library that facilitates high-performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network and produces a highly optimized runtime engine that performs inference for that network.

Container

NVIDIA Developer Program

Nvidia

NVIDIA NIM for GenMol

GenMol is a masked diffusion model trained on molecular SAFE representations for fragment-based molecule generation, which can serve as a generalist model for various drug discovery tasks.

Container

NVIDIA

NVIDIA NIM Operator

An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment

Container

NVIDIA

vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving. The NVIDIA vLLM NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance.

Container

NVIDIA Developer Program

MIT

DiffDock

Diffdock predicts the 3D structure of the interaction between a molecule and a protein.

Container

NVIDIA

Dynamo vLLM Runtime

The Dynamo vLLM runtime image is a containerized build of Dynamo + vLLM which serves as the base runtime environment for vLLM based inference with Dynamo's distributed inference framework.

Container

NVIDIA

Dynamo Tensorrt-LLM Runtime

The Dynamo TensorRT-LLM runtime image is a containerized build of Dynamo + TensorRT-LLM which serves as the base runtime environment for tensorrt-llm based inference with Dynamo's distributed inference framework.

Container

NVIDIA

CUDA GL

CUDA is a parallel computing platform and programming model that enables dramatic increases in computing performance by harnessing the power of the NVIDIA GPUs. These images extend the CUDA images to include OpenGL support through libglvnd.

Container

NVIDIA

NVIDIA NIM Operator

Helm chart for NIM Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment

Helm Chart

NVIDIA

Dynamo Platform

A comprehensive Helm chart for deploying the NVIDIA Dynamo operator and its dependencies

Helm Chart

NVIDIA

Dynamo kubernetes-operator

kubernetes-operator is a container that runs as part of the Dynamo cloud platform. Dynamo cloud is a kubernetes platform for deploying and managing inference services. This container manages the lifecycle of Dynamo inference deployments in kubernetes.

Container

NVIDIA

Riva Skills Quick Start

Scripts and utilities for getting started with Riva Speech Skills

Resource

NVIDIA

Dynamo SGLang Runtime

The Dynamo SGLang runtime image is a containerized build of Dynamo + SGLang which serves as the base runtime environment for sglang based inference with Dynamo's distributed inference framework.

Container

NVIDIA AI Enterprise

NVIDIA

Triton Inference Server PB October 2024 (PB 24h2)

Triton Inference Server Production Branch October 2024 (PB 24h2) offers a 9-month lifecycle for API stability, with monthly patches for high and critical software vulnerabilities.

Container

NVIDIA AI Enterprise

NVIDIA

Triton Inference Server PB March 2025 (PB 25h1)

Triton Inference Server PB May 2025 (PB 25h1) offers a 9-month lifecycle for API stability, with monthly patches for high and critical software vulnerabilities.

Container

NVIDIA

Merlin PyTorch

The Merlin PyTorch container allows users to do preprocessing and feature engineering with NVTabular, and then train a deep-learning based recommender system model with PyTorch, and serve the trained model on Triton Inference Server.

Container

NVIDIA

NVCF ClusterAgent

NVCF ClusterAgent (NVCA) is a self-installable micro-service that enables a compute backend, DGX Clouds or other NVIDIA Compute Backends to be used as an NVCF Target to host NVIDIA Cloud Functions’ instances.

Container

NVIDIA AI Enterprise

NVIDIA

Triton Inference Server PB October 2025 (PB 25h2)

Triton Inference Server PB October 2025 (PB 25h2) offers a 9-month lifecycle for API stability, with monthly patches for high and critical software vulnerabilities.

Container

NVIDIA

NVCA Webhook Server

NVCF ClusterAgent (NVCA) is a self-installable micro-service that enables a compute backend to host NVIDIA Cloud Functions’ instances. NVCA Webhook Server is used by NVCA to enforce restrictions for NVCF Mini Service.

Container

NVIDIA

Dynamo CRDs

A Helm chart that manages Custom Resource Definitions (CRDs) for the NVIDIA Dynamo ecosystem in Kubernetes

Helm Chart

NVIDIA

Morpheus

NVIDIA Morpheus is an open AI application framework for cybersecurity developers.

Container

NVIDIA AI Enterprise

NVIDIA

Triton Inference Server Long-Term Support Branch 2 (LTSB 2)

Triton Inference Server is an open source software that lets teams deploy trained AI models from any framework, from local or cloud storage and on any GPU- or CPU-based infrastructure in the cloud, data center, or embedded devices.

Container

NVIDIA

Merlin TensorFlow

The Merlin TensorFlow container allows users to do preprocessing and feature engineering with NVTabular, and then train a deep-learning based recommender system model with TensorFlow, and serve the trained model on Triton Inference Server.

Container