Monitor GPUs in Kubernetes using NVIDIA DCGM. This is an exporter for a Prometheus monitoring solution in Kubernetes.

dcgm-exporter

The NVIDIA Kubernetes Device Plugin registers GPUs as compute resources in the Kubernetes cluster.

k8s-device-plugin

gpu-operator-validator

Docker containers distributed as part of the TAO Toolkit package

tao-toolkit

Build and Run GPU Accelerated Docker Containers.

container-toolkit

Plugin for the Kubernetes Node Feature Discovery for adding GPU node labels.

gpu-feature-discovery

Provision NVIDIA GPU Driver as a Container

driver

Triton Inference Server is an open source software that lets teams deploy trained AI models from any framework, from local or cloud storage and on any GPU- or CPU-based infrastructure in the cloud, data center, or embedded devices.

tritonserver

Manages NVIDIA Driver upgrades in Kubernetes cluster.

k8s-driver-manager

CUDA is a parallel computing platform and programming model that enables dramatic increases in computing performance by harnessing the power of the NVIDIA GPUs.

cuda

Manage and Monitor GPUs in Cluster Environments.

dcgm

Deploy and Manage NVIDIA GPU resources in Kubernetes.

gpu-operator

PyTorch is a GPU accelerated tensor computational framework. Functionality can be extended with common Python libraries such as NumPy and SciPy. Automatic differentiation is done with a tape-based system at the functional and neural network layer levels.

pytorch

TensorFlow is an open source platform for machine learning. It provides comprehensive tools and libraries in a flexible architecture allowing easy deployment across a variety of platforms and devices.

tensorflow

Manage MIG partitions in Kubernetes with a simple label change to a node.

k8s-mig-manager

Llama 3.1 70B-Instruct NIM Production Branch October 2024 (PB 24h2) offers a 9-month lifecycle for API stability, with monthly patches for high and critical software vulnerabilities.

llama-3.1-70b-instruct-pb24h2

NVIDIA TensorRT is a C++ library that facilitates high-performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network and produces a highly optimized runtime engine that performs inference for that network.

tensorrt

NVIDIA NIM for GPU accelerated Snowflake Arctic Embed Large Embedding inference

arctic-embed-l

The NVIDIA Retrieval QA Llama 1B Reranking NIM is a NIM optimized for providing a logit score that represents how relevant a document(s) is to a given query, fine-tuned for multilingual and cross-lingual text question-answering retrieval.

llama-3.2-nv-rerankqa-1b-v2

NVIDIA NIM for GPU accelerated Gemma-2-2B-IT inference through OpenAI compatible APIs

gemma-2-2b-instruct

NVIDIA NIM for GPU accelerated Llama-3-Swalow-70B-Instruct-v0.1 inference through OpenAI compatible APIs

llama-3-swallow-70b-instruct-v0.1

NVIDIA NIM for GPU accelerated Llama 2 70B inference through OpenAI compatible APIs

llama-2-70b-chat

NVIDIA NIM for GPU accelerated Llama 3.1 Swallow 8B inference through OpenAI compatible APIs

llama-3.1-swallow-8b-instruct-v0.1

NVIDIA NIM for GPU accelerated Mistral-NeMo-12B-Instruct inference through OpenAI compatible APIs

mistral-nemo-12b-instruct

Hive’s Deepfake Image Detection model analyzes images and returns a confidence score on how likely the image contains a deepfake.