NeMo DGXC Admission Controller microservice enables multi-node GPU training environments by managing Kubernetes resources for high-performance computing. It handles networking configurations for technologies like Elastic Fabric Adapter (EFA) on AWS, InfiniBand on Azure, and RDMA on OCI.
You can use this service to optimize GPU workloads across multiple nodes, configure high-performance networking for distributed training, and ensure proper resource allocation for AI training jobs in Kubernetes clusters.
You can install DGXC Admission Controller as part of the NeMo microservices platform by using the NeMo Microservices Helm Chart (chart | documentation.
Container | Helm Installation Guide
Note: Use, distribution or deployment of this microservice in production requires an NVIDIA AI Enterprise License.
The software and materials are governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products.