NVIDIA

Nemotron-3-Super-120B-A12B (Turbo)

Container

NVIDIA

Nemotron-3-Super-120B-A12B (Turbo)

Nemotron-3-Super-120B-A12B is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks.

NVIDIA NIM

Nemotron-3-Super-120B-A12B(Turbo) Overview

Description:

This turbo NIM container houses Nemotron-3-Super-120B-A12B, which is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Like other models in the family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template.

The model employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Distinct from the Nano model, the Super model incorporates Multi-Token Prediction (MTP) layers for faster text generation and improved quality, and it is trained using NVFP4 quantization to maximize compute efficiency. The model has 12B active parameters and 120B parameters in total.

The supported languages include: English, French, German, Italian, Japanese, Spanish, and Chinese.

This turbo NIM container has following optimizations added -

Newer OSS vLLM container
KV cache and Mamba SSM cache settings
FlashInfer attention backend
Expert parallelism
Kernel fusion for attn+quant

The container components are ready for commercial/non-commercial use.

License/Terms of Use:

GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products. Use of this model is governed by the NVIDIA Nemotron Open Model License.

Get Help

NVIDIA Developer Community Forum

Get access to community knowledge base articles and support cases (https://forums.developer.nvidia.com/)

Deployment Geography:

Global

Release Date:

Huggingface 3/11/2026 via https://huggingface.co/

NGC 3/11/2026 via
https://catalog.ngc.nvidia.com/models

Nemotron-3-Super-120B-A12B-turbo

NVIDIA-Nemotron-3-Super-120B-A12B-turbo Container includes the following model:

Model Name & Link	Use Case	How to Pull the Model
NVIDIA-Nemotron-3-Super-120B-A12B-turbo	Nemotron-3-Super-120B-A12B is a general purpose reasoning and chat model that employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. The model has 12B active parameters and 120B parameters in total.	Manual

Deployment Details:

Visit the NIM Container LLM page for release documentation, deployment guides, and more.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Container Version(s):

/nim/nvidia/nemotron-3-super-120b-a12b--turbo:1.0.0

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.

Get Help

Getting started with the NIM

Deploying and integrating the NIM is straightforward thanks to our industry standard APIs. Visit the NVIDIA NIM documentation for release documentation, deployment guides and more.

NVIDIA Developer Community Forum

For support, visit the NVIDIA Developer Community Forum.

Publisher

NVIDIA

Latest Tag1.0.0

UpdatedJune 25, 2026 UTC

Compressed Size9.16 GB

Multinode SupportNo

Multi-Arch SupportYes

System

signed images

Labels

Experimental NSPECT-YN18-TKF1