Nemotron-3-Super-120B-A12B is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks.
Nemotron-3-Super-120B-A12B(Turbo) Overview
Description:
This turbo NIM container houses Nemotron-3-Super-120B-A12B, which is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Like other models in the family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template.
The model employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Distinct from the Nano model, the Super model incorporates Multi-Token Prediction (MTP) layers for faster text generation and improved quality, and it is trained using NVFP4 quantization to maximize compute efficiency. The model has 12B active parameters and 120B parameters in total.
The supported languages include: English, French, German, Italian, Japanese, Spanish, and Chinese.
This turbo NIM container has following optimizations added -
- Newer OSS vLLM container
- KV cache and Mamba SSM cache settings
- FlashInfer attention backend
- Expert parallelism
- Kernel fusion for attn+quant
The container components are ready for commercial/non-commercial use.
License/Terms of Use:
GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products. Use of this model is governed by the NVIDIA Nemotron Open Model License.
Get Help
NVIDIA Developer Community Forum
Get access to community knowledge base articles and support cases (https://forums.developer.nvidia.com/)
Deployment Geography:
Global
Release Date:
Huggingface 3/11/2026 via https://huggingface.co/
NGC 3/11/2026 via
https://catalog.ngc.nvidia.com/models
Nemotron-3-Super-120B-A12B-turbo
NVIDIA-Nemotron-3-Super-120B-A12B-turbo Container includes the following model:
| Model Name & Link | Use Case | How to Pull the Model |
|---|---|---|
| NVIDIA-Nemotron-3-Super-120B-A12B-turbo | Nemotron-3-Super-120B-A12B is a general purpose reasoning and chat model that employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. The model has 12B active parameters and 120B parameters in total. | Manual |
Deployment Details:
Visit the NIM Container LLM page for release documentation, deployment guides, and more.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Container Version(s):
/nim/nvidia/nemotron-3-super-120b-a12b--turbo:1.0.0
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.
You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.
Get Help
Getting started with the NIM
Deploying and integrating the NIM is straightforward thanks to our industry standard APIs. Visit the NVIDIA NIM documentation for release documentation, deployment guides and more.
NVIDIA Developer Community Forum
For support, visit the NVIDIA Developer Community Forum.