NVIDIA
NVIDIA
Nemotron-3-Super-120B-A12B (Turbo)
Container
NVIDIA
NVIDIA
Nemotron-3-Super-120B-A12B (Turbo)

Nemotron-3-Super-120B-A12B is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks.

Nemotron-3-Super-120B-A12B(Turbo) Overview

Description:

This turbo NIM container houses Nemotron-3-Super-120B-A12B, which is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Like other models in the family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template.

The model employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Distinct from the Nano model, the Super model incorporates Multi-Token Prediction (MTP) layers for faster text generation and improved quality, and it is trained using NVFP4 quantization to maximize compute efficiency. The model has 12B active parameters and 120B parameters in total.

The supported languages include: English, French, German, Italian, Japanese, Spanish, and Chinese.

This turbo NIM container has following optimizations added -

  • Newer OSS vLLM container
  • KV cache and Mamba SSM cache settings
  • FlashInfer attention backend
  • Expert parallelism
  • Kernel fusion for attn+quant

The container components are ready for commercial/non-commercial use.

License/Terms of Use:

GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products. Use of this model is governed by the NVIDIA Nemotron Open Model License.

Get Help

NVIDIA Developer Community Forum

Get access to community knowledge base articles and support cases (https://forums.developer.nvidia.com/)

Deployment Geography:

Global

Release Date:

Huggingface 3/11/2026 via https://huggingface.co/

NGC 3/11/2026 via
https://catalog.ngc.nvidia.com/models

Nemotron-3-Super-120B-A12B-turbo

NVIDIA-Nemotron-3-Super-120B-A12B-turbo Container includes the following model:

Model Name & LinkUse CaseHow to Pull the Model
NVIDIA-Nemotron-3-Super-120B-A12B-turboNemotron-3-Super-120B-A12B is a general purpose reasoning and chat model that employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. The model has 12B active parameters and 120B parameters in total.Manual

Deployment Details:

Visit the NIM Container LLM page for release documentation, deployment guides, and more.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Container Version(s):

/nim/nvidia/nemotron-3-super-120b-a12b--turbo:1.0.0

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.

Get Help

Getting started with the NIM

Deploying and integrating the NIM is straightforward thanks to our industry standard APIs. Visit the NVIDIA NIM documentation for release documentation, deployment guides and more.

NVIDIA Developer Community Forum

For support, visit the NVIDIA Developer Community Forum.

Publisher
NVIDIA
NVIDIA
Latest Tag1.0.0
UpdatedJune 25, 2026 UTC
Compressed Size9.16 GB
Multinode SupportNo
Multi-Arch SupportYes

NVIDIA uses cookies to improve your experience on our web site. We and our third-party partners also use cookies and other tools to collect and record information you provide as well as information about your interactions with our websites for performance improvement, analytics, and to assist in marketing efforts. By clicking "Accept All", you consent to our use of cookies and other tools as described in our Cookie Policy. You can manage your cookie settings by clicking on "Manage Settings." By continuing to use this site or by clicking one of the buttons below, you agree to our Terms of Service (which contains important waivers). Please see our Privacy Policy for more information on our privacy practices.