Linux / amd64
NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to speed up generative AI deployment in enterprises. Supporting a wide range of AI models, including NVIDIA AI foundation and custom models, it ensures seamless, scalable AI inferencing, on-premises or in the cloud, leveraging industry standard APIs.
Qwen3-235B-A22B-FP8 is the FP8 version of Qwen3-235B-A22B. Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. It uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within a single model, ensuring optimal performance across various scenarios. The model shows significant enhancement in its reasoning capabilities, surpassing previous versions on mathematics, code generation, and commonsense logical reasoning. It also demonstrates superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Furthermore, it has expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Qwen3-235B-A22B supports over 100 languages and dialects with strong capabilities for multilingual instruction following and translation.
NVIDIA NIM offers prebuilt containers for large language models (LLMs) that can be used to develop chatbots, content analyzers—or any application that needs to understand and generate human language. Each NIM consists of a container and a model and uses a CUDA-accelerated runtime for all NVIDIA GPUs, with special optimizations available for many configurations. Whether on-premises or in the cloud, NIM is the fastest way to achieve accelerated generative AI inference at scale.
NVIDIA NIM for LLMs abstracts away model inference internals such as execution engine and runtime operations. NVIDIA NIM for LLMs provides the most performant option available whether it be with TRT-LLM, vLLM or others.
Deploying and integrating NVIDIA NIM is straightforward thanks to our industry standard APIs. Visit the NIM Container LLM page for release documentation, deployment guides and more.
Please review the Security Scanning (LINK) tab to view the latest security scan results.
For certain open-source vulnerabilities listed in the scan results, NVIDIA provides a response in the form of a Vulnerability Exploitability eXchange (VEX) document. The VEX information can be reviewed and downloaded from the Security Scanning (LINK) tab.
Get access to knowledge base articles and support cases or submit a ticket.
Visit the NIM Container LLM page for release documentation, deployment guides and more.
The NIM container is governed by the NVIDIA Software License Agreement; and the Product Specific Terms for AI Products; and the use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement.
You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.