Llama-3.2-11B-Vision-Instruct

NVIDIA

Container

NVIDIA

Llama-3.2-11B-Vision-Instruct

The Llama 3.2 Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

NVIDIA Developer Program NVIDIA AI Enterprise

NVIDIA AI Enterprise Supported NVIDIA NIM

Join or Subscribe to get accessSubscribe to the product below to access this premium content:

NVIDIA Developer ProgramJoin the Developer Program for access to free tools, support, and tech resources.

Get Access

NVIDIA AI EnterpriseAccelerate your AI agent development

Subscribe Now

Note: You can gain access to hundreds more GPU-optimized artifacts by creating a free NGC account.

Already Subscribed?Log in

What Is NVIDIA NIM?

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to speed up generative AI deployment in enterprises. Supporting a wide range of AI models, including NVIDIA AI foundation and custom models, it ensures seamless, scalable AI inferencing, on-premises or in the cloud, leveraging industry standard APIs.

NVIDIA NIM for Vision Language Models (VLMs) (NVIDIA NIM for VLMs) brings the power of state-of-the-art vision language models (VLMs) to enterprise applications, providing unmatched natural language and multimodal understanding capabilities.

NIM makes it easy for IT and DevOps teams to self-host vision language models (VLMs) in their own managed environments while still providing developers with industry-standard APIs that allow them to build powerful copilots, chatbots, and AI assistants that can transform their business. Leveraging NVIDIA’s cutting-edge GPU acceleration and scalable deployment, NIM offers the fastest path to inference with unparalleled performance.

High Performance Features

NVIDIA NIM for VLMs abstracts away model inference internals such as execution engine and runtime operations. NVIDIA NIM for VLMs provides the most performant option available whether it be with TRT-LLM, vLLM or others. NIM offers the following high-performance features:

Scalable Deployment that is performant and can quickly and seamlessly scale from a few users to millions.
Advanced Vision Language Model support with pre-generated optimized engines for a diverse range of cutting-edge VLM architectures.
Flexible Integration to easily incorporate the microservice into existing workflows and applications. Developers are provided an OpenAI API-compatible programming model and custom NVIDIA extensions for additional functionality.
Enterprise-Grade Security emphasizes security by using safetensors, constantly monitoring and patching CVEs in our stack and conducting internal penetration tests.

Applications

Image Q&A: Empower bots with visual understanding besides human-like language understanding and responsiveness
Image summarization: Generate summaries based on image understanding
Image description: Empower bots to describe the content of an image and engage in multi-turn conversations
Charts and diagram understanding: Generate descriptions of charts, tables, and diagrams present in an image

And many more… The potential applications of NIM are vast, spanning across various industries and use cases.

Getting started with NVIDIA NIM

Deploying and integrating NVIDIA NIM is straightforward thanks to our industry standard APIs. Visit the NIM Container VLM page for release documentation, deployment guides and more.

Security Vulnerabilities in Open Source Packages

Please review the Security Scanning (LINK) tab to view the latest security scan results.

For certain open-source vulnerabilities listed in the scan results, NVIDIA provides a response in the form of a Vulnerability Exploitability eXchange (VEX) document. The VEX information can be reviewed and downloaded from the Security Scanning (LINK) tab.

Get Help

Enterprise Support

Get access to knowledge base articles and support cases or submit a ticket.

NVIDIA NIM Documentation

Visit the NIM Container LLM page for release documentation, deployment guides and more.

Governing Terms

The NIM container is governed by the NVIDIA Software License Agreement; and the Product Specific Terms for AI Products; and the use of this model is governed by the [NVIDIA AI Foundation Models Community License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/#:~:text=This%20license%20agreement%20(%E2%80%9CAgreement%E2%80%9D,algorithms%2C%20parameters%2C%20configuration%20files%2C). ADDITIONAL INFORMATION: Llama 3.2 Community License Agreement, Built with Llama.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

Publisher

NVIDIA

Latest Tag1.1.1

UpdatedJanuary 17, 2025 UTC

Compressed Size7.66 GB

Multinode SupportNo

Multi-Arch SupportNo

System

signed images

Labels

Automotive / Transportation Computer Vision Image Segmentation NSPECT-548C-BVKF Question Answering Vision AI