NVIDIA NIM for Vision Language Models
NVIDIA NIM for Vision Language Models (VLMs) (NVIDIA NIM for VLMs) brings the power of state-of-the-art vision language models (VLMs) to enterprise applications, providing unmatched natural language and multimodal understanding capabilities.
NIM makes it easy for IT and DevOps teams to self-host vision language models (VLMs) in their own managed environments while still providing developers with industry standard APIs that allow them to build powerful copilots, chatbots, and AI assistants that can transform their business. Leveraging NVIDIA's cutting-edge GPU acceleration and scalable deployment, NIM offers the fastest path to inference with unparalleled performance.
High Performance Features
NIM abstracts away model inference internals such as execution engine and runtime operations. They are also the most performant option available whether it be with TRT-LLM, vLLM or others. NIM offers the following high performance features:
Scalable Deployment that is performant and can easily and seamlessly scale from a few users to millions.
Advanced Vision Language Model support with pre-generated optimized engines for a diverse range of cutting edge VLM architectures.
Flexible Integration to easily incorporate the microservice into existing workflows and applications. Developers are provided with an OpenAI API compatible programming model and custom NVIDIA extensions for additional functionality.
Enterprise-Grade Security emphasizes security by using safetensors, constantly monitoring and patching CVEs in our stack and conducting internal penetration tests.
Model Overview
Description:
The Google DePlot model is a one-shot visual language understanding solution that translates images of plots or charts into linearized tables.
Terms of use
By using this model, you are agreeing to the terms and conditions of the license, acceptable use policy and Google Research privacy policy.
References(s):
DePlot paper DePlot on HuggingFace Model Architecture:
Architecture Type: Transformer Network Architecture: Pix2Struct
Input:
Input Format: Red, Green, Blue (RGB) Image + Text Input Parameters: None Other Properties Related to Input: None
Output:
Output Format: Text Output Parameters: temperature, top_p, max_tokens Other Properties Related to Output: stream
Supported Operating System(s):
Linux
Getting started with NVIDIA NIM
Deploying and integrating NVIDIA NIM is straightforward thanks to our industry standard APIs. Download the Getting Started Documentation
Evaluation Only
This is an Early Access version of the NIM and is only for testing and evaluation purposes only. This is NOT for Production Deployment.
Governing Terms
The Early Access NIM container is governed by the NVIDIA Evaluation License. ADDITIONAL INFORMATION: Deplot Apache 2.0 model license.