SearchSearch thousands of GPU-optimized Containers, pretrained Models, SDKs, and Helm charts—ready to accelerate AI, digital twins, and HPC from cloud to edge.
NVIDIA Enterprise
NVIDIA Enterprise
NVIDIA NIM
NVIDIA NIM
NIM Container GPUs
NIM Container GPUs
Use Case
Use Case
NVIDIA Platform
NVIDIA Platform
8
Industry
Industry
Solution
Solution
8
Publisher
Publisher
3
3
1
1
Policy
Policy
Displaying 8 results
Meta Llama 3.2 3B Instruct INT4 ONNX model is the quantized version of the Meta Llama-3.2-3B-Instruct model, which is an auto-regressive language model that uses an optimized transformer architecture.
Model
Built with Llama - The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned text-to-text generative models in 1B and 3B sizes.
Model
Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters. This model leads accuracy on popular benchmarks across common sense reasoning, coding, math, multilingual and multi-turn chat tasks.
Model
Built with Llama - Meta-Llama 3.1 8B Instruct INT4 ONNX model is the AWQ quantized version model, which is an auto-regressive language model that uses an optimized transformer architecture for multilingual dialogue use cases.
Model
Nemotron-Mini-4B Instruct model is for generating responses for roleplaying, retrieval augmented generation, and function calling. It is a small language model optimized through distillation, pruning and quantization for speed and on-device deployment.
Model
The Mistral-7B-Instruct-v0.3 INT4 ONNX model is the quantized version of the Mistral-7B-Instruct-v0.3 model, which is an instruct fine-tuned version of the Mistral-7B-v0.3 model used for text generation and question answering.
Model
The NVIDIA Phi-3.5-mini-Instruct INT4 ONNX model is the quantized version of the Microsoft Phi-3.5-mini-Instruct model which has 3.8B parameters and is a dense decoder-only Transformer model using the same tokenizer as Phi-3 Mini.
Model
The NVIDIA Gemma-2b-it INT4 ONNX model is the quantized version of the Google Gemma-2b-it model which is a text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.
Model

NVIDIA uses cookies to improve your experience on our web site. We and our third-party partners also use cookies and other tools to collect and record information you provide as well as information about your interactions with our websites for performance improvement, analytics, and to assist in marketing efforts. By clicking "Accept All", you consent to our use of cookies and other tools as described in our Cookie Policy. You can manage your cookie settings by clicking on "Manage Settings." By continuing to use this site or by clicking one of the buttons below, you agree to our Terms of Service (which contains important waivers). Please see our Privacy Policy for more information on our privacy practices.