Built with Llama - The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned text-to-text generative models in 1B and 3B sizes . The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.
This model is ready for commercial use.
Llama 3.2 3B Instruct model is quantized to AWQ INT4 using AutoAWQ and converted to ONNX using Onnxruntime-GenAI.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the Llama 3.2 Model Card.
This model is governed by the NVIDIA AI Foundation Models Community License Agreement . Additional Information Llama 3.2 Community License Agreement, Built with Llama Acceptable Use Policy
Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Architecture Type: Large Language Model (LLM)
Network Architecture: llama
Refer to Llama 3.2 Model Card for the details.
Input Type: Text and Code
Input Format: String
Input Parameters: Temperature, TopP
Other Properties Related to Input: Supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
Output Type(s): Text and Code
Output Format: String
Output Parameters: Max output tokens
Runtime(s): Not Applicable
Supported Hardware Platform(s): RTX 4090, 6GB or higher VRAM gpus are recommended. Higher VRAM may be required for larger context length use cases.
Supported Operating System(s): Windows
Model Version(s): 1.0
Inference Backend: Onnxruntime-GenAI-DirectML
(Note: please refer to ReadMe.txt for the detailed instructions.)
MMLU (5# shots): 58.58
Link: https://people.eecs.berkeley.edu/~hendrycks/data.tar .
Data Collection Method by dataset = Unknown
Labeling Method by dataset = Not Applicable
Test configuration:
GPU: RTX 4090.
Windows 11: 23H2
NVIDIA Graphics driver: R560 or higher