GPU-optimized AI, Machine Learning, & HPC Software

Meta

Meta Llama 3.2 3B Instruct ONNX INT4 RTX

Built with Llama - The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned text-to-text generative models in 1B and 3B sizes.

Model

Mistral AI

Mistral-Nemo-12b-Instruct-Onnx-INT4-RTX

Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters. This model leads accuracy on popular benchmarks across common sense reasoning, coding, math, multilingual and multi-turn chat tasks.

Model

Meta

Meta-Llama3.1-8B-Instruct-ONNX-INT4-RTX

Built with Llama - Meta-Llama 3.1 8B Instruct INT4 ONNX model is the AWQ quantized version model, which is an auto-regressive language model that uses an optimized transformer architecture for multilingual dialogue use cases.

Model

Mistral AI

Nemotoron-Mini-4B-Instruct-ONNX-INT4-RTX

Nemotron-Mini-4B Instruct model is for generating responses for roleplaying, retrieval augmented generation, and function calling. It is a small language model optimized through distillation, pruning and quantization for speed and on-device deployment.

Model

Mistral AI

Mistral-7b-Instruct-v0.3-ONNX-INT4-RTX

The Mistral-7B-Instruct-v0.3 INT4 ONNX model is the quantized version of the Mistral-7B-Instruct-v0.3 model, which is an instruct fine-tuned version of the Mistral-7B-v0.3 model used for text generation and question answering.

Model

Microsoft

Phi-3.5-Mini-Instruct-ONNX-INT4-RTX

The NVIDIA Phi-3.5-mini-Instruct INT4 ONNX model is the quantized version of the Microsoft Phi-3.5-mini-Instruct model which has 3.8B parameters and is a dense decoder-only Transformer model using the same tokenizer as Phi-3 Mini.

Model

Google

Gemma-2b-Instruct-ONNX-INT4-RTX

The NVIDIA Gemma-2b-it INT4 ONNX model is the quantized version of the Google Gemma-2b-it model which is a text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.

Model