GPU-optimized AI, Machine Learning, & HPC Software

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University

ChatGLM3-6B is the latest open-source model in the ChatGLM series. ChatGLM3-6B introduces the following features (1) More Powerful Base Model (2) More Comprehensive Function Support (3) More Comprehensive Open-source Series.

Model

Meta

LlaMa2-7B Chat Int4

LlaMa 2 is a large language AI model capable of generating text and code in response to prompts.

Model

NVIDIA

RT-DETR 2D Warehouse

RT-DETR object detection model for 2D warehouse applications

Model

Meta

Llama2-13b Chat Int4

LlaMa 2 is a large language AI model capable of generating text and code in response to prompts.

Model

Mistral AI

Mistral-7B Chat Int4

The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.

Model

Meta

Code Llama 13B

Code Llama is a code-specialized version of Llama 2. It can generate code, and natural language about code, from both code and natural language prompts.

Model

Meta

Llama3-8B Instruct Int4

Built with Meta Llama 3 - Meta Llama 3 family of large language models (LLMs) is a collection of pretrained and instruction tuned generative text models in 8B and 70B sizes.

Model

—

Whisper ASR GGUF for Nv IGI SDK

Whisper ASR GGUF for Nv IGI SDK ASR plugin

Model

—

OpenVoice

A collection of models to enable OpenVoice support for the NVIDIA In-Game Inferencing (NVIGI) SDK.

Model

Meta

Meta Llama3.2 3B Instruct ONNX INT4 RTX TensorRT Model Optimizer

Meta Llama 3.2 3B Instruct INT4 ONNX model is the quantized version of the Meta Llama-3.2-3B-Instruct model, which is an auto-regressive language model that uses an optimized transformer architecture.

Model

Meta

Meta Llama 3.2 3B Instruct ONNX INT4 RTX

Built with Llama - The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned text-to-text generative models in 1B and 3B sizes.

Model

Microsoft

Phi-3-mini-128k Instruct Int4 RTX

The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with high-quality properties.

Model

Microsoft

Phi-3-mini-4k Instruct Int4 RTX

The Phi-3-Mini-4K is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality properties.

Model

Mistral AI

Mistral-Nemo-12b-Instruct-Onnx-INT4-RTX

Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters. This model leads accuracy on popular benchmarks across common sense reasoning, coding, math, multilingual and multi-turn chat tasks.

Model

Meta

Meta-Llama3.1-8B-Instruct-ONNX-INT4-RTX

Built with Llama - Meta-Llama 3.1 8B Instruct INT4 ONNX model is the AWQ quantized version model, which is an auto-regressive language model that uses an optimized transformer architecture for multilingual dialogue use cases.

Model

Microsoft

Phi-3-Medium-128k Instruct Int4 RTX

The Phi-3-Medium-128K-Instruct is a 14B parameters, lightweight, open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties.

Model

Mistral AI

Nemotoron-Mini-4B-Instruct-ONNX-INT4-RTX

Nemotron-Mini-4B Instruct model is for generating responses for roleplaying, retrieval augmented generation, and function calling. It is a small language model optimized through distillation, pruning and quantization for speed and on-device deployment.

Model

—

e5-large-unsupervised GGUF for Nv IGI SDK

e5-large-unsupervised GGUF for Nv IGI SDK Embed plugin

Model

Google

Gemma-7B-FP16-RTX

Gemma-7B is a 7B parameter model from Gemma family of models from Google. It has been instruction-tuned so it can respond to prompts in a conversation manner.

Model

Google

Gemma-2B-INT4-RTX

Gemma-2B is a 2.5B parameter model from Gemma family of models from Google. It has been instruction-tuned so it can respond to prompts in a conversation manner.

Model

Meta

Meta Llama 3.1 8b Instruct ONNX INT4 RTX

Built with Meta Llama 3.1 - The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained, and instruction tuned generative models in 8B, 70B and 405B sizes.

Model

Mistral AI

Mistral-7b-Instruct-v0.3-ONNX-INT4-RTX

The Mistral-7B-Instruct-v0.3 INT4 ONNX model is the quantized version of the Mistral-7B-Instruct-v0.3 model, which is an instruct fine-tuned version of the Mistral-7B-v0.3 model used for text generation and question answering.

Model

Microsoft

Phi-3.5-Mini-Instruct-ONNX-INT4-RTX

The NVIDIA Phi-3.5-mini-Instruct INT4 ONNX model is the quantized version of the Microsoft Phi-3.5-mini-Instruct model which has 3.8B parameters and is a dense decoder-only Transformer model using the same tokenizer as Phi-3 Mini.

Model

Google

CodeGemma-7B-IT-INT4-RTX

CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text decoder-only models. This is a 7 billion parameter instruction-tuned varient for code chat and instruction.

Model