Gemma-2B-FP16-RTX

Gemma-2B-FP16-RTX

Description
Gemma-2B is a 2.5B parameter model from Gemma family of models from Google. It has been instruction-tuned so it can respond to prompts in a conversation manner.
Publisher
Google
Latest Version
latest
Modified
March 7, 2024
Compressed Size
5.14 GB

Model Overview

Description:

Gemma-2B is a 2.5B parameter model from Gemma family of models from Google. It has been instruction-tuned so it can respond to prompts in a conversation manner. Nvidia has converted original Gemma weights and format into weight and format that can be consumed by Tensorrt-LLM.

Terms of use:

By accessing this model, you are agreeing to Gemma Terms of Use, Gemma Prohibited Use Policy .

References(s):

Input:

Input Format: Text

Input Parameters: None

Output:

Output Format: Text

Output Parameters: None

Software Integration:

Supported Hardware Platform(s): RTX 4090

Supported Operating System(s): Windows

Inference:

Windows Setup with TRT-LLM

TRT-LLM Inference Engine

Test Hardware:

RTX 4090