Mistral-7B-v0.1 | NVIDIA NGC

NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

Mistral-7B-v0.1 is a pretrained base generative text model developed by Mistral AI.

Publisher

Mistral AI

Latest Version

1.0

Modified

November 12, 2024

Size

26.98 GB

Redistribution Information

Supported Runtime(s): TensorRT-LLM
Supported Hardware(s): Ampere, Hopper
Supported OS(s): Linux

Terms of Use: By using this model, you are agreeing to the terms and conditions of the license.

Mistral 7B v0.1

Mistral-7B-v0.1 is a pretrained base generative text model developed by Mistral AI. Model details can be found here. This model is optimized through NVIDIA NeMo Framework, and is provided through a .nemo checkpoint.

Benefits of using Mistral-7B-v0.1 checkpoints in NeMo Framework

P-Tuning and LoRA

NeMo Framework offers support for various parameter-efficient fine-tuning (PEFT) methods for Mistral-7B-v0.1.

PEFT techniques allow customizing foundation models to improve performance on specific tasks.

Two of them, P-Tuning and Low-Rank Adaptation (LoRA), are supported out of the box for Mistral-7B-v0.1 and have been described in detail in NeMo Framework user guide, showing how to tune Mistral-7B-v0.1 to answer biomedical questions based on PubMedQA.

Supervised Fine-tuning

NeMo Framework offers Supervised fine-tuning (SFT) support for Mistral-7B-v0.1.

Fine-tuning refers to how one can modify the weights of a pre-trained foundation model with additional custom data. Supervised fine-tuning (SFT) refers to unfreezing all the weights and layers in tuned model and training on a newly labeled set of examples. One can fine-tune to incorporate new, domain-specific knowledge or teach the foundation model what type of response to provide. One specific type of SFT is also referred to as instruction tuning where we use SFT to teach a model to follow instructions better.

NeMo Framework offers out-of-the-box SFT support for Mistral-7B-v0.1, which has been described in detail in NeMo Framework user guide, showing how to tune Mistral-7B-v0.1 to follow instructions based on databricks-dolly-15k.

Optimized Deployment with TensorRT-LLM

Using TensorRT-LLM, NeMo Framework allows exporting Mistral-7B-v0.1 checkpoints to formats that are optimized for deployment on NVIDIA GPUs.

TensorRT-LLM is a library that allows building TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.

Thanks to that, it is possible to reach state-of-the-art performance using export and deployment methods that NVIDIA built for Mistral-7B-v0.1. This process has been described in detail in NeMo Framework user guide.

Detailed information, including performance results are available here.