Supported Runtime(s): TensorRT-LLM Supported Hardware(s): Ampere, Hopper Supported OS(s): Linux Terms of Use: By using this model, you are agreeing to the terms and conditions of the license.
Mistral-7B-v0.3 is an instruction-tuned generative text model developed by Mistral AI. Model details can be found here. This model is optimized through NVIDIA NeMo Framework, and is provided through a .nemo checkpoint.
While the examples below refer to Mistral-7B v0.1, they are directly compatible with Mistral-7B Instruct v0.3.
NeMo Framework offers support for various parameter-efficient fine-tuning (PEFT) methods for Mistral-7B Instruct v0.3.
PEFT techniques allow customizing foundation models to improve performance on specific tasks.
Two of them, P-Tuning and Low-Rank Adaptation (LoRA), are supported out of the box for Mistral-7B Instruct v0.3 and have been described in detail in NeMo Framework user guide, showing how to tune Mistral-7B-v0.1 to answer biomedical questions based on PubMedQA.
NeMo Framework offers Supervised fine-tuning (SFT) support for Mistral-7B Instruct v0.3.
Fine-tuning refers to how one can modify the weights of a pre-trained foundation model with additional custom data. Supervised fine-tuning (SFT) refers to unfreezing all the weights and layers in tuned model and training on a newly labeled set of examples. One can fine-tune to incorporate new, domain-specific knowledge or teach the foundation model what type of response to provide.
NeMo Framework offers out-of-the-box SFT support for Mistral-7B Instruct v0.3, which has been described in detail for Mistral-7B v0.1 in NeMo Framework user guide, but is directly compatible with Mistral-7B Instruct v0.3.
Using TensorRT-LLM, NeMo Framework allows exporting Mistral-7B Instruct v0.3 checkpoints to formats that are optimized for deployment on NVIDIA GPUs.
TensorRT-LLM is a library that allows building TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.
Thanks to that, it is possible to reach state-of-the-art performance using export and deployment methods that NVIDIA built for Mistral-7B Instruct v0.3. This process has been described in detail in NeMo Framework user guide.
Detailed information, including performance results are available here.