Terms of Use: By using this model, you are agreeing to the terms and conditions of the license.
Mistral-7B-v0.1 is a pretrained base generative text model developed by Mistral AI. Model details can be found here. This model is optimized through NVIDIA NeMo Framework, and is provided through a .nemo
checkpoint.
NeMo Framework offers support for various parameter-efficient fine-tuning (PEFT) methods for Mistral-7B-v0.1.
PEFT techniques allow customizing foundation models to improve performance on specific tasks.
Two of them, P-Tuning and Low-Rank Adaptation (LoRA), are supported out of the box for Mistral-7B-v0.1 and have been described in detail in NeMo Framework user guide, showing how to tune Mistral-7B-v0.1 to answer biomedical questions based on PubMedQA.
NeMo Framework offers Supervised fine-tuning (SFT) support for Mistral-7B-v0.1.
Fine-tuning refers to how one can modify the weights of a pre-trained foundation model with additional custom data. Supervised fine-tuning (SFT) refers to unfreezing all the weights and layers in tuned model and training on a newly labeled set of examples. One can fine-tune to incorporate new, domain-specific knowledge or teach the foundation model what type of response to provide. One specific type of SFT is also referred to as instruction tuning
where we use SFT to teach a model to follow instructions better.
NeMo Framework offers out-of-the-box SFT support for Mistral-7B-v0.1, which has been described in detail in NeMo Framework user guide, showing how to tune Mistral-7B-v0.1 to follow instructions based on databricks-dolly-15k.
Using TensorRT-LLM, NeMo Framework allows exporting Mistral-7B-v0.1 checkpoints to formats that are optimized for deployment on NVIDIA GPUs.
TensorRT-LLM is a library that allows building TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.
Thanks to that, it is possible to reach state-of-the-art performance using export and deployment methods that NVIDIA built for Mistral-7B-v0.1. This process has been described in detail in NeMo Framework user guide.
Detailed information, including performance results are available here.