Terms of Use: By using this model, you are agreeing to the terms and conditions of the license.
Mixtral-8x7B-v0.1 is a pretrained Mixture of Experts generative text models developed by Mistral AI. Model details can be found here. This model is optimized through NVIDIA NeMo Framework, and is provided through a .nemo
checkpoint.
NeMo Framework offers support for various parameter-efficient fine-tuning (PEFT) methods for Mixtral 8x7B v1.0.
PEFT techniques allow customizing foundation models to improve performance on specific tasks.
Two of them, P-Tuning and Low-Rank Adaptation (LoRA), are supported out of the box for Mixtral 8x7B v1.0 and have been described in detail in NeMo Framework user guide, showing how to tune Mixtral 8x7B v1.0 to answer biomedical questions based on PubMedQA.
NeMo Framework offers Supervised fine-tuning (SFT) support for Mixtral 8x7B v1.0.
Fine-tuning refers to how one can modify the weights of a pre-trained foundation model with additional custom data. Supervised fine-tuning (SFT) refers to unfreezing all the weights and layers in tuned model and training on a newly labeled set of examples. One can fine-tune to incorporate new, domain-specific knowledge or teach the foundation model what type of response to provide. One specific type of SFT is also referred to as instruction tuning
where we use SFT to teach a model to follow instructions better.
NeMo Framework offers out-of-the-box SFT support for Mixtral 8x7B v1.0, which has been described in detail in NeMo Framework user guide, showing how to tune Mixtral 8x7B v1.0 to follow instructions based on databricks-dolly-15k.