NGC Catalog
CLASSIC
Welcome Guest
Models
Mixtral 8x7Bv0.1

Mixtral 8x7Bv0.1

For downloads and more information, please view on a desktop device.
Logo for Mixtral 8x7Bv0.1
Description
Mixtral-8x7B
Publisher
Mistral AI
Latest Version
1.0
Modified
November 12, 2024
Size
86.99 GB

Redistribution Information

NVIDIA Validated

  • Supported Runtime(s): TensorRT-LLM
  • Supported Hardware(s): Ampere, Hopper
  • Supported OS(s): Linux

Terms of Use: By using this model, you are agreeing to the terms and conditions of the license.

Mixtral 8x7B v1.0

Mixtral-8x7B-v0.1 is a pretrained Mixture of Experts generative text models developed by Mistral AI. Model details can be found here. This model is optimized through NVIDIA NeMo Framework, and is provided through a .nemo checkpoint.

Benefits of using Mixtral-8x7B-v0.1 checkpoints in NeMo Framework

P-Tuning and LoRA

NeMo Framework offers support for various parameter-efficient fine-tuning (PEFT) methods for Mixtral 8x7B v1.0.

PEFT techniques allow customizing foundation models to improve performance on specific tasks.

Two of them, P-Tuning and Low-Rank Adaptation (LoRA), are supported out of the box for Mixtral 8x7B v1.0 and have been described in detail in NeMo Framework user guide, showing how to tune Mixtral 8x7B v1.0 to answer biomedical questions based on PubMedQA.

Supervised Fine-tuning

NeMo Framework offers Supervised fine-tuning (SFT) support for Mixtral 8x7B v1.0.

Fine-tuning refers to how one can modify the weights of a pre-trained foundation model with additional custom data. Supervised fine-tuning (SFT) refers to unfreezing all the weights and layers in tuned model and training on a newly labeled set of examples. One can fine-tune to incorporate new, domain-specific knowledge or teach the foundation model what type of response to provide. One specific type of SFT is also referred to as instruction tuning where we use SFT to teach a model to follow instructions better.

NeMo Framework offers out-of-the-box SFT support for Mixtral 8x7B v1.0, which has been described in detail in NeMo Framework user guide, showing how to tune Mixtral 8x7B v1.0 to follow instructions based on databricks-dolly-15k.