In this notebook, we will use NVIDIA's NeMo framework to finetune the Mistral 7B LLM. Finetuning can be done using Brev quick deploy option.
Finetune Mistral 7B using NVIDIA NeMO and PEFT
Welcome!
In this notebook, we will use NVIDIA's NeMo Framework to finetune the Mistral 7B LLM. Finetuning the process of adjusting the weights of a pre-trained foundation model with custom data. Considering that foundation models can be significantly large, a variant of fine-tuning has gained traction recently, known as parameter-efficient fine-tuning (PEFT). PEFT encompasses several methods, including P-Tuning, LoRA, Adapters, and IA3.
For those interested in a deeper understanding of these methods, we have included a list of additional resources at the end of this document.
A note about running Jupyter Notebooks: Press Shift + Enter to run a cell. A * in the left-hand cell box means the cell is running. A number means it has completed. If your Notebook is acting weird, you can interrupt a too-long process by interrupting the kernel (Kernel tab -> Interrupt Kernel) or even restarting the kernel (Kernel tab -> Restart Kernel). Note restarting the kernel will require you to run everything from the beginning.
Deploy now
To streamline your experience and jump directly into a GPU-accelerated environment with this notebook and NeMo pre-installed, click the badge below. Our 1-click deploys are powered by Brev.dev.
NeMo Tools and Resources:
-
NVIDIA/NeMo-Megatron-Launcher: NeMo Megatron launcher and tools (github.com)
-
NeMo/examples/nlp/language_modeling/tuning at main · NVIDIA/NeMo · GitHub
Requirements:
Software:
- NeMo Framework Container, version 23.05 or later
- Docker
- NVIDIA AI Enterprise Product Support Matrix
Hardware:
- 1X A100 GPU, preferably 80GB
Prepare the base model
If you already have a .nemo file in your directory for the mistral models, you can skip this step.
Otherwise, run the following cells to download the model and convert it to NeMo format
Prepare Data
Next, we'll need to prepare the data that we're going to use for our LoRA fine tuning. Here we're going to be using the PubMedQA dataset, and we'll be training our model to respond with simple "yes" or "no" answers.
First let's download the data and divide it into train/validation/test splits
Now we can convert the PubMedQA data into the JSONL format that NeMo needs for Parameter Efficient Fine Tuning. We'll also reformat the data into prompts that our model can appropriately handle.
Here's an example of what the data looks like
Run Training
NeMo Framework uses config objects to control many of its operations, which allows you to quickly see what options you can change and carry out different experiments. We can start by downloading an example config file from github.
Now we'll read in this default config file with Hydra, and apply an override that enables the use of Megatron core.
To see all of the different configuration options available, you can take a look at the file we downloaded. For this example, we're going to update a couple of settings to point to our datasets and run LoRA tuning on our A100. Feel free to experiment with these different options!
For data our data configuration, we'll point to the JSONL files we wrote out earlier. concat_sampling_probabilities determines what percentage of the finetuning data you would like to come from each file -- in our example we only have 1 train file so we choose [1.0]
For our model settings, we don't have much to change since we're reading in a pretrained model. We need to point to our existing/converted .nemo file, specify that we want to use LoRA as our scheme for finetuning, and choose our parallelism and batch size values. The values below should be appropriate for a single A100 GPU.
Finally, we set some training specific options. We're training on 1 GPU on a single node at bfloat16 precision. For this example we'll also only train for 50 steps.
With our configurations set, we are ready to initialize our Trainer object to handle our training loop, and an experiment manager to handle checkpointing and logging. After initializing the Trainer object we can load our model from disk into memory.
Before training our adapter, let's see how the base model performs on the dataset
Now, let's add the LoRA Adapter and train it:
Finally, we can see how the newly finetuned model performs on the test data: