NVIDIA

Finetune Mistral 7B Using Brev.dev Quick Deploy

Resource

NVIDIA

Finetune Mistral 7B Using Brev.dev Quick Deploy

In this notebook, we will use NVIDIA's NeMo framework to finetune the Mistral 7B LLM. Finetuning can be done using Brev quick deploy option.

Back to File Browser

mistral_nemo_finetune.ipynb

Finetune Mistral 7B using NVIDIA NeMO and PEFT

Welcome!

In this notebook, we will use NVIDIA's NeMo Framework to finetune the Mistral 7B LLM. Finetuning the process of adjusting the weights of a pre-trained foundation model with custom data. Considering that foundation models can be significantly large, a variant of fine-tuning has gained traction recently, known as parameter-efficient fine-tuning (PEFT). PEFT encompasses several methods, including P-Tuning, LoRA, Adapters, and IA3.

For those interested in a deeper understanding of these methods, we have included a list of additional resources at the end of this document.

A note about running Jupyter Notebooks: Press Shift + Enter to run a cell. A * in the left-hand cell box means the cell is running. A number means it has completed. If your Notebook is acting weird, you can interrupt a too-long process by interrupting the kernel (Kernel tab -> Interrupt Kernel) or even restarting the kernel (Kernel tab -> Restart Kernel). Note restarting the kernel will require you to run everything from the beginning.

Deploy now

To streamline your experience and jump directly into a GPU-accelerated environment with this notebook and NeMo pre-installed, click the badge below. Our 1-click deploys are powered by Brev.dev.

NeMo Tools and Resources:

Requirements:

Software:

NeMo Framework Container, version 23.05 or later
Docker
NVIDIA AI Enterprise Product Support Matrix

Hardware:

1X A100 GPU, preferably 80GB

"https://console.brev.dev/notebook/trtmistral1?cuda=undefined&python=undefined&diskStorage=256&name=nvidia-tensorrt-mistral&instance=A10G:g5.xlarge&baseImage=nvcr.io/nvidia/tensorrt:24.01-py3"

Prepare the base model

If you already have a .nemo file in your directory for the mistral models, you can skip this step.

Otherwise, run the following cells to download the model and convert it to NeMo format

In [ ]:

!pip install ipywidgets
!jupyter nbextension enable --py widgetsnbextension

In [ ]:

!mkdir -p models/mistral7b

In [ ]:

import huggingface_hub

huggingface_hub.snapshot_download(repo_id="mistralai/Mistral-7B-v0.1", local_dir="models/mistral7b", local_dir_use_symlinks=False)

In [ ]:

!python /opt/NeMo/scripts/nlp_language_modeling/convert_hf_mistral_7b_to_nemo.py --in-file=models/mistral7b --out-file=models/mistral7b.nemo

Prepare Data

Next, we'll need to prepare the data that we're going to use for our LoRA fine tuning. Here we're going to be using the PubMedQA dataset, and we'll be training our model to respond with simple "yes" or "no" answers.

First let's download the data and divide it into train/validation/test splits

In [ ]:

!git clone https://github.com/pubmedqa/pubmedqa.git
!cd pubmedqa/preprocess && python split_dataset.py pqal

Now we can convert the PubMedQA data into the JSONL format that NeMo needs for Parameter Efficient Fine Tuning. We'll also reformat the data into prompts that our model can appropriately handle.

In [ ]:

import json

def write_jsonl(fname, json_objs):
    with open(fname, 'wt') as f:
        for o in json_objs:
            f.write(json.dumps(o)+"\n")

def form_question(obj):
    st = ""
    st += f"QUESTION:{obj['QUESTION']}\n"
    st += "CONTEXT: "
    for i, label in enumerate(obj['LABELS']):
        st += f"{obj['CONTEXTS'][i]}\n"
    st += f"TARGET: the answer to the question given the context is (yes|no|maybe): "
    return st

def convert_to_jsonl(data_path, output_path):
    data = json.load(open(data_path, 'rt'))
    json_objs = []
    for k in data.keys():
        obj = data[k]
        prompt = form_question(obj)
        completion = obj['reasoning_required_pred']
        json_objs.append({"input": prompt, "output": completion})
    write_jsonl(output_path, json_objs)
    return json_objs

In [ ]:

test_json_objs = convert_to_jsonl("pubmedqa/data/test_set.json", "pubmedqa_test.jsonl")
train_json_objs = convert_to_jsonl("pubmedqa/data/pqal_fold0/train_set.json", "pubmedqa_train.jsonl")
dev_json_objs = convert_to_jsonl("pubmedqa/data/pqal_fold0/dev_set.json", "pubmedqa_val.jsonl")

Here's an example of what the data looks like

In [ ]:

test_json_objs[0]

Run Training

NeMo Framework uses config objects to control many of its operations, which allows you to quickly see what options you can change and carry out different experiments. We can start by downloading an example config file from github.

In [ ]:

!wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/nlp/language_modeling/tuning/conf/megatron_gpt_finetuning_config.yaml

Now we'll read in this default config file with Hydra, and apply an override that enables the use of Megatron core.

In [ ]:

import hydra
from omegaconf.omegaconf import OmegaConf

hydra.initialize(version_base=None, config_path=".")

In [ ]:

cfg = hydra.compose(config_name="megatron_gpt_finetuning_config", overrides=['++model.mcore_gpt=True'])

To see all of the different configuration options available, you can take a look at the file we downloaded. For this example, we're going to update a couple of settings to point to our datasets and run LoRA tuning on our A100. Feel free to experiment with these different options!

For data our data configuration, we'll point to the JSONL files we wrote out earlier. concat_sampling_probabilities determines what percentage of the finetuning data you would like to come from each file -- in our example we only have 1 train file so we choose [1.0]

In [ ]:

OmegaConf.update(cfg, "model.data", {
  "train_ds": {
      "num_workers": 0,
      "file_names": ["pubmedqa_train.jsonl"],
      "concat_sampling_probabilities": [1.0]
  },
  "validation_ds": {
      "num_workers": 0,
      "file_names": ["pubmedqa_val.jsonl"]
  },
  "test_ds": {
    "file_names": ["pubmedqa_test.jsonl"],
    "names": ["pubmedqa"]
  }
}, merge=True)

For our model settings, we don't have much to change since we're reading in a pretrained model. We need to point to our existing/converted .nemo file, specify that we want to use LoRA as our scheme for finetuning, and choose our parallelism and batch size values. The values below should be appropriate for a single A100 GPU.

In [ ]:

OmegaConf.update(cfg, "model", {
    "restore_from_path": "models/mistral7b.nemo",
    "peft": {
        "peft_scheme": "lora"
    },
    "tensor_model_parallel_size": 1,
    "pipeline_model_parallel_size": 1,
    "micro_batch_size": 1,
    "global_batch_size": 8,
}, merge=True)

Finally, we set some training specific options. We're training on 1 GPU on a single node at bfloat16 precision. For this example we'll also only train for 50 steps.

In [ ]:

OmegaConf.update(cfg, "trainer", {
    'devices': 1,
    'num_nodes': 1,
    'precision': "bf16-mixed",
    "val_check_interval": 10,
    "max_steps": 20
})

With our configurations set, we are ready to initialize our Trainer object to handle our training loop, and an experiment manager to handle checkpointing and logging. After initializing the Trainer object we can load our model from disk into memory.

In [ ]:

from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel
from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder
from nemo.collections.nlp.parts.peft_config import LoraPEFTConfig
from nemo.utils.exp_manager import exp_manager

trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer()
exp_manager(trainer, cfg.exp_manager)

model_cfg = MegatronGPTSFTModel.merge_cfg_with(cfg.model.restore_from_path, cfg)
model = MegatronGPTSFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer)

Before training our adapter, let's see how the base model performs on the dataset

In [ ]:

trainer.test(model)

Now, let's add the LoRA Adapter and train it:

In [ ]:

model.add_adapter(LoraPEFTConfig(model_cfg))
trainer.fit(model)

Finally, we can see how the newly finetuned model performs on the test data:

In [ ]:

trainer.test(model)