This recipe contains information and scripts to produce performance results for the Mistral Hugging Face fine-tuning training workload using PEFT and FSDP. The scripts help perform environment setup, dataset setup, and launch benchmark jobs. This variant of the workload is best-suited for GPU clusters with:
Performance for HF Mistral fine tuning is measured by train samples per second, which is logged in the .out file associated with the job.
grep train_samples_per_second log-hf_mistral_7b_32_656947.out
{'train_runtime': 2950.1412, 'train_samples_per_second': 555.363, 'train_steps_per_second': 0.034, 'train_loss': 1.0721950674057006, 'epoch': 6.25}
Mistral 7b 24.02 BF16 | Train samples per second on 8x H100 GPUs | Train samples per second on 16x H100 GPUs | Train samples per second on 32x H100 GPUs | Train samples per second on 64x H100 GPUs | Train samples per second on 128x H100 GPUs | Train samples per second on 256x H100 GPUs |
---|---|---|---|---|---|---|
Training samples per second | 16.287 | 37.895 | 81.626 | 161.273 | 308.95 | 555.363 |
This recipe requires access to Hugging Face Mistral. Instructions are below if needed.
Create a staging area by running the setup.sh script. The script converts the docker image from nvcr.io/nvidia/pytorch:24.02-py3 to the nvidia+pytorch+24.02.sqsh file under the $STAGE_PATH folder and downloads DHS-LLM workshop source code.
# Set the path where all artifacts will be downloaded
export STAGE_PATH=<path to your shared file system folder> (e.g. /lustre/myproject/<userid>)
# Set the Slurm partition to use
export SLURM_PARTITION="batch"
# Set the Slurm account to use
export SLURM_ACCOUNT="account_name"
# Run the setup
bash ./setup.sh
Access to Mistral 7B must be requested on Hugging Face Mistral 7B.
To download the model and dataset you will need to create a Hugging Face access token with READ privileges. You will use your HF user name and access token as the user/password for the git clones. For more information see: https://huggingface.co/docs/hub/en/security-tokens
Note: Cloning the model can take well over an hour, and you will be prompted twice for user/password. After the second prompt it'll appear as if it's hung.
cd $STAGE_PATH
# Only needs to be performed once
git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-v0.1
If the model download step was successful there should these files in the $STAGE_PATH/Mistral-7B-v0.1 folder.
README.md config.json generation_config.json model-00001-of-00002.safetensors model-00002-of-00002.safetensors model.safetensors.index.json pytorch_model-00001-of-00002.bin pytorch_model-00002-of-00002.bin pytorch_model.bin.index.json special_tokens_map.json tokenizer.json tokenizer.model tokenizer_config.json
git clone https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k
If the dataset clone step was successful there should these files in the $STAGE_PATH/ultrachat_200k/data folder
test_gen-00000-of-00001-3d4cd8309148a71f.parquet test_sft-00000-of-00001-f7dfac4afe5b93f4.parquet train_gen-00000-of-00003-a6c9fb894be3e50b.parquet train_gen-00001-of-00003-d6a0402e417f35ca.parquet train_gen-00002-of-00003-c0db75b92a2f48fd.parquet train_sft-00000-of-00003-a3ecf92756993583.parquet train_sft-00001-of-00003-0a1804bcb6ae68c6.parquet train_sft-00002-of-00003-ee46ed25cfae92c6.parquet
Once the environment has been prepared, it is time to train a model. Run the launch_7b.sh script with sbatch for launching Hugging Face MISTRAL 7b model training on 1 to 32 nodes with BF16 precision.
Log files will be located under ${STAGE_PATH}/results/$GSW_VERSION/bf16/7b/$JOB_TOTAL_GPUS
.
sbatch -A ${SLURM_ACCOUNT} -p ${SLURM_PARTITION} -N ${NUM_NODES} ./launch_7b.sh
Where:
NUM_NODES
can be calculate by N_GPUS / N_GPUS_PER_NODE
, N_GPUS_PER_NODE
is 8 for DGX H100, therefore for 256 GPUs scale, NUM_NODES
should be 256 / 8 = 32
.Note: that it might be necessary to pass --gres=gpu:8
to sbatch for certain clusters on encountering errors like GPU not found. See https://slurm.schedmd.com/gres.html
accelerate launches on every node and pip install requirements.txt is run as part of srun command to ensure compute nodes have same environment. PYTHONPATH is set for this.