The LLM Benchmarking Collection provides an easy path to reproduce the latest performance results for deep learning workloads.
The Resource for each workload contains software and hardware configuration information plus the scripts necessary to obtain performance results. On the overview page for each workload, you can also find target performance results for the given configuration.
Currently, the LLM Benchmarking Collection focuses on measuring speed metrics like time per training step and tokens per second.
Workload | Type | Description | Version | Dataset | Max Scale | DTYPE |
---|---|---|---|---|---|---|
Nemotron4 | Training | 15B and 340B benchmarks | 24.05 | Synthetic | 2048 | FP8, BF16 |
Nemo Megatron | Training | 175B benchmarks | 24.05 | Pile | 2048 | FP8, BF16 |
Llama 2 | Training | 7B and 70B benchmarks | 24.03.01 | Pile | 2048 | FP8, BF16 |
Llama 3 | Training | 8B and 70B benchmarks | 24.05 | Pile | 2048 | FP8, BF16 |
Llama 2 | Fine Tuning | Hugging Face 70B benchmarks | 24.02 | HF Llama2 | 512 | BF16 |
Mistral | Fine Tuning | Hugging Face 7B benchmarks | 24.02 | HF Mistral | 256 | BF16 |
Baseline performance was obtained by running these workloads on NVIDIA DGX H100 Reference Architecture - EOS.
The benchmarks are created on the NVIDIA Reference Architecture, the following changes are needed for optimal performance on each CSP
Instructions on adding EFA support:
GCP utilizes TCP-X for the compute fabric. Ensure the following variables are correct and present for your environment.
NCCL_LIB_DIR='/var/lib/tcpxo/lib64' source /var/lib/tcpxo/lib64/nccl-env-profile.sh; \
export NCCL_FASTRAK_CTRL_DEV=enp0s12; \
export NCCL_FASTRAK_IFNAME=enp6s0,enp7s0,enp13s0,enp14s0,enp134s0,enp135s0,enp141s0,enp142s0; \
export NCCL_SOCKET_IFNAME=enp0s12; \
export NCCL_FASTRAK_LLCM_DEVICE_DIRECTORY=/dev/aperture_devices; \
export NCCL_NET=FasTrak; \
ls /var/lib/tcpxo/lib64;"
Since Nemo Megatron, Paxml, HF Llama2 and HF Mistral [only] are using older containers that do not have an Azure topology file, these workloads will fail to run. To solve this, apply the following steps:
NCCL_TOPO_FILE=<full path to topo>
--container-env=NCCL_TOPO_FILE
flag to override any container settings.Example from Nemo Megatron Launcher:
export NCCL_TOPO_FILE=/opt/microsoft/topo.xml # Exact location varies by cluster
srun --container-image ${IMAGE} \
--container-writable \
--container-mounts ${NCCL_TOPO_FILE},${DATA_DIR}:/datasets/,${RESULT_DIR},$INDEX_MAPPING_DIR,${STAGE_PATH}/cfg:/cfg/ \
--container-env=NCCL_TOPO_FILE \
--no-container-mount-home
<snip> ...
For questions or to provide feedback, please contact LLMBenchmarks@nvidia.com