Linux / amd64
The NVIDIA HPC-Benchmarks collection provides three benchmarks (HPL, HPL-AI, and HPCG) widely used in HPC community optimized for performance on NVIDIA accelerated HPC systems.
NVIDIA's HPL and HPL-AI benchmarks provide software packages to solve a (random) dense linear system in double precision (64 bits) arithmetic and in mixed precision arithmetic using Tensor Cores, respectively, on distributed-memory computers equipped with NVIDIA GPUs, based on the netlib HPL benchmark.
NVIDIA's HPCG benchmark accelerates the High Performance Conjugate Gradients (HPCG) Benchmark. HPCG is a software package that performs a fixed number of multigrid preconditioned (using a symmetric Gauss-Seidel smoother) conjugate gradient (PCG) iterations using double precision (64 bit) floating point values.
The NVIDIA HPC-Benchmarks NGC collection provides two container images: 21.4-hpl
and 21.4-hpcg
.
The 21.4-hpl
container image is provided with the following packages embedded:
HPL-NVIDIA
v1.0.0HPL-AI-NVIDIA
v2.0.0cuBLAS
v11.4.1OpenMPI
v4.0.5UCX
v1.10.0MKL
v2020.4-912The 21.4-hpcg
container image is provided with the following packages embedded:
HPCG-NVIDIA
v1.0.0OpenMPI
v4.0.5UCX
v1.10.0Before running the NVIDIA HPC-Benchmarks NGC containers, please ensure that your system meets the following requirements:
--gpus
option,
orThe 21.4-hpl
container provides the HPL-NVIDIA
and HPL-AI-NVIDIA
benchmarks in the following folder structure:
hpl.sh
script in folder /workspace
to invoke the xhpl
or xhpl-ai
executables.
HPL-NVIDIA
in folder /workspace/hpl-linux-x86_64
contains:
xhpl
executable.HPL-AI-NVIDIA
in folder /workspace/hpl-ai-linux-x86_64
contains:
xhpl_ai
executable.The 21.4-hpcg
container provides the HPCG-NVIDIA
benchmark in the following folder structure:
HPCG-NVIDIA
in folder /workspace/hpcg-linux-x86_64
contains:xhpcg
executable.hpcg.sh
script to invoke the xhpcg
executable.The HPL-NVIDIA
and HPL-AI-NVIDIA
benchmarks use the same input format as the standard Netlib HPL benchmark.
Please see the Netlib HPL benchmark for getting started with the HPL software concepts and best practices.
The HPCG-NVIDIA
benchmark uses the same input format as the standard HPCG-Benchmark.
Please see the HPCG-Benchmark for getting started with the HPCG software concepts and best practices.
The HPL-NVIDIA, HPL-AI-NVIDIA, and HPCG-NVIDIA expect one GPU per MPI process. As such, set the number of MPI processes to match the number of available GPUs in the cluster.
The script hpl.sh
and hpcg.sh
can be invoked on a command line or through a slurm batch-script to launch the "HPL-NVIDIA
and HPL-AI-NVIDIA
", or "HPCG-NVIDIA
" benchmarks, respectively.
The scripts hpl.sh
and hpcg.sh
accept the following parameters:
--config
name of config with preset options (dgx-a100
), or path to a shell file--cpu-affinity
colon separated list of cpu index ranges--cpu-cores-per-rank
number of threads per rank--cuda-compat
manually enable CUDA forward compatibility--dat
path to HPL.dat/hpcg.dat--gpu-affinity
colon separated list of gpu indices--mem-affinity
colon separated list of memory indices--ucx-affinity
colon separated list of UCX devices--ucx-tls
UCX transport to use--xhpl-ai
use the HPL-AI-NVIDIA benchmarkA typical run would either use --config
for a preset configuration, or a combination of --cpu-affinity
, --cpu-cores-per-rank
, --gpu-affinity
, --mem-affinity
, --ucx-affinity
, and --ucx-tls
. Options are processed from left to right, so, for instance, --config dgx-a100 --cpu-affinity ...
would use the dgx-a100
preset configuration but override the preset CPU affinity with the specified value.
It is recommended to specify --cpu-cores-per-rank
in a way such that (number of
MPI processes per node)*(cpu cores per rank) does not exceed the number of cpu
cores.
It is also highly recommended to lock the GPU clocks prior to launching the benchmarks for best performance, either prior to loading the container, or inside the container if running in interactive mode.
The next sections provide sample runs of HPL-NVIDIA
, HPL-AI-NVIDIA
, and HPCG-NVIDIA
benchmarks.
For a general guide on pulling and running containers, see Pulling A Container image and Running A Container in the NGC Container User Guide.
The examples below use Pyxis/enroot from NVIDIA to facilitate running HPC-Benchmarks NGC containers. Note that an enroot .credentials
file is necessary to use these NGC containers.
To copy and customize the sample slurm scripts and/or sample HPL.dat/hpcg.dat files from the containers, run the container in interactive mode, while mounting a folder outside the container, and copy the needed files, as follows:
CONT='nvcr.io#nvidia/hpc-benchmarks:21.4-hpl'
MOUNT="$PWD:/home_pwd"
srun -N 1 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
--pty bash
Once inside the container, copy the needed files to /home_pwd
.
HPL-NVIDIA
runSeveral sample slurm scripts, and several sample HPL.dat files, are available in the container at
/workspace/hpl-linux-x86_64
.
To run HPL-NVIDIA
on a single DGX A100 node, using your custom HPL.dat file:
CONT='nvcr.io#nvidia/hpc-benchmarks:21.4-hpl'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
hpl.sh --config dgx-a100 --dat /my-dat-files/HPL.dat
To run HPL-NVIDIA
on 16 DGX A100 nodes, using provided sample HPL.dat files:
CONT='nvcr.io#nvidia/hpc-benchmarks:21.4-hpl'
srun -N 16 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
hpl.sh --config dgx-a100 --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-dgx-a100-16N.dat
Several sample slurm scripts, and several sample HPL.dat files, are available in the container at
/workspace/hpl-ai-linux-x86_64
.
To run HPL-AI-NVIDIA
on a single DGX A100 node, using provided sample HPL.dat files:
CONT='nvcr.io#nvidia/hpc-benchmarks:21.4-hpl'
srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
hpl.sh --xhpl-ai --config dgx-a100 --dat /workspace/hpl-ai-linux-x86_64/sample-dat/HPL-dgx-a100-1N.dat
To run HPL-AI-NVIDIA
on a 4 node cluster, each node with 4 A100 GPUs, using your custom HPL.dat file:
CONT='nvcr.io#nvidia/hpc-benchmarks:21.4-hpl'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 4 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
hpl.sh --xhpl-ai --cpu-affinity 0:0:1:1 --cpu-cores-per-rank 4 --gpu-affinity 0:1:2:3 --dat /my-dat-files/HPL.dat
Pay special attention to CPU cores affinity/binding, as it highly affects the performance of the HPC benchmarks.
Several sample slurm scripts, and several sample hpcg.dat files, are available in the container at
/workspace/hpcg-linux-x86_64
.
To run HPCG-NVIDIA
on a single DGX A100 node, using your custom hpcg.dat file:
CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpcg'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
hpcg.sh --config dgx-a100 --dat /my-dat-files/hpcg.dat
To run HPCG-NVIDIA
on 16 DGX A100 nodes, using provided sample hpcg.dat files:
CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpcg'
srun -N 16 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
hpcg.sh --config dgx-a100 --dat /workspace/hpcg-linux-x86_64/sample-dat/hpcg-dgx-a100-16N.dat
The instructions below assume Singularity 3.4.1 or later.
Save the HPL-NVIDIA & HPL-AI-NVIDIA NGC container as a local Singularity image file:
$ singularity pull --docker-login hpc-benchmarks:21.4-hpl.sif docker://nvcr.io/nvidia/hpc-benchmarks:21.4-hpl
This command saves the container in the current directory as
hpc-benchmarks:21.4-hpl.sif
.
HPL-NVIDIA
runSeveral sample slurm scripts, and several sample HPL.dat files, are available in the container at
/workspace/hpl-linux-x86_64
.
To run HPL-NVIDIA
on a single DGX A100 node, using your custom HPL.dat file:
CONT='/path/to/hpc-benchmarks:21.4-hpl.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
singularity run --nv \
-B "${MOUNT}" "${CONT}" \
hpl.sh --config dgx-a100 --dat /my-dat-files/HPL.dat
To run HPL-NVIDIA
on 16 DGX A100 nodes, using provided sample HPL.dat files:
CONT='/path/to/hpc-benchmarks:21.4-hpl.sif'
srun -N 16 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
singularity run --nv \
"${CONT}" \
hpl.sh --config dgx-a100 --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-dgx-a100-16N.dat
Several sample slurm scripts, and several sample HPL.dat files, are available in the container at
/workspace/hpl-ai-linux-x86_64
.
To run HPL-AI-NVIDIA
on a single DGX A100 node, using provided sample HPL.dat files:
CONT='/path/to/hpc-benchmarks:21.4-hpl.sif'
srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
singularity run --nv \
"${CONT}" \
hpl.sh --xhpl-ai --config dgx-a100 --dat /workspace/hpl-ai-linux-x86_64/sample-dat/HPL-dgx-a100-1N.dat
To run HPL-AI-NVIDIA
on a 4 node cluster, each node with 4 A100 GPUs, using your custom HPL.dat file:
CONT='/path/to/hpc-benchmarks:21.4-hpl.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 4 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
singularity run --nv \
-B "${MOUNT}" "${CONT}" \
hpl.sh --xhpl-ai --cpu-affinity 0:0:1:1 --cpu-cores-per-rank 4 --gpu-affinity 0:1:2:3 --dat /my-dat-files/HPL.dat
Pay special attention to CPU cores affinity/binding, as it highly affects the performance of the HPC benchmarks.
First, save the HPCG-NVIDIA NGC container as a local Singularity image file:
$ singularity pull --docker-login hpc-benchmarks:21.4-hpcg.sif docker://nvcr.io/nvidia/hpc-benchmarks:21.4-hpcg
This command saves the container in the current directory as
hpc-benchmarks:21.4-hpcg.sif
.
Second, customize the sample slurm scripts, and sample hpcg.dat files, available in the container at /workspace/hpcg-linux-x86_64
to run the benchmark as follows:
To run HPCG-NVIDIA
on a single DGX A100 node, using your custom HPCG.dat file:
CONT='/path/to/hpc-benchmarks:21.4-hpcg.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
singularity run --nv \
-B "${MOUNT}" "${CONT}" \
hpcg.sh --config dgx-a100 --dat /my-dat-files/hpcg.dat
To run HPCG-NVIDIA
on 16 DGX A100 nodes, using provided sample hpcg.dat files:
CONT='/path/to/hpc-benchmarks:21.4-hpcg.sif'
srun -N 16 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
singularity run --nv \
"${CONT}" \
hpcg.sh --config dgx-a100 --dat /workspace/hpcg-linux-x86_64/sample-dat/hpcg-dgx-a100-16N.dat
The below examples are for single node runs with Docker, it is not recommended to use Docker to do multi-node runs.
Download the HPL-NVIDIA & HPL-AI-NVIDIA NGC container as a local Docker image file:
$ docker pull nvcr.io/nvidia/hpc-benchmarks:21.4-hpl
NOTE: you may want to add --privileged
flag for your docker command to avoid a “set_mempolicy” error.
HPL-NVIDIA
runSeveral sample HPL.dat files, are available in the container at
/workspace/hpl-linux-x86_64
.
To run HPL-NVIDIA
on a single DGX A100 node, using your custom HPL.dat file, and pre-locking the devices clocks:
CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpl'
MOUNT="/full-path/to/your/custom/dat-files:/my-dat-files"
(sudo) nvidia-smi -lgc 1380,1410
docker run --gpus all -v ${MOUNT} \
${CONT} \
mpirun --bind-to none -np 8 \
hpl.sh --config dgx-a100 --dat /my-dat-files/HPL.dat
Several sample HPL.dat files, are available in the container at
/workspace/hpl-ai-linux-x86_64
.
To run HPL-AI-NVIDIA
on a single DGX A100 node, using provided sample HPL.dat files:
CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpl'
docker run --gpus all \
${CONT} \
mpirun --bind-to none -np 8 \
hpl.sh --xhpl-ai --config dgx-a100 --dat /workspace/hpl-ai-linux-x86_64/sample-dat/HPL-dgx-a100-1N.dat
To run HPCG-NVIDIA
on a single DGX A100 node, using your custom HPCG.dat file:
CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpcg'
MOUNT="/full-path/to/your/custom/dat-files:/my-dat-files"
docker run --gpus all -v ${MOUNT} \
${CONT} \
mpirun --bind-to none -np 8 \
hpcg.sh --config dgx-a100 --dat /my-dat-files/hpcg.dat