NGC | Catalog
CatalogContainersNVIDIA HPC-Benchmarks

NVIDIA HPC-Benchmarks

For copy image paths and more information, please view on a desktop device.
Logo for NVIDIA HPC-Benchmarks

Description

The NVIDIA HPC-Benchmarks collection provides three NVIDIA accelerated HPC benchmarks: HPL-NVIDIA, HPL-AI-NVIDIA, and HPCG-NVIDIA.

Publisher

NVIDIA

Latest Tag

21.4-hpl

Modified

December 1, 2022

Compressed Size

570.4 MB

Multinode Support

Yes

Multi-Arch Support

No

21.4-hpl (Latest) Scan Results

Linux / amd64

NVIDIA HPC-Benchmarks 21.4

The NVIDIA HPC-Benchmarks collection provides three benchmarks (HPL, HPL-AI, and HPCG) widely used in HPC community optimized for performance on NVIDIA accelerated HPC systems.

NVIDIA's HPL and HPL-AI benchmarks provide software packages to solve a (random) dense linear system in double precision (64 bits) arithmetic and in mixed precision arithmetic using Tensor Cores, respectively, on distributed-memory computers equipped with NVIDIA GPUs, based on the netlib HPL benchmark.

NVIDIA's HPCG benchmark accelerates the High Performance Conjugate Gradients (HPCG) Benchmark. HPCG is a software package that performs a fixed number of multigrid preconditioned (using a symmetric Gauss-Seidel smoother) conjugate gradient (PCG) iterations using double precision (64 bit) floating point values.

Container packages

The NVIDIA HPC-Benchmarks NGC collection provides two container images: 21.4-hpl and 21.4-hpcg.

The 21.4-hpl container image is provided with the following packages embedded:

  • HPL-NVIDIA v1.0.0
  • HPL-AI-NVIDIA v2.0.0
  • cuBLAS v11.4.1
  • OpenMPI v4.0.5
  • UCX v1.10.0
  • MKL v2020.4-912

The 21.4-hpcg container image is provided with the following packages embedded:

  • HPCG-NVIDIA v1.0.0
  • OpenMPI v4.0.5
  • UCX v1.10.0

System Requirements

Before running the NVIDIA HPC-Benchmarks NGC containers, please ensure that your system meets the following requirements:

  • Ampere (sm80) NVIDIA GPU(s). The current container version is aimed at clusters of DGX A100 nodes (Previous GPU generations are expected to work but has not been verified nor optimized with this container).
  • CUDA driver version >= 450.36
    • >= 418.39 with CUDA forward compatibility.
  • Container framework:
    • Pyxis/enroot from NVIDIA, or
    • Singularity version 3.4.1 or later, or
    • Docker 19.03 or later which includes support for the --gpus option, or

Containers folder structure

The 21.4-hpl container provides the HPL-NVIDIA and HPL-AI-NVIDIA benchmarks in the following folder structure:

  • hpl.sh script in folder /workspace to invoke the xhpl or xhpl-ai executables.

  • HPL-NVIDIA in folder /workspace/hpl-linux-x86_64 contains:

    • xhpl executable.
    • Sample slurm batch-job scripts.
    • Sample HPL.dat files.
    • README, RUNNING, and TUNING guides.
  • HPL-AI-NVIDIA in folder /workspace/hpl-ai-linux-x86_64 contains:

    • xhpl_ai executable.
    • Sample slurm batch-job scripts.
    • Sample HPL.dat files.
    • README, RUNNING, and TUNING guides.

The 21.4-hpcg container provides the HPCG-NVIDIA benchmark in the following folder structure:

  • HPCG-NVIDIA in folder /workspace/hpcg-linux-x86_64 contains:
    • xhpcg executable.
    • hpcg.sh script to invoke the xhpcg executable.
    • Sample slurm batch-job scripts.
    • Sample HPCG.dat files.
    • README, RUNNING, and TUNING guides.

Running the HPL-NVIDIA, HPL-AI-NVIDIA, and HPCG-NVIDIA Benchmarks

The HPL-NVIDIA and HPL-AI-NVIDIA benchmarks use the same input format as the standard Netlib HPL benchmark. Please see the Netlib HPL benchmark for getting started with the HPL software concepts and best practices.

The HPCG-NVIDIA benchmark uses the same input format as the standard HPCG-Benchmark. Please see the HPCG-Benchmark for getting started with the HPCG software concepts and best practices.

The HPL-NVIDIA, HPL-AI-NVIDIA, and HPCG-NVIDIA expect one GPU per MPI process. As such, set the number of MPI processes to match the number of available GPUs in the cluster.

The script hpl.sh and hpcg.sh can be invoked on a command line or through a slurm batch-script to launch the "HPL-NVIDIA and HPL-AI-NVIDIA", or "HPCG-NVIDIA" benchmarks, respectively. The scripts hpl.sh and hpcg.sh accept the following parameters:

  • --config name of config with preset options (dgx-a100), or path to a shell file
  • --cpu-affinity colon separated list of cpu index ranges
  • --cpu-cores-per-rank number of threads per rank
  • --cuda-compat manually enable CUDA forward compatibility
  • --dat path to HPL.dat/hpcg.dat
  • --gpu-affinity colon separated list of gpu indices
  • --mem-affinity colon separated list of memory indices
  • --ucx-affinity colon separated list of UCX devices
  • --ucx-tls UCX transport to use
  • --xhpl-ai use the HPL-AI-NVIDIA benchmark

A typical run would either use --config for a preset configuration, or a combination of --cpu-affinity, --cpu-cores-per-rank, --gpu-affinity, --mem-affinity, --ucx-affinity, and --ucx-tls. Options are processed from left to right, so, for instance, --config dgx-a100 --cpu-affinity ... would use the dgx-a100 preset configuration but override the preset CPU affinity with the specified value.

It is recommended to specify --cpu-cores-per-rank in a way such that (number of MPI processes per node)*(cpu cores per rank) does not exceed the number of cpu cores.

It is also highly recommended to lock the GPU clocks prior to launching the benchmarks for best performance, either prior to loading the container, or inside the container if running in interactive mode.

The next sections provide sample runs of HPL-NVIDIA, HPL-AI-NVIDIA, and HPCG-NVIDIA benchmarks.

For a general guide on pulling and running containers, see Pulling A Container image and Running A Container in the NGC Container User Guide.

Running with Pyxis/Enroot

The examples below use Pyxis/enroot from NVIDIA to facilitate running HPC-Benchmarks NGC containers. Note that an enroot .credentials file is necessary to use these NGC containers.

To copy and customize the sample slurm scripts and/or sample HPL.dat/hpcg.dat files from the containers, run the container in interactive mode, while mounting a folder outside the container, and copy the needed files, as follows:

CONT='nvcr.io#nvidia/hpc-benchmarks:21.4-hpl'
MOUNT="$PWD:/home_pwd"

srun -N 1 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     --pty bash

Once inside the container, copy the needed files to /home_pwd.

Examples of HPL-NVIDIA run

Several sample slurm scripts, and several sample HPL.dat files, are available in the container at /workspace/hpl-linux-x86_64.

To run HPL-NVIDIA on a single DGX A100 node, using your custom HPL.dat file:

CONT='nvcr.io#nvidia/hpc-benchmarks:21.4-hpl'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     hpl.sh --config dgx-a100 --dat /my-dat-files/HPL.dat

To run HPL-NVIDIA on 16 DGX A100 nodes, using provided sample HPL.dat files:

CONT='nvcr.io#nvidia/hpc-benchmarks:21.4-hpl'

srun -N 16 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     hpl.sh --config dgx-a100 --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-dgx-a100-16N.dat

Examples of HPL-AI-NVIDIA run

Several sample slurm scripts, and several sample HPL.dat files, are available in the container at /workspace/hpl-ai-linux-x86_64.

To run HPL-AI-NVIDIA on a single DGX A100 node, using provided sample HPL.dat files:

CONT='nvcr.io#nvidia/hpc-benchmarks:21.4-hpl'

srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     hpl.sh --xhpl-ai --config dgx-a100 --dat /workspace/hpl-ai-linux-x86_64/sample-dat/HPL-dgx-a100-1N.dat

To run HPL-AI-NVIDIA on a 4 node cluster, each node with 4 A100 GPUs, using your custom HPL.dat file:

CONT='nvcr.io#nvidia/hpc-benchmarks:21.4-hpl'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 4 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     hpl.sh --xhpl-ai --cpu-affinity 0:0:1:1 --cpu-cores-per-rank 4 --gpu-affinity 0:1:2:3 --dat /my-dat-files/HPL.dat

Pay special attention to CPU cores affinity/binding, as it highly affects the performance of the HPC benchmarks.

Examples of HPCG-NVIDIA run

Several sample slurm scripts, and several sample hpcg.dat files, are available in the container at /workspace/hpcg-linux-x86_64.

To run HPCG-NVIDIA on a single DGX A100 node, using your custom hpcg.dat file:

CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpcg'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     hpcg.sh --config dgx-a100 --dat /my-dat-files/hpcg.dat

To run HPCG-NVIDIA on 16 DGX A100 nodes, using provided sample hpcg.dat files:

CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpcg'

srun -N 16 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     hpcg.sh --config dgx-a100 --dat /workspace/hpcg-linux-x86_64/sample-dat/hpcg-dgx-a100-16N.dat

Running with Singularity

The instructions below assume Singularity 3.4.1 or later.

Pull the image

Save the HPL-NVIDIA & HPL-AI-NVIDIA NGC container as a local Singularity image file:

$ singularity pull --docker-login hpc-benchmarks:21.4-hpl.sif docker://nvcr.io/nvidia/hpc-benchmarks:21.4-hpl

This command saves the container in the current directory as hpc-benchmarks:21.4-hpl.sif.

Examples of HPL-NVIDIA run

Several sample slurm scripts, and several sample HPL.dat files, are available in the container at /workspace/hpl-linux-x86_64.

To run HPL-NVIDIA on a single DGX A100 node, using your custom HPL.dat file:

CONT='/path/to/hpc-benchmarks:21.4-hpl.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
   singularity run --nv \
     -B "${MOUNT}" "${CONT}" \
     hpl.sh --config dgx-a100 --dat /my-dat-files/HPL.dat

To run HPL-NVIDIA on 16 DGX A100 nodes, using provided sample HPL.dat files:

CONT='/path/to/hpc-benchmarks:21.4-hpl.sif'

srun -N 16 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
   singularity run --nv \
     "${CONT}" \
     hpl.sh --config dgx-a100 --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-dgx-a100-16N.dat

Examples of HPL-AI-NVIDIA run

Several sample slurm scripts, and several sample HPL.dat files, are available in the container at /workspace/hpl-ai-linux-x86_64.

To run HPL-AI-NVIDIA on a single DGX A100 node, using provided sample HPL.dat files:

CONT='/path/to/hpc-benchmarks:21.4-hpl.sif'

srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
  singularity run --nv \
     "${CONT}" \
     hpl.sh --xhpl-ai --config dgx-a100 --dat /workspace/hpl-ai-linux-x86_64/sample-dat/HPL-dgx-a100-1N.dat

To run HPL-AI-NVIDIA on a 4 node cluster, each node with 4 A100 GPUs, using your custom HPL.dat file:

CONT='/path/to/hpc-benchmarks:21.4-hpl.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 4 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
  singularity run --nv \
     -B "${MOUNT}" "${CONT}" \
     hpl.sh --xhpl-ai --cpu-affinity 0:0:1:1 --cpu-cores-per-rank 4 --gpu-affinity 0:1:2:3 --dat /my-dat-files/HPL.dat

Pay special attention to CPU cores affinity/binding, as it highly affects the performance of the HPC benchmarks.

Examples of HPCG-NVIDIA run

First, save the HPCG-NVIDIA NGC container as a local Singularity image file:

$ singularity pull --docker-login hpc-benchmarks:21.4-hpcg.sif docker://nvcr.io/nvidia/hpc-benchmarks:21.4-hpcg

This command saves the container in the current directory as hpc-benchmarks:21.4-hpcg.sif.

Second, customize the sample slurm scripts, and sample hpcg.dat files, available in the container at /workspace/hpcg-linux-x86_64 to run the benchmark as follows:

To run HPCG-NVIDIA on a single DGX A100 node, using your custom HPCG.dat file:

CONT='/path/to/hpc-benchmarks:21.4-hpcg.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
  singularity run --nv \
     -B "${MOUNT}" "${CONT}" \
     hpcg.sh --config dgx-a100 --dat /my-dat-files/hpcg.dat

To run HPCG-NVIDIA on 16 DGX A100 nodes, using provided sample hpcg.dat files:

CONT='/path/to/hpc-benchmarks:21.4-hpcg.sif'

srun -N 16 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
  singularity run --nv \
     "${CONT}" \
     hpcg.sh --config dgx-a100 --dat /workspace/hpcg-linux-x86_64/sample-dat/hpcg-dgx-a100-16N.dat

Running with Docker

The below examples are for single node runs with Docker, it is not recommended to use Docker to do multi-node runs.

Pull the image

Download the HPL-NVIDIA & HPL-AI-NVIDIA NGC container as a local Docker image file:

$ docker pull nvcr.io/nvidia/hpc-benchmarks:21.4-hpl

NOTE: you may want to add --privileged flag for your docker command to avoid a “set_mempolicy” error.

Examples of HPL-NVIDIA run

Several sample HPL.dat files, are available in the container at /workspace/hpl-linux-x86_64.

To run HPL-NVIDIA on a single DGX A100 node, using your custom HPL.dat file, and pre-locking the devices clocks:

CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpl'
MOUNT="/full-path/to/your/custom/dat-files:/my-dat-files"

(sudo) nvidia-smi -lgc 1380,1410

docker run --gpus all -v ${MOUNT} \
     ${CONT} \
     mpirun --bind-to none -np 8 \
     hpl.sh --config dgx-a100 --dat /my-dat-files/HPL.dat

Examples of HPL-AI-NVIDIA run

Several sample HPL.dat files, are available in the container at /workspace/hpl-ai-linux-x86_64.

To run HPL-AI-NVIDIA on a single DGX A100 node, using provided sample HPL.dat files:

CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpl'

docker run --gpus all \
     ${CONT} \
     mpirun --bind-to none -np 8 \
     hpl.sh --xhpl-ai --config dgx-a100 --dat /workspace/hpl-ai-linux-x86_64/sample-dat/HPL-dgx-a100-1N.dat

Examples of HPCG-NVIDIA run

To run HPCG-NVIDIA on a single DGX A100 node, using your custom HPCG.dat file:

CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpcg'
MOUNT="/full-path/to/your/custom/dat-files:/my-dat-files"

docker run --gpus all -v ${MOUNT} \
     ${CONT} \
     mpirun --bind-to none -np 8 \
     hpcg.sh --config dgx-a100 --dat /my-dat-files/hpcg.dat

Resources

Support