NGC | Catalog
CatalogContainersNVIDIA HPC-Benchmarks

NVIDIA HPC-Benchmarks

For copy image paths and more information, please view on a desktop device.
Logo for NVIDIA HPC-Benchmarks

Features

Description

The NVIDIA HPC-Benchmarks collection provides three NVIDIA accelerated HPC benchmarks: HPL-NVIDIA, HPL-MxP-NVIDIA, and HPCG-NVIDIA.

Publisher

NVIDIA

Latest Tag

23.10

Modified

December 2, 2023

Compressed Size

4.12 GB

Multinode Support

Yes

Multi-Arch Support

Yes

23.10 (Latest) Security Scan Results

Linux / amd64

Linux / arm64

NVIDIA HPC-Benchmarks 23.10

The NVIDIA HPC-Benchmarks collection provides four benchmarks (HPL, HPL-MxP, HPCG and STREAM) widely used in the HPC community optimized for performance on NVIDIA accelerated HPC systems.

NVIDIA's HPL and HPL-MxP benchmarks provide software packages to solve a (random) dense linear system in double precision (64 bits) arithmetic and in mixed precision arithmetic using Tensor Cores, respectively, on distributed-memory computers equipped with NVIDIA GPUs, based on the netlib HPL benchmark and HPL-MxP benchmark

NVIDIA's HPCG benchmark accelerates the High Performance Conjugate Gradients (HPCG) Benchmark. HPCG is a software package that performs a fixed number of multigrid preconditioned (using a symmetric Gauss-Seidel smoother) conjugate gradient (PCG) iterations using double precision (64 bit) floating point values.

NVIDIA's STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth. NVIDIA HPC-Benchmarks container includes STREAM benchmarks optimized for NVIDIA Ampere GPU architecture (sm80), NVIDIA Hopper GPU architecture (sm90) and NVIDIA Grace CPU.

Container packages

The NVIDIA HPC-Benchmarks provides multiplatform (x86 and aarch64) container image hpc-benchmarks:23.10 which is based on NVIDIA Optimized Frameworks 23.09 container images.

In addition to NVIDIA Optimized Frameworks 23.09 container images, the hpc-benchmarks:23.10 container image is provided with the following packages embedded:

  • HPL-NVIDIA 23.10.0
  • HPL-MxP-NVIDIA 23.10.0
  • HPCG-NVIDIA 23.10.0
  • STREAM-NVIDIA 23.10.0
  • NVIDIA NVSHMEM 2.10.1
  • NVIDIA NCCL 2.16.5
  • Intel MKL 2020.4-912

Aarch64 container image contains BETA release of HPC benchmarks.

Prerequisites

Using the NVIDIA HPC-Benchmarks Container requires the host system to have the following installed:

For supported versions, see the Framework Containers Support Matrix and the NVIDIA Container Toolkit Documentation

NVIDIA's HPL benchmark requires GRDCopy installed on the system. Please visit https://developer.nvidia.com/gdrcopy and https://github.com/NVIDIA/gdrcopy#build-and-installation for more information. In addition, please be aware that GDRCopy requires an extra kernel-mode driver to be installed and loaded on the target machine.

The NVIDIA HPC-Benchmarks Container supports NVIDIA Ampere GPU architecture (sm80) or NVIDIA Hopper GPU architecture (sm90). The current container version is aimed at clusters of DGX A100, DGX H100, NVIDIA Grace Hopper, and NVIDIA Grace CPU nodes (Previous GPU generations are not expected to work).

Containers folder structure

The hpc-benchmarks:23.10 container provides the HPL-NVIDIA, HPL-MxP-NVIDIA, HPCG-NVIDIA and STREAM-NVIDIA benchmarks in the following folder structure:

x86 container image:

  • hpl.sh script in the folder /workspace to invoke the xhpl executable.

  • hpl-mxp.sh script in the folder /workspace to invoke the xhpl-mxp executable.

  • hpcg.sh script in the folder /workspace to invoke the xhpcg executable.

  • stream-gpu-test.sh script in the folder /workspace to invoke the stream_test executable for NVIDIA H100 or A100 GPU.

  • HPL-NVIDIA in the folder /workspace/hpl-linux-x86_64 contains:

    • xhpl executable.
    • Samples of slurm batch-job scripts in sample-slurm directory.
    • Samples of input files in sample-dat directory.
    • README, RUNNING, and TUNING guides.
  • HPL-MxP-NVIDIA in the folder /workspace/hpl-mxp-linux-x86_64 contains:

    • xhpl_mxp executable.
    • Samples of slurm batch-job scripts in sample-slurm directory.
    • README, RUNNING, and TUNING guides.
  • HPCG-NVIDIA in the folder /workspace/hpcg-linux-x86_64 contains:

    • xhpcg executable.
    • Samples of slurm batch-job scripts in sample-slurm directory
    • Sample input file in sample-dat directory.
    • README, RUNNING, and TUNING guides.
  • STREAM-NVIDIA in the folder /workspace/stream-gpu-linux-x86_64

    • stream_test executable. GPU STREAM benchmark with double precision elements.
    • stream_test_fp32 executable. GPU STREAM benchmark with single precision elements.

Running the HPL-NVIDIA, HPL-MxP-NVIDIA, HPCG-NVIDIA and STREAM-NVIDIA Benchmarks

The HPL-NVIDIA benchmark uses input format as the standard Netlib HPL benchmark. Please see the Netlib HPL benchmark for getting started with the HPL software concepts and best practices.

The HPCG-NVIDIA benchmark uses the same input format as the standard HPCG-Benchmark. Please see the HPCG-Benchmark for getting started with the HPCG software concepts and best practices.

The HPL-MxP-NVIDIA benchmark accepts the list of parameters to describe input tasks and set additional tuning settings. The description of parameters can be found in README and TUNING files.

The HPL-NVIDIA, HPL-MxP-NVIDIA, and HPCG-NVIDIA with GPU support expect one GPU per MPI process. As such, set the number of MPI processes to match the number of available GPUs in the cluster.

x86 container image

The scripts hpl.sh and hpcg.sh can be invoked on a command line or through a slurm batch-script to launch the HPL-NVIDIA and HPCG-NVIDIA benchmarks, respectively. The scripts hpl.sh and hpcg.sh accept the following parameters:

  • --dat path to HPL.dat. Optional parameters:
  • --gpu-affinity <string> colon separated list of gpu indices
  • --cpu-affinity <string> colon separated list of cpu index ranges
  • --mem-affinity <string> colon separated list of memory indices
  • --ucx-affinity <string> colon separated list of UCX devices
  • --ucx-tls <string> UCX transport to use
  • --exec-name <string> HPL executable file
  • --no-multinode enable flags for no-multinode (no-network) execution

In addition, the script hpcg.sh alternatively to input file accepts the following parameters:

  • --nx specifies the local (to an MPI process) X dimensions of the problem
  • --ny specifies the local (to an MPI process) Y dimensions of the problem
  • --nz specifies the local (to an MPI process) Z dimensions of the problem
  • --rt specifies the number of seconds of how long the timed portion of the benchmark should run
  • --b activates benchmarking mode to bypass CPU reference execution when set to one (--b 1)
  • --l2cmp activates compression in GPU L2 cache when set to one (--l2cmp 1)

The script hpl-mxp.sh can be invoked on a command line or through a slurm batch script to launch the HPL-MxP-NVIDIA benchmark. The script hpl-mxp.sh requires the following parameters:

  • --gpu-affinity <string> colon separated list of gpu indices
  • --nprow <int> number of rows in the processor grid"
  • --npcol <int> number of columns in the processor grid"
  • --nporder <string> "row" or "column" major layout of the processor grid"
  • --n <int> size of N-by-N matrix
  • --nb <int> nb is the blocking constant (panel size)" The full list of accepted parameters can be found in README and TUNING files.

Note:

  • CPU and memory affinities can improve performance of HPL-MxP-NVIDIA benchmark. Below the example for DGX A100 and DGX H100:
    • DGX-H100: --mem-affinity 0:0:0:0:1:1:1:1 --cpu-affinity 0-13:14-27:28-41:42-55:56-69:70-83:84-97:98-111
    • DGX-A100: --mem-affinity 2:3:0:1:6:7:4:5 --cpu-affinity 32-47:48-63:0-15:16-31:96-111:112-127:64-79:80-95

The script stream-gpu-test.shcan be invoked on a command line or through a slurm batch script to launch the STREAM-NVIDIA benchmark. The script stream-gpu-test.sh accepts the following optional parameters:

  • --d <int> device number
  • --n <int> number of elements in the arrays
  • --dt fp32 enable fp32 stream test

aarch64 container image

HPL-NVIDIA, HPCG-NVIDIA, HPL-MxP-NVIDIA benchmark and STREAM-NVIDIA benchmark for GPU can be run in the same way with HPL-NVIDIA, HPCG-NVIDIA, HPL-MxP-NVIDIA benchmark and STREAM-NVIDIA benchmark from x86_64 container image (see details in x86 container image section).

The section provide sample runs of HPL-NVIDIA, HPL-MxP-NVIDIA, and HPCG-NVIDIA benchmarks for NVIDIA Grace CPU.

The scripts hpl-aarch64.sh and hpcg-aarch64.sh can be invoked on a command line or through a slurm batch-script to launch the HPL-NVIDIA and HPCG-NVIDIA benchmarks for NVIDIA Grace CPU, respectively.

The scripts hpl-aarch64.sh and hpcg-aarch64.sh accept the following parameters:

  • --dat path to HPL.dat. Optional parameters:
  • --cpu-affinity <string> colon separated list of cpu index ranges
  • --mem-affinity <string> colon separated list of memory indices
  • --ucx-affinity <string> colon separated list of UCX devices
  • --ucx-tls <string> UCX transport to use
  • --exec-name <string> HPL executable file

In addition, the script hpcg.sh alternatively to input file accepts the following parameters:

  • --nx specifies the local (to an MPI process) X dimensions of the problem
  • --ny specifies the local (to an MPI process) Y dimensions of the problem
  • --nz specifies the local (to an MPI process) Z dimensions of the problem
  • --rt specifies the number of seconds of how long the timed portion of the benchmark should run
  • --b activates benchmarking mode to bypass CPU reference execution when set to one (--b=1)

The script hpl-mxp-aarch64.sh can be invoked on a command line or through a slurm batch script to launch the HPL-MxP-NVIDIA benchmark for NVIDIA Grace CPU. The script hpl-mxp-aarch64.sh requires the following parameters:

  • --nprow <int> number of rows in the processor grid"
  • --npcol <int> number of columns in the processor grid"
  • --nporder <string> "row" or "column" major layout of the processor grid"
  • --n <int> size of N-by-N matrix
  • --nb <int> nb is the blocking constant (panel size)" The full list of accepted parameters can be found in README and TUNING files.

The script stream-cpu-test.shcan be invoked on a command line or through a slurm batch script to launch the STREAM-NVIDIA benchmark. The script stream-cpu-test.sh accepts the following optional parameters:

  • --n <int> number of elements in the arrays
  • --t <int> number of threads

For a general guide on pulling and running containers, see Running A Container chapter in the NVIDIA Containers For Deep Learning Frameworks User’s Guide. For more information about using NGC, refer to the NGC Container User Guide.

Running with Pyxis/Enroot

The examples below use Pyxis/enroot from NVIDIA to facilitate running HPC-Benchmarks Containers. Note that an enroot .credentials file is necessary to use these NGC containers.

To copy and customize the sample slurm scripts and/or sample HPL.dat/hpcg.dat files from the containers, run the container in interactive mode, while mounting a folder outside the container, and copy the needed files, as follows:

CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'
MOUNT="$PWD:/home_pwd"

srun -N 1 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     --pty bash

Once inside the container, copy the needed files to /home_pwd.

HPL-NVIDIA, HPL-MxP-NVIDIA, HPCG-NVIDIA and STREAM-NVIDIA Benchmarks with support of GPU

Examples of HPL-NVIDIA run

Several sample slurm scripts and several sample input files are available in the container at /workspace/hpl-linux-x86_64 or /workspace/hpl-linux-aarch64-gpu.

To run HPL-NVIDIA on a single node with 4 GPUs using your custom HPL.dat file:

CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 1 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     ./hpl.sh --dat /my-dat-files/HPL.dat

To run HPL-NVIDIA on nodes 16 with 4 GPUs (or 8 nodes with 8 GPUs) using provided sample HPL-dgx-64GPUs.dat files:

CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'

srun -N 16 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     ./hpl.sh --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-64GPUs.dat
CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'

srun -N 8 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     ./hpl.sh --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-64GPUs.dat
Examples of HPL-MxP-NVIDIA run

Several sample slurm scripts and are available in the container at /workspace/hpl-mxp-linux-x86_64 or /workspace/hpl-mxp-linux-aarch64-gpu.

To run HPL-MxP-NVIDIA on a single node with 8 GPUs:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'

srun -N 1 --ntasks-per-node=8 \
     --container-image="${CONT}" \
     ./hpl-mxp.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row --gpu-affinity 0:1:2:3:4:5:6:7

To run HPL-MxP-NVIDIA on a 4 nodes, each node with 4 GPUs:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'

srun -N 4 --ntasks-per-node=4 \
     --container-image="${CONT}" \
     ./hpl-mxp.sh --n 280000 --nb 2048 --nprow 4 --npcol 4 --nporder row --gpu-affinity 0:1:2:3

Pay special attention to CPU cores affinity/binding, as it greatly affects the performance of the HPL benchmarks.

Examples of HPCG-NVIDIA run

Several sample slurm scripts and sample input file are available in the container at /workspace/hpcg-linux-x86_64 or /workspace/hpcg-linux-aarch64-gpu

To run HPCG-NVIDIA on a single node with one GPU using your custom hpcg.dat file:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     ./hpcg.sh --dat /my-dat-files/hpcg.dat

To run HPCG-NVIDIA on nodes 16 with 4 GPUs (or 8 nodes with 8 GPUs) using script parameters:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 16 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     ./hpcg.sh --nx 256 --ny 256 --nz 256 --rt 2

HPL-NVIDIA, HPL-MxP-NVIDIA, HPCG-NVIDIA and STREAM-NVIDIA Benchmarks for NVIDIA Grace CPU

Examples of HPL-NVIDIA run

Several sample input files are available in the container at /workspace/hpl-linux-aarch64.

To run HPL-NVIDIA on two nodes of NVIDIA Grace CPU using your custom HPL.dat file:

CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 2 --ntasks-per-node=2 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     ./hpl-aarch64.sh --dat /my-dat-files/HPL.dat --cpu-affinity 0-71:72-143 --mem-affinity 0:1

where --cpu-affinity is mapping to cores on the local node and --mem-affinity is mapping to NUMA-nodes on the local node.

Examples of HPL-MxP-NVIDIA run

To run HPL-MxP-NVIDIA on a single node of NVIDIA Grace Hopper x4:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'

srun -N 1 --ntasks-per-node=16 \
     --container-image="${CONT}" \
     ./hpl-mxp-aarch64.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row \
     --cpu-affinity 0-71:72-143:144-215:216-287 \
     --mem-affinity 0:1:2:3

where --cpu-affinity is mapping to cores on the local node and --mem-affinity is mapping to NUMA-nodes on the local node.

Examples of HPCG-NVIDIA run

Sample input file is available in the container at /workspace/hpcg-linux-aarch64

To run HPCG-NVIDIA on two nodes of NVIDIA Grace CPU using your custom hpcg.dat file:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 2 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     ./hpcg.sh --dat /my-dat-files/hpcg.dat --cpu-affinity 0-35:36-71:72-107:108-143 --mem-affinity 0:0:1:1

Running with Singularity

The instructions below assume Singularity 3.4.1 or later.

Pull the image

Save the HPC-Benchmark container as a local Singularity image file:

$ singularity pull --docker-login hpc-benchmarks:23.10.sif docker://nvcr.io/nvidia/hpc-benchmarks:23.10

This command saves the container in the current directory as hpc-benchmarks:23.10.sif.

HPL-NVIDIA, HPL-MxP-NVIDIA, HPCG-NVIDIA and STREAM-NVIDIA Benchmarks with support of GPU

Examples of HPL-NVIDIA run

Several sample slurm scripts and several sample input files are available in the container at /workspace/hpl-linux-x86_64 or /workspace/hpl-linux-aarch64-gpu.

To run HPL-NVIDIA on a single node with 4 GPUs using your custom HPL.dat file:

CONT='/path/to/hpc-benchmarks:23.10.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 1 --ntasks-per-node=4 singularity run --nv \
     -B "${MOUNT}" "${CONT}" \
     ./hpl.sh --dat /my-dat-files/HPL.dat

To run HPL-NVIDIA on nodes 16 with 4 GPUs (or 8 nodes with 8 GPUs) using provided sample HPL-dgx-64GPUs.dat files:

CONT='/path/to/hpc-benchmarks:23.10.sif'

srun -N 16 --ntasks-per-node=4 singularity run --nv \
     "${CONT}" \
     ./hpl.sh --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-64GPUs.dat
CONT='/path/to/hpc-benchmarks:23.10.sif'

srun -N 8 --ntasks-per-node=8 singularity run --nv \
     "${CONT}" \
     ./hpl.sh --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-64GPUs.dat
Examples of HPL-MxP-NVIDIA run

Several sample slurm scripts are available in the container at /workspace/hpl-mxp-linux-x86_64 or /workspace/hpl-mxp-linux-aarch64-gpu.

To run HPL-MxP-NVIDIA on a single node with 8 GPUs:

CONT='/path/to/hpc-benchmarks:23.10.sif'

srun -N 1 --ntasks-per-node=8 singularity run --nv \
     "${CONT}" \
     ./hpl-mxp.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row --gpu-affinity 0:1:2:3:4:5:6:7

To run HPL-MxP-NVIDIA on a 4 nodes, each node with 4 GPUs:

CONT='/path/to/hpc-benchmarks:23.10.sif'

srun -N 4 --ntasks-per-node=4 singularity run --nv \
     "${CONT}" \
     ./hpl-mxp.sh --n 280000 --nb 2048 --nprow 4 --npcol 4 --nporder row --gpu-affinity 0:1:2:3

Pay special attention to CPU cores affinity/binding, as it greatly affects the performance of the HPL benchmarks.

Examples of HPCG-NVIDIA run

Several sample slurm scripts and sample input file are available in the container at /workspace/hpcg-linux-x86_64 or /workspace/hpcg-linux-aarch64-gpu

To run HPCG-NVIDIA on a single node with one GPU using your custom hpcg.dat file:

CONT='/path/to/hpc-benchmarks:23.10.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     ./hpcg.sh --dat /my-dat-files/hpcg.dat

To run HPCG-NVIDIA on nodes 16 with 4 GPUs (or 8 nodes with 8 GPUs) using script parameters:

CONT='/path/to/hpc-benchmarks:23.10.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 16 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
     --container-image="${CONT}" \
     --container-mounts="${MOUNT}" \
     ./hpcg.sh --nx 256 --ny 256 --nz 256 --rt 2

HPL-NVIDIA, HPL-MxP-NVIDIA, HPCG-NVIDIA and STREAM-NVIDIA Benchmarks for NVIDIA Grace CPU

Examples of HPL-NVIDIA run

Several sample input files are available in the container at /workspace/hpl-linux-aarch64.

To run HPL-NVIDIA on two nodes of NVIDIA Grace CPU using your custom HPL.dat file:

CONT='/path/to/hpc-benchmarks:23.10.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 2 --ntasks-per-node=2 singularity run \
     -B "${MOUNT}" "${CONT}" \
     ./hpl-aarch64.sh --dat /my-dat-files/HPL.dat --cpu-affinity 0-71:72-143 --mem-affinity 0:1

where --cpu-affinity is mapping to cores on the local node and --mem-affinity is mapping to NUMA-nodes on the local node.

Examples of HPL-MxP-NVIDIA run

To run HPL-MxP-NVIDIA on a single node of NVIDIA Grace Hopper x4:

CONT='/path/to/hpc-benchmarks:23.10.sif'

srun -N 1 --ntasks-per-node=16 singularity run \
     "${CONT}" \
     ./hpl-mxp-aarch64.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row \
     --cpu-affinity 0-71:72-143:144-215:216-287 \
     --mem-affinity 0:1:2:3

where --cpu-affinity is mapping to cores on the local node and --mem-affinity is mapping to NUMA-nodes on the local node.

Examples of HPCG-NVIDIA run

Sample input file is available in the container at /workspace/hpcg-linux-aarch64

To run HPCG-NVIDIA on two nodes of NVIDIA Grace CPU using your custom hpcg.dat file:

CONT='/path/to/hpc-benchmarks:23.10.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

srun -N 2 --ntasks-per-node=4 singularity run \
     -B "${MOUNT}" "${CONT}" \
     ./hpcg.sh --dat /my-dat-files/hpcg.dat --cpu-affinity 0-35:36-71:72-107:108-143 --mem-affinity 0:0:1:1

Running with Docker

The below examples are for single node runs with Docker. It is not recommended to use Docker for multi-node runs.

Pull the image

Download the HPL-Benchmark container as a local Docker image file:

$ docker pull nvcr.io/nvidia/hpc-benchmarks:23.10

NOTE: you may want to add --privileged flag for your docker command to avoid a “set_mempolicy” error.

HPL-NVIDIA, HPL-MxP-NVIDIA, HPCG-NVIDIA and STREAM-NVIDIA Benchmarks with support of GPU

Examples of HPL-NVIDIA run

To run HPL-NVIDIA on a single node with 4 GPUs using your custom HPL.dat file:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/full-path/to/your/custom/dat-files:/my-dat-files"

docker run --gpus all --shm-size=1g -v ${MOUNT} \
     ${CONT} \
     mpirun --bind-to none -np 4 \
     ./hpl.sh --dat /my-dat-files/HPL.dat
Examples of HPL-MxP-NVIDIA run

To run HPL-MxP-NVIDIA on a single node with 8 GPUs:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'

docker run --gpus all --shm-size=1g \
     ${CONT} \
     mpirun --bind-to none -np 8 \
     ./hpl-mxp.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row --gpu-affinity 0:1:2:3:4:5:6:7
Examples of HPCG-NVIDIA run

To run HPCG-NVIDIA on a single node with one GPU using your custom hpcg.dat file:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/full-path/to/your/custom/dat-files:/my-dat-files"

docker run --gpus all -v --shm-size=1g ${MOUNT} \
     ${CONT} \
     mpirun --bind-to none -np 8 \
     ./hpcg.sh --dat /my-dat-files/hpcg.dat

HPL-NVIDIA, HPL-MxP-NVIDIA, HPCG-NVIDIA and STREAM-NVIDIA Benchmarks for NVIDIA Grace CPU

Examples of HPL-NVIDIA run

Several sample docker run scripts are available in the container at /workspace/hpl-linux-aarch64.

To run HPL-NVIDIA on a single NVIDIA Grace CPU mode using your custom HPL.dat file:

CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

docker run -v ${MOUNT} \
     "${CONT}" \
     mpirun --bind-to none -np 2 \
     ./hpl-aarch64.sh --dat /my-dat-files/HPL.dat --cpu-affinity 0-71:72-143 --mem-affinity 0:1

where --cpu-affinity is mapping to cores on the local node and --mem-affinity is mapping to NUMA-nodes on the local node.

Examples of HPL-MxP-NVIDIA run

Several sample docker run scripts are available in the container at /workspace/hpl-mxp-linux-aarch64.

To run HPL-MxP-NVIDIA on a single node of NVIDIA Grace Hopper x4:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'

docker run \
     "${CONT}" \
     mpirun --bind-to none -np 4 \
     ./hpl-mxp-aarch64.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row \
     --cpu-affinity 0-71:72-143:144-215:216-287 --mem-affinity 0:1:2:3

where --cpu-affinity is mapping to cores on the local node and --mem-affinity is mapping to NUMA-nodes on the local node.

Examples of HPCG-NVIDIA run

Several sample docker run scripts are available in the container at /workspace/hpcg-linux-aarch64.

To run HPCG-NVIDIA on a single node of NVIDIA Grace CPU using your custom hpcg.dat file:

CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"

docker run -v ${MOUNT} \
     "${CONT}" \
     mpirun --bind-to none -np 4 \
     ./hpcg.sh --dat /my-dat-files/hpcg.dat --cpu-affinity 0-35:36-71:72-107:108-143 --mem-affinity 0:0:1:1

Resources

Support