Linux / amd64
Linux / arm64
The NVIDIA HPC-Benchmarks collection provides four benchmarks (HPL, HPL-MxP, HPCG and STREAM) widely used in the HPC community optimized for performance on NVIDIA accelerated HPC systems.
NVIDIA's HPL and HPL-MxP benchmarks provide software packages to solve a (random) dense linear system in double precision (64 bits) arithmetic and in mixed precision arithmetic using Tensor Cores, respectively, on distributed-memory computers equipped with NVIDIA GPUs, based on the netlib HPL benchmark and HPL-MxP benchmark
NVIDIA's HPCG benchmark accelerates the High Performance Conjugate Gradients (HPCG) Benchmark. HPCG is a software package that performs a fixed number of multigrid preconditioned (using a symmetric Gauss-Seidel smoother) conjugate gradient (PCG) iterations using double precision (64 bit) floating point values.
NVIDIA's STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth. NVIDIA HPC-Benchmarks container includes STREAM benchmarks optimized for NVIDIA Ampere GPU architecture (sm80), NVIDIA Hopper GPU architecture (sm90) and NVIDIA Grace CPU.
The NVIDIA HPC-Benchmarks provides multiplatform (x86 and aarch64) container image hpc-benchmarks:23.10
which is based on NVIDIA Optimized Frameworks 23.09 container images.
In addition to NVIDIA Optimized Frameworks 23.09 container images, the hpc-benchmarks:23.10
container image is provided with the following packages embedded:
Aarch64 container image contains BETA release of HPC benchmarks.
Using the NVIDIA HPC-Benchmarks Container requires the host system to have the following installed:
For supported versions, see the Framework Containers Support Matrix and the NVIDIA Container Toolkit Documentation
NVIDIA's HPL benchmark requires GRDCopy installed on the system. Please visit https://developer.nvidia.com/gdrcopy and https://github.com/NVIDIA/gdrcopy#build-and-installation for more information. In addition, please be aware that GDRCopy requires an extra kernel-mode driver to be installed and loaded on the target machine.
The NVIDIA HPC-Benchmarks Container supports NVIDIA Ampere GPU architecture (sm80) or NVIDIA Hopper GPU architecture (sm90). The current container version is aimed at clusters of DGX A100, DGX H100, NVIDIA Grace Hopper, and NVIDIA Grace CPU nodes (Previous GPU generations are not expected to work).
The hpc-benchmarks:23.10
container provides the HPL-NVIDIA
, HPL-MxP-NVIDIA
, HPCG-NVIDIA
and STREAM-NVIDIA
benchmarks in the following folder structure:
hpl.sh
script in the folder /workspace
to invoke the xhpl executable.
hpl-mxp.sh
script in the folder /workspace
to invoke the xhpl-mxp executable.
hpcg.sh
script in the folder /workspace
to invoke the xhpcg executable.
stream-gpu-test.sh
script in the folder /workspace
to invoke the stream_test executable for NVIDIA H100 or A100 GPU.
HPL-NVIDIA
in the folder /workspace/hpl-linux-x86_64
contains:
xhpl
executable.sample-slurm
directory.sample-dat
directory.HPL-MxP-NVIDIA
in the folder /workspace/hpl-mxp-linux-x86_64
contains:
xhpl_mxp
executable.sample-slurm
directory.HPCG-NVIDIA
in the folder /workspace/hpcg-linux-x86_64
contains:
xhpcg
executable.sample-slurm
directorysample-dat
directory.STREAM-NVIDIA
in the folder /workspace/stream-gpu-linux-x86_64
stream_test
executable. GPU STREAM benchmark with double precision elements.stream_test_fp32
executable. GPU STREAM benchmark with single precision elements.The HPL-NVIDIA
benchmark uses input format as the standard Netlib HPL benchmark. Please see the Netlib HPL benchmark for getting started with the HPL software concepts and best practices.
The HPCG-NVIDIA
benchmark uses the same input format as the standard HPCG-Benchmark. Please see the HPCG-Benchmark for getting started with the HPCG software concepts and best practices.
The HPL-MxP-NVIDIA
benchmark accepts the list of parameters to describe input tasks and set additional tuning settings. The description of parameters can be found in README and TUNING files.
The HPL-NVIDIA, HPL-MxP-NVIDIA, and HPCG-NVIDIA with GPU support expect one GPU per MPI process. As such, set the number of MPI processes to match the number of available GPUs in the cluster.
The scripts hpl.sh
and hpcg.sh
can be invoked on a command line or through a slurm batch-script to launch the HPL-NVIDIA
and HPCG-NVIDIA
benchmarks, respectively. The scripts hpl.sh
and hpcg.sh
accept the following parameters:
--dat
path to HPL.dat.
Optional parameters:--gpu-affinity <string>
colon separated list of gpu indices--cpu-affinity <string>
colon separated list of cpu index ranges--mem-affinity <string>
colon separated list of memory indices--ucx-affinity <string>
colon separated list of UCX devices--ucx-tls <string>
UCX transport to use--exec-name <string>
HPL executable file--no-multinode
enable flags for no-multinode (no-network) executionIn addition, the script hpcg.sh
alternatively to input file accepts the following parameters:
--nx
specifies the local (to an MPI process) X dimensions of the problem--ny
specifies the local (to an MPI process) Y dimensions of the problem--nz
specifies the local (to an MPI process) Z dimensions of the problem--rt
specifies the number of seconds of how long the timed portion of the benchmark should run--b
activates benchmarking mode to bypass CPU reference execution when set to one (--b 1)--l2cmp
activates compression in GPU L2 cache when set to one (--l2cmp 1)The script hpl-mxp.sh
can be invoked on a command line or through a slurm batch script to launch the HPL-MxP-NVIDIA benchmark
. The script hpl-mxp.sh
requires the following parameters:
--gpu-affinity <string>
colon separated list of gpu indices--nprow <int>
number of rows in the processor grid"--npcol <int>
number of columns in the processor grid"--nporder <string>
"row" or "column" major layout of the processor grid"--n <int>
size of N-by-N matrix--nb <int>
nb is the blocking constant (panel size)"
The full list of accepted parameters can be found in README and TUNING files.Note:
HPL-MxP-NVIDIA
benchmark. Below the example for DGX A100 and DGX H100:--mem-affinity 0:0:0:0:1:1:1:1
--cpu-affinity 0-13:14-27:28-41:42-55:56-69:70-83:84-97:98-111
--mem-affinity 2:3:0:1:6:7:4:5
--cpu-affinity 32-47:48-63:0-15:16-31:96-111:112-127:64-79:80-95
The script stream-gpu-test.sh
can be invoked on a command line or through a slurm batch script to launch the STREAM-NVIDIA benchmark
. The script stream-gpu-test.sh
accepts the following optional parameters:
--d <int>
device number--n <int>
number of elements in the arrays--dt fp32
enable fp32 stream testHPL-NVIDIA
, HPCG-NVIDIA
, HPL-MxP-NVIDIA benchmark
and STREAM-NVIDIA benchmark
for GPU can be run in the same way with HPL-NVIDIA
, HPCG-NVIDIA
, HPL-MxP-NVIDIA benchmark
and STREAM-NVIDIA benchmark
from x86_64 container image (see details in x86 container image
section).
The section provide sample runs of HPL-NVIDIA
, HPL-MxP-NVIDIA
, and HPCG-NVIDIA
benchmarks for NVIDIA Grace CPU.
The scripts hpl-aarch64.sh
and hpcg-aarch64.sh
can be invoked on a command line or through a slurm batch-script to launch the HPL-NVIDIA
and HPCG-NVIDIA
benchmarks for NVIDIA Grace CPU, respectively.
The scripts hpl-aarch64.sh
and hpcg-aarch64.sh
accept the following parameters:
--dat
path to HPL.dat.
Optional parameters:--cpu-affinity <string>
colon separated list of cpu index ranges--mem-affinity <string>
colon separated list of memory indices--ucx-affinity <string>
colon separated list of UCX devices--ucx-tls <string>
UCX transport to use--exec-name <string>
HPL executable fileIn addition, the script hpcg.sh
alternatively to input file accepts the following parameters:
--nx
specifies the local (to an MPI process) X dimensions of the problem--ny
specifies the local (to an MPI process) Y dimensions of the problem--nz
specifies the local (to an MPI process) Z dimensions of the problem--rt
specifies the number of seconds of how long the timed portion of the benchmark should run--b
activates benchmarking mode to bypass CPU reference execution when set to one (--b=1)The script hpl-mxp-aarch64.sh
can be invoked on a command line or through a slurm batch script to launch the HPL-MxP-NVIDIA benchmark
for NVIDIA Grace CPU. The script hpl-mxp-aarch64.sh
requires the following parameters:
--nprow <int>
number of rows in the processor grid"--npcol <int>
number of columns in the processor grid"--nporder <string>
"row" or "column" major layout of the processor grid"--n <int>
size of N-by-N matrix--nb <int>
nb is the blocking constant (panel size)"
The full list of accepted parameters can be found in README and TUNING files.The script stream-cpu-test.sh
can be invoked on a command line or through a slurm batch script to launch the STREAM-NVIDIA benchmark
. The script stream-cpu-test.sh
accepts the following optional parameters:
--n <int>
number of elements in the arrays--t <int>
number of threadsFor a general guide on pulling and running containers, see Running A Container chapter in the NVIDIA Containers For Deep Learning Frameworks User’s Guide. For more information about using NGC, refer to the NGC Container User Guide.
The examples below use Pyxis/enroot from NVIDIA to facilitate running HPC-Benchmarks Containers. Note that an enroot .credentials
file is necessary to use these NGC containers.
To copy and customize the sample slurm scripts and/or sample HPL.dat/hpcg.dat files from the containers, run the container in interactive mode, while mounting a folder outside the container, and copy the needed files, as follows:
CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'
MOUNT="$PWD:/home_pwd"
srun -N 1 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
--pty bash
Once inside the container, copy the needed files to /home_pwd
.
HPL-NVIDIA
runSeveral sample slurm scripts and several sample input files are available in the container at /workspace/hpl-linux-x86_64
or /workspace/hpl-linux-aarch64-gpu
.
To run HPL-NVIDIA
on a single node with 4 GPUs using your custom HPL.dat file:
CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 1 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
./hpl.sh --dat /my-dat-files/HPL.dat
To run HPL-NVIDIA
on nodes 16 with 4 GPUs (or 8 nodes with 8 GPUs) using provided sample HPL-dgx-64GPUs.dat files:
CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'
srun -N 16 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
./hpl.sh --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-64GPUs.dat
CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'
srun -N 8 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
./hpl.sh --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-64GPUs.dat
HPL-MxP-NVIDIA
runSeveral sample slurm scripts and are available in the container at /workspace/hpl-mxp-linux-x86_64
or /workspace/hpl-mxp-linux-aarch64-gpu
.
To run HPL-MxP-NVIDIA on a single node with 8 GPUs:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
srun -N 1 --ntasks-per-node=8 \
--container-image="${CONT}" \
./hpl-mxp.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row --gpu-affinity 0:1:2:3:4:5:6:7
To run HPL-MxP-NVIDIA on a 4 nodes, each node with 4 GPUs:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
srun -N 4 --ntasks-per-node=4 \
--container-image="${CONT}" \
./hpl-mxp.sh --n 280000 --nb 2048 --nprow 4 --npcol 4 --nporder row --gpu-affinity 0:1:2:3
Pay special attention to CPU cores affinity/binding, as it greatly affects the performance of the HPL benchmarks.
HPCG-NVIDIA
runSeveral sample slurm scripts and sample input file are available in the container at /workspace/hpcg-linux-x86_64
or /workspace/hpcg-linux-aarch64-gpu
To run HPCG-NVIDIA
on a single node with one GPU using your custom hpcg.dat file:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
./hpcg.sh --dat /my-dat-files/hpcg.dat
To run HPCG-NVIDIA
on nodes 16 with 4 GPUs (or 8 nodes with 8 GPUs) using script parameters:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 16 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
./hpcg.sh --nx 256 --ny 256 --nz 256 --rt 2
HPL-NVIDIA
runSeveral sample input files are available in the container at /workspace/hpl-linux-aarch64
.
To run HPL-NVIDIA
on two nodes of NVIDIA Grace CPU using your custom HPL.dat file:
CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 2 --ntasks-per-node=2 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
./hpl-aarch64.sh --dat /my-dat-files/HPL.dat --cpu-affinity 0-71:72-143 --mem-affinity 0:1
where --cpu-affinity
is mapping to cores on the local node and --mem-affinity
is mapping to NUMA-nodes on the local node.
HPL-MxP-NVIDIA
runTo run HPL-MxP-NVIDIA on a single node of NVIDIA Grace Hopper x4:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
srun -N 1 --ntasks-per-node=16 \
--container-image="${CONT}" \
./hpl-mxp-aarch64.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row \
--cpu-affinity 0-71:72-143:144-215:216-287 \
--mem-affinity 0:1:2:3
where --cpu-affinity
is mapping to cores on the local node and --mem-affinity
is mapping to NUMA-nodes on the local node.
HPCG-NVIDIA
runSample input file is available in the container at /workspace/hpcg-linux-aarch64
To run HPCG-NVIDIA
on two nodes of NVIDIA Grace CPU using your custom hpcg.dat file:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 2 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
./hpcg.sh --dat /my-dat-files/hpcg.dat --cpu-affinity 0-35:36-71:72-107:108-143 --mem-affinity 0:0:1:1
The instructions below assume Singularity 3.4.1 or later.
Save the HPC-Benchmark container as a local Singularity image file:
$ singularity pull --docker-login hpc-benchmarks:23.10.sif docker://nvcr.io/nvidia/hpc-benchmarks:23.10
This command saves the container in the current directory as hpc-benchmarks:23.10.sif
.
HPL-NVIDIA
runSeveral sample slurm scripts and several sample input files are available in the container at /workspace/hpl-linux-x86_64
or /workspace/hpl-linux-aarch64-gpu
.
To run HPL-NVIDIA
on a single node with 4 GPUs using your custom HPL.dat file:
CONT='/path/to/hpc-benchmarks:23.10.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 1 --ntasks-per-node=4 singularity run --nv \
-B "${MOUNT}" "${CONT}" \
./hpl.sh --dat /my-dat-files/HPL.dat
To run HPL-NVIDIA
on nodes 16 with 4 GPUs (or 8 nodes with 8 GPUs) using provided sample HPL-dgx-64GPUs.dat files:
CONT='/path/to/hpc-benchmarks:23.10.sif'
srun -N 16 --ntasks-per-node=4 singularity run --nv \
"${CONT}" \
./hpl.sh --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-64GPUs.dat
CONT='/path/to/hpc-benchmarks:23.10.sif'
srun -N 8 --ntasks-per-node=8 singularity run --nv \
"${CONT}" \
./hpl.sh --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-64GPUs.dat
HPL-MxP-NVIDIA
runSeveral sample slurm scripts are available in the container at /workspace/hpl-mxp-linux-x86_64
or /workspace/hpl-mxp-linux-aarch64-gpu
.
To run HPL-MxP-NVIDIA on a single node with 8 GPUs:
CONT='/path/to/hpc-benchmarks:23.10.sif'
srun -N 1 --ntasks-per-node=8 singularity run --nv \
"${CONT}" \
./hpl-mxp.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row --gpu-affinity 0:1:2:3:4:5:6:7
To run HPL-MxP-NVIDIA on a 4 nodes, each node with 4 GPUs:
CONT='/path/to/hpc-benchmarks:23.10.sif'
srun -N 4 --ntasks-per-node=4 singularity run --nv \
"${CONT}" \
./hpl-mxp.sh --n 280000 --nb 2048 --nprow 4 --npcol 4 --nporder row --gpu-affinity 0:1:2:3
Pay special attention to CPU cores affinity/binding, as it greatly affects the performance of the HPL benchmarks.
HPCG-NVIDIA
runSeveral sample slurm scripts and sample input file are available in the container at /workspace/hpcg-linux-x86_64
or /workspace/hpcg-linux-aarch64-gpu
To run HPCG-NVIDIA
on a single node with one GPU using your custom hpcg.dat file:
CONT='/path/to/hpc-benchmarks:23.10.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 1 --ntasks-per-node=8 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
./hpcg.sh --dat /my-dat-files/hpcg.dat
To run HPCG-NVIDIA
on nodes 16 with 4 GPUs (or 8 nodes with 8 GPUs) using script parameters:
CONT='/path/to/hpc-benchmarks:23.10.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 16 --ntasks-per-node=4 --cpu-bind=none --mpi=pmix \
--container-image="${CONT}" \
--container-mounts="${MOUNT}" \
./hpcg.sh --nx 256 --ny 256 --nz 256 --rt 2
HPL-NVIDIA
runSeveral sample input files are available in the container at /workspace/hpl-linux-aarch64
.
To run HPL-NVIDIA
on two nodes of NVIDIA Grace CPU using your custom HPL.dat file:
CONT='/path/to/hpc-benchmarks:23.10.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 2 --ntasks-per-node=2 singularity run \
-B "${MOUNT}" "${CONT}" \
./hpl-aarch64.sh --dat /my-dat-files/HPL.dat --cpu-affinity 0-71:72-143 --mem-affinity 0:1
where --cpu-affinity
is mapping to cores on the local node and --mem-affinity
is mapping to NUMA-nodes on the local node.
HPL-MxP-NVIDIA
runTo run HPL-MxP-NVIDIA on a single node of NVIDIA Grace Hopper x4:
CONT='/path/to/hpc-benchmarks:23.10.sif'
srun -N 1 --ntasks-per-node=16 singularity run \
"${CONT}" \
./hpl-mxp-aarch64.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row \
--cpu-affinity 0-71:72-143:144-215:216-287 \
--mem-affinity 0:1:2:3
where --cpu-affinity
is mapping to cores on the local node and --mem-affinity
is mapping to NUMA-nodes on the local node.
HPCG-NVIDIA
runSample input file is available in the container at /workspace/hpcg-linux-aarch64
To run HPCG-NVIDIA
on two nodes of NVIDIA Grace CPU using your custom hpcg.dat file:
CONT='/path/to/hpc-benchmarks:23.10.sif'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
srun -N 2 --ntasks-per-node=4 singularity run \
-B "${MOUNT}" "${CONT}" \
./hpcg.sh --dat /my-dat-files/hpcg.dat --cpu-affinity 0-35:36-71:72-107:108-143 --mem-affinity 0:0:1:1
The below examples are for single node runs with Docker. It is not recommended to use Docker for multi-node runs.
Download the HPL-Benchmark container as a local Docker image file:
$ docker pull nvcr.io/nvidia/hpc-benchmarks:23.10
NOTE: you may want to add --privileged
flag for your docker command to avoid a “set_mempolicy” error.
HPL-NVIDIA
runTo run HPL-NVIDIA
on a single node with 4 GPUs using your custom HPL.dat file:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/full-path/to/your/custom/dat-files:/my-dat-files"
docker run --gpus all --shm-size=1g -v ${MOUNT} \
${CONT} \
mpirun --bind-to none -np 4 \
./hpl.sh --dat /my-dat-files/HPL.dat
HPL-MxP-NVIDIA
runTo run HPL-MxP-NVIDIA on a single node with 8 GPUs:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
docker run --gpus all --shm-size=1g \
${CONT} \
mpirun --bind-to none -np 8 \
./hpl-mxp.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row --gpu-affinity 0:1:2:3:4:5:6:7
HPCG-NVIDIA
runTo run HPCG-NVIDIA
on a single node with one GPU using your custom hpcg.dat file:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/full-path/to/your/custom/dat-files:/my-dat-files"
docker run --gpus all -v --shm-size=1g ${MOUNT} \
${CONT} \
mpirun --bind-to none -np 8 \
./hpcg.sh --dat /my-dat-files/hpcg.dat
HPL-NVIDIA
runSeveral sample docker run scripts are available in the container at /workspace/hpl-linux-aarch64
.
To run HPL-NVIDIA
on a single NVIDIA Grace CPU mode using your custom HPL.dat file:
CONT='nvcr.io#nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
docker run -v ${MOUNT} \
"${CONT}" \
mpirun --bind-to none -np 2 \
./hpl-aarch64.sh --dat /my-dat-files/HPL.dat --cpu-affinity 0-71:72-143 --mem-affinity 0:1
where --cpu-affinity
is mapping to cores on the local node and --mem-affinity
is mapping to NUMA-nodes on the local node.
HPL-MxP-NVIDIA
runSeveral sample docker run scripts are available in the container at /workspace/hpl-mxp-linux-aarch64
.
To run HPL-MxP-NVIDIA on a single node of NVIDIA Grace Hopper x4:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
docker run \
"${CONT}" \
mpirun --bind-to none -np 4 \
./hpl-mxp-aarch64.sh --n 380000 --nb 2048 --nprow 2 --npcol 4 --nporder row \
--cpu-affinity 0-71:72-143:144-215:216-287 --mem-affinity 0:1:2:3
where --cpu-affinity
is mapping to cores on the local node and --mem-affinity
is mapping to NUMA-nodes on the local node.
HPCG-NVIDIA
runSeveral sample docker run scripts are available in the container at /workspace/hpcg-linux-aarch64
.
To run HPCG-NVIDIA
on a single node of NVIDIA Grace CPU using your custom hpcg.dat file:
CONT='nvcr.io/nvidia/hpc-benchmarks:23.10'
MOUNT="/path/to/your/custom/dat-files:/my-dat-files"
docker run -v ${MOUNT} \
"${CONT}" \
mpirun --bind-to none -np 4 \
./hpcg.sh --dat /my-dat-files/hpcg.dat --cpu-affinity 0-35:36-71:72-107:108-143 --mem-affinity 0:0:1:1