The NGC Pre-flight Check container is a light-weight tool that verifies that the container runtime is setup correctly for GPUs and InfiniBand. You can run this container prior to running your HPC or Deep Learning model on your system. The output message can be used as a guide to troubleshoot issues, prior to running containers from the NGC catalog.
$ docker run --rm -it --gpus all -v /dev/infiniband --cap-add IPC_LOCK nvcr.io/hpc/preflightcheck:20.11
INFO: The NVIDIA Driver was detected.
INFO: NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.51.06 Sun Jul 19 20:02:54 UTC 2020
INFO: Found CUDA driver library: /usr/lib64/libcuda.so.1
INFO: Latest CUDA supported version: 11000
INFO: Number of GPUs detected: 8
INFO: Detected Mellanox OFED version 4.6-1.0.1
INFO: Detected nv_peer_mem version 1.0-7
INFO: Number of InfiniBand devices detected: 4
The InfiniBand devices are not mounted in the container (-v /dev/infiniband
):
$ docker run --rm -it --gpus all nvcr.io/hpc/preflightcheck:20.11
INFO: The NVIDIA Driver was detected.
INFO: NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.51.06 Sun Jul 19 20:02:54 UTC 2020
INFO: Found CUDA driver library: /usr/lib64/libcuda.so.1
INFO: Latest CUDA supported version: 11000
INFO: Number of GPUs detected: 8
WARNING: No InfiniBand devices detected
Disable GPU support (-e NVIDIA_VISIBLE_DEVICES=""
):
$ docker run --rm -it -e NVIDIA_VISIBLE_DEVICES="" nvcr.io/hpc/preflightcheck:20.11
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
INFO: Use 'docker run --gpus all' to start this container; see
INFO: https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(Native-GPU-Support)
WARNING: No InfiniBand devices detected
$ singularity run --nv docker://nvcr.io/hpc/preflightcheck:20.11
INFO: The NVIDIA Driver was detected.
INFO: NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.51.06 Sun Jul 19 20:02:54 UTC 2020
INFO: Found CUDA driver library: /.singularity.d/libs/libcuda.so.1
INFO: Latest CUDA supported version: 11000
INFO: Number of GPUs detected: 8
INFO: Detected Mellanox OFED version 4.6-1.0.1
INFO: Detected nv_peer_mem version 1.0-7
INFO: Number of InfiniBand devices detected: 4
The --nv
Singularity option is omitted:
$ singularity run docker://nvcr.io/hpc/preflightcheck:20.11
INFO: The NVIDIA Driver was detected.
INFO: NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.51.06 Sun Jul 19 20:02:54 UTC 2020
WARNING: Unable to find CUDA driver library
WARNING: Unable to detect the latst CUDA version supported by the driver
WARNING: Unable to get list of GPUs
INFO: Detected Mellanox OFED version 4.6-1.0.1
INFO: Detected nv_peer_mem version 1.0-7
INFO: Number of InfiniBand devices detected: 4
The --contain
Singularity option is used to isolate the container and
omit --nv
:
$ singularity run --contain docker://nvcr.io/hpc/preflightcheck:20.11
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
INFO: Use 'singularity run --nv' to start this container; see
INFO: https://sylabs.io/guides/3.5/user-guide/gpu.html
WARNING: No InfiniBand devices detected