NGC | Catalog

NVSM

For copy image paths and more information, please view on a desktop device.
Logo for NVSM

Description

NVIDIA System Management (NVSM) is a software framework for monitoring NVIDIA DGX nodes

Publisher

NVIDIA

Latest Tag

v1.0.1-21.07.16-ubi8

Modified

November 1, 2023

Compressed Size

648.6 MB

Multinode Support

No

Multi-Arch Support

No

v1.0.1-21.07.16-ubi8 (Latest) Security Scan Results

Linux / amd64

NVIDIA System Management (NVSM)

NVIDIA System Management (NVSM) is a software framework for monitoring NVIDIA DGX nodes in a data center.

  • For DGX Servers, it includes active health monitoring, system alerts, and log generation.

  • For DGX Station, is it limited to using the CLI to check the health of the system and obtain diagnostic information.

The v1.0.0-21.07.x release is the first release of the NVSM containers. This first release supports only three operations namely; show health, dump health and show versions.

Usage

nvsm show health

The "show health" command can be used to quickly assess overall system health.

To run show health, nvsm needs to be initialized from inside the pod and once on the prompt 'show health' command can be directly run.

nvsm> show health

This command will print a summary of the system state.

nvsm dump health

The "dump health" command produces a health report file suitable for attaching to support tickets.

To run dump health, nvsm needs to be initialized from inside the pod and once on the prompt 'dump health' command can be directly run.

nvsm> dump health

This command will create a .tar.xz file which can be copied out from the pod and then analyzed/attached with tickets.

nvsm show versions

The "show versions" command can be used to get information of the versions of the packages and firmware installed on the system.

To run show versions, nvsm needs to be initialized from inside the pod and once on the prompt 'show versions' command can be directly run.

nvsm> show versions

This command then prints versions of software/hardware components on the system.

Documentation

Complete NVSM Documentation is available here.

License

License here.