NGC | Catalog
CatalogContainersDOCA Telemetry Service

DOCA Telemetry Service

Logo for DOCA Telemetry Service
Description
DOCA Telemetry Service (DTS) is a DOCA Service for collecting and exporting telemetry data. Both predefined and user-defined telemetry counters are available and the export is made by Prometheus or Fluent Bit.
Publisher
NVIDIA
Latest Tag
1.16.5-doca2.6.0-host
Modified
April 1, 2024
Compressed Size
472.69 MB
Multinode Support
No
Multi-Arch Support
No
1.16.5-doca2.6.0-host (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

Introduction

DOCA Telemetry Service (DTS) runs inside of its own Kubernetes pod on the BlueField and collects data from built-in providers and external telemetry applications. The following providers are available (sysfs is enabled by default):

  • sysfs
  • ethtool
  • ifconfig
  • tc (Traffic Control)

Aggregation providers to collect data from other applications via TCP:

  • fluent_aggr
  • prometheus_aggr

Additional telemetry applications written on top of DOCA API can send the service data over IPC (shared memory).

The providers that require additional permissions, which cannot be given by DTS, can be executed via Doca Privileged Executer (DPE) server. Data collected in DPE server will be consumed by DTS as standard providers data. For details please refer to documentation.

Collected data is written/exported according to the configuration. The following oprions are available:

  • Write data as binary files to storage (disabled by default).
  • Fluent Bit - Export data through Fluent Bit forwarding.
  • Prometheus - Creates Prometheus endpoint and keeps the most recent data to be scraped by Prometheus.

Telemetry agent can export the data via Prometheus (pull) or use Fluent bit (push). The Prometheus endpoint is bound to port 9100 and can be enabled using the config file. Data collected by several Telemetry Agents from their BlueFields can be aggregated by a Telemetry Agent that runs on a separate host. This setup requires InfiniBand port configuration.

Note: starting from DTS version 1.9.0-doca1.3.0, shared memory is mounted under "/dev/shm/telemetry/". Telemetry applications should be mounted to the same folder.

Note: DTS version 1.9.0-doca1.3.0 is not backward compatible with previous DOCA API versions. Note: DTS version 1.11.0-doca1.5.0 is not backward compatible with previous DOCA API versions.

Installation and Getting Started

All preparation steps are listed under DOCA's Container Deployment User Guide.

Note: The DOCA Service container is configured for K8S-based deployment, hence the use of the docker pull command is discouraged.

Preparation steps for the DOCA Service

None needed.

Adjusting the .yaml configuration

The .yaml configuration for our container is doca_telemetry.yaml:

wget https://api.ngc.nvidia.com/v2/resources/nvidia/doca/doca_container_configs/versions/2.6.0v2/files/configs/2.6.0/doca_telemetry.yaml

The yaml file allows generating DTS configuration from scratch, and overwriting Fluent-Bit export configuration.

Note: The file is also stored with the rest of the .yaml configurations as were pulled from NGC in the previous steps (See "Installation and Getting Started").

Enable fluent bit forwarding

In case Fluent Bit forwarding should be enabled, please add the destination host and port to the "command" found in the initContainers section:

command: ["/bin/bash/", "-c", "/usr/bin/telemetry-init.sh && /usr/bin/enable-forward-to-morpheus.sh -i=127.0.0.1 -p=24224"]

Please note that the host and port shown above are just an example.

Spawning the container

Simply copy the updated doca_telemetry.yaml file to the /etc/kubelet.d directory. Kubelet will automatically pull the container image from NGC, and spawn a pod executing the container. The DOCA Telemetry Service application will start executing right away.

# View currently active pods, and their IDs (it might take up to 20 seconds for the pod to start)
crictl pods

# View currently active containers, and their IDs
crictl ps

# Examine logs of a given container
crictl logs 

# Examine kubelet logs, in case something didn't work as expected
journalctl -u kubelet

Please refer to the documentation for more information.

Host deployment:

Note: Host deployment is available starting from DTS version 1.13.0-doca2.0.2

x86_64 host version runs as docker container. Please refer to section Host Deployment of documentation for configuration and run commands.

Documentation

The DOCA Telemetry Service guide is available here.

License & EULA

DOCA is licensed under the NVIDIA DOCA License. By pulling and using the container, you accept the terms and conditions of this license.

Technical Support

Use the NVIDIA Developers forum for questions regarding this Software.