The NVIDIA DevTools Sidecar Injector enables your containerized applications to be profiled by NVIDIA DevTools applications (currently, only using Nsight Systems). This solution leverages a Kubernetes dynamic admission controller to inject an init container, volumes with the NVIDIA DevTools application and its configurations, environment variables, and a security context upon the creation or update of your Pod.
admissionregistration.k8s.io/v1
API enabled. Verify that by running
the following command:kubectl api-versions | grep admissionregistration.k8s.io/v1
The result should be:
admissionregistration.k8s.io/v1
Note: Additionally, the
MutatingAdmissionWebhook
andValidatingAdmissionWebhook
admission controllers should be added and listed in the correct order in the admission-control flag of kube-apiserver. Please refer to the Kubernetes documentation. It is likely that this is set by default if your cluster is running on EKS, AKS, OKE or GKE.
Install the NVIDIA Devtools Sidecar Injector (in this example configuration values were save in custom_values.yaml
):
helm install -f custom_values.yaml \
devtools-sidecar-injector https://helm.ngc.nvidia.com/nvidia/devtools/charts/devtools-sidecar-injector-1.0.0.tgz
The NVIDIA DevTools Sidecar can be customized to suit particular needs. Most likely, you will need to configure the profile.devtoolArgs, profile.injectionMatch, profile.volumes, and profile.volumeMounts values. A values file can be used for setting these parameters.
Sample custom_values.yaml
. This configuration will enable profiling for any instance of yourawesomeapp
found in
injection Pods.
# Nsight Systems profiling configuration
profile:
# The arguments for the Nsight Systems. The placeholders will be replaced with the actual values.
devtoolArgs: "profile -f true --start-later true --trace nvtx,cuda -o /home/auto_{PROCESS_NAME}_%{POD_FULLNAME}_%{CONTAINER_NAME}_{TIMESTAMP}_{UID}.nsys-rep"
# The regex to match applications to profile.
injectionMatch: "^(?!.*nsys( |$)).*\\byourawesomeapp.*$"
Sample custom_values_launch.yaml
. This configuration will inject Nsight Systems for later profiling for any
instance of yourawesomeapp
found in injection Pods. nsys_k8s.py
can be used further to start/stop collection.
# Nsight Systems profiling configuration
profile:
# The arguments for the Nsight Systems. The placeholders will be replaced with the actual values.
devtoolArgs: "launch --trace nvtx,cuda"
# The regex to match applications to profile.
injectionMatch: "^(?!.*nsys( |$)).*\\byourawesomeapp.*$"
Sample custom_values_extended.yaml
:
# Nsight Systems profiling configuration
profile:
# A volume to store profiling results. It can be omitted, but in this case, the results will be lost after the pod
# deletion and they will not be in the common location.
# You may skip this section if you already have a shared volume for all the profiling pods.
volumes:
[
{
"name": "nsys-output-volume",
"persistentVolumeClaim": { "claimName": "CSP-managed-disk" },
},
]
volumeMounts:
[{ "name": "nsys-output-volume", "mountPath": "/mnt/nsys/output" }]
# The arguments for the Nsight Systems. The placeholders will be replaced with the actual values.
devtoolArgs: "profile -f true --start-later false --duration 20 --kill none --backtrace dwarf --trace nvtx,cuda -o /mnt/nsys/output/auto_{PROCESS_NAME}_%{POD_FULLNAME}_%{CONTAINER_NAME}_{TIMESTAMP}_{UID}.nsys-rep"
# The regex to match applications to profile.
injectionMatch: "^(?!.*nsys( |$)).*\\byourawesomeapp.*$"
Variable | Description | Default value |
---|---|---|
profile.devtoolArgs | The parameters for Nsight Systems used during profiling are detailed in the Nsight Systems User Guide. A comprehensive list of available parameters is provided there. Placeholders within these parameters will be substituted with their actual values during execution. It is recommended to include {TIMESTAMP} and {UID} placeholders in the output file name to keep filenames unique. Otherwise, the report may be overwritten or not generated at all. Example: profile -f true --trace nvtx,cuda -o /mnt/nsys/output/auto_{PROCESS_NAME}_%{POD_FULLNAME}_%{CONTAINER_NAME}_{TIMESTAMP}_{UID}.nsys-rep |
|
profile.injectionMatch | The regex used to match the application that is to be profiled. | ^(?!/bin/)(?!/sbin/)(?!/usr/bin/)(?!/usr/sbin/)(?!.*nsys( | $))(?!.*cat( | $)).*$ |
profile.volumes | Additional volumes that will be injected into profiled containers. May be useful for storing profiling results. | |
profile.volumeMounts | Volume mounts that will be injected into profiled containers. theyay be useful for storing profiling results. | |
sidecarImage.image | NVIDIA DevTools Sidecar image URL can be specified in case of custom registry usage (if the NVIDIA registry is not available). In the case of a private registry, the imagePullSecrets should also be specified. | The default Sidecar nvcr.io URL |
devtoolBinariesImage.image | NVIDIA DevTools Binaries image URL can be specified in the case of custom registry usage (if the NVIDIA registry is not available). In the case of a private registry, the imagePullSecrets should also be specified. | The default Nsight Systems nvcr.io URL |
imagePullSecrets | List of references to secrets within the same namespace for pulling Sidecar and DevTools binaries images. These secrets must be available in all namespaces containing pods that require profiling, as well as in the "nvidia-devtools-sidecar-injector" namespace. | None |
privileged | Enables profiled containers to be run in privileged mode (can be used to collect GPU metrics). | None |
capabilities | Enables profiled containers to be run with specific capabilities (fox isntance SYS_ADMIN can be used to collect GPU metrics) | None |
Placeholder | Replacement |
---|---|
{UID} |
The random alphanumeric string (8 symbols) |
{PROCESS_NAME} |
The profiled process name. |
{PROCESS_ID} |
The profiled process id |
{TIMESTAMP} |
The UNIX timestamp (in ms) |
%{ANY ENVIRONMENT VARIABLE} |
The ANY ENVIRONMENT VARIABLE environment variable inside a container. POD_FULLNAME and CONTAINER_NAME environment variables are set by the NVIDIA DevTools Sidecar injection |
To enable automatic Sidecar injection for all Pods in a namespace, add the nvidia-devtools-sidecar-injector=enabled
label to the namespace.
kubectl label namespaces <namespace name> nvidia-devtools-sidecar-injector=enabled
To enable automatic Sidecar injection for a specific resource in a namespace, add the
nvidia-devtools-sidecar-injector=enabled
label to the resource.
kubectl label <resource_tyoe> <pod-name> nvidia-devtools-sidecar-injector=enabled
At this point, any new pod will be considered for injection based on labels and injectionMatch
An already started pod cannot be injected. Instead you must restart the pod, to support profiling.
By the same token if you remove the label or set the Pod label to disabled
, you will need to restart them to remove
the Sidecar injection.
####### Resource with more than one replica
kubectl rollout restart <resource type>/<resource name>
For example:
kubectl rollout restart deployment/amazing_service
kubectl scale <resource type>/<resource name> --replicas=0
kubectl scale <resource type>/<resource name> --replicas=1
For example:
kubectl scale deployment/amazing_service --replicas=0
kubectl scale deployment/amazing_service --replicas=1
Profiling can be controlled using the nsys_k8s.py
script. The script can be found in
NVIDIA DevTools Sidecar Injector Resources.
This script facilitates the execution of
Nsight Systems commands within profiled containers of
Kubernetes pods. Additionally, it provides a convinient method for downloading profiling result.
nsys_k8s
searches for Pods that are labeled for profiling and looks for active Nsight Systems sessions launched by
the Sidecar in them.
The script supports Pods filtering using
field selectors
pip install -r requirements.txt
)The script supports executing Nsight Systems commands within containers of Kubernetes pods, with optional filters for targeting specific namespaces, containers, and pods. Nsight Systems commands are executed only on pods that have active Nsight Systems sessions. The general command structure is as follows:
./nsys_k8s.py [--field-selector SELECTOR] nsys [nsys_arguments...]
Argument | Description |
---|---|
--field-selector |
(Optional) Filter Kubernetes objects to identify those on which an Nsight Systems command will be executed, based on the value(s) of one or more resource fields. Field selectors. |
nsys_arguments... |
Specify the Nsight Systems command and arguments you wish to execute. For example, start --sampling-frequency=5000 . For commands which supports the --output argument, in case this argument is not present, the --output arguments will be generated based on profile.devtoolArgs Helm option value |
Do not specify the session name in nsys_arguments
- it will be obtained atomatically.
download
commandThe script supports the download
command to provide a convinient way for downloading profiling results from profiled Pods.
./nsys_k8s.py [--field-selector SELECTOR] download [destination]
Argument | Description |
---|---|
--field-selector |
(Optional) Filter Kubernetes objects to identify those on which an Nsight Systems command will be executed, based on the value(s) of one or more resource fields. Field selectors. |
destination |
The path for the directory into which the profiling results will be downloaded. |
--remove-source |
(Optional) Delete source files from Pods after downloading them. |
check
commandThe script supports the check
command to provide a convinient way to check if a NVIDIA DevTools Sidecar Injector
is injected into a specific Pod.
./nsys_k8s.py check [-n namespace] [pod]
Argument | Description |
---|---|
-n |
(Optional) The namespace of the Pod to check. |
pod |
The name of the Pod to check. |
Sidecar Injector configurations can be modified after the installation. Please note, however, that the configuration
of already injected Pods will not be updated until they are restarted and ConfigMaps are not deleted from their namespaces (kubectl delete cm -n <namespace name> nvidia-devtools-sidecar-injector-custom
).
helm upgrade -f custom_values.yaml \
devtools-sidecar-injector https://helm.ngc.nvidia.com/nvidia/devtools/charts/devtools-sidecar-injector-1.0.0.tgz
Sidecar Injector configurations can be customized for an individual namespace/pods. For doing that a ConfigMap with
name nvidia-devtools-sidecar-injector-custom
can be used.
Sample separate_configmap.yaml
:
apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-devtools-sidecar-injector-custom
labels:
app: nvidia-devtools-sidecar-injector
data:
injectionconfig.yaml: |
{
"devtoolArgs": "profile -f true --trace cuda -o /mnt/nsys/output/auto_{PROCESS_NAME}_%{POD_FULLNAME}_%{CONTAINER_NAME}_{TIMESTAMP}_{UID}.nsys-rep",
"injectionMatch": "^(?!.*nsys( |$)).*\byourotherawesomeapp.*$"
}
GPU Metrics Samples can only be collected by one process per GPU. The most straightforward way to avoid collisions is to collect GPU metrics from a single custom DaemonSet per node. The following resources configuration can be used to achive that:
kubectl apply -f ./gpu_metrics_resources.yaml
gpu_metrics_daemonset.yaml
:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: gpu-metrics-collector
namespace: example-gpu-metrics-ns
labels:
nvidia-devtools-sidecar-injector: enabled
spec:
template:
spec:
containers:
- name: gpu-metrics-ubuntu-container
image: ubuntu:22.04
command: ["sleep", "infinity"]
securityContext:
privileged: true
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
---
apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-devtools-sidecar-injector-custom
namespace: example-gpu-metrics-ns
labels:
app: nvidia-devtools-sidecar-injector
data:
injectionconfig.yaml: |
{
"devtoolArgs": "profile -f true --start-later false --gpu-metrics-device=all -s system-wide -o /mnt/nsys/output/auto_gpu_metrics_%{POD_FULLNAME}_{TIMESTAMP}_{UID}.nsys-rep",
"injectionMatch": "^sleep infinity$"
}
The ConfigMap customizes profiling parameters (which ensure the GPU Metrics are collected) for the DaemonSet. Started
by this DaemonSet Pod will be controllable by the nsys_k8s.py
script.
Perform the following steps to uninstall the NVIDIA Devtools Sidecar Injector:
helm uninstall devtools-sidecar-injector
This will not automatically delete some resources, so they should be deleted manually.
Replace <namespace name>
with the namespace where profiled Pods are running:
kubectl delete mutatingwebhookconfiguration nvidia-devtools-sidecar-injector-webhook
kubectl delete cm -n <namespace name> nvidia-devtools-sidecar-injector
kubectl delete cm -n <namespace name> nvidia-devtools-sidecar-injector-custom
Additionally, you can delete labels from all labeled with nvidia-devtools-sidecar-injector=enabled
resources:
kubectl get all --all-namespaces -l nvidia-devtools-sidecar-injector=enabled -o custom-columns=:.metadata.name,NS:.metadata.namespace,KIND:.kind --no-headers | while read name namespace kind; do kubectl label $kind $name -n $namespace nvidia-devtools-sidecar-injector-; done
Sometimes you may find that pod is injected with sidecar container as expected, check the following items:
nvidia-devtools-sidecar-injector
in the nvidia-devtools-sidecar-injector
namespace Pod is in running state
and no error logs have been produced../nsys_k8s.py check [-n namespace] [pod]
--gpu-metrics-device
option. In that case, you can use a report from that
injection or modify the configurations to ensure only one Pod is running with the GPU metrics option.nvidia-dcgm-exporter
(documentation)
DaemonSet which collects GPU metrics. If you are not using it, you can temporary disable it:kubectl -n gpu-operator patch daemonset nvidia-dcgm-exporter -p '{"spec": {"template": {"spec": {"nodeSelector": {"non-existing": "true"}}}}}'
To enable it back, you can call the command:
kubectl -n gpu-operator patch daemonset nvidia-dcgm-exporter --type json -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector/non-existing"}]'