dpfctl
The NVIDIA® DOCA™ Platform Framework enables the building of cloud platforms on top of NVIDIA BlueField® data processing units (DPUs), leveraging industry-standard APIs. With DPF operators can provisioning and manage DPUs and the services running on them. For information on platform support and getting started, visit the official repository.
dpfctl
is a command-line tool designed to visualize, debug, and troubleshoot DPU resources in Kubernetes.
It simplifies debugging by extracting and presenting resource conditions in a structured, human-readable format.
There are 2 ways to run dpfctl
:
dpfctl
locally on your machine.dpfctl
that is
compatible with the DPF Operator.To download the latest version of dpfctl
, you can find the latest version on NGC doca/dpfctl.
There are 3 different versions available for different architectures:
dpfctl-linux-amd64
dpfctl-linux-arm64
dpfctl-darwin-arm64
You can also download it directly from the command line (in this example we are using dpfctl-linux-amd64
with version v25.4.0
):
curl -L -o /usr/local/bin/dpfctl https://api.ngc.nvidia.com/v2/resources/nvidia/doca/dpfctl/versions/v25.4.0/files/dpfctl-linux-amd64
chmod +x /usr/local/bin/dpfctl
To execute dpfctl
from the running container:
kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl describe all
For convenience, you can create a shell alias to simplify your commands:
alias dpfctl="kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl "
If you want to use watch
according to man watch
:
If the last character of the alias value is a blank, then the next command word following the alias is also checked for alias expansion.
To use watch
with custom arguments (like -n1
for interval), add both aliases to your shell's configuration file (e.g., .bashrc
, .zshrc
, or .profile
):
# Add to your shell's configuration file
alias dpfctl="kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl "
alias watch="watch -c "
dpfctl describe [command] [flags]
Available Commands:
Command | Description |
---|---|
all | Describe all DPF resources |
dpuclusters | Describe DPF DPUClusters |
dpudeployments | Describe DPF DPUDeployments |
dpuservices | Describe DPF DPUServices |
dpusets | Describe DPF DPUSets and DPU related resources |
[!NOTE]
Available flags can be found withdpfctl describe --help
.
By default, dpfctl describe
provides an overview of key DPU-related resources and their conditions.
Example output of dpfctl describe all
:
> dpfctl describe all
NAME NAMESPACE READY REASON SINCE MESSAGE
DPFOperatorConfig/dpfoperatorconfig dpf-operator-system True Success 27h
├─DPUClusters
│ └─DPUCluster/dpu-cplane-tenant1 dpu-cplane-tenant1 True HealthCheckPassed 29h
├─DPUServiceChains
│ └─2 DPUServiceChains... dpf-operator-system True Success 27h See hbn-to-fabric, ovn-to-hbn
├─DPUServiceCredentialRequests
│ └─DPUServiceCredentialRequest/ovn-dpu dpf-operator-system True Success 27h
├─DPUServiceIPAMs
│ └─2 DPUServiceIPAMs... dpf-operator-system True Success 27h See loopback, pool1
├─DPUServiceInterfaces
│ └─6 DPUServiceInterfaces... dpf-operator-system True Success 27h See app-sf, ovn, p0, p0-sf, p1, p1-sf
├─DPUServices
│ └─12 DPUServices... dpf-operator-system True Success 27h See doca-blueman-service, doca-hbn, doca-telemetry-service, flannel, multus, nvidia-k8s-ipam,
│ ovn-dpu, ovs-cni, ovs-helper, servicechainset-controller, sfc-controller, sriov-device-plugin
└─DPUSets
└─DPUSet/dpuset dpf-operator-system
├─DPU/worker1-0000-08-00 dpf-operator-system True DPUNodeReady 27h
└─DPU/worker2-0000-08-00 dpf-operator-system True DPUNodeReady 27h
By default, dpfctl
uses the kubeconfig file at ~/.kube/config
. To use a different kubeconfig file, specify it with
--kubeconfig
, which is part of the global flags:
dpfctl describe all --kubeconfig /path/to/kubeconfig
Alternatively, set the KUBECONFIG
environment variable:
export KUBECONFIG=/path/to/kubeconfig
dpfctl describe all
or
KUBECONFIG=/path/to/kubeconfig dpfctl describe all
We can customize the output by using flags.
For example dpfctl describe all --show-resources=dpuservice
will show only DPUService resources.
> dpfctl describe all --show-resources dpuservice
NAME NAMESPACE READY REASON SINCE MESSAGE
DPFOperatorConfig/dpfoperatorconfig dpf-operator-system True Success 27h
└─DPUServices
└─12 DPUServices... dpf-operator-system True Success 27h See doca-blueman-service, doca-hbn, doca-telemetry-service, flannel, multus, nvidia-k8s-ipam,
ovn-dpu, ovs-cni, ovs-helper, servicechainset-controller, sfc-controller, sriov-device-plugin
If you want to show multiple different resources you can add a comma-separated list of resources to the
--show-resources
flag.
> dpfctl describe all --show-resources dpuservice,dpuset
NAME NAMESPACE READY REASON SINCE MESSAGE
DPFOperatorConfig/dpfoperatorconfig dpf-operator-system True Success 27h
├─DPUServices
│ └─12 DPUServices... dpf-operator-system True Success 27h See doca-blueman-service, doca-hbn, doca-telemetry-service, flannel, multus, nvidia-k8s-ipam,
│ ovn-dpu, ovs-cni, ovs-helper, servicechainset-controller, sfc-controller, sriov-device-plugin
└─DPUSets
└─DPUSet/dpuset dpf-operator-system
├─DPU/worker1-0000-08-00 dpf-operator-system True DPUNodeReady 27h
└─DPU/worker2-0000-08-00 dpf-operator-system True DPUNodeReady 27h
To expand child objects:
> dpfctl describe all --show-resources dpuservice --expand-resources dpuservice
NAME NAMESPACE READY REASON SINCE MESSAGE
[...]
├─DPUServices
│ ├─DPUService/doca-blueman-service dpf-operator-system True Success 27h
│ │ └─Application/dpu-cplane-tenant1-doca-blueman-service dpf-operator-system True Success 108s
│ ├─DPUService/doca-hbn dpf-operator-system True Success 27h
│ │ └─Application/dpu-cplane-tenant1-doca-hbn dpf-operator-system True Success 108s
[!NOTE]
The flag--expand-resources
is currently supported only forDPUServices
. Further support for other resources will be added in future releases.
Grouping combines resources of the same kind. To disable grouping:
> dpfctl describe all --show-resources dpuservice --grouping=false
NAME NAMESPACE READY REASON SINCE MESSAGE
DPFOperatorConfig/dpfoperatorconfig dpf-operator-system True Success 27h
└─DPUServices
├─DPUService/doca-blueman-service dpf-operator-system True Success 27h
├─DPUService/doca-hbn dpf-operator-system True Success 27h
├─DPUService/doca-telemetry-service dpf-operator-system True Success 27h
├─DPUService/flannel dpf-operator-system True Success 27h
├─DPUService/multus dpf-operator-system True Success 27h
├─DPUService/nvidia-k8s-ipam dpf-operator-system True Success 27h
├─DPUService/ovn-dpu dpf-operator-system True Success 27h
├─DPUService/ovs-cni dpf-operator-system True Success 27h
├─DPUService/ovs-helper dpf-operator-system True Success 27h
├─DPUService/servicechainset-controller dpf-operator-system True Success 27h
├─DPUService/sfc-controller dpf-operator-system True Success 27h
└─DPUService/sriov-device-plugin dpf-operator-system True Success 27h
To show resource conditions:
dpfctl describe all --show-conditions dpuservice
Example output:
NAME NAMESPACE READY REASON SINCE MESSAGE
├─DPUServices
│ ├─DPUService/doca-blueman-service dpf-operator-system True Success 27h
│ │ ├─ApplicationPrereqsReconciled True Success 28h
│ │ ├─ApplicationsReady True Success 27h
│ │ ├─ApplicationsReconciled True Success 28h
│ │ └─DPUServiceInterfaceReconciled True Success 28h
Use all
or failed
to show all conditions or only failed conditions, respectively.
The NVIDIA® DOCA™ Platform Framework is licensed under Apache 2.0.