NGC Catalog
CLASSIC
Welcome Guest
Helm Charts
Skyhook Operator

Skyhook Operator

For versions and more information, please view on a desktop device.
Logo for Skyhook Operator
Description
Deploy Skyhook Operator -A Kubernetes Operator to manage Node OS customizations.
Publisher
NVIDIA
Latest Version
v0.7.6
Compressed Size
72.15 KB
Modified
April 11, 2025

skyhook

Skyhook is a Kubernetes-aware package manager for cluster administrators to safely modify and maintain underlying host declaratively at scale.

Why Skyhook?

Managing and updating Kubernetes clusters is challenging. While Kubernetes advocates treating compute as disposable, but certain scenarios make this difficult:

  • Updating hosts without re-imaging:
    • Limited excess hardware/capacity for rolling replacements
    • Long node replacement times (example can be hours in some cloud providers)
  • OS image management:
    • Maintain a common base image with workload-specific overlays instead of multiple OS images
  • Workload sensitivity:
    • Some workloads can't be moved, are difficult to move, or take a long time to migrate

What is Skyhook?

Skyhook functions like a package manager but for your entire Kubernetes cluster, with three main components:

  1. Skyhook Operator - Manages installing, updating, and removing packages
  2. Skyhook Custom Resource (SCR) - Declarative definitions of changes to apply
  3. Packages - The actual modifications you want to implement

Where and When to use Skyhook

Skyhook works in any Kubernetes environment (self-managed, on-prem, cloud) and shines when you need:

  • Kubernetes-aware scheduling that protects important workloads
  • Rolling or simultaneous updates across your cluster
  • Declarative configuration management for host-level changes

Benefits

  • Native Kubernetes integration - Packages are standard Kubernetes resources compatible with GitOps tools like ArgoCD, Helm, and Flux
  • Autoscaling support - Ensure newly created nodes are properly configured before schedulable
  • First-class upgrades - Deploys changes with minimal disruption, waiting for running workloads to complete when needed

Key Features

  • Interruption Budget: percent of nodes or count
  • Node Selectors: selectors for which nodes to apply too (node labels)
  • Pod Non Interrupt Labels: labels for pods to never interrupt
  • Package Interrupt: service (containerd, cron, any thing systemd), or reboot
  • Additional Tolerations: are tolerations added to the packages
  • Runtime Required: requires node to come into the cluster with a taint, and will do work prior to removing custom taint.

Pre-built Packages

There are a few pre-built generalist packages available at NVIDIA/skyhook-packages