Skip to content

Architecture

k8s-sustain is split into three independent components that run as separate processes (different container args in the same image):

┌─────────────────────────────────────────────────────────────────┐
│                        Kubernetes Cluster                       │
│                                                                 │
│  ┌──────────────────┐        ┌──────────────────────────────┐   │
│  │  k8s-sustain     │        │  k8s-sustain-webhook         │   │
│  │  (controller)    │        │  (admission server)          │   │
│  │                  │        │                              │   │
│  │  Watches Policy  │        │  Intercepts Pod CREATE        │   │
│  │  objects and     │        │  requests, injects           │   │
│  │  reconciles      │        │  resources from OnCreate     │   │
│  │  Ongoing-mode    │        │  policies                    │   │
│  │  workloads       │        │                              │   │
│  └────────┬─────────┘        └──────────────┬───────────────┘   │
│           │                                 │                   │
│           │ list / patch                    │ Get Policy        │
│           │                                 │ Get Job/RS        │
│           ▼                                 ▼                   │
│  ┌────────────────────────────────────────────────────────┐     │
│  │                   Kubernetes API Server                │     │
│  └─────────────────────────┬──────────────────────────────┘     │
│                            │                                    │
│           ┌────────────────┼────────────────┐                   │
│           ▼                ▼                ▼                   │
│    Deployments      StatefulSets        CronJobs                │
│    DaemonSets                                                   │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                        Prometheus                        │   │
│  │  k8s_sustain:container_cpu_usage_by_workload:rate5m      │   │
│  │  k8s_sustain:container_memory_by_workload:bytes          │   │
│  └────────────────────────────┬─────────────────────────────┘   │
│                               │                                 │
│  ┌────────────────────────────┴─────────────────────────────┐   │
│  │  k8s-sustain-dashboard (optional)                        │   │
│  │  Web UI: policy exploration, metrics, simulator          │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Controller (k8s-sustain start)

The controller is a standard controller-runtime reconciler that watches Policy objects.

Reconcile loop:

  1. A Policy event is received (create / update / periodic requeue)
  2. For each workload kind enabled in the policy (deployment, statefulSet, daemonSet, cronJob):
  3. List all objects of that kind cluster-wide
  4. Filter by the k8s.sustain.io/policy annotation in the pod template
  5. Skip workloads with OnCreate mode (handled by the webhook)
  6. For each matching workload:
  7. Query Prometheus for the pN of CPU and memory over the configured window
  8. Compute per-container recommendations (request + limit)
  9. If --recommend-only is set, log the recommendation and skip patching
  10. Recycle stale running pods: on k8s ≥ 1.31 via in-place resource patching (using the /resize subresource on k8s ≥ 1.33); on k8s < 1.31 via the Eviction API (PDB-respecting). The webhook injects the latest resources into replacement pods at creation time

The controller requeues after --reconcile-interval (default 10m).

Admission Webhook (k8s-sustain webhook)

The webhook is a mutating admission webhook that intercepts pods/CREATE requests.

Admission flow:

  1. Pod creation request arrives at the API server
  2. API server calls POST /mutate on the webhook service
  3. Webhook reads k8s.sustain.io/policy from the pod's annotations
  4. Resolves the pod's owner chain to determine the workload kind:
  5. Pod → ReplicaSet → Deployment
  6. Pod → Job → CronJob
  7. Pod → StatefulSet / DaemonSet
  8. Fetches the named Policy from the API server
  9. Checks that the policy has OnCreate mode for that workload kind
  10. Queries Prometheus for current recommendations
  11. Skips containers that already have a CPU request set
  12. If --recommend-only is set, logs the recommendation and allows the pod through unchanged
  13. Returns an RFC 6902 JSON Patch with the recommended resources
  14. The API server applies the patch before persisting the pod

The webhook fails open (failurePolicy: Ignore by default) — if it is unreachable or returns an error, the pod is admitted unchanged. The controller will handle ongoing reconciliation regardless.

Dashboard (k8s-sustain dashboard)

The dashboard is an optional web UI that provides:

  • Policy overview — list all policies with status, namespaces, workload types
  • Workload metrics — interactive CPU and memory time-series charts
  • Policy simulator — test "what-if" scenarios with different percentiles, headroom, and min/max values

It is read-only: it queries the Kubernetes API and Prometheus but never modifies any resources. See the Dashboard guide for details.

Recommend-only mode

When --recommend-only is passed (or recommendOnly: true in the Helm values), all three components continue to operate normally — querying Prometheus, computing recommendations, resolving workloads — but no mutations are applied. Recommendations are emitted as structured JSON log lines at info level.

This is useful for:

  • Validating that the operator produces sensible recommendations before enabling active mode
  • Auditing what changes would be made without risk
  • Running the operator in a staging environment alongside existing resource settings

Prometheus recording rules

k8s-sustain relies on pre-computed recording rules to avoid expensive fan-out queries at reconcile time. The chart installs three rule groups:

Group Records
k8s_sustain.workload_mapping Maps pods to their workload owner (resolves RS→Deployment)
k8s_sustain.cpu_rates Per-container CPU rate, with and without workload labels
k8s_sustain.memory_rates Per-container memory working set, with and without workload labels

Percentile computation is not pre-recorded. At reconcile time the controller and webhook query k8s_sustain:container_cpu_usage_by_workload:rate5m and k8s_sustain:container_memory_by_workload:bytes using quantile_over_time with the exact quantile and window from the policy. This avoids maintaining fixed-window pre-aggregations that would not match per-policy configuration.

Policy selection

Each workload explicitly declares its policy via the k8s.sustain.io/policy annotation on its pod template. This is a deliberate design choice:

  • No implicit matching — a workload is never accidentally governed by a policy
  • No ambiguity — one annotation, one policy, deterministic behavior
  • Same annotation, two consumers — the webhook reads it from the pod (inherited from the template); the controller reads it from the workload's pod template directly