Architecture¶
k8s-sustain is split into three independent components that run as separate processes (different container args in the same image):
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ k8s-sustain │ │ k8s-sustain-webhook │ │
│ │ (controller) │ │ (admission server) │ │
│ │ │ │ │ │
│ │ Watches Policy │ │ Intercepts Pod CREATE │ │
│ │ objects and │ │ requests, injects │ │
│ │ reconciles │ │ resources from OnCreate │ │
│ │ Ongoing-mode │ │ policies │ │
│ │ workloads │ │ │ │
│ └────────┬─────────┘ └──────────────┬───────────────┘ │
│ │ │ │
│ │ list / patch │ Get Policy │
│ │ │ Get Job/RS │
│ ▼ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Kubernetes API Server │ │
│ └─────────────────────────┬──────────────────────────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ Deployments StatefulSets CronJobs │
│ DaemonSets │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Prometheus │ │
│ │ k8s_sustain:container_cpu_usage_by_workload:rate5m │ │
│ │ k8s_sustain:container_memory_by_workload:bytes │ │
│ └────────────────────────────┬─────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────┴─────────────────────────────┐ │
│ │ k8s-sustain-dashboard (optional) │ │
│ │ Web UI: policy exploration, metrics, simulator │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Controller (k8s-sustain start)¶
The controller is a standard controller-runtime reconciler that watches Policy objects.
Reconcile loop:
- A
Policyevent is received (create / update / periodic requeue) - For each workload kind enabled in the policy (
deployment,statefulSet,daemonSet,cronJob): - List all objects of that kind cluster-wide
- Filter by the
k8s.sustain.io/policyannotation in the pod template - Skip workloads with
OnCreatemode (handled by the webhook) - For each matching workload:
- Query Prometheus for the p
Nof CPU and memory over the configured window - Compute per-container recommendations (request + limit)
- If
--recommend-onlyis set, log the recommendation and skip patching - Recycle stale running pods: on k8s ≥ 1.31 via in-place resource patching (using the
/resizesubresource on k8s ≥ 1.33); on k8s < 1.31 via the Eviction API (PDB-respecting). The webhook injects the latest resources into replacement pods at creation time
The controller requeues after --reconcile-interval (default 10m).
Admission Webhook (k8s-sustain webhook)¶
The webhook is a mutating admission webhook that intercepts pods/CREATE requests.
Admission flow:
- Pod creation request arrives at the API server
- API server calls
POST /mutateon the webhook service - Webhook reads
k8s.sustain.io/policyfrom the pod's annotations - Resolves the pod's owner chain to determine the workload kind:
Pod → ReplicaSet → DeploymentPod → Job → CronJobPod → StatefulSet / DaemonSet- Fetches the named Policy from the API server
- Checks that the policy has
OnCreatemode for that workload kind - Queries Prometheus for current recommendations
- Skips containers that already have a CPU request set
- If
--recommend-onlyis set, logs the recommendation and allows the pod through unchanged - Returns an RFC 6902 JSON Patch with the recommended resources
- The API server applies the patch before persisting the pod
The webhook fails open (failurePolicy: Ignore by default) — if it is unreachable or returns an error, the pod is admitted unchanged. The controller will handle ongoing reconciliation regardless.
Dashboard (k8s-sustain dashboard)¶
The dashboard is an optional web UI that provides:
- Policy overview — list all policies with status, namespaces, workload types
- Workload metrics — interactive CPU and memory time-series charts
- Policy simulator — test "what-if" scenarios with different percentiles, headroom, and min/max values
It is read-only: it queries the Kubernetes API and Prometheus but never modifies any resources. See the Dashboard guide for details.
Recommend-only mode¶
When --recommend-only is passed (or recommendOnly: true in the Helm values), all three components continue to operate normally — querying Prometheus, computing recommendations, resolving workloads — but no mutations are applied. Recommendations are emitted as structured JSON log lines at info level.
This is useful for:
- Validating that the operator produces sensible recommendations before enabling active mode
- Auditing what changes would be made without risk
- Running the operator in a staging environment alongside existing resource settings
Prometheus recording rules¶
k8s-sustain relies on pre-computed recording rules to avoid expensive fan-out queries at reconcile time. The chart installs three rule groups:
| Group | Records |
|---|---|
k8s_sustain.workload_mapping |
Maps pods to their workload owner (resolves RS→Deployment) |
k8s_sustain.cpu_rates |
Per-container CPU rate, with and without workload labels |
k8s_sustain.memory_rates |
Per-container memory working set, with and without workload labels |
Percentile computation is not pre-recorded. At reconcile time the controller and webhook query k8s_sustain:container_cpu_usage_by_workload:rate5m and k8s_sustain:container_memory_by_workload:bytes using quantile_over_time with the exact quantile and window from the policy. This avoids maintaining fixed-window pre-aggregations that would not match per-policy configuration.
Policy selection¶
Each workload explicitly declares its policy via the k8s.sustain.io/policy annotation on its pod template. This is a deliberate design choice:
- No implicit matching — a workload is never accidentally governed by a policy
- No ambiguity — one annotation, one policy, deterministic behavior
- Same annotation, two consumers — the webhook reads it from the pod (inherited from the template); the controller reads it from the workload's pod template directly