Architecture¶

k8s-sustain is split into three independent components that run as separate processes (different container args in the same image):

┌─────────────────────────────────────────────────────────────────┐
│                        Kubernetes Cluster                       │
│                                                                 │
│  ┌──────────────────┐        ┌──────────────────────────────┐   │
│  │  k8s-sustain     │        │  k8s-sustain-webhook         │   │
│  │  (controller)    │        │  (admission server)          │   │
│  │                  │        │                              │   │
│  │  Watches Policy  │        │  Intercepts Pod CREATE        │   │
│  │  objects and     │        │  requests, injects           │   │
│  │  reconciles      │        │  resources from OnCreate     │   │
│  │  Ongoing-mode    │        │  policies                    │   │
│  │  workloads       │        │                              │   │
│  └────────┬─────────┘        └──────────────┬───────────────┘   │
│           │                                 │                   │
│           │ list / patch                    │ Get Policy        │
│           │                                 │ Get Job/RS        │
│           ▼                                 ▼                   │
│  ┌────────────────────────────────────────────────────────┐     │
│  │                   Kubernetes API Server                │     │
│  └─────────────────────────┬──────────────────────────────┘     │
│                            │                                    │
│           ┌────────────────┼────────────────┐                   │
│           ▼                ▼                ▼                   │
│    Deployments      StatefulSets        CronJobs                │
│    DaemonSets                                                   │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                        Prometheus                        │   │
│  │  k8s_sustain:container_cpu_usage_by_workload:rate5m      │   │
│  │  k8s_sustain:container_memory_by_workload:bytes          │   │
│  └────────────────────────────┬─────────────────────────────┘   │
│                               │                                 │
│  ┌────────────────────────────┴─────────────────────────────┐   │
│  │  k8s-sustain-dashboard (optional)                        │   │
│  │  Web UI: policy exploration, metrics, simulator          │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Controller (`k8s-sustain start`)¶

The controller is a standard controller-runtime reconciler that watches Policy objects.

Reconcile loop:

A Policy event is received (create / update / periodic requeue)
For each workload kind enabled in the policy (deployment, statefulSet, daemonSet, cronJob):
List all objects of that kind cluster-wide
Filter by the k8s.sustain.io/policy annotation in the pod template
Skip workloads with OnCreate mode (handled by the webhook)
For each matching workload:
Query Prometheus for the pN of CPU and memory over the configured window
Compute per-container recommendations (request + limit)
If --recommend-only is set, log the recommendation and skip patching
Recycle stale running pods: on k8s ≥ 1.31 via in-place resource patching (using the /resize subresource on k8s ≥ 1.33); on k8s < 1.31 via the Eviction API (PDB-respecting). The webhook injects the latest resources into replacement pods at creation time

The controller requeues after --reconcile-interval (default 10m).

Admission Webhook (`k8s-sustain webhook`)¶

The webhook is a mutating admission webhook that intercepts pods/CREATE requests.

Admission flow:

Pod creation request arrives at the API server
API server calls POST /mutate on the webhook service
Webhook reads k8s.sustain.io/policy from the pod's annotations
Resolves the pod's owner chain to determine the workload kind:
Pod → ReplicaSet → Deployment
Pod → Job → CronJob
Pod → StatefulSet / DaemonSet
Fetches the named Policy from the API server
Checks that the policy has OnCreate mode for that workload kind
Queries Prometheus for current recommendations
Skips containers that already have a CPU request set
If --recommend-only is set, logs the recommendation and allows the pod through unchanged
Returns an RFC 6902 JSON Patch with the recommended resources
The API server applies the patch before persisting the pod

The webhook fails open (failurePolicy: Ignore by default) — if it is unreachable or returns an error, the pod is admitted unchanged. The controller will handle ongoing reconciliation regardless.

Dashboard (`k8s-sustain dashboard`)¶

The dashboard is an optional web UI that provides:

Policy overview — list all policies with status, namespaces, workload types
Workload metrics — interactive CPU and memory time-series charts
Policy simulator — test "what-if" scenarios with different percentiles, headroom, and min/max values

It is read-only: it queries the Kubernetes API and Prometheus but never modifies any resources. See the Dashboard guide for details.

When --recommend-only is passed (or recommendOnly: true in the Helm values), all three components continue to operate normally — querying Prometheus, computing recommendations, resolving workloads — but no mutations are applied. Recommendations are emitted as structured JSON log lines at info level.

This is useful for:

Validating that the operator produces sensible recommendations before enabling active mode
Auditing what changes would be made without risk
Running the operator in a staging environment alongside existing resource settings

Prometheus recording rules¶

k8s-sustain relies on pre-computed recording rules to avoid expensive fan-out queries at reconcile time. The chart installs three rule groups:

Group	Records
`k8s_sustain.workload_mapping`	Maps pods to their workload owner (resolves RS→Deployment)
`k8s_sustain.cpu_rates`	Per-container CPU rate, with and without workload labels
`k8s_sustain.memory_rates`	Per-container memory working set, with and without workload labels

Percentile computation is not pre-recorded. At reconcile time the controller and webhook query k8s_sustain:container_cpu_usage_by_workload:rate5m and k8s_sustain:container_memory_by_workload:bytes using quantile_over_time with the exact quantile and window from the policy. This avoids maintaining fixed-window pre-aggregations that would not match per-policy configuration.

Policy selection¶

Each workload explicitly declares its policy via the k8s.sustain.io/policy annotation on its pod template. This is a deliberate design choice:

No implicit matching — a workload is never accidentally governed by a policy
No ambiguity — one annotation, one policy, deterministic behavior
Same annotation, two consumers — the webhook reads it from the pod (inherited from the template); the controller reads it from the workload's pod template directly

Architecture¶

Controller (k8s-sustain start)¶

Admission Webhook (k8s-sustain webhook)¶

Dashboard (k8s-sustain dashboard)¶

Recommend-only mode¶

Prometheus recording rules¶

Policy selection¶

Controller (`k8s-sustain start`)¶

Admission Webhook (`k8s-sustain webhook`)¶

Dashboard (`k8s-sustain dashboard`)¶