Annotations

Pod annotations control per-workload behavior. Set them on the pod template of a pod that has runtimeClassName: runc-architect. Each entry lists its default and what it requires.

managed-containers

architect.loopholelabs.io/managed-containers: '["container-1", "container-2"]'

Which containers Architect manages. Unlisted containers run normally.

Default: none. · Requires: runtimeClassName: runc-architect.

scaledown-durations

architect.loopholelabs.io/scaledown-durations: '{"container-1":"30s", "container-2":"60s"}'

Idle time before a container hibernates.

Default: 60s. · Requires: managed-containers.

initial-scaledown-delays

architect.loopholelabs.io/initial-scaledown-delays: '{"container-1":"90s"}'

Grace period (a Go duration string) that suppresses hibernation for the configured duration after the container's first scale-up. Useful for slow-starting workloads (for example JVMs whose readiness probes take longer than scaledown-durations) so they are not hibernated mid-startup. Normal activity-based scale-down resumes after the window elapses. The window is not re-armed after a migration or post-scale-down restart, since the workload is already past its slow startup by then.

Default: 0 (disabled), values clamped to 24h. · Requires: managed-containers.

network-monitor

architect.loopholelabs.io/network-monitor: '{"container-1":"packets", "container-2":"connections"}'

Enables network-based wake: a scaled-down container wakes when it receives network traffic. An eBPF program in the pod's network namespace watches the container's declared ports and triggers a scale-up. Without this annotation, the only way to wake a scaled-down container is kubectl exec.

Modes:

  • packets: wake on any incoming TCP/UDP packet on a tracked port. Suits sporadic request/response workloads such as HTTP APIs and webhook receivers.
  • connections: TCP only. Wake on connection establishment and stay awake while any TCP connection is open. Suits long-lived connection patterns such as databases, message brokers, and gRPC servers. A client that holds a pooled connection open indefinitely keeps the container awake.

Activity is tracked per port. Architect monitors only the ports the container declares in its ports array. Shadow ports injected by health-check-proxy and shadow-ports are added to that array so Kubernetes Services can target them, but Architect ignores traffic on them when assessing activity. The traffic still reaches the application; it just does not keep the container running.

Activity is also scoped per container. Sidecars sharing the pod's network namespace (Istio sidecars, fluentd, and the like) do not keep the managed container awake, and outbound traffic from an ephemeral source port does not count. A workload that only does outbound traffic from ephemeral ports should use disable-autoscaledown-containers.

Default: off. · Requires: managed-containers.

health-check-proxy

architect.loopholelabs.io/health-check-proxy: '{"mappings":[{"containerName":"app","appPort":8080,"shadowPort":9080}]}'

Lets kubelet liveness, readiness, and startup probes pass while the container is scaled down, without waking it. Probes are pointed at the shadowPort; Architect injects an architect-health-check-proxy sidecar that forwards probes to the application while it runs and answers them itself while it is scaled down, so kubelet keeps seeing a healthy response. Without this, every probe hits the application port and counts as activity, so a probed container never scales down.

Mapping fields:

  • containerName (required): a container in managed-containers.
  • appPort (required, 1 to 65535): the application's real probe port.
  • shadowPort (required, 1 to 65535): the port to point probes at.

Duplicate shadowPort values across mappings are dropped with a warning. The sidecar is not added (and a warning is logged) if managed-containers or network-monitor is missing.

See Examples for a worked example, and Troubleshooting if probes still wake the container.

Default: none. · Requires: managed-containers, network-monitor.

shadow-ports

architect.loopholelabs.io/shadow-ports: '{"mappings":[{"containerName":"app","appPort":9090,"shadowPort":29090}]}'

Lets a scraper (Prometheus, an external health check, a debug tool) reach an application port without counting as activity, so regular scrapes do not keep the container awake. The scraper is pointed at the shadowPort; traffic still reaches the application on the real port, and the application is unaware of the redirect. Without this, a recurring scrape looks like continuous traffic and the container never scales down.

Mapping fields:

  • containerName (required): a container in managed-containers.
  • appPort (required, 1 to 65535): the real port the application listens on.
  • shadowPort (required, 1 to 65535): the port to point the scraper at.

Duplicate shadowPort values are dropped with a warning. The shadow ports are not added (and a warning is logged) if managed-containers or network-monitor is missing. When the scraper cannot be moved to a different port (for example it is hard-coded in Prometheus discovery), use ignore-activity-ports instead.

See Examples for a worked example, and Troubleshooting if scrapes still wake the container.

Default: none. · Requires: managed-containers, network-monitor.

ignore-activity-ports

architect.loopholelabs.io/ignore-activity-ports: '{"container-1":[9091, 9100]}'

Marks specific ports on the container's existing port spec as conntrack-bypassed, so traffic to them does not count as activity. Unlike shadow-ports there is no DNAT and no new port is injected; the operator asserts that the listed ports are already declared on the container and the application already listens on them. Use this when a metrics scraper hits the real application port directly and should not keep the workload awake.

Default: none. · Requires: managed-containers, network-monitor.

postmigration-autoscaleup-containers

architect.loopholelabs.io/postmigration-autoscaleup-containers: '["container-1"]'

Containers that automatically scale up after migration. By default they stay hibernated to avoid a thundering herd.

Default: off (containers stay hibernated after migration). · Requires: managed-containers.

disable-autoscaledown-containers

architect.loopholelabs.io/disable-autoscaledown-containers: '["container-1"]'

Prevents automatic hibernation. Useful for background jobs that should migrate but not hibernate on idle.

Default: off (containers hibernate on idle). · Requires: managed-containers.

scaleup-timeout-containers

architect.loopholelabs.io/scaleup-timeout-containers: '{"container-1": "60s"}'

How long to wait for a checkpoint during startup.

Default: 30s.

migrate-emptydir-containers

architect.loopholelabs.io/migrate-emptydir-containers: '["container-1"]'

Preserves emptyDir volume data during migration. By default, emptyDir volumes are not migrated.

Default: off (emptyDir not migrated). · Requires: managed-containers.

sparse-files-containers

architect.loopholelabs.io/sparse-files-containers: '{"container-1": ["/var/cache/app.db"]}'

Recreates the listed files as sparse files (same size and mode, contents zeroed) at the destination instead of copying their bytes through the upper-layer snapshot, and skips them on the source so the migration avoids the per-byte snapshot cost. Use for workloads that re-scan or rewrite the file post-restore (caches, generated artifacts, scratch space). Workloads that read the original contents after migration see zeros.

Default: none.

lazy-pages-migration-containers

Experimental. Only enable when advised by Loophole Labs.

architect.loopholelabs.io/lazy-pages-migration-containers: '["container-1"]'

Enables CRIU lazy-pages migration, fetching memory pages on demand from the source pod during restore instead of copying everything upfront. Helps with memory-heavy containers. Falls back to eager migration if lazy-pages migration fails.

Default: off.

lazy-pages-restore-timeout-containers

Experimental. Only enable when advised by Loophole Labs.

architect.loopholelabs.io/lazy-pages-restore-timeout-containers: '{"container-1":"30s"}'

Bounds how long a lazy-pages restore waits for memory pages from the source before falling back to a fresh start. Useful when the source page-server is unreachable but the underlying TCP connection appears healthy. Values are Go duration strings.

Default: 0 (disabled), clamped to 24h.

rewrite-listener-addresses-containers

architect.loopholelabs.io/rewrite-listener-addresses-containers: '["container-1"]'

Rewrites listener socket addresses in CRIU checkpoints during migration. When an application binds to the pod IP (rather than 0.0.0.0), the listener address becomes invalid on the destination pod. This rewrites those addresses to INADDR_ANY (0.0.0.0) or in6addr_any (::) so the restore succeeds.

Default: off.

rewrite-established-addresses-containers

architect.loopholelabs.io/rewrite-established-addresses-containers: '["container-1"]'

Rewrites the source IP of established TCP connections in CRIU checkpoints during migration. The source pod's IP no longer exists on the destination pod, which causes CRIU's socket restore to fail. This rewrites the source address to the new pod's IP (read from /etc/hosts). Supports IPv4 and IPv6.

Default: off.

start-from-persistent-checkpoint

# Same namespace (name only):
architect.loopholelabs.io/start-from-persistent-checkpoint: "persistent-checkpoint-name"
# Cross-namespace (namespace/name):
architect.loopholelabs.io/start-from-persistent-checkpoint: "namespace/persistent-checkpoint-name"

Restore from a PersistentCheckpoint CRD on startup. With a bare name the PersistentCheckpoint is looked up in the pod's namespace; use namespace/name to reference one in a different namespace. When set, this takes priority over pod-template-hash-based Checkpoint CRDs: on any failure (not found, empty, download error, registry storage) the pod starts fresh rather than falling back to the migration path.

Default: none.

checkpoint-engine

Experimental. Only enable when advised by Loophole Labs.

architect.loopholelabs.io/checkpoint-engine: "cruise"

Selects the checkpoint/restore engine for the pod's managed containers. Set it to cruise to route runc checkpoint/restore to the in-tree cruise engine instead of CRIU. This is a pod-global setting; unmanaged containers in the pod are never checkpointed.

Default: criu.