On this page:
Architect for Kubernetes revolutionizes Kubernetes cost optimization by enabling pods to hibernate in place when idle and wake instantly (<50ms) when needed.
Unlike traditional autoscaling solutions that delete pods and cause cold starts, Architect keeps pods scheduled while reducing their resource consumption to zero during idle periods.
Key Benefits
- Zero idle costs: Hibernated pods consume no CPU or memory
- Instant wake times: Pods restore in <50ms vs 30-60+ seconds for cold starts
- No application changes: Works with existing workloads
- Pods stay scheduled: No delays from rescheduling, PVC mounting, or service registration
Quick Start
Want to see Architect in action immediately? Here's the fastest way to get started:
# 1. Label nodes
kubectl label nodes <node-name> architect.loopholelabs.io/node=true
kubectl label nodes <node-name> architect.loopholelabs.io/critical-node=true
# 2. Install Architect from https://console.architect.io/ (requires GitHub account)
helm uninstall -n architect architect || true
helm install architect oci://ghcr.io/loopholelabs/architect-chart \
--namespace architect --create-namespace \
--set kubernetesDistro="eks" \
--set apiUrl="https://api.architect.io" \
--set machineToken="mymachinetoken" \
--set clusterName="myclustername" --wait
# 3. Deploy the example Go application
helm uninstall example-go || true
helm install example-go oci://ghcr.io/loopholelabs/example-go-chart --wait
# 4. Watch the pod hibernate after 10 seconds of inactivity
kubectl get pods -w
# 5. Wake it up instantly with a request
kubectl exec -it <example-go-pod> -- curl localhost:8080
# 6. Observe the resource savings
kubectl top podsWe have a number of example applications that already have Architect annotations applied and can be deployed with a single command:
# Go
helm upgrade example-go \
oci://ghcr.io/loopholelabs/example-go-chart \
--install --wait
# Kafka
helm upgrade example-kafka \
oci://ghcr.io/loopholelabs/example-kafka-chart \
--install --wait
# PHP Wordpress
helm upgrade example-php-wordpress \
oci://ghcr.io/loopholelabs/example-php-wordpress-chart \
--install --wait
# Postgres
helm upgrade example-postgres \
oci://ghcr.io/loopholelabs/example-postgres-chart \
--install --wait
# Python
helm upgrade example-python \
oci://ghcr.io/loopholelabs/example-python-chart \
--install --wait
# Ruby
helm upgrade example-ruby \
oci://ghcr.io/loopholelabs/example-ruby-chart \
--install --wait
# Rust
helm upgrade example-rust-miniserve \
oci://ghcr.io/loopholelabs/example-rust-miniserve-chart \
--install --wait
# Spring Boot
helm upgrade example-spring-boot \
oci://ghcr.io/loopholelabs/example-spring-boot-chart \
--install --wait
# Valkey
helm upgrade example-valkey \
oci://ghcr.io/loopholelabs/example-valkey-chart \
--install --waitHow Does It Work?
Architect continuously monitors your pods for activity. When a pod becomes idle (no network traffic for a configured duration), Architect:
- Creates a checkpoint of the complete pod state (memory, file descriptors, network connections)
- Hibernates the pod in place, reducing resource requests to zero
- Keeps the pod scheduled and registered with services
- Instantly restores the pod when traffic arrives or when accessed via
kubectl exec
Wake Triggers
Pods automatically wake from hibernation when:
- kubectl exec commands - Running commands in the container wakes it instantly
- Network traffic arrives (coming soon) - Any incoming network packet triggers immediate restoration
- API calls (coming soon) - Programmatic wake/sleep control via Architect API
Installation
Prerequisites
- Kubernetes cluster version 1.32 or higher (1.33 is required for pod sleeping)
- Helm 3 or higher
- Nodes where Architect workloads will run must be labeled
- On Amazon EKS, must use AL2023 AMI (AL2 is not supported)
Step 1: Install Architect
Sign into https://console.architect.io/ and click on the + Add Cluster button, then follow the instructions.
If you want to manage your Architect installation via GitOps, you may want to create the following secret:
kubectl create secret generic architect-secrets \
--from-literal=machineToken="YOUR_MACHINE_TOKEN" \
--namespace=architect \
--output=yaml \
--dry-run=client \
| kubectl apply --filename -And then install Architect by referencing this secret via the secretRef Helm
chart property (e.g. --set secretRef=architect-secrets).
Step 2: Verify Installation
# Check that all Architect components are running
kubectl get pods -n architect
# You should see:
# - architect-manager (admission controller)
# - architect-control-plane
# - architectd pods on each labeled nodeConfiguration
To enable Architect for your workloads, you need to:
- Set the runtime class to
runc-architect - Specify which containers to manage
- Configure idle timeouts
Basic Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
template:
metadata:
annotations:
# Specify which containers Architect should manage
architect.loopholelabs.io/managed-containers: '["my-app-container"]'
# Set idle timeout (optional, default is 10s)
architect.loopholelabs.io/scaledown-durations: '{"my-app-container":"30s"}'
spec:
runtimeClassName: runc-architect # Required
containers:
- name: my-app-container
image: my-app:latest
resources:
requests:
memory: "512Mi"
cpu: "250m"Configuration Options
Runtime Class (Required)
spec:
runtimeClassName: runc-architectThis tells Kubernetes to use Architect's custom runtime for this pod. Two runtime classes are available:
runc-architect: Standard runtime based on runc. Containers can automatically hibernate and reduce their resource after a certain amount of inactivity and wake on activity. Uses themanaged-containersannotation.runsc-architect: gVisor-based runtime with enhanced security isolation. Usesmanaged-podannotation andPersistentCheckpointCRDs for checkpoint creation. See Example 4 for details.
Managed Containers Annotation
architect.loopholelabs.io/managed-containers: '["container-1", "container-2"]'- Lists which containers in the pod should be managed by Architect
- Containers not in this list run normally without hibernation
- Useful for excluding sidecar containers (e.g., logging agents)
Scale-down Durations Annotation
architect.loopholelabs.io/scaledown-durations: '{"container-1":"30s", "container-2":"60s"}'- Sets how long a container must be idle before hibernating
- Format: JSON object with container names as keys and durations as values
- Default: 10 seconds if not specified
- Minimum: 1s, Maximum: unlimited
Post-Migration Auto Scale Up Containers Annotation
architect.loopholelabs.io/postmigration-autoscaleup-containers: '["container-1", "container-2"]'- List which containers should automatically scale back up after a migration (by default, containers stay scaled down so as to not cause a thundering herd on migrations)
Disable Auto Scale Down Containers Annotation
architect.loopholelabs.io/disable-autoscaledown-containers: '["container-1", "container-2"]'- Lists which containers should not automatically scale down. By default, containers scale down after the duration set in the scale-down duration annotation. If a container is listed in this annotation, it will never scale down automatically
- Mostly useful for long-running background jobs that should still be migrated by default, but not scale down when there is no traffic
Scale-Up Timeout Containers Annotation
architect.loopholelabs.io/scaleup-timeout-containers: '{"container-1": "60s", "container-2": "2m"}'- Sets how long a container should wait for a checkpoint to become available during scale-up
- Format: JSON object with container names as keys and durations as values
- Default: 30 seconds if not specified
- When a new pod starts and other pods with the same template hash exist, the container waits up to this timeout for a checkpoint CRD to be advertised before aborting the checkpoint download and starting a fresh container
- Only applies to
runc-architect. This annotation is ignored forrunsc-architectpods since gVisor pods do not usepod-template-hash-based checkpoint lookup
Managed Pod Annotation (gVisor only)
architect.loopholelabs.io/managed-pod: "true"- Used with the
runsc-architectruntime class (gVisor) instead ofmanaged-containers - When set to
"true", the entire pod is managed by Architect - Checkpoints for the
runsc-architectruntime class are pod-scoped (not container-scoped), so all containers in the pod are checkpointed together - Required for using
PersistentCheckpointCRDs
Migrate EmptyDir Containers Annotation
architect.loopholelabs.io/migrate-emptydir-containers: '["container-1", "container-2"]'- Lists which containers should have their emptyDir volumes preserved during migrations
- By default, emptyDir volumes are ephemeral and not migrated
- When a container is listed in this annotation, its emptyDir volume data is snapshotted during checkpoint and restored after migration
- Useful for applications that store important temporary data in emptyDir volumes
Start From Persistent Checkpoint Annotation (gVisor only)
architect.loopholelabs.io/start-from-persistent-checkpoint: "namespace/persistent-checkpoint-name"- Specifies a
PersistentCheckpointCRD to restore from - Used with the
runsc-architectruntime class (gVisor) - Enables creating new pods from a "golden image" checkpoint
- Allows restoring from a checkpoint in a different namespace
- If the referenced
PersistentCheckpointdoesn't exist or has no checkpoint data, the pod starts fresh
Examples
Example 1: Web API Service
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 10 # Can now overprovision without cost penalty
strategy:
type: Recreate
template:
metadata:
annotations:
architect.loopholelabs.io/managed-containers: '["api"]'
architect.loopholelabs.io/scaledown-durations: '{"api":"30s"}'
spec:
runtimeClassName: runc-architect
containers:
- name: api
image: mycompany/api:v2.1
ports:
- containerPort: 8080
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"Example 2: Microservices with Sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
replicas: 15
template:
metadata:
annotations:
# Only manage the main container, not the sidecar
architect.loopholelabs.io/managed-containers: '["order-service"]'
architect.loopholelabs.io/scaledown-durations: '{"order-service":"60s"}'
spec:
runtimeClassName: runc-architect
containers:
- name: order-service
image: mycompany/order-service:v1.5
ports:
- containerPort: 8080
- name: logging-agent
image: fluentd:latest
# This container is not managed by ArchitectExample 3: Development Environment
apiVersion: apps/v1
kind: Deployment
metadata:
name: dev-environment
namespace: development
spec:
replicas: 50 # One per developer, most idle
template:
metadata:
annotations:
architect.loopholelabs.io/managed-containers: '["dev-container"]'
# Aggressive hibernation for dev environments
architect.loopholelabs.io/scaledown-durations: '{"dev-container":"5s"}'
spec:
runtimeClassName: runc-architect
containers:
- name: dev-container
image: mycompany/dev-env:latest
resources:
requests:
memory: "4Gi"
cpu: "2000m"Example 4: gVisor with PersistentCheckpoint (Pre-Migration Checkpoints)
The runsc-architect runtime class uses gVisor for enhanced security
isolation. Unlike runc-architect, gVisor pods use a
PersistentCheckpoint CRD to trigger checkpoint creation while keeping the pod
running. This is useful for:
- Creating checkpoints of running applications without stopping them (for backups)
- Pre-creating checkpoints before planned migrations for faster pod recreation
- Creating "golden images" that can be used to quickly spin up new pods with pre-loaded state
apiVersion: apps/v1
kind: Deployment
metadata:
name: valkey-cache
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: valkey
template:
metadata:
labels:
app: valkey
annotations:
# Mark entire pod as managed (required for gVisor)
architect.loopholelabs.io/managed-pod: "true"
# Tells the pod which PersistentCheckpoint to restore from
architect.loopholelabs.io/start-from-persistent-checkpoint: "default/persistent-checkpoint-demo"
spec:
runtimeClassName: runsc-architect # gVisor runtime
containers:
- name: valkey
image: valkey/valkey:latest
ports:
- containerPort: 6379
resources:
requests:
memory: "256Mi"
cpu: "100m"To create a checkpoint while the pod continues running, create a PersistentCheckpoint CRD:
apiVersion: architect.loopholelabs.io/v1
kind: PersistentCheckpoint
metadata:
name: persistent-checkpoint-demo
namespace: default
spec:
podName: example-valkey-valkey-7d7f78c4f7-5f6ssApply it with:
kubectl apply -f persistentcheckpoint.yamlThe PersistentCheckpoint CRD will persist after the checkpoint is created and
will contain the embedded checkpoint data. You can verify the checkpoint was
created by checking that the CRD has checkpoint data:
kubectl get persistentcheckpoint persistent-checkpoint-demo -o jsonpath='{.spec.checkpoint}'
# Should return checkpoint data with podTemplateHash and replicasIn addition to this, an event is emitted on the pod itself after a checkpoint is created.
When the pod is later deleted (e.g., during node drain or manual deletion), a
new pod with the start-from-persistent-checkpoint annotation pointing to this
PersistentCheckpoint will restore from the checkpoint. Without the
annotation, gVisor pods start fresh; they do not restore from a checkpoint via
pod-template-hash matching like runc-architect pods do.
Restoring from a PersistentCheckpoint with a Different Pod Template
You can also restore a completely new deployment from an existing
PersistentCheckpoint using the start-from-persistent-checkpoint annotation.
This is useful for:
- Creating new deployments from a "golden image" checkpoint
- Restoring state to a different deployment with a different configuration
- Cross-namespace restore (restoring from a checkpoint in a different namespace)
apiVersion: apps/v1
kind: Deployment
metadata:
name: valkey-from-checkpoint
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: valkey-restored
template:
metadata:
labels:
app: valkey-restored
annotations:
architect.loopholelabs.io/managed-pod: "true"
# Restore from an existing PersistentCheckpoint (format: namespace/name)
architect.loopholelabs.io/start-from-persistent-checkpoint: "default/persistent-checkpoint-demo"
spec:
runtimeClassName: runsc-architect
containers:
- name: valkey
image: valkey/valkey:latest
ports:
- containerPort: 6379Important notes about checkpoint behavior:
- For
runsc-architectpods, thestart-from-persistent-checkpointannotation is the only way to restore from a checkpoint; restoring via the pod-template-hash (as is the case forrunc-architect) is not supported - If the referenced
PersistentCheckpointdoesn't exist or has no checkpoint data, the pod starts fresh
Deleting a PersistentCheckpoint
When you delete a PersistentCheckpoint CRD, the checkpoint files are
automatically cleaned up from disk:
kubectl delete persistentcheckpoint persistent-checkpoint-demoAfter deletion, any new pods referencing this PersistentCheckpoint via the
start-from-persistent-checkpoint annotation will start fresh (no data
restored).
This is different from regular Checkpoint CRDs, which are automatically
deleted after being consumed by a new pod. PersistentCheckpoint CRDs persist
until explicitly deleted, allowing multiple pods to restore from the same
checkpoint.
Introspection
Pod Status Labels
Architect adds specific labels to track container hibernation state:
# Check hibernation status for a specific container
kubectl get pods -l status.architect.loopholelabs.io/<container-name>=SCALED_DOWN
# Example: Check if the 'api' container is hibernated
kubectl get pods -l status.architect.loopholelabs.io/api=SCALED_DOWN
# List all pods with any hibernated containers
kubectl get pods -o json | jq '.items[] | select(.metadata.labels | to_entries[] | select(.key | startswith("status.architect.loopholelabs.io/")) | .value == "SCALED_DOWN") | .metadata.name'Resource Tracking Annotations
When a pod hibernates, Architect preserves the original resource requests in annotations:
# View original CPU requests for hibernated containers
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/cpu-requests}'
# Output: {"container-name":"250m"}
# View original memory requests
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/memory-requests}'
# Output: {"container-name":"6Gi"}Resource Consumption
Monitor actual resource usage to see savings (requires Kubernetes 1.33 or higher):
# View resource consumption of pods
kubectl top pods
# Hibernated pods will show zero CPU and memory usage
# Compare with original requests stored in annotations to calculate savingsLogs
Architect components log important events:
# View architectd logs on a specific node
kubectl logs -n architect -l app=architectd --tail=100
# View admission controller logs
kubectl logs -n architect -l app=architect-manager --tail=100
# Filter logs for specific pod events
kubectl logs -n architect -l app=architectd | grep <pod-name>Testing Your Application
Before deploying to production, test your application's compatibility:
# 1. Deploy with Architect in staging
# 2. Generate typical load
# 3. Let it hibernate (check status label)
kubectl get pod <pod> -o jsonpath='{.metadata.labels.status\.architect\.loopholelabs\.io/<container>}'
# 4. Wake it with traffic
kubectl exec <pod> -- curl localhost:<port>/health
# 5. Verify functionality and state preservation
# 6. Check logs if there are errors
kubectl logs -n architect -l app=architectd | grep <pod>Best Practices
1. Node Configuration
- Label nodes appropriately: Only label nodes where you want Architect workloads to run
- Avoid preemptable nodes for Architect components: The
architect-managerandarchitect-control-planeshould run on stable nodes - Separate control plane from workloads: Run Architect control components on different nodes than your workloads when possible
2. Application Suitability
Well-suited applications:
- Stateless web services and APIs
- Microservices with intermittent traffic
- Development and staging environments
- Batch processing jobs with idle periods
- Services with predictable traffic patterns
Applications requiring careful consideration:
- GPU workloads requiring CUDA state preservation (under development)
3. Configuration Guidelines
- Start with conservative timeouts: Begin with 30-60 second idle timeouts and decrease gradually
- Test in staging first: Always validate hibernation behavior in non-production environments
- Monitor wake times: Ensure your SLOs are met with the hibernation/wake cycle
4. Capacity Planning
With Architect, you can:
- Overprovision without cost penalty: Run more replicas for better availability
- Eliminate scaling buffers: No need for extra replicas to handle scale-up delays
- Simplify HPA configuration: Focus on actual capacity needs, not scaling delays
Troubleshooting
Pod Not Hibernating
Check idle timeout configuration:
# View configured timeout (default is 10s if not set)
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/scaledown-durations}'Verify container is managed:
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/managed-containers}'Check container status label:
# Check if container shows as scaled down
kubectl get pod <pod-name> -o jsonpath='{.metadata.labels.status\.architect\.loopholelabs\.io/<container-name>}'Review architectd logs for hibernation events:
kubectl logs -n architect -l app=architectd | grep <pod-name>Pod Not Waking
Test wake triggers:
# Wake via kubectl exec
kubectl exec -it <pod-name> -- /bin/sh -c "echo test"
# Wake via network traffic (if service exposed)
kubectl port-forward <pod-name> <port>:<port>
curl localhost:<port>Check pod events:
kubectl describe pod <pod-name>Verify architectd is running on the node:
# Find which node the pod is on
kubectl get pod <pod-name> -o wide
# Check architectd on that node
kubectl get pods -n architect -o wide | grep <node-name>High Wake Times
If wake times exceed 50ms:
- Check node CPU and memory availability
- Verify no resource contention on the node
- Check checkpoint size (larger applications take longer)
- Review architectd logs for restore errors
Checkpoint Failures
Common causes and solutions:
-
Application incompatibility:
- Applications using GPUs are not currently supported
-
Disk space issues:
# Check disk space on nodes kubectl get nodes -o custom-columns=NAME:.metadata.name,DISK:.status.allocatable.ephemeral-storage -
Permission issues:
- Ensure the runtime class is properly set
- Verify node labels are correct
-
Review detailed logs:
# Get detailed architectd logs kubectl logs -n architect -l app=architectd --tail=500 | grep -E "checkpoint|restore|error"
Helm Chart
The Helm chart supports additional configuration options for customizing components. For example you can chose to install a development version:
--devel --version 0.0.0-pojntfx-arch-394-implement-p2p-evac-for-new-architect.1.9b433b9Or add custom node selectors for components to further restrict pod placement:
--set 'architectdNodeSelector.custom-label=value' \
--set 'architectAdmissionControllerNodeSelector.zone=us-east-1a' \
--set 'architectControlPlaneNodeSelector.tier=critical'Add tolerations for components to allow scheduling pods to tained nodes:
--set 'architectdTolerations[0].key=dedicated' \
--set 'architectdTolerations[0].operator=Equal' \
--set 'architectdTolerations[0].value=architect' \
--set 'architectdTolerations[0].effect=NoSchedule'And set resource requests and limits for the different components:
--set 'architectAdmissionControllerResources.requests.cpu=100m' \
--set 'architectAdmissionControllerResources.requests.memory=128Mi' \
--set 'architectAdmissionControllerResources.limits.cpu=500m' \
--set 'architectAdmissionControllerResources.limits.memory=512Mi' \
--set 'architectControlPlaneResources.requests.cpu=200m' \
--set 'architectControlPlaneResources.requests.memory=256Mi'FAQ
Q: How is this different from scale-to-zero solutions like KEDA or Knative?
A: Scale-to-zero solutions delete pods entirely, causing 30-60+ second cold starts when they're needed again. Architect keeps pods scheduled but hibernates them in place, enabling <50ms wake times. Your pods stay registered with services, keep their PVCs mounted, and maintain their network configuration.
Q: What triggers a pod to wake from hibernation?
A: Pods wake instantly (<50ms) when:
- You run
kubectl execcommands on the container - (Coming soon) network traffic arrives at the pod
- (Coming soon) API calls to programmatically wake pods
The wake process is automatic and transparent - your application doesn't need any modifications.
Q: Can I use Architect with HPA (Horizontal Pod Autoscaler)?
A: Yes! Architect complements HPA perfectly. HPA handles adding/removing replicas based on metrics, while Architect ensures idle replicas don't consume resources. You can now set more aggressive HPA policies without cost concerns.
Q: What applications are not compatible?
A: Currently, applications using GPUs are not compatible. We recommend thorough testing in staging environments.
Q: How much overhead does Architect add?
A: Architect adds minimal overhead - typically <1% CPU and <50MB memory per node for the architectd daemon. The checkpoint/restore process itself is highly optimized with near-zero impact on running workloads.
Q: Can I migrate hibernated pods between nodes?
A: Yes - pods that are deleted on one node will have their checkpoints moved to whichever node the replacement pod is scheduled to.
Q: What happens during Kubernetes upgrades?
A: Architect components should be upgraded first, followed by your workloads. Hibernated pods will be woken during node drains and can be safely rescheduled.
Q: Is there a limit to how many pods can be hibernated?
A: There's no hard limit. The practical limit depends on your node's disk space for storing checkpoints (typically 50-200MB per pod) and the architectd daemon's capacity.
Q: How do I know how much I'm saving?
A: Monitor the difference between provisioned resources and actual usage:
# Provisioned resources
kubectl get pods -o custom-columns=NAME:.metadata.name,CPU:.spec.containers[0].resources.requests.cpu,MEMORY:.spec.containers[0].resources.requests.memory
# Actual usage (hibernated pods show ~0)
kubectl top podsA more concise breakdown will be available soon at https://console.architect.io/
Q: What happens to in-flight requests?
A: Architect monitors network traffic and only hibernates pods that have been truly idle (no traffic) for the configured duration. If a request arrives while a pod is transitioning to hibernation or while it's hibernated, it's buffered and delivered once the pod wakes (typically within 50ms). No packets are dropped.
Q: Can I change the managed containers list without restarting pods?
A: While you can update the managed-containers annotation without
restarting, it's not recommended. When you remove a container from the managed
list, its checkpoint is deleted and it becomes unmanaged. For predictable
behavior, use the Recreate deployment strategy or restart pods after changing
the annotation.
Q: How much disk space do checkpoints require?
A: Checkpoint size varies by application but typically ranges from 50-200MB per pod. The size depends on the application's memory footprint and state. Monitor disk usage on nodes with:
kubectl exec -n architect <architectd-pod> -- du -sh /var/lib/architect/checkpoints/Q: Does Architect work with StatefulSets?
A: Yes, Architect works with StatefulSets. Each pod maintains its own checkpoint and persistent volume claims remain mounted during hibernation. Use the same annotations and runtime class as with Deployments.
Q: What happens if the architectd daemon crashes?
A: If architectd crashes on a node, pods on that node continue running normally but won't hibernate or wake. The daemon automatically restarts via the DaemonSet controller. Existing checkpoints are preserved and operations resume once architectd is back online.
Q: When should I use gVisor (runsc-architect) vs runc (runc-architect)?
A: Choose based on your needs:
- Use
runc-architectfor most workloads. It provides automatic hibernation on inactivity and can wake up again on activity. Best for web services, APIs, and general (trusted) workloads. Checkpoints here are container-scoped, meaning that all containers get checkpointed together or you can checkpoint individual containers and opt out others. - Use
runsc-architectwhen you need enhanced security isolation (gVisor's sandbox), e.g. for untrusted workloads, or want explicit control over checkpoint timing. With gVisor, you create checkpoints manually usingPersistentCheckpointCRDs while the pod continues running. This is useful for creating backup snapshots or pre-migration checkpoints without stopping your workload. Note thatrunsc-architectdoes not support init containers, except for the case where they are used as sidecar containers. Checkpoints here are pod-scoped, meaning that all containers get checkpointed together.
Q: Can I exclude certain pods from hibernation temporarily?
A: Yes, you can:
- Remove the container from the
managed-containersannotation - Set a very long timeout (e.g., "24h")
- Remove the
runc-architectruntime class (requires pod restart)
Q: How do I calculate my actual cost savings?
A: Track these metrics:
# Total provisioned resources
kubectl get pods -o json | jq '[.items[] | .spec.containers[] | .resources.requests] | map(.memory // "0", .cpu // "0") | add'
# Resources actually being used
kubectl top pods --no-headers | awk '{sum+=$2} END {print sum}'
# Savings = (Provisioned - Actual) * Cloud provider ratesQ: Where are checkpoints cached?
A: Checkpoints are cached on each node that architectd runs on, at
/root/.local/state/architect/.
Q: How do I uninstall Architect?
A: To uninstall Architect without disrupting your running workloads, you must follow a specific sequence. Failure to update your workloads before uninstallation will result in pod errors.
- Prepare Your Workloads (Required): before running the uninstall command, you must update your workload configurations to ensure they no longer depend on the Architect runtime.
- Remove
runtimeClassName: Remove this configuration from all active workloads. If you uninstall Architect while runtimeClassName is still active, all managed pods will immediately enter anErrorstate. - Remove Annotations (optional): Delete any
architect.loopholelabs.io/*annotations to keep your manifests clean.
- Execute Uninstallation: once your workloads are updated, use Helm to remove the Architect release:
helm uninstall -n architect architect- Post-Uninstall Cleanup: to fully clean your environment, manually remove any remaining Architect-specific labels from your cluster nodes.
Support and Resources
- Documentation: This guide and architecture documentation
- Support: Contact Loophole Labs support team
- License: Contact admin@loopholelabs.io for enterprise installations
Conclusion
Architect for Kubernetes fundamentally changes the economics of running Kubernetes workloads. By eliminating idle resource consumption while maintaining instant availability, you can:
- Overprovision for peak capacity without cost penalties
- Reduce infrastructure spend by 30-80%
- Maintain or improve application performance
- Simplify capacity planning and autoscaling
Next Steps
- Start Small: Deploy Architect in a development or staging environment first
- Test Compatibility: Verify your applications checkpoint and restore correctly
- Monitor Savings: Track resource consumption before and after enabling Architect
- Optimize Timeouts: Fine-tune idle timeouts based on your traffic patterns
Getting Help
- Review the architecture documentation for deep technical details
- Contact support for assistance with specific use cases
- Join our community discussions for tips and best practices
Start with a small subset of workloads, measure the benefits, and gradually expand your Architect deployment for maximum savings.