On this page:
Architect for Kubernetes revolutionizes Kubernetes cost optimization by enabling pods to hibernate in place when idle and wake instantly (<50ms) when needed. Unlike traditional autoscaling solutions that delete pods and cause cold starts, Architect keeps pods scheduled while reducing their resource consumption to zero during idle periods.
Key Benefits
- Zero idle costs: Hibernated pods consume no CPU or memory
- Instant wake times: Pods restore in <50ms vs 30-60+ seconds for cold starts
- No application changes: Works with existing workloads
- Pods stay scheduled: No delays from rescheduling, PVC mounting, or service registration
Quick Start
Want to see Architect in action immediately? Here's the fastest way to get started:
# 1. Label nodes and install Architect (get your command from https://console.architect.io/)
kubectl label nodes <node-name> architect.loopholelabs.io/node=true
kubectl label nodes <node-name> architect.loopholelabs.io/critical-node=true
helm uninstall -n architect architect || true
helm install architect oci://ghcr.io/loopholelabs/architect-chart \
--namespace architect --create-namespace \
--set kubernetesDistro="eks" \
--set machineToken="mymachinetoken" \
--set clusterName="myclustername" --wait
# 2. Deploy the example Go application
helm uninstall example-go || true
helm install example-go oci://ghcr.io/loopholelabs/example-go-chart --wait
# 3. Watch the pod hibernate after 10 seconds of inactivity
kubectl get pods -w
# 4. Wake it up instantly with a request
kubectl exec -it <example-go-pod> -- curl localhost:8080
# 5. Observe the resource savings
kubectl top podsOther example applications that you can deploy for testing Architect behaviour (these are already pre-configured to be managed by Architect):
helm upgrade example-valkey oci://ghcr.io/loopholelabs/example-valkey-chart --install --wait
helm upgrade example-python oci://ghcr.io/loopholelabs/example-python-chart --install --wait
helm upgrade example-ruby oci://ghcr.io/loopholelabs/example-ruby-chart --install --wait
helm upgrade example-rust-miniserve oci://ghcr.io/loopholelabs/example-rust-miniserve-chart --install --wait
helm upgrade example-kafka oci://ghcr.io/loopholelabs/example-kafka-chart --install --wait
helm upgrade example-spring-boot oci://ghcr.io/loopholelabs/example-spring-boot-chart --install --wait
helm upgrade example-php-wordpress oci://ghcr.io/loopholelabs/example-php-wordpress-chart --install --wait
helm upgrade example-postgres oci://ghcr.io/loopholelabs/example-postgres-chart --install --waitHow It Works
Architect continuously monitors your pods for activity. When a pod becomes idle (no network traffic for a configured duration), Architect:
- Creates a checkpoint of the complete pod state (memory, file descriptors, network connections)
- Hibernates the pod in place, reducing resource requests to zero
- Keeps the pod scheduled and registered with services
- Instantly restores the pod when traffic arrives or when accessed via
kubectl exec
Wake Triggers
Pods automatically wake from hibernation when:
- Network traffic arrives - Any incoming network packet triggers immediate restoration
- kubectl exec commands - Running commands in the container wakes it instantly
- API calls (coming soon) - Programmatic wake/sleep control via Architect API
Installation
Prerequisites
- Kubernetes cluster version 1.32 or higher (1.33 is required for pod sleeping)
- Helm 3 or higher
- Nodes where Architect workloads will run must be labeled
- For Amazon EKS: must use AL2023 AMI (AL2 is not supported)
Step 1: Install Architect
Sign into https://console.architect.io/ and click on the + Add Cluster button, then follow the instructions.
If you want to manage your Architect installation via GitOps, you may want to create the following secret:
kubectl create secret generic architect-secrets \
--from-literal=machineToken="YOUR_MACHINE_TOKEN" \
--namespace=architect \
--output=yaml \
--dry-run=client \
| kubectl apply --filename -And then install Architect by referencing this secret via the secretRef Helm chart property (e.g. --set secretRef=architect-secrets).
Step 2: Verify Installation
# Check that all Architect components are running
kubectl get pods -n architect
# You should see:
# - architect-manager (admission controller)
# - architect-control-plane
# - architectd pods on each labeled nodeConfiguring Workloads for Architect
To enable Architect for your workloads, you need to:
- Set the runtime class to
runc-architect - Specify which containers to manage
- Configure idle timeouts
Basic Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
template:
metadata:
annotations:
# Specify which containers Architect should manage
architect.loopholelabs.io/managed-containers: '["my-app-container"]'
# Set idle timeout (optional, default is 10s)
architect.loopholelabs.io/scaledown-durations: '{"my-app-container":"30s"}'
spec:
runtimeClassName: runc-architect # Required
containers:
- name: my-app-container
image: my-app:latest
resources:
requests:
memory: "512Mi"
cpu: "250m"Configuration Options
Runtime Class (Required)
spec:
runtimeClassName: runc-architectThis tells Kubernetes to use Architect's custom runtime for this pod. Two runtime classes are available:
runc-architect: Standard runtime based on runc. Containers can automatically hibernate and reduce their resource after a certain amount of inactivity and wake on activity. Uses themanaged-containersannotation.runsc-architect: gVisor-based runtime with enhanced security isolation. Usesmanaged-podannotation andPersistentCheckpointCRDs for checkpoint creation. See Example 4 for details.
Managed Containers Annotation
architect.loopholelabs.io/managed-containers: '["container-1", "container-2"]'- Lists which containers in the pod should be managed by Architect
- Containers not in this list run normally without hibernation
- Useful for excluding sidecar containers (e.g., logging agents)
Scale-down Durations Annotation
architect.loopholelabs.io/scaledown-durations: '{"container-1":"30s", "container-2":"60s"}'- Sets how long a container must be idle before hibernating
- Format: JSON object with container names as keys and durations as values
- Default: 10 seconds if not specified
- Minimum: 1s, Maximum: unlimited
Post-Migration Auto Scale Up Containers Annotation
architect.loopholelabs.io/postmigration-autoscaleup-containers: '["container-1", "container-2"]'- List which containers should automatically scale back up after a migration (by default, containers stay scaled down so as to not cause a thundering herd on migrations)
Disable Auto Scale Down Containers Annotation
architect.loopholelabs.io/disable-autoscaledown-containers: '["container-1", "container-2"]'- Lists which containers should not automatically scale down. By default, containers scale down after the duration set in the scale-down duration annotation. If a container is listed in this annotation, it will never scale down automatically
- Mostly useful for long-running background jobs that should still be migrated by default, but not scale down when there is no traffic
Scale-Up Timeout Containers Annotation
architect.loopholelabs.io/scaleup-timeout-containers: '{"container-1": "60s", "container-2": "2m"}'- Sets how long a container should wait for a checkpoint to become available during scale-up
- Format: JSON object with container names as keys and durations as values
- Default: 30 seconds if not specified
- When a new pod starts and other pods with the same template hash exist, the container waits up to this timeout for a checkpoint CRD to be advertised before aborting the checkpoint download and starting a fresh container
Managed Pod Annotation (gVisor only)
architect.loopholelabs.io/managed-pod: "true"- Used with the
runsc-architectruntime class (gVisor) instead ofmanaged-containers - When set to
"true", the entire pod is managed by Architect - Checkpoints for the
runsc-architectruntime class are pod-scoped (not container-scoped), so all containers in the pod are checkpointed together - Required for using
PersistentCheckpointCRDs
Migrate EmptyDir Containers Annotation
architect.loopholelabs.io/migrate-emptydir-containers: '["container-1", "container-2"]'- Lists which containers should have their emptyDir volumes preserved during migrations
- By default, emptyDir volumes are ephemeral and not migrated
- When a container is listed in this annotation, its emptyDir volume data is snapshotted during checkpoint and restored after migration
- Useful for applications that store important temporary data in emptyDir volumes
Start From Persistent Checkpoint Annotation (gVisor only)
architect.loopholelabs.io/start-from-persistent-checkpoint: "namespace/persistent-checkpoint-name"- Specifies a
PersistentCheckpointCRD to restore from when no regular checkpoint exists - Used with the
runsc-architectruntime class (gVisor) - Enables creating new pods from a "golden image" checkpoint
- Allows restoring from a checkpoint in a different namespace
- If a regular
CheckpointCRD exists (same pod template hash), it takes precedence over this annotation - If the referenced
PersistentCheckpointdoesn't exist or has no checkpoint data, the pod starts fresh
Usage Examples
Example 1: Web API Service
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 10 # Can now overprovision without cost penalty
strategy:
type: Recreate
template:
metadata:
annotations:
architect.loopholelabs.io/managed-containers: '["api"]'
architect.loopholelabs.io/scaledown-durations: '{"api":"30s"}'
spec:
runtimeClassName: runc-architect
containers:
- name: api
image: mycompany/api:v2.1
ports:
- containerPort: 8080
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"Example 2: Microservices with Sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
replicas: 15
template:
metadata:
annotations:
# Only manage the main container, not the sidecar
architect.loopholelabs.io/managed-containers: '["order-service"]'
architect.loopholelabs.io/scaledown-durations: '{"order-service":"60s"}'
spec:
runtimeClassName: runc-architect
containers:
- name: order-service
image: mycompany/order-service:v1.5
ports:
- containerPort: 8080
- name: logging-agent
image: fluentd:latest
# This container is not managed by ArchitectExample 3: Development Environment
apiVersion: apps/v1
kind: Deployment
metadata:
name: dev-environment
namespace: development
spec:
replicas: 50 # One per developer, most idle
template:
metadata:
annotations:
architect.loopholelabs.io/managed-containers: '["dev-container"]'
# Aggressive hibernation for dev environments
architect.loopholelabs.io/scaledown-durations: '{"dev-container":"5s"}'
spec:
runtimeClassName: runc-architect
containers:
- name: dev-container
image: mycompany/dev-env:latest
resources:
requests:
memory: "4Gi"
cpu: "2000m"Example 4: gVisor with PersistentCheckpoint (Pre-Migration Checkpoints)
The runsc-architect runtime class uses gVisor for enhanced security isolation. Unlike runc-architect, gVisor pods use a PersistentCheckpoint CRD to trigger checkpoint creation while keeping the pod running. This is useful for:
- Creating checkpoints of running applications without stopping them (for backups)
- Pre-creating checkpoints before planned migrations for faster pod recreation
- Creating "golden images" that can be used to quickly spin up new pods with pre-loaded state
apiVersion: apps/v1
kind: Deployment
metadata:
name: valkey-cache
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: valkey
template:
metadata:
labels:
app: valkey
annotations:
# Mark entire pod as managed (required for gVisor)
architect.loopholelabs.io/managed-pod: "true"
spec:
runtimeClassName: runsc-architect # gVisor runtime
containers:
- name: valkey
image: valkey/valkey:latest
ports:
- containerPort: 6379
resources:
requests:
memory: "256Mi"
cpu: "100m"To create a checkpoint while the pod continues running, create a PersistentCheckpoint CRD:
apiVersion: architect.loopholelabs.io/v1
kind: PersistentCheckpoint
metadata:
name: persistent-checkpoint-demo
namespace: default
spec:
podName: example-valkey-valkey-7d7f78c4f7-5f6ssApply it with:
kubectl apply -f persistentcheckpoint.yamlThe PersistentCheckpoint CRD will persist after the checkpoint is created and will contain the embedded checkpoint data. You can verify the checkpoint was created by checking that the CRD has checkpoint data:
kubectl get persistentcheckpoint persistent-checkpoint-demo -o jsonpath='{.spec.checkpoint}'
# Should return checkpoint data with podTemplateHash and replicasIn addition to this, an event is emitted on the pod itself after a checkpoint is created.
When the pod is later deleted (e.g., during node drain or manual deletion), a new pod with the same pod template hash (e.g. one that's created by a deployment controller) will restore from the checkpoint created by the PersistentCheckpoint CRD.
Restoring from a PersistentCheckpoint with a Different Pod Template
You can also restore a completely new deployment from an existing PersistentCheckpoint using the start-from-persistent-checkpoint annotation. This is useful for:
- Creating new deployments from a "golden image" checkpoint
- Restoring state to a different deployment with a different configuration
- Cross-namespace restore (restoring from a checkpoint in a different namespace)
apiVersion: apps/v1
kind: Deployment
metadata:
name: valkey-from-checkpoint
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: valkey-restored
template:
metadata:
labels:
app: valkey-restored
annotations:
architect.loopholelabs.io/managed-pod: "true"
# Restore from an existing PersistentCheckpoint (format: namespace/name)
architect.loopholelabs.io/start-from-persistent-checkpoint: "default/persistent-checkpoint-demo"
spec:
runtimeClassName: runsc-architect
containers:
- name: valkey
image: valkey/valkey:latest
ports:
- containerPort: 6379Important notes about checkpoint precedence:
- If a regular
CheckpointCRD exists (from a previous pod with the same template hash), it takes precedence over thestart-from-persistent-checkpointannotation - The
start-from-persistent-checkpointannotation is only used when no regular checkpoint exists - If the referenced
PersistentCheckpointdoesn't exist or has no checkpoint data, the pod starts fresh
Deleting a PersistentCheckpoint
When you delete a PersistentCheckpoint CRD, the checkpoint files are automatically cleaned up from disk:
kubectl delete persistentcheckpoint persistent-checkpoint-demoAfter deletion, any new pods referencing this PersistentCheckpoint via the start-from-persistent-checkpoint annotation will start fresh (no data restored).
This is different from regular Checkpoint CRDs, which are automatically deleted after being consumed by a new pod. PersistentCheckpoint CRDs persist until explicitly deleted, allowing multiple pods to restore from the same checkpoint.
Monitoring and Observability
Pod Status Labels
Architect adds specific labels to track container hibernation state:
# Check hibernation status for a specific container
kubectl get pods -l status.architect.loopholelabs.io/<container-name>=SCALED_DOWN
# Example: Check if the 'api' container is hibernated
kubectl get pods -l status.architect.loopholelabs.io/api=SCALED_DOWN
# List all pods with any hibernated containers
kubectl get pods -o json | jq '.items[] | select(.metadata.labels | to_entries[] | select(.key | startswith("status.architect.loopholelabs.io/")) | .value == "SCALED_DOWN") | .metadata.name'Resource Tracking Annotations
When a pod hibernates, Architect preserves the original resource requests in annotations:
# View original CPU requests for hibernated containers
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/cpu-requests}'
# Output: {"container-name":"250m"}
# View original memory requests
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/memory-requests}'
# Output: {"container-name":"6Gi"}Resource Consumption
Monitor actual resource usage to see savings (requires Kubernetes 1.33 or higher):
# View resource consumption of pods
kubectl top pods
# Hibernated pods will show zero CPU and memory usage
# Compare with original requests stored in annotations to calculate savingsLogs
Architect components log important events:
# View architectd logs on a specific node
kubectl logs -n architect -l app=architectd --tail=100
# View admission controller logs
kubectl logs -n architect -l app=architect-manager --tail=100
# Filter logs for specific pod events
kubectl logs -n architect -l app=architectd | grep <pod-name>Known Limitations
- GPU Workloads: GPU state preservation is under development
Testing Your Application
Before deploying to production, test your application's compatibility:
# 1. Deploy with Architect in staging
# 2. Generate typical load
# 3. Let it hibernate (check status label)
kubectl get pod <pod> -o jsonpath='{.metadata.labels.status\.architect\.loopholelabs\.io/<container>}'
# 4. Wake it with traffic
kubectl exec <pod> -- curl localhost:<port>/health
# 5. Verify functionality and state preservation
# 6. Check logs if there are errors
kubectl logs -n architect -l app=architectd | grep <pod>Best Practices
1. Node Configuration
- Label nodes appropriately: Only label nodes where you want Architect workloads to run
- Avoid preemptable nodes for Architect components: The
architect-managerandarchitect-control-planeshould run on stable nodes - Separate control plane from workloads: Run Architect control components on different nodes than your workloads when possible
2. Application Suitability
Well-suited applications:
- Stateless web services and APIs
- Microservices with intermittent traffic
- Development and staging environments
- Batch processing jobs with idle periods
- Services with predictable traffic patterns
Applications requiring careful consideration:
- GPU workloads requiring CUDA state preservation (under development)
3. Configuration Guidelines
- Start with conservative timeouts: Begin with 30-60 second idle timeouts and decrease gradually
- Test in staging first: Always validate hibernation behavior in non-production environments
- Monitor wake times: Ensure your SLOs are met with the hibernation/wake cycle
4. Capacity Planning
With Architect, you can:
- Overprovision without cost penalty: Run more replicas for better availability
- Eliminate scaling buffers: No need for extra replicas to handle scale-up delays
- Simplify HPA configuration: Focus on actual capacity needs, not scaling delays
Updating and Managing Workloads
Adding Architect to Existing Workloads
- Add the runtime class:
spec:
runtimeClassName: runc-architect- Add the managed containers annotation:
annotations:
architect.loopholelabs.io/managed-containers: '["your-container"]'- Apply the changes:
kubectl apply -f your-deployment.yamlRemoving Architect from Workloads
To disable Architect for a workload:
- Remove the container from the managed containers list:
annotations:
architect.loopholelabs.io/managed-containers: "[]"- Or remove the runtime class:
# Remove or comment out:
# runtimeClassName: runc-architect- Apply changes (no need to delete the pod):
kubectl apply -f your-deployment.yamlTroubleshooting
Pod Not Hibernating
Check idle timeout configuration:
# View configured timeout (default is 10s if not set)
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/scaledown-durations}'Verify container is managed:
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/managed-containers}'Check container status label:
# Check if container shows as scaled down
kubectl get pod <pod-name> -o jsonpath='{.metadata.labels.status\.architect\.loopholelabs\.io/<container-name>}'Review architectd logs for hibernation events:
kubectl logs -n architect -l app=architectd | grep <pod-name>Pod Not Waking
Test wake triggers:
# Wake via kubectl exec
kubectl exec -it <pod-name> -- /bin/sh -c "echo test"
# Wake via network traffic (if service exposed)
kubectl port-forward <pod-name> <port>:<port>
curl localhost:<port>Check pod events:
kubectl describe pod <pod-name>Verify architectd is running on the node:
# Find which node the pod is on
kubectl get pod <pod-name> -o wide
# Check architectd on that node
kubectl get pods -n architect -o wide | grep <node-name>High Wake Times
If wake times exceed 50ms:
- Check node CPU and memory availability
- Verify no resource contention on the node
- Check checkpoint size (larger applications take longer)
- Review architectd logs for restore errors
Checkpoint Failures
Common causes and solutions:
-
Application incompatibility:
- Applications using GPUs are not currently supported
-
Disk space issues:
# Check disk space on nodes kubectl get nodes -o custom-columns=NAME:.metadata.name,DISK:.status.allocatable.ephemeral-storage -
Permission issues:
- Ensure the runtime class is properly set
- Verify node labels are correct
-
Review detailed logs:
# Get detailed architectd logs kubectl logs -n architect -l app=architectd --tail=500 | grep -E "checkpoint|restore|error"
Customizing the Helm Chart
The Helm chart supports additional configuration options for customizing components, for example you can chose to install a development version:
--devel --version 0.0.0-pojntfx-arch-394-implement-p2p-evac-for-new-architect.1.9b433b9Or add custom node selectors for components to further restrict pod placement:
--set 'architectdNodeSelector.custom-label=value' \
--set 'architectAdmissionControllerNodeSelector.zone=us-east-1a' \
--set 'architectControlPlaneNodeSelector.tier=critical'Add tolerations for components to allow scheduling pods to tained nodes:
--set 'architectdTolerations[0].key=dedicated' \
--set 'architectdTolerations[0].operator=Equal' \
--set 'architectdTolerations[0].value=architect' \
--set 'architectdTolerations[0].effect=NoSchedule'And set resource requests and limits for the different components:
--set 'architectAdmissionControllerResources.requests.cpu=100m' \
--set 'architectAdmissionControllerResources.requests.memory=128Mi' \
--set 'architectAdmissionControllerResources.limits.cpu=500m' \
--set 'architectAdmissionControllerResources.limits.memory=512Mi' \
--set 'architectControlPlaneResources.requests.cpu=200m' \
--set 'architectControlPlaneResources.requests.memory=256Mi'FAQ
Q: How is this different from scale-to-zero solutions like KEDA or Knative?
A: Scale-to-zero solutions delete pods entirely, causing 30-60+ second cold starts when they're needed again. Architect keeps pods scheduled but hibernates them in place, enabling <50ms wake times. Your pods stay registered with services, keep their PVCs mounted, and maintain their network configuration.
Q: What triggers a pod to wake from hibernation?
A: Pods wake instantly (<50ms) when:
- Network traffic arrives at the pod
- You run
kubectl execcommands on the container - (Coming soon) API calls to programmatically wake pods
The wake process is automatic and transparent - your application doesn't need any modifications.
Q: Can I use Architect with HPA (Horizontal Pod Autoscaler)?
A: Yes! Architect complements HPA perfectly. HPA handles adding/removing replicas based on metrics, while Architect ensures idle replicas don't consume resources. You can now set more aggressive HPA policies without cost concerns.
Q: What applications are not compatible?
A: Currently, applications using GPUs are not compatible. We recommend thorough testing in staging environments.
Q: How much overhead does Architect add?
A: Architect adds minimal overhead - typically <1% CPU and <50MB memory per node for the architectd daemon. The checkpoint/restore process itself is highly optimized with near-zero impact on running workloads.
Q: Can I migrate hibernated pods between nodes?
A: Yes - pods that are deleted on one node will have their checkpoints moved to whichever node the replacement pod is scheduled to.
Q: What happens during Kubernetes upgrades?
A: Architect components should be upgraded first, followed by your workloads. Hibernated pods will be woken during node drains and can be safely rescheduled.
Q: Is there a limit to how many pods can be hibernated?
A: There's no hard limit. The practical limit depends on your node's disk space for storing checkpoints (typically 50-200MB per pod) and the architectd daemon's capacity.
Q: How do I know how much I'm saving?
A: Monitor the difference between provisioned resources and actual usage:
# Provisioned resources
kubectl get pods -o custom-columns=NAME:.metadata.name,CPU:.spec.containers[0].resources.requests.cpu,MEMORY:.spec.containers[0].resources.requests.memory
# Actual usage (hibernated pods show ~0)
kubectl top podsA more concise breakdown will be available soon at https://console.architect.io/
Q: What happens to in-flight requests?
A: Architect monitors network traffic and only hibernates pods that have been truly idle (no traffic) for the configured duration. If a request arrives while a pod is transitioning to hibernation or while it's hibernated, it's buffered and delivered once the pod wakes (typically within 50ms). No packets are dropped.
Q: Can I change the managed containers list without restarting pods?
A: While you can update the managed-containers annotation without restarting, it's not recommended. When you remove a container from the managed list, its checkpoint is deleted and it becomes unmanaged. For predictable behavior, use the Recreate deployment strategy or restart pods after changing the annotation.
Q: How much disk space do checkpoints require?
A: Checkpoint size varies by application but typically ranges from 50-200MB per pod. The size depends on the application's memory footprint and state. Monitor disk usage on nodes with:
kubectl exec -n architect <architectd-pod> -- du -sh /var/lib/architect/checkpoints/Q: Does Architect work with StatefulSets?
A: Yes, Architect works with StatefulSets. Each pod maintains its own checkpoint and persistent volume claims remain mounted during hibernation. Use the same annotations and runtime class as with Deployments.
Q: What happens if the architectd daemon crashes?
A: If architectd crashes on a node, pods on that node continue running normally but won't hibernate or wake. The daemon automatically restarts via the DaemonSet controller. Existing checkpoints are preserved and operations resume once architectd is back online.
Q: When should I use gVisor (runsc-architect) vs runc (runc-architect)?
A: Choose based on your needs:
- Use
runc-architectfor most workloads. It provides automatic hibernation on inactivity and can wake up again on activity. Best for web services, APIs, and general (trusted) workloads. Checkpoints here are container-scoped, meaning that all containers get checkpointed together or you can checkpoint individual containers and opt out others. - Use
runsc-architectwhen you need enhanced security isolation (gVisor's sandbox), e.g. for untrusted workloads, or want explicit control over checkpoint timing. With gVisor, you create checkpoints manually usingPersistentCheckpointCRDs while the pod continues running. This is useful for creating backup snapshots or pre-migration checkpoints without stopping your workload. Note thatrunsc-architectdoes not support init containers, except for the case where they are used as sidecar containers.Checkpoints here are pod-scoped, meaning that all containers get checkpointed together.
Q: Can I exclude certain pods from hibernation temporarily?
A: Yes, you can:
- Remove the container from the
managed-containersannotation - Set a very long timeout (e.g., "24h")
- Remove the
runc-architectruntime class (requires pod restart)
Q: How do I calculate my actual cost savings?
A: Track these metrics:
# Total provisioned resources
kubectl get pods -o json | jq '[.items[] | .spec.containers[] | .resources.requests] | map(.memory // "0", .cpu // "0") | add'
# Resources actually being used
kubectl top pods --no-headers | awk '{sum+=$2} END {print sum}'
# Savings = (Provisioned - Actual) * Cloud provider ratesQ: Where are checkpoints cached?
A: Checkpoints are cached on each node that architectd runs on, at /root/.local/state/architect/.
Q: How do I uninstall Architect?
A: To uninstall Architect without disrupting your running workloads, you must follow a specific sequence. Failure to update your workloads before uninstallation will result in pod errors.
- Prepare Your Workloads (Required): before running the uninstall command, you must update your workload configurations to ensure they no longer depend on the Architect runtime.
- Remove
runtimeClassName: Remove this configuration from all active workloads. If you uninstall Architect while runtimeClassName is still active, all managed pods will immediately enter anErrorstate. - Remove Annotations (optional): Delete any
architect.loopholelabs.io/*annotations to keep your manifests clean.
- Execute Uninstallation: once your workloads are updated, use Helm to remove the Architect release:
helm uninstall -n architect architect- Post-Uninstall Cleanup: to fully clean your environment, manually remove any remaining Architect-specific labels from your cluster nodes.
Support and Resources
- Documentation: This guide and architecture documentation
- Support: Contact Loophole Labs support team
- License: Contact admin@loopholelabs.io for enterprise installations
Conclusion
Architect for Kubernetes fundamentally changes the economics of running Kubernetes workloads. By eliminating idle resource consumption while maintaining instant availability, you can:
- Overprovision for peak capacity without cost penalties
- Reduce infrastructure spend by 30-80%
- Maintain or improve application performance
- Simplify capacity planning and autoscaling
Next Steps
- Start Small: Deploy Architect in a development or staging environment first
- Test Compatibility: Verify your applications checkpoint and restore correctly
- Monitor Savings: Track resource consumption before and after enabling Architect
- Optimize Timeouts: Fine-tune idle timeouts based on your traffic patterns
Getting Help
- Review the architecture documentation for deep technical details
- Contact support for assistance with specific use cases
- Join our community discussions for tips and best practices
Start with a small subset of workloads, measure the benefits, and gradually expand your Architect deployment for maximum savings.