Docs | Loophole Labs

Architect for Kubernetes revolutionizes Kubernetes cost optimization by enabling pods to hibernate in place when idle and wake instantly (<50ms) when needed. Unlike traditional autoscaling solutions that delete pods and cause cold starts, Architect keeps pods scheduled while reducing their resource consumption to zero during idle periods.

Key Benefits

Zero idle costs: Hibernated pods consume no CPU or memory
Instant wake times: Pods restore in <50ms vs 30-60+ seconds for cold starts
No application changes: Works with existing workloads
Pods stay scheduled: No delays from rescheduling, PVC mounting, or service registration

Quick Start

Want to see Architect in action immediately? Here's the fastest way to get started:

# 1. Label nodes and install Architect (get your command from https://console.architect.io/)
kubectl label nodes <node-name> architect.loopholelabs.io/node=true
kubectl label nodes <node-name> architect.loopholelabs.io/critical-node=true

helm uninstall -n architect architect || true
helm install architect oci://ghcr.io/loopholelabs/architect-chart \
  --namespace architect --create-namespace \
  --set kubernetesDistro="eks" \
  --set machineToken="mymachinetoken" \
  --set clusterName="myclustername" --wait

# 2. Deploy the example Go application
helm uninstall example-go || true
helm install example-go oci://ghcr.io/loopholelabs/example-go-chart --wait

# 3. Watch the pod hibernate after 10 seconds of inactivity
kubectl get pods -w

# 4. Wake it up instantly with a request
kubectl exec -it <example-go-pod> -- curl localhost:8080

# 5. Observe the resource savings
kubectl top pods

Other example applications that you can deploy for testing Architect behaviour (these are already pre-configured to be managed by Architect):

helm upgrade example-valkey oci://ghcr.io/loopholelabs/example-valkey-chart --install --wait
helm upgrade example-python oci://ghcr.io/loopholelabs/example-python-chart --install --wait
helm upgrade example-ruby oci://ghcr.io/loopholelabs/example-ruby-chart --install --wait
helm upgrade example-rust-miniserve oci://ghcr.io/loopholelabs/example-rust-miniserve-chart --install --wait
helm upgrade example-kafka oci://ghcr.io/loopholelabs/example-kafka-chart --install --wait
helm upgrade example-spring-boot oci://ghcr.io/loopholelabs/example-spring-boot-chart --install --wait
helm upgrade example-php-wordpress oci://ghcr.io/loopholelabs/example-php-wordpress-chart --install --wait
helm upgrade example-postgres oci://ghcr.io/loopholelabs/example-postgres-chart --install --wait

How It Works

Architect continuously monitors your pods for activity. When a pod becomes idle (no network traffic for a configured duration), Architect:

Creates a checkpoint of the complete pod state (memory, file descriptors, network connections)
Hibernates the pod in place, reducing resource requests to zero
Keeps the pod scheduled and registered with services
Instantly restores the pod when traffic arrives or when accessed via kubectl exec

Wake Triggers

Pods automatically wake from hibernation when:

Network traffic arrives - Any incoming network packet triggers immediate restoration
kubectl exec commands - Running commands in the container wakes it instantly
API calls (coming soon) - Programmatic wake/sleep control via Architect API

Installation

Prerequisites

Kubernetes cluster version 1.32 or higher (1.33 is required for pod sleeping)
Helm 3 or higher
Nodes where Architect workloads will run must be labeled
For Amazon EKS: must use AL2023 AMI (AL2 is not supported)

Step 1: Install Architect

Sign into https://console.architect.io/ and click on the + Add Cluster button, then follow the instructions.

If you want to manage your Architect installation via GitOps, you may want to create the following secret:

kubectl create secret generic architect-secrets \
    --from-literal=machineToken="YOUR_MACHINE_TOKEN" \
    --namespace=architect \
    --output=yaml \
    --dry-run=client \
| kubectl apply --filename -

And then install Architect by referencing this secret via the secretRef Helm chart property (e.g. --set secretRef=architect-secrets).

Step 2: Verify Installation

# Check that all Architect components are running
kubectl get pods -n architect

# You should see:
# - architect-manager (admission controller)
# - architect-control-plane
# - architectd pods on each labeled node

Configuring Workloads for Architect

To enable Architect for your workloads, you need to:

Set the runtime class to runc-architect
Specify which containers to manage
Configure idle timeouts

Basic Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  template:
    metadata:
      annotations:
        # Specify which containers Architect should manage
        architect.loopholelabs.io/managed-containers: '["my-app-container"]'
        # Set idle timeout (optional, default is 10s)
        architect.loopholelabs.io/scaledown-durations: '{"my-app-container":"30s"}'
    spec:
      runtimeClassName: runc-architect # Required
      containers:
        - name: my-app-container
          image: my-app:latest
          resources:
            requests:
              memory: "512Mi"
              cpu: "250m"

Configuration Options

Runtime Class (Required)

spec:
  runtimeClassName: runc-architect

This tells Kubernetes to use Architect's custom runtime for this pod. Two runtime classes are available:

runc-architect: Standard runtime based on runc. Containers can automatically hibernate and reduce their resource after a certain amount of inactivity and wake on activity. Uses the managed-containers annotation.
runsc-architect: gVisor-based runtime with enhanced security isolation. Uses managed-pod annotation and PersistentCheckpoint CRDs for checkpoint creation. See Example 4 for details.

Managed Containers Annotation

architect.loopholelabs.io/managed-containers: '["container-1", "container-2"]'

Lists which containers in the pod should be managed by Architect
Containers not in this list run normally without hibernation
Useful for excluding sidecar containers (e.g., logging agents)

Scale-down Durations Annotation

architect.loopholelabs.io/scaledown-durations: '{"container-1":"30s", "container-2":"60s"}'

Sets how long a container must be idle before hibernating
Format: JSON object with container names as keys and durations as values
Default: 10 seconds if not specified
Minimum: 1s, Maximum: unlimited

Post-Migration Auto Scale Up Containers Annotation

architect.loopholelabs.io/postmigration-autoscaleup-containers: '["container-1", "container-2"]'

List which containers should automatically scale back up after a migration (by default, containers stay scaled down so as to not cause a thundering herd on migrations)

Disable Auto Scale Down Containers Annotation

architect.loopholelabs.io/disable-autoscaledown-containers: '["container-1", "container-2"]'

Lists which containers should not automatically scale down. By default, containers scale down after the duration set in the scale-down duration annotation. If a container is listed in this annotation, it will never scale down automatically
Mostly useful for long-running background jobs that should still be migrated by default, but not scale down when there is no traffic

Scale-Up Timeout Containers Annotation

architect.loopholelabs.io/scaleup-timeout-containers: '{"container-1": "60s", "container-2": "2m"}'

Sets how long a container should wait for a checkpoint to become available during scale-up
Format: JSON object with container names as keys and durations as values
Default: 30 seconds if not specified
When a new pod starts and other pods with the same template hash exist, the container waits up to this timeout for a checkpoint CRD to be advertised before aborting the checkpoint download and starting a fresh container

Managed Pod Annotation (gVisor only)

architect.loopholelabs.io/managed-pod: "true"

Used with the runsc-architect runtime class (gVisor) instead of managed-containers
When set to "true", the entire pod is managed by Architect
Checkpoints for the runsc-architect runtime class are pod-scoped (not container-scoped), so all containers in the pod are checkpointed together
Required for using PersistentCheckpoint CRDs

Migrate EmptyDir Containers Annotation

architect.loopholelabs.io/migrate-emptydir-containers: '["container-1", "container-2"]'

Lists which containers should have their emptyDir volumes preserved during migrations
By default, emptyDir volumes are ephemeral and not migrated
When a container is listed in this annotation, its emptyDir volume data is snapshotted during checkpoint and restored after migration
Useful for applications that store important temporary data in emptyDir volumes

Start From Persistent Checkpoint Annotation (gVisor only)

architect.loopholelabs.io/start-from-persistent-checkpoint: "namespace/persistent-checkpoint-name"

Specifies a PersistentCheckpoint CRD to restore from when no regular checkpoint exists
Used with the runsc-architect runtime class (gVisor)
Enables creating new pods from a "golden image" checkpoint
Allows restoring from a checkpoint in a different namespace
If a regular Checkpoint CRD exists (same pod template hash), it takes precedence over this annotation
If the referenced PersistentCheckpoint doesn't exist or has no checkpoint data, the pod starts fresh

Usage Examples

Example 1: Web API Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 10 # Can now overprovision without cost penalty
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        architect.loopholelabs.io/managed-containers: '["api"]'
        architect.loopholelabs.io/scaledown-durations: '{"api":"30s"}'
    spec:
      runtimeClassName: runc-architect
      containers:
        - name: api
          image: mycompany/api:v2.1
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "1Gi"
              cpu: "500m"
            limits:
              memory: "2Gi"
              cpu: "1000m"

Example 2: Microservices with Sidecar

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  replicas: 15
  template:
    metadata:
      annotations:
        # Only manage the main container, not the sidecar
        architect.loopholelabs.io/managed-containers: '["order-service"]'
        architect.loopholelabs.io/scaledown-durations: '{"order-service":"60s"}'
    spec:
      runtimeClassName: runc-architect
      containers:
        - name: order-service
          image: mycompany/order-service:v1.5
          ports:
            - containerPort: 8080
        - name: logging-agent
          image: fluentd:latest
          # This container is not managed by Architect

Example 3: Development Environment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dev-environment
  namespace: development
spec:
  replicas: 50 # One per developer, most idle
  template:
    metadata:
      annotations:
        architect.loopholelabs.io/managed-containers: '["dev-container"]'
        # Aggressive hibernation for dev environments
        architect.loopholelabs.io/scaledown-durations: '{"dev-container":"5s"}'
    spec:
      runtimeClassName: runc-architect
      containers:
        - name: dev-container
          image: mycompany/dev-env:latest
          resources:
            requests:
              memory: "4Gi"
              cpu: "2000m"

Example 4: gVisor with PersistentCheckpoint (Pre-Migration Checkpoints)

The runsc-architect runtime class uses gVisor for enhanced security isolation. Unlike runc-architect, gVisor pods use a PersistentCheckpoint CRD to trigger checkpoint creation while keeping the pod running. This is useful for:

Creating checkpoints of running applications without stopping them (for backups)
Pre-creating checkpoints before planned migrations for faster pod recreation
Creating "golden images" that can be used to quickly spin up new pods with pre-loaded state

apiVersion: apps/v1
kind: Deployment
metadata:
  name: valkey-cache
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: valkey
  template:
    metadata:
      labels:
        app: valkey
      annotations:
        # Mark entire pod as managed (required for gVisor)
        architect.loopholelabs.io/managed-pod: "true"
    spec:
      runtimeClassName: runsc-architect # gVisor runtime
      containers:
        - name: valkey
          image: valkey/valkey:latest
          ports:
            - containerPort: 6379
          resources:
            requests:
              memory: "256Mi"
              cpu: "100m"

To create a checkpoint while the pod continues running, create a PersistentCheckpoint CRD:

apiVersion: architect.loopholelabs.io/v1
kind: PersistentCheckpoint
metadata:
  name: persistent-checkpoint-demo
  namespace: default
spec:
  podName: example-valkey-valkey-7d7f78c4f7-5f6ss

Apply it with:

kubectl apply -f persistentcheckpoint.yaml

The PersistentCheckpoint CRD will persist after the checkpoint is created and will contain the embedded checkpoint data. You can verify the checkpoint was created by checking that the CRD has checkpoint data:

kubectl get persistentcheckpoint persistent-checkpoint-demo -o jsonpath='{.spec.checkpoint}'
# Should return checkpoint data with podTemplateHash and replicas

In addition to this, an event is emitted on the pod itself after a checkpoint is created.

When the pod is later deleted (e.g., during node drain or manual deletion), a new pod with the same pod template hash (e.g. one that's created by a deployment controller) will restore from the checkpoint created by the PersistentCheckpoint CRD.

Restoring from a PersistentCheckpoint with a Different Pod Template

You can also restore a completely new deployment from an existing PersistentCheckpoint using the start-from-persistent-checkpoint annotation. This is useful for:

Creating new deployments from a "golden image" checkpoint
Restoring state to a different deployment with a different configuration
Cross-namespace restore (restoring from a checkpoint in a different namespace)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: valkey-from-checkpoint
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: valkey-restored
  template:
    metadata:
      labels:
        app: valkey-restored
      annotations:
        architect.loopholelabs.io/managed-pod: "true"
        # Restore from an existing PersistentCheckpoint (format: namespace/name)
        architect.loopholelabs.io/start-from-persistent-checkpoint: "default/persistent-checkpoint-demo"
    spec:
      runtimeClassName: runsc-architect
      containers:
        - name: valkey
          image: valkey/valkey:latest
          ports:
            - containerPort: 6379

Important notes about checkpoint precedence:

If a regular Checkpoint CRD exists (from a previous pod with the same template hash), it takes precedence over the start-from-persistent-checkpoint annotation
The start-from-persistent-checkpoint annotation is only used when no regular checkpoint exists
If the referenced PersistentCheckpoint doesn't exist or has no checkpoint data, the pod starts fresh

Deleting a PersistentCheckpoint

When you delete a PersistentCheckpoint CRD, the checkpoint files are automatically cleaned up from disk:

kubectl delete persistentcheckpoint persistent-checkpoint-demo

After deletion, any new pods referencing this PersistentCheckpoint via the start-from-persistent-checkpoint annotation will start fresh (no data restored).

This is different from regular Checkpoint CRDs, which are automatically deleted after being consumed by a new pod. PersistentCheckpoint CRDs persist until explicitly deleted, allowing multiple pods to restore from the same checkpoint.

Monitoring and Observability

Pod Status Labels

Architect adds specific labels to track container hibernation state:

# Check hibernation status for a specific container
kubectl get pods -l status.architect.loopholelabs.io/<container-name>=SCALED_DOWN

# Example: Check if the 'api' container is hibernated
kubectl get pods -l status.architect.loopholelabs.io/api=SCALED_DOWN

# List all pods with any hibernated containers
kubectl get pods -o json | jq '.items[] | select(.metadata.labels | to_entries[] | select(.key | startswith("status.architect.loopholelabs.io/")) | .value == "SCALED_DOWN") | .metadata.name'

Resource Tracking Annotations

When a pod hibernates, Architect preserves the original resource requests in annotations:

# View original CPU requests for hibernated containers
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/cpu-requests}'
# Output: {"container-name":"250m"}

# View original memory requests
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/memory-requests}'
# Output: {"container-name":"6Gi"}

Resource Consumption

Monitor actual resource usage to see savings (requires Kubernetes 1.33 or higher):

# View resource consumption of pods
kubectl top pods

# Hibernated pods will show zero CPU and memory usage
# Compare with original requests stored in annotations to calculate savings

Logs

Architect components log important events:

# View architectd logs on a specific node
kubectl logs -n architect -l app=architectd --tail=100

# View admission controller logs
kubectl logs -n architect -l app=architect-manager --tail=100

# Filter logs for specific pod events
kubectl logs -n architect -l app=architectd | grep <pod-name>

Known Limitations

GPU Workloads: GPU state preservation is under development

Testing Your Application

Before deploying to production, test your application's compatibility:

# 1. Deploy with Architect in staging
# 2. Generate typical load
# 3. Let it hibernate (check status label)
kubectl get pod <pod> -o jsonpath='{.metadata.labels.status\.architect\.loopholelabs\.io/<container>}'

# 4. Wake it with traffic
kubectl exec <pod> -- curl localhost:<port>/health

# 5. Verify functionality and state preservation
# 6. Check logs if there are errors
kubectl logs -n architect -l app=architectd | grep <pod>

Best Practices

1. Node Configuration

Label nodes appropriately: Only label nodes where you want Architect workloads to run
Avoid preemptable nodes for Architect components: The architect-manager and architect-control-plane should run on stable nodes
Separate control plane from workloads: Run Architect control components on different nodes than your workloads when possible

2. Application Suitability

Well-suited applications:

Stateless web services and APIs
Microservices with intermittent traffic
Development and staging environments
Batch processing jobs with idle periods
Services with predictable traffic patterns

Applications requiring careful consideration:

GPU workloads requiring CUDA state preservation (under development)

3. Configuration Guidelines

Start with conservative timeouts: Begin with 30-60 second idle timeouts and decrease gradually
Test in staging first: Always validate hibernation behavior in non-production environments
Monitor wake times: Ensure your SLOs are met with the hibernation/wake cycle

4. Capacity Planning

With Architect, you can:

Overprovision without cost penalty: Run more replicas for better availability
Eliminate scaling buffers: No need for extra replicas to handle scale-up delays
Simplify HPA configuration: Focus on actual capacity needs, not scaling delays

Updating and Managing Workloads

Adding Architect to Existing Workloads

Add the runtime class:

spec:
  runtimeClassName: runc-architect

Add the managed containers annotation:

annotations:
  architect.loopholelabs.io/managed-containers: '["your-container"]'

Apply the changes:

kubectl apply -f your-deployment.yaml

Removing Architect from Workloads

To disable Architect for a workload:

Remove the container from the managed containers list:

annotations:
  architect.loopholelabs.io/managed-containers: "[]"

Or remove the runtime class:

# Remove or comment out:
# runtimeClassName: runc-architect

Apply changes (no need to delete the pod):

kubectl apply -f your-deployment.yaml

Troubleshooting

Pod Not Hibernating

Check idle timeout configuration:

# View configured timeout (default is 10s if not set)
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/scaledown-durations}'

Verify container is managed:

kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.architect\.loopholelabs\.io/managed-containers}'

Check container status label:

# Check if container shows as scaled down
kubectl get pod <pod-name> -o jsonpath='{.metadata.labels.status\.architect\.loopholelabs\.io/<container-name>}'

Review architectd logs for hibernation events:

kubectl logs -n architect -l app=architectd | grep <pod-name>

Pod Not Waking

Test wake triggers:

# Wake via kubectl exec
kubectl exec -it <pod-name> -- /bin/sh -c "echo test"

# Wake via network traffic (if service exposed)
kubectl port-forward <pod-name> <port>:<port>
curl localhost:<port>

Check pod events:

kubectl describe pod <pod-name>

Verify architectd is running on the node:

# Find which node the pod is on
kubectl get pod <pod-name> -o wide

# Check architectd on that node
kubectl get pods -n architect -o wide | grep <node-name>

High Wake Times

If wake times exceed 50ms:

Check node CPU and memory availability
Verify no resource contention on the node
Check checkpoint size (larger applications take longer)
Review architectd logs for restore errors

Checkpoint Failures

Common causes and solutions:

Application incompatibility:
- Applications using GPUs are not currently supported

Disk space issues:

# Check disk space on nodes
kubectl get nodes -o custom-columns=NAME:.metadata.name,DISK:.status.allocatable.ephemeral-storage

Permission issues:
- Ensure the runtime class is properly set
- Verify node labels are correct

Review detailed logs:

# Get detailed architectd logs
kubectl logs -n architect -l app=architectd --tail=500 | grep -E "checkpoint|restore|error"

Customizing the Helm Chart

The Helm chart supports additional configuration options for customizing components, for example you can chose to install a development version:

--devel --version 0.0.0-pojntfx-arch-394-implement-p2p-evac-for-new-architect.1.9b433b9

Or add custom node selectors for components to further restrict pod placement:

--set 'architectdNodeSelector.custom-label=value' \
--set 'architectAdmissionControllerNodeSelector.zone=us-east-1a' \
--set 'architectControlPlaneNodeSelector.tier=critical'

Add tolerations for components to allow scheduling pods to tained nodes:

--set 'architectdTolerations[0].key=dedicated' \
--set 'architectdTolerations[0].operator=Equal' \
--set 'architectdTolerations[0].value=architect' \
--set 'architectdTolerations[0].effect=NoSchedule'

And set resource requests and limits for the different components:

--set 'architectAdmissionControllerResources.requests.cpu=100m' \
--set 'architectAdmissionControllerResources.requests.memory=128Mi' \
--set 'architectAdmissionControllerResources.limits.cpu=500m' \
--set 'architectAdmissionControllerResources.limits.memory=512Mi' \
--set 'architectControlPlaneResources.requests.cpu=200m' \
--set 'architectControlPlaneResources.requests.memory=256Mi'

FAQ

Q: How is this different from scale-to-zero solutions like KEDA or Knative?

A: Scale-to-zero solutions delete pods entirely, causing 30-60+ second cold starts when they're needed again. Architect keeps pods scheduled but hibernates them in place, enabling <50ms wake times. Your pods stay registered with services, keep their PVCs mounted, and maintain their network configuration.

Q: What triggers a pod to wake from hibernation?

A: Pods wake instantly (<50ms) when:

Network traffic arrives at the pod
You run kubectl exec commands on the container
(Coming soon) API calls to programmatically wake pods

The wake process is automatic and transparent - your application doesn't need any modifications.

Q: Can I use Architect with HPA (Horizontal Pod Autoscaler)?

A: Yes! Architect complements HPA perfectly. HPA handles adding/removing replicas based on metrics, while Architect ensures idle replicas don't consume resources. You can now set more aggressive HPA policies without cost concerns.

Q: What applications are not compatible?

A: Currently, applications using GPUs are not compatible. We recommend thorough testing in staging environments.

Q: How much overhead does Architect add?

A: Architect adds minimal overhead - typically <1% CPU and <50MB memory per node for the architectd daemon. The checkpoint/restore process itself is highly optimized with near-zero impact on running workloads.

Q: Can I migrate hibernated pods between nodes?

A: Yes - pods that are deleted on one node will have their checkpoints moved to whichever node the replacement pod is scheduled to.

Q: What happens during Kubernetes upgrades?

A: Architect components should be upgraded first, followed by your workloads. Hibernated pods will be woken during node drains and can be safely rescheduled.

Q: Is there a limit to how many pods can be hibernated?

A: There's no hard limit. The practical limit depends on your node's disk space for storing checkpoints (typically 50-200MB per pod) and the architectd daemon's capacity.

Q: How do I know how much I'm saving?

A: Monitor the difference between provisioned resources and actual usage:

# Provisioned resources
kubectl get pods -o custom-columns=NAME:.metadata.name,CPU:.spec.containers[0].resources.requests.cpu,MEMORY:.spec.containers[0].resources.requests.memory

# Actual usage (hibernated pods show ~0)
kubectl top pods

A more concise breakdown will be available soon at https://console.architect.io/

Q: What happens to in-flight requests?

A: Architect monitors network traffic and only hibernates pods that have been truly idle (no traffic) for the configured duration. If a request arrives while a pod is transitioning to hibernation or while it's hibernated, it's buffered and delivered once the pod wakes (typically within 50ms). No packets are dropped.

Q: Can I change the managed containers list without restarting pods?

A: While you can update the managed-containers annotation without restarting, it's not recommended. When you remove a container from the managed list, its checkpoint is deleted and it becomes unmanaged. For predictable behavior, use the Recreate deployment strategy or restart pods after changing the annotation.

Q: How much disk space do checkpoints require?

A: Checkpoint size varies by application but typically ranges from 50-200MB per pod. The size depends on the application's memory footprint and state. Monitor disk usage on nodes with:

kubectl exec -n architect <architectd-pod> -- du -sh /var/lib/architect/checkpoints/

Q: Does Architect work with StatefulSets?

A: Yes, Architect works with StatefulSets. Each pod maintains its own checkpoint and persistent volume claims remain mounted during hibernation. Use the same annotations and runtime class as with Deployments.

Q: What happens if the architectd daemon crashes?

A: If architectd crashes on a node, pods on that node continue running normally but won't hibernate or wake. The daemon automatically restarts via the DaemonSet controller. Existing checkpoints are preserved and operations resume once architectd is back online.

Q: When should I use gVisor (runsc-architect) vs runc (runc-architect)?

A: Choose based on your needs:

Use runc-architect for most workloads. It provides automatic hibernation on inactivity and can wake up again on activity. Best for web services, APIs, and general (trusted) workloads. Checkpoints here are container-scoped, meaning that all containers get checkpointed together or you can checkpoint individual containers and opt out others.
Use runsc-architect when you need enhanced security isolation (gVisor's sandbox), e.g. for untrusted workloads, or want explicit control over checkpoint timing. With gVisor, you create checkpoints manually using PersistentCheckpoint CRDs while the pod continues running. This is useful for creating backup snapshots or pre-migration checkpoints without stopping your workload. Note that runsc-architect does not support init containers, except for the case where they are used as sidecar containers.Checkpoints here are pod-scoped, meaning that all containers get checkpointed together.

Q: Can I exclude certain pods from hibernation temporarily?

A: Yes, you can:

Remove the container from the managed-containers annotation
Set a very long timeout (e.g., "24h")
Remove the runc-architect runtime class (requires pod restart)

Q: How do I calculate my actual cost savings?

A: Track these metrics:

# Total provisioned resources
kubectl get pods -o json | jq '[.items[] | .spec.containers[] | .resources.requests] | map(.memory // "0", .cpu // "0") | add'

# Resources actually being used
kubectl top pods --no-headers | awk '{sum+=$2} END {print sum}'

# Savings = (Provisioned - Actual) * Cloud provider rates

Q: Where are checkpoints cached?

A: Checkpoints are cached on each node that architectd runs on, at /root/.local/state/architect/.

Q: How do I uninstall Architect?

A: To uninstall Architect without disrupting your running workloads, you must follow a specific sequence. Failure to update your workloads before uninstallation will result in pod errors.

Prepare Your Workloads (Required): before running the uninstall command, you must update your workload configurations to ensure they no longer depend on the Architect runtime.

Remove runtimeClassName: Remove this configuration from all active workloads. If you uninstall Architect while runtimeClassName is still active, all managed pods will immediately enter an Error state.
Remove Annotations (optional): Delete any architect.loopholelabs.io/* annotations to keep your manifests clean.

Execute Uninstallation: once your workloads are updated, use Helm to remove the Architect release:

helm uninstall -n architect architect

Post-Uninstall Cleanup: to fully clean your environment, manually remove any remaining Architect-specific labels from your cluster nodes.

Support and Resources

Documentation: This guide and architecture documentation
Support: Contact Loophole Labs support team
License: Contact admin@loopholelabs.io for enterprise installations

Conclusion

Architect for Kubernetes fundamentally changes the economics of running Kubernetes workloads. By eliminating idle resource consumption while maintaining instant availability, you can:

Overprovision for peak capacity without cost penalties
Reduce infrastructure spend by 30-80%
Maintain or improve application performance
Simplify capacity planning and autoscaling

Next Steps

Start Small: Deploy Architect in a development or staging environment first
Test Compatibility: Verify your applications checkpoint and restore correctly
Monitor Savings: Track resource consumption before and after enabling Architect
Optimize Timeouts: Fine-tune idle timeouts based on your traffic patterns

Getting Help

Review the architecture documentation for deep technical details
Contact support for assistance with specific use cases
Join our community discussions for tips and best practices

Start with a small subset of workloads, measure the benefits, and gradually expand your Architect deployment for maximum savings.

Architect for Kubernetes Documentation