On this page:
TL;DR:
We've made spot instances practical for all Kubernetes workloads. Architect migrates pods to new nodes before spot instances are reclaimed - with zero downtime, zero client interruptions, and zero code changes. Get 75%+ cost savings on your entire Kubernetes infrastructure while maintaining production reliability.
Join our waitlist to run your production workloads on spot instances without any compromises.
The Spot Instance Problem
Spot instances are 75%+ cheaper than on-demand instances, but they can be reclaimed with just 30 seconds to 2 minutes notice. This makes them unusable for production Kubernetes workloads.
When a spot instance is reclaimed, every pod on that node experiences:
- Forced termination
- Rescheduling to a new node (if capacity exists)
- Full application restart and initialization
- Load balancer re-registration
- Client request failures during the transition
Even "stateless" workloads aren't immune. Your users experience errors, timeouts, and degraded performance every time a node is preempted. The unpredictability makes it worse - preemption rates can swing from 3% to over 60% based on time, region, and instance type.
The result? Teams keep paying full price for on-demand instances because reliability matters more than cost savings.
Why Traditional Approaches Fail
Kubernetes provides mechanisms to handle node failures - pod disruption budgets, graceful termination, and cluster autoscaling. But none of these prevent service interruption during spot instance preemption.
Common workarounds all have critical flaws:
- Keep extra replicas: Defeats the cost savings
- Fast autoscaling: Still causes client interruptions during pod startup
- Graceful shutdown handlers: Can't prevent the pod from being terminated
- Mixed node pools: Reduces but doesn't eliminate the problem
You're still forced to choose between cost and reliability.
Enter Architect: Seamless Pod Migrations
We took a different approach. Instead of managing the chaos of preemption, we eliminate it entirely.
Architect continuously snapshots your running pods without interrupting them. This results in Architect always having the complete state of your application on-hand before anything happens. When a spot instance preemption notice arrives, we migrate your pods to new nodes before the original instance terminates. Your applications keep running, connections stay alive, and clients never notice.
This isn't a restart or a reschedule - it's true live migration for any Kubernetes workload.
How It Works
Under the hood Architect is a relatively simple state machine who's only job is to keep your application running under any circumstances. Architect fundamentally moves through four possible states:
-
Normal Operation: In this state Architect is continuously capturing pod state and syncing it across nodes. This is inherently why our migrations are instantaneous - we're amortizing the migrations ahead of time. During this time we're also carefully watching the resource utilization across the cluster to make sure we have enough resources to handle any sudden migrations that come up.
-
Preemption: When a preemption notice from the cloud provider arrives (between 30 seconds and 2 minutes depending on the provider) Architect goes to work moving the nodes. We mostly rely on the Kubernetes scheduler for this part because we've already been coordinating with the scheduler in advance by way of ballast pods and daemonsets. This allows us to be confident that there are enough resources in the correct places to not require a node scale-up.
-
Migrations: This is where Architect earns its keep. Once we have the resources scheduled (ie. the pods themselves) have been rescheduled and all ancillary resources have been recreated (file descriptors, PVC attachments, GPU resources, etc.) it's time to move the state. This is relatively instantaneous thanks to all the work we did ahead of time.
-
Rerouting: Migrations aren't just supposed to be seamless for the application - they should be seamless for clients as well. To achieve this Architect gets to work updating our XDP-based routing layer to transparently reroute connections. This is done in a way so that no packets are dropped and the client can't even tell that the workload has migrated.
The end result is a completely seamless migration experience - neither your clients nor your application can tell
that it's now running on a completely new node. The best part is that this entire process requires no code changes,
just helm install
and your infrastructure is spot-ready.
──/~\ Architect
──Optimize cluster costs and maximize node utilization, all without modifying your applications or your infrastructure.
See It In Action: Live Demo at KubeCon NA 2024
This isn't theoretical. At KubeCon North America 2024, we demonstrated live pod migration across multiple cloud providers (AWS, GCP, and Azure), all while maintaining active client connections:
Inherently Compatible
Architect is inherently compatible with any type of workload and fixes how your infrastructure handles spot preemptions and pod migrations:
- Web Services & APIs: No 502 errors, no request timeouts, no user impact during preemptions
- Databases & Caches: Maintain connections, preserve buffers, no rebalancing overhead
- ML/AI Workloads: Models stay loaded, GPU state is preserved, training continues uninterrupted
- Streaming Systems: Kafka consumers keep their positions, WebSocket connections survive
- Batch Jobs: Complete without restart, no lost progress, no wasted compute
With Architect, every pod in your cluster can now safely run on spot instances.
The Power of Workload Mobility
The same technology that enables spot instance migration also powers:
- Scale-to-Zero Without Cold Starts: Hibernate idle pods, wake them in <50ms
- Node Maintenance Without Downtime: Evacuate nodes anytime without service impact
- Dynamic Cost Optimization: Move workloads to the cheapest compute continuously and automatically
Only when your pods can move without interruption does your infrastructure become truly elastic.
Getting Started
Architect is currently in early access. We're working with select teams to dramatically reduce their Kubernetes costs without compromising reliability.
Join our waitlist to be among the first to deploy Architect in your cluster. We'll help you identify which workloads can benefit most and guide you through maximizing your spot instance savings.
Spot instances have always been a compromise - great prices but unreliable for production. With Architect, that compromise finally disappears.