~/BLOG/engineering

The Math Changed: Rewriting Stale Open Source In The AI Era

TL;DR:

Until recently, rewriting an established open source project could take years. With LLMs, that's changed. We're rewriting CRIU in Zig, and expect it to be complete in months, not years.

Reaching For The Obvious Tool

Anyone who's done home renovations knows there's a tool for every job. Over the years you end up with a shed full of specialized things that each do one job well. Tools you might only need a handful of times, but when the moment comes, nothing else will do.

 My house              Tools to fix up my house
 ════════              ════════════════════════
                       ┌───────────────────────────┐
                       │  hammer   saw    drill    │
                       │  chisel   awl    pliers   │
                       │  level   square   axe     │
    ╱╲                 │  jigsaw   sander  oil     │
   ╱  ╲                │  crowbar  plane    vise   │
  ┌────┐               │  tape   wrench   ratchet  │
  │ ▢▢ │               │  spanner  nailer   awl    │
  └────┘               └───────────────────────────┘

Software is the same story. As we build, we reach for the specialized tools that let us move faster and ship better. And both kinds of tools need maintenance. A rusty saw is more pain than it's worth when you actually need to cut wood, and a stale software tool can cost you more time than it ever saved. The catch with software is that you usually aren't the one keeping it sharp. Someone else is, and you're trusting that they still are.

The standard advice, when you notice your tool getting dull, is to be a good open-source citizen: file the bug, write the patch, send the pull request. Most of us have done it, and the reciprocity is what's kept the ecosystem running. But that model is starting to show strain. Maintainers are quietly drowning in AI-generated patches and bug reports that look right at a glance and burn hours under review.

The contribute-upstream economics worked when humans wrote thoughtful patches for humans to review, at roughly comparable speed. When one side of that equation runs at machine speed and the other doesn't, the whole loop breaks down. It's worth asking whether the model still works at all.

Checkpoint/restore is what Architect does for a living. We take our customers' running Linux processes - their memory, their threads, their open files and sockets - freeze them, pick them up, and bring them back to life on a different machine. There is exactly one tool on Linux that does this: CRIU. It's been around for more than a decade, it's wired into runc, podman, and the wider Kubernetes ecosystem, and it's the obvious thing every team reaches for first.

So we reached for it.

How Open Source Projects Go Stale

The issues we kept running into - and the workarounds we kept piling on top of each other - weren't really CRIU problems. They were structural ones, the kind every long-lived open source project eventually runs into, and no amount of patching around the edges was going to fix them.

People have passions - almost obsessions. We've all had things in our lives we were utterly obsessed with for a while, and then just grew away from. Maintaining that intensity over years, let alone decades, is genuinely hard.

At the start of a project, everything is new. Every commit moves the needle, the mental model fits in your head, and progress is fast and fun. As the project matures, the work shifts. More of your time goes to maintenance, regression hunts, and keeping the existing code from rotting under you. Back to the renovation analogy: we've all got the project on the shelf that died at 90%, because the last 10% was the unfun part.

When code is written by hand, people only have finite energy to push things forward. The original driving forces eventually burn out and move on, and the project keeps running on whoever's left.

The World Doesn't Stop

A long-lived project also exists in a world that keeps changing around it. When CRIU started in 2012, Go had just hit 1.0. Rust was three years from its first stable release. Zig wouldn't exist for another four. The Linux kernel didn't have io_uring. eBPF as we know it didn't exist. Containers were a niche curiosity rather than the substrate of half the world's compute. Build systems, CI, package management, testing conventions, documentation tooling - all of it has moved, often more than once.

When a project doesn't keep refactoring toward its environment, the world around it slowly drifts out of fit. The tells are subtle. CRIU's homepage is still a MediaWiki install in 2026. It's the kind of detail that doesn't matter on its own, but reads like wallpaper from the wrong decade. It quietly marks when the project stopped renovating.

┌──────────────────────────────────────────────────────────────────────────┐
│ File  Edit  View  Help                                                   │
├──────────────────────────────────────────────────────────────────────────┤
│ ◄ ► ⟳ ⌂   http://geocities.com/~user/website-i-have-not-updated-lately   │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   ★ WELCOME TO MY HOMEPAGE ★             visitors: [ 0 0 0 0 4 2 7 ]     │
│   ⚠ UNDER CONSTRUCTION ⚠                 [<animated worker.gif>]         │
│                                                                          │
│   webring:  << prev | random | next >>   last updated: April 1998        │
│   sign my guestbook!                     best viewed in Netscape         │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Some of that drift can be patched over. CRIU did get bolted into runc, podman, and Kubernetes after they emerged. If the existing core exposes a reasonable interface, new orchestration layers can sit on top of it without surgery. Other kinds of drift can't be patched at all. You don't migrate an established C codebase to Rust or Zig with a few pull requests. The implementation language, the memory model, the on-disk formats, and the core architecture all freeze the moment they're chosen. By the time the world has moved on, the cost of moving with it approaches the cost of starting over.

Big Things Don't Innovate

There's a third force at work, and it's structural. Big companies don't innovate. Not because the people inside them lack ideas, but because the company has been optimized, often deliberately, to reduce risk and protect existing revenue. Innovation is by definition risky and disruptive, and a system tuned for stability rejects it the way an immune system rejects a foreign cell.

Mature projects work the same way. Once there are real users, a downstream ecosystem, and a core group that's been around a long time, the dominant force becomes "don't break anything." Reviewers get cautious. Scope tightens. Ambitious patches die in discussion. The project quietly converges on a position: keep it alive, keep it stable, don't touch the bottom layer. Big innovation stops being something anyone can push through, regardless of whether they wanted to.

In the corporate world there's at least an escape valve. Big companies let startups take on the risk, wait until the idea proves out, and acquire the result. The innovation happens outside, and the incumbent buys it once it's a known quantity. Open source has no equivalent. Large mature projects rarely absorb small innovative ones. A successful alternative implementation has to stand on its own; it doesn't get merged into the thing it was built to replace. Meaningful innovation in mature OSS isn't an internal process. It's an external one.

The pattern is everywhere once you look for it. Cassandra didn't get built inside the MySQL codebase. CockroachDB didn't get built inside Postgres, even though it speaks the Postgres wire protocol, uses Postgres-compatible SQL, and targets users who outgrew single-node Postgres. It's the thing Postgres would have become if Postgres could have evolved in that direction internally, and instead it had to be built outside. The modern database landscape (Cassandra, Redis, ClickHouse, DuckDB, Tigerbeetle, and dozens of others) is what happened when people wanted something different and the incumbents weren't going to absorb it. None of it got folded back into MySQL or Postgres. The incumbents keep running on whoever's left.

Just Write Your Own

Software development has always had a tension between reaching for an existing tool and rolling your own version. Both extremes have their famous failure modes. On one end you pull in hundreds of dependencies for trivial things, and find out six months later that one of them has been hijacked or unpublished. On the other you refuse to use ready-made solutions for anything, and spend months rewriting parsers and serialization libraries that the world has had for a decade.

What's kept the second extreme in check, at least until recently, is speed. Rolling your own version of an established tool is expensive. The problem space is large, the edge cases are hostile, the long tail of platform quirks goes on forever. Even if you knew everything you needed to know to rebuild a tool, the months or years of typing got in the way. Most of the time it was cheaper to live with the tool you had than to build the one you wanted. Building from scratch was reserved for the desperate or the well-funded.

How AI Changes The Math

 ✻ Welcome to Claude Code

 cwd: ~/linux

 ╭─────────────────────────────────────────────────────────────╮
 │ > Rewrite the linux kernel in rust on pain of death.        │
 │   I'll be back in 10 mins and expect it to boot. Only use   │
 │   unsafe if you really have to, or if you're just feeling   │
 │   adventurous.                                              │
 ╰─────────────────────────────────────────────────────────────╯

 [10 mins later...]

 ╭─────────────────────────────────────────────────────────────╮
 │ > Sorry I forgot about rust compile time. I'll be back in   │
 │   2 days.                                                   │
 ╰─────────────────────────────────────────────────────────────╯

The barriers to rewriting an established project (in a different language, with different priorities, for a more specific purpose) are essentially gone.

The most interesting thing isn't that AI can write code; it's that the existing project becomes a usable reference. Decades of accumulated decisions, edge cases, and platform quirks used to be the wall blocking any rewrite. Now they're a specification you can read against your own version. The incumbent stops being a prison and starts being documentation.

What you build instead can be optimized for your particular usage rather than every possible usage. Off-the-shelf tools solve a wide range of problems fairly well. A purpose-built tool can solve your problem really well, free of the cruft that accumulates over years of being everything to everyone. Maybe you want it in a different language. Maybe you want to integrate it with the rest of your stack in ways the original architecture forbids. Maybe you want a different memory model, on-disk format, or concurrency story. All of these are now within reach.

What AI doesn't do is the architecture or the judgment. The hard parts of building infrastructure still come from humans: what to expose, what to hide, where the abstractions break, which tradeoffs are real and which are folklore. But the months of typing, the long tail of platform quirks, the exhaustive test matrices, the boilerplate around the actual ideas: that's what's gotten cheap. Removing it doesn't make rewriting trivial. It makes rewriting finite.

Rewriting CRIU In Zig

Live migration is core to what we build. Architect's whole value proposition (moving running workloads across clouds, across nodes, across hardware) depends on checkpoint/restore being fast, predictable, and ours to fix when it breaks. We need to optimize every step of the process, identify and eliminate bottlenecks (almost always disk or network at our scale), and ship a fix the same day we find the bug.

CRIU gets us most of the way there. To be clear: it's a serious piece of software, with more than a decade of careful work, deep kernel integration, and a real engineering effort we have a lot of respect for. But "most of the way" isn't enough when checkpoint/restore is the foundation of your product. And working downstream of a large project with hundreds of open issues unrelated to your needs (issues you can't ignore, because they touch the same code paths your product depends on) is a tax that doesn't go away.

So we're rewriting CRIU in Zig - not as a fork, not as an upstream contribution, but as a different tool aimed squarely at our customers' problems. Why Zig (and not Rust, or C, or Go) is a post of its own; stay tuned.

How The Rewrite Works

   STEP 1   Build a robust test harness against the target
            project, covering ONLY the behavior we want to
            replicate.




   STEP 2   Start the rewrite with the language and tooling
            that makes the most sense today, using the
            target project as the specification.




   STEP 3   Iterate until every test passes.

How's It Going?

It's early days, but the initial signs are good. We can ensure complete correctness against the slice of CRIU we care about, ignore everything we don't, and move fast. We already have a development spike that checkpoints and restores things CRIU can't currently handle (including processes using io_uring), and we expect to drop in our replacement by summer.

What we'll have is our own easy-to-maintain checkpoint/restore codebase, in Zig. Explicit allocators give us predictable memory behavior, comptime replaces brittle macros, and zero-cost C interop lets us reach into existing systems libraries directly. We'll be able to iterate fast on issues, and reach areas of the problem space that CRIU hasn't gotten to through its slower issues process. And we'll be able to integrate tightly into our specific use case, optimizing the things our customers actually care about most.

An LLM rewrite of open source tailored to your use case isn't going to be the right answer for everyone. But the math has changed. For the first time, the cost of building exactly what you need might be lower than the cost of adapting to what already exists.

JIMMY MOORE

Author:

jimmy moore

Founding Engineer

Stay in the {Loop}

Get our latest articles in your inbox by signing up for our newsletter: