Building my own implementation of Dark Factories in Rust and Jujutsu

I read the Gas Town article from Steve Yygge, and honestly, I was inspired. What he as a functional implementation of a fully autonomous software factory, defines in my mind the future of Software Engineering.

What if we could go beyond just having LLMs write our code. What if we could have a fully functioning version of the dark factory pattern that went beyond just coding.

Frequent readers will know where this is going:

How hard could it be?

On th way to Seattle last week, amid a flurry of messages to my colleagues, I decided that it's time to start forging this particular piece of software. I launched my trusty airplane coding partner in crime Ona (huge shoutout by the way) on my iPad and started clicking away. Spun up a repo in my Forge and had Ona clone and start coding away.

The idea was clear -- fully autonomous from requirements to commited code, with an eventual pipeline to get it deployed. For now, I decided the goal was going to be a fully asynchronous system that relied only on the disk for long term persistence. No complex state to manage in the frontend, no state to manage in the backend. The factory simply keeps running, requirements come, in they get decomposed, coded up, requirements are evaluated then get a quick code review, merge conflict management, and finally merged back in.

Currently, the vast majority of coding tools focus on context management, latency, human developer experiences, rather than converting the craft of software development into a truly "factory" like process. The team at Ona and Lou Bichard have done an incredible job at articulating and reviewing material around this topic.

This is an expansion and implementation of their work.

The Factory (tm)

Not an actual trademark, btw, it's just funny to say "The Factory (tm)"

I love Factorio. The people around me know that I probably spend too much time in that game. However, the entire game can be boiled down to "The Factory must Grow". The best way the factory grows is by never having any bottlenecks. The factory must effectively be able to consume all the "raw" material we can throw at it.

One of the key components of a factory in Factorio is throughput -- how much material can move through the factory on any given tick. In factorio, this is entirely controlled by the logistics network. Belts, Pipes, Robots, and Trains all contribute to the success of the factory. You also need to have the right kind of assemblers, furnaces, and everything else required to make the rocket.

The same way, the software dark factory goes beyond the limitations of just one coding agent / a few git worktrees running at the same time. Using jujutsu workspaces, we can build isolated environments with a clean merge protocol in a version control system that treats conflicts as first class citizens. With the dark factory, the logistics network (or what we call the Substrate) is entirely controlled and managed by Jujutsu.

Why Not Git?

This question comes up every time I explain the project. Git is ubiquitous, everyone knows it, it works.

The problem is that Git was built for humans. It expects one person (or a small team) to produce changes, stage them, commit them, push them, and resolve merge conflicts interactively. When you throw 20 concurrent agents at a Git repository, things start jamming fast. Merge conflicts become blocking events. Rebases require human intervention. The whole system grinds to a halt waiting for someone to sit down and type git rebase --continue.

The conveyor belt jams. The factory stops.

Jujutsu (or jj) was built differently. A few properties make it uniquely suited for autonomous agents:

Stable Change IDs. In Git, a commit's hash changes every time you rebase it. In jj, every change gets a permanent identifier that survives rewrites, rebases, and conflict resolution. An agent that started working on change tvptzzuk can find that change again after twenty other agents have rebased around it. This sounds like a small thing and it is absolutely foundational.

First-class conflicts. In Git, a conflict is a failure state. jj treats conflicts as data stored inside the commit itself. A jj rebase never fails — it simply records the conflict in the resulting commit and moves on. Agents can keep working. The Refinery handles resolution later. The belt keeps moving.

The operation log. Every mutation to a jj repository is recorded in an append-only operation log. This gives you a complete audit trail and instant rollback via jj undo or jj op restore. When an agent crashes mid-operation and restarts, it can inspect exactly what happened and recover cleanly.

Lock-free concurrency. Multiple agents can safely read and write to the same jj repository without explicit locking. Git's index is a single file with a lock; jj doesn't have that constraint.

Put it together and you have a VCS that was, almost accidentally, designed for exactly this use case. The logistics network doesn't jam.

The Blueprint

Since we're doing Factorio, let's talk about what actually runs in the factory. I structured this as a 10-crate Rust workspace, each crate responsible for one layer of the system.

Here's the map:

Factory Primitive	Crate	What it does
Conveyor Belt	`factory-substrate`	The jj repository. Items flow through here.
Items	`factory-work-graph`	Blueprints, production lines, work items with a full state machine
Power Grid	`factory-bedrock`	AWS Bedrock. LLM compute. 19+ models, streaming, tool use, thinking
Assembler	`factory-floor`	The AI agent worker. Claude, running tools, producing code
Smelter	`factory-floor`	Takes raw requirements, outputs structured specifications
Splitter	`factory-floor`	Routes work items to available assemblers
Refinery	`factory-floor`	Merges completed work, resolves conflicts
Quality Inspector	`factory-floor`	Runs tests, validates output, flags rework
Control Room	`factory-control`	Circuit network (metrics, alerts), Overseer (orchestration)
Operator Console	`factory-cli` + `factory-web`	How humans interact with the running factory

The factory-tools crate deserves a special mention. It contains 18 tools the agents actually use to interact with code: filesystem reads and writes, grep, terminal execution, HTTP calls, and more. This is how an Assembler actually does something. It's the difference between an agent that talks about code and one that changes it.

Everything sits on factory-core, which defines all the traits and the single error type (FactoryError) that propagates through the whole system. Every major component is a trait with its implementation in a separate crate — this makes testing straightforward and means you can swap out the LLM provider, the VCS layer, or the storage backend without touching the production floor logic.

Durability as a First-Class Requirement

One thing I was deliberate about from the start: anything that matters must live on disk. The only durable layers in the system are the jj repository and the .factory/ JSON files that represent the work graph.

Everything else — the Assembler running an LLM session, the Splitter deciding which agent gets which item, the Engineer tick loop that orchestrates it all — these are all ephemeral. They can crash at any time. When they restart, they read state from disk and continue.

This is what "fully asynchronous, disk-only persistence" actually means in practice. No database. No message queue. No in-memory state that has to be checkpointed. The factory floor is stateless between ticks; the jj repository and the work graph are the only truth.

The consequence is that the system is resilient by construction, not by instrumentation. You don't need to add crash recovery logic everywhere because there's nothing to recover — just read the disk and figure out what's in progress.

Throughput Over Perfection

Here's a philosophy decision that felt weird at first and now feels obviously correct: fish fall out of the barrel.

Not every work item gets completed. Some requirements are ambiguous and can't be resolved. Some code changes produce conflicts that no amount of LLM-assisted resolution can untangle. Some items hit their rework limit (currently 5 attempts) and get dropped with a reason logged.

The temptation in a system like this is to add infinite retries, escalation paths, human-in-the-loop fallbacks for every failure mode. That thinking is wrong. It prioritizes theoretical completeness over actual throughput, and in a factory, throughput is the metric that matters.

The right response to a jammed item is to log it, drop it, and keep the belt moving. The operator can inspect the dropped items later and decide if any of them are worth picking back up. But the factory doesn't stop for them.

This is the "bounded failures" principle: every failure mode has a fixed upper bound on how many times the system will retry before giving up. It keeps the factory predictable. It keeps it moving.

The Human as Operator

The role shift here is real and worth sitting with for a moment.

In the current paradigm — even with powerful coding assistants — the human is the coder. You write code, the AI helps. You review, the AI suggests. You merge, the AI checks. The human is on the production floor.

In the dark factory, the human is the operator. You write blueprints — templates that describe what a class of work looks like, what tools are available, what the acceptance criteria are. You stamp blueprints onto requirements to create production lines. You monitor throughput on the Overseer dashboard, watch for bottlenecks, inspect dropped items, and decide whether to scale up the assembler pool.

You are not in the code. You are in the control room.

This is not "AI replaces developers." It's a different mode of working that requires a different set of skills. The operator still needs to understand code deeply — more so, in some ways, because you're designing the process that produces it rather than just writing the output. You need to be able to read what the factory is doing, identify why an item keeps getting dropped, and fix the blueprint so it doesn't happen again.

It's the difference between a chef and a restaurant owner. The owner needs to understand cooking. But they spend their time on the system, not the stove.

Where The Factory Stands

As of today, the foundation is complete. All 10 crates compile, all 150+ tests pass, the full tool-use loop is wired into the Assembler, the CLI covers every management operation, and the web API scaffold is in place.

The jj substrate currently shells out to the jj CLI — a pragmatic choice to get something working quickly that I'll replace with a proper library integration soon. The web API returns 501s on most endpoints (the routes exist, the handlers are stubbed). Agentic conflict resolution in the Refinery is designed but not yet implemented — the mechanical, deterministic phase works, the semantic LLM-assisted phase is next.

The recent commit history tells its own story. After the foundation sprint, the work shifted to hardening: validated constructors that prevent invalid change IDs from ever being created, bounded iteration limits on the smelter tool-use loop so a confused LLM can't spin forever, proper signal handling for emergency stops, security fixes in the web layer. This is what a system moving from "prototype" to "production" looks like.

The architecture is already designed to run 20-30 concurrent agents. The Engineer tick loop is single-threaded today, but multiple Engineer instances can coordinate through the shared jj repository and work graph. The scaling story is horizontal and it's already baked into the design.

The Factory Must Grow

I think we're at an inflection point in software development that most people haven't fully registered yet. The dominant mental model is still "AI as a very good autocomplete." Even with tools like Claude Code, Cursor, and the rest, the mode is fundamentally human-driven: the human decides what to do, asks the AI to help, reviews the result.

What Yegge showed with Gas Town, and what I'm trying to push further with The Factory, is that there's a different mode available. One where you're not asking an AI for help — you're running a production system that outputs software. The humans who figure out how to operate that system, rather than just use AI tools, are going to build things at a scale and speed that isn't comparable.

The logistics network doesn't jam. The factory keeps running. Requirements come in, code ships, the operator watches the meters.

That's the idea, anyway. Ona and I are still building it.

More updates as they come.

See you next time reader, Shardul

Published: March 2, 2026