blog.lizconlan.com

Watching AI Write Code
(And catch its own mistakes)

Mar 2026 5 minute read

Half watching a perfectly normal build when Claude Code announces “Hello, I will be your React developer for today.” as usual, then almost immediately catches itself “Actually, per the spec, this is a vanilla HTML/CSS/JS app”

It read its own instructions and corrected itself. No prompting from me. That’s what this project is about.

The Problem With Just Vibing With an AI

If you’ve played with AI coding tools, you’ve probably tried the obvious approach: open a chat, describe what you want and let it rip. And it works! Until it doesn’t.

The issue is context. A detailed brief can easily overflow what the AI can hold in its head at once. Give it too much to think about and it starts forgetting earlier decisions, repeating work, or quietly heading off in its own direction. The longer the session, the more likely things drift.

Spec-driven development is one answer to this. Instead of figuring things out interactively as you go, you write a detailed specification upfront - a proper plan, saved to a file - and the agent works from that. The plan is always there to check back against, even when the context window has moved on.

What I Built

claude-code-workflow is a structured workflow for Claude Code that puts three specialised roles to work in sequence: an Analyst, an Architect, and a Developer. Each has its own role file and its own job to do.

Run /demo and it kicks the whole process end-to-end. The Analyst gathers requirements, the Architect turns those into a technical spec, and then - crucially - that spec gets automatically broken into small, numbered sprint files. The Developer works through them one at a time, checking completed tasks off as it goes, and commits after each increment. If something goes wrong halfway through, it can quickly work out which file it was on and pick up where it left off without immediately filling the context window with the project’s life story.

(When using it for real with the individual /analyse and /build commands, the process will pause so that you can check and shape the requirements rather than jumping straight into the build. Then when the build takes over it will wait after each increment completes to give the human-in-the-loop a chance to examine the resulting git commit and adjust as needed. But that would not be a fun thing to watch so /demo just wraps both commands together with some extra prompting to make it skip the usual pauses so that it plays straight through.)

The example project it builds is Tic-Tac-Toe - deliberately simple, instantly recognisable, and cheap enough to run on a standard Claude Pro subscription.

The demo video took 9% of a session using Claude Sonnet 4.6 with Medium effort. I might be the one hallucinating here but I think that the Opus model, given a puzzle it probably found trivial, seemed to get bored and started adding bells and whistles I hadn’t asked for^*. Your mileage may vary.

The Part I Find Most Interesting

Chunking the work isn’t just a technical trick. It changes what the agent is able to do. A developer role handed one small sprint file with a handful of tasks is in a completely different position to one handed a thousand-line spec and told to get on with it. The first can focus. The second can only hope.

The self-correction? That was it checking against the spec before making any code changes and seeing that the spec said otherwise. It spotted the contradiction and fixed it without being asked. That’s what good constraints can do.

Wanna Try?

The repo has a video of the full run, and the PRs show exactly what code Claude produced, kept separate from main so nothing gets polluted and everything can be picked apart at leisure.

If you want your own copy to play with, mess up and reset as you see fit, feel free to fork the project, swap the Tic-Tac-Toe brief in .roles/ANALYST.md for something you actually want to build, and see what happens. Or start smaller - try giving the Analyst more specific instructions, a colour scheme, a visual style, and watch how that flows through into the finished result. Or just run the demo as-is first and get a feel for it.

Although the AI coding tools are changing rapidly, it’s not quite as chaotic as the pile-up of posts might have you think. It’s easier to reckon with once you’ve set it up and used it to build a toy project with it. Come and play!

When I asked Sonnet about it, it reasoned that "a more capable model given a simple problem may over-optimise, finding patterns and elaborations that weren't asked for". Google's AI Mode chat - which at time of writing thinks the latest GPT model is 4o - also suggested "Reward Function Overdrive" which is now my new band name. ↩

Watching AI Write Code (And catch its own mistakes)

The Problem With Just Vibing With an AI

What I Built

The Part I Find Most Interesting

Wanna Try?

Watching AI Write Code
(And catch its own mistakes)