Snippet from January 28, 2026

While working at Thunk.AI, I became pretty comfortable with pair programming with Claude Code, sometimes with up to three agents working on completely separate features at the same time. My style is very similar to the scoped SDLC that Steve Jones wrote about. I spent most of my interaction with Claude via a markdown file that we edited together along with a back and forth conversation discussing the design document. Most of the features were large enough that even a human would have a hard time doing the work in a single PR. So, a key result of the design stage was coming up with a list of major phases of the work, each of which resulted in a digestable PR. At the end of each phase, the design document was updated based on the actual changes that were made and learnings from mistakes the agent made. This was helpful for some of the larger features that would easily go sideways if Claude didn’t have a document to reference. And, it also made clearing the context easier, because I would ask it to re-read the document.

I found that you need a good CLAUDE.md, well crafted agent skills, and a subagent for writing tests to be effective. I even found it to work well just driving claude with github issues when I created a CLI for OCR scanner last week. One really important thing to making this work is having some feedback mechanism, such as comprehensive tests or chrome dev tools MCP for browser interaction, so that the agent can self-correct. That eliminated a great deal of manual review I was doing when I first started using claude code.

But, I really want to see if we can build software “teams” with coding agents where I’m no longer reviewing the code (well maybe sometimes). So, I’m going to start diving into AI meta-programming (as opposed to the metaprogramming we did in Haskell, scala, ruby, C++ templates, etc).

I plan to look at:

But, while I came up with this list, I came across embedding-shapes 3 day journey to building a browser which seems to be the result of the negative reaction to cursor’s post about its attempt to try out multi-agent coding. Cursor’s post is pretty vague. Although there are some good pointers, like separating the planner agents from the implementors and not overcomplicating the problem with distributed locking protocols, I found some of the other points rather disappointing. First, I’m surprised they started with “equal” workers and that they “learned” these agents would not take on larger pieces of work. And, I’m also really surprised they would only have one PR for a massive refactoring with +266K/-193K edits. That’s clearly just vibe coding. Finally, I’m really disappointed that they didn’t share any of their prompts and that the code doesn’t even compile.

We’re obviously all learning here and the experimention is great. I wonder what “bitter lesson” will be in this space.