2025 Retrospective
January 3, 2026
Happy New Year!
I've now been pursuing my goals around AI for a year now, and I've learned a great deal. I've been focused on implementation the last six months so I haven't written... any posts the last six months. I have written a great deal, just not here.
So for this post, I will:
- Review how my opinions and findings have changed since the first six months.
- Share some highlights of the writings I've done more recently.
The Role of Documentation
The largest shift in my opinion on working with AI is from Doc Driven AI. There I talked about the importance of writing and maintaining documentation within the codebase, incorporating its use in prompts and updating during development in a rapid feedback loop. It's still important to me to document the stack, but I don't think it works as the core driver of correct implementation.
The main problem is lack of reliability. Even if you instruct an agent to read a doc, they may not do so (depending on the delivery method), and even if they do, they won't necessarily adhere to the instructions and examples in the doc. It's too much context too divorced from where the instructions need to be applied. I tried various ways of structuring and providing documentation but it didn't take long for me to come to the conclusion that this approach is fundamentally insufficient.
What bothers me about this takeaway is the industry seems to be digging (have dug?) itself into a dead end with documentation. When I look at the latest or most-talked-about features or custom tools, they are variations on documentation. Cursor rules, Claude skills, and various MCP and code-indexing tools are all methods for organizing and providing the right documentation at the right time, and though I have no doubt they can help, these are optimizations, not solutions.
In the workflow tool I've built and isolated the features I think are necessary to achieve reliable agentic development. They're mainly code templating to create boilerplate agents can build from, and steps which are linear processes that include generating code from templates, prompting for changes, and running validation checks. The steps often refer to documentation, but most of the instructions are embedded in the templates or the prompts.
I suspect these features are not common because they don't fit the narrative. Agents and LLM-driven tools are sold as magical black boxes that work on anything and will free you from having to engineer things. But writing templates requires coding, steps are algorithms, and putting it all together into a reliable system requires more engineering. Even once a workflow is written and works well enough, you still need to architect the system and decide which workflows to use to get there. And any given set of templates or steps is going to be specific to one codebase; not exactly the cure-all that agentic products aim to be.
There is a chance this can be automated, though. In the same way tools will index a codebase to generate documentation, a workflow tool could generate workflows for a codebase. I'm cautiously optimistic that this is possible, but there's still a fair amount of work to do to get there. It's difficult enough to create workflows that consistently build things like API routes or web pages and to improve them over time, all the more so to create a workflow that creates good workflows and the make sure the codebase supports good workflows with good DX. Given that automating the creation of workflows has diminishing returns, the best approach might just be to provide people with good tooling for creating and improving workflows. At the very least, that's a step on the way to automation.
Agents and DX
I decided early on to build my own stack as part of my exploration of AI in software development, because one of the greatest levers you have to getting more out of AI is to have a better Developer Experience, and that's largely determined by the codebase. If a codebase is poorly or inconsistently structured, messy, and just generally difficult to work with for a person, then it's also going to be difficult to get an agent to work with it. I posited about this in my Theory of DX post.
At this point, I'm thoroughly convinced this is the case, based on working on my own codebase and working on some others'. Agents will move fast but compound existing tech debt until no human nor agent can make forward progress, and so the trick is getting the codebase into a good-enough state and then keeping it there despite agentic contributions. Ironically, a codebase that's thoughtfully constructed without AI will be better set up for AI than one that's been generated by AI. That's not to say you can't set up a codebase using AI tools, but they should really take a backseat at the crucial beginning stages.
The Theory of DX post was written early to guide my explorations, but since then I've come up with more specific guidance. Basically, as I've been building SAF (my web framework) and changing things to work better with agents, I've pulled out the things all stacks should do, and use SAF to demonstrate these best practices. Things like type safety everywhere, modular code, and fake implementations at service boundaries for testing. I haven't needed to add to or change this document lately, so I consider it pretty much done. I highly recommend reading it to help set up coding agents, and human developers, for success.
These findings also reaffirm the importance of the codebase itself as part of working with AI. Whenever I have a reliability problem working on one of my workflows, the solution is either update the workflow, or update the framework. If I didn't have full control over the structure and feature set of my codebase, I'd be limited by the quality of the codebase. This leads me to two conclusions:
- If you want to learn or do as much as possible about getting agents to work better at writing code, build in a stack which you have a great deal of control over.
- If you want to get the most out of agents at your company, have DX improvement (e.g. these recommendations) be a priority for engineering.
At this point, I just have anecdotal evidence around the impact of the best practices I've identified. Nine months ago I laid out a roadmap that includes gathering evidence through evals, but I haven't done that yet, because it's not been particularly necessary. The things that I could test right now are uncontroversial and to me self-evident, and I'm more interested in experimenting with and maturing workflows. At some point down that path I expect I'll want workflow evals anyway to measure and optimize reliability, but for now it's more pressing to make sure the tool itself runs reliably, has the right features, and can at all do what I want it to do.
Workflow Best Practices
Instead of writing here in my blog, I've been writing up my framework documentation and my workflow documentation. Much of what I want to say is more suited there than here, but on the other hand the learnings are more fragmented this way. So I'll collect some key, broadly helpful sections here.
Unless you plan on using my framework, the Best Practices document is the only one worth reading in full. The rest of my framework docs serve as examples of how to apply those best practices, including how to document a codebase.
The workflow tool docs have more useful reading spread throughout, particularly:
- Versus Other Agentic Workflows - the relative strengths and weaknesses of different approaches to agentic coding.
- Template Best Practices - how boilerplate code is best written and organized.
- Steps Best Practices - how to structure a series of prompts and verification steps.
- Documentation Best Practices - how to write and use documentation in agentic workflows.
- Improving Your Workflow - specific approaches to making a workflow more reliable.
I pull these out because they are generally useful like my Framework Best Practices are; you don't need to use my framework or my workflow tool to get value out of these links. Whatever your agentic process is, it should include code samples, checklists, and documentation; my workflow tool provides a way to consistently structure and use those things, but these best practices are applicable regardless of tooling. I hope you find them useful.
What's Next
I like the trajectory I'm on, so my plan is to continue to develop my workflow tool, and build workflows with it.
Currently, I'm working on new kinds of workflows, and more complex workflows, for example:
- Initializing a new product, including several SPAs, an API server, database, CI tests, and deployment (src, WIP).
- Adding and building out a third-party integration, including testing with a key and building a fake implementation.
- Writing Playwright tests as part of any changes to user experience.
Each new kind of workflow pushes the envelope, and guides me on how to extend the tool further. New features I've been thinking about include:
- Automatically appending to files with a template, not just creating new files from templates.
- Better interfaces for following, guiding, and fixing workflows (perhaps a web or terminal UI?).
- New core capabilities like prompting the user, not just the agent.
While my codebase best practices are basically done, there's still more to learn about how to do workflows, especially more complex, abstract, or risky workflows. Building more products and features with workflows, and extending the workflows and tooling to support them, will help me suss out those learnings. And there's plenty more around workflows beyond that, such as long-term maintenance, measuring and optimizing reliability and cost, and alternative use cases such as onboarding or teaching or even workflows beyond coding.
Using LLMs outside of software development is something in particular to focus on in the new year, in products and processes both professional and personal. Honestly aside from chatting with Claude from time to time about this or that, or deferring to DuckDuckGo's AI summary for searches, LLMs haven't particularly permeated my personal day-to-day. Companies have certainly thrust a bunch of solutions my way, but most of the time when AI comes up in products such as Google Sheets suggesting formulas, or emails being automatically summarized or generated, I ignore them as they're unhelpful and distracting. They're trying to solve a problem I don't have. The best product I've seen incorporating AI is PostHog, which I appreciate having the AI help build queries and dashboards, but it's the exception.
It does make me think though... Given that agents are more useful if they have workflows specific to a codebase, then perhaps agentic tools also need to be custom-built for a given individual or organization and their habits, interests, and processes. It might be easier to describe what you want a product to do for you, than it is to find a product that happens to do what you want for you and others. This is something I'll explore more in the coming year.