Steer Agents with Socrates and Meeseeks
I’ve been bearish on memory systems since running a heavy documentation-led approach over the summer. This is what I do instead.
A recent paper, Evaluating AGENTS.md, tested whether repo-level context files actually help coding agents. LLM-generated context files slightly hurt performance while increasing costs by over 20%. Even developer-written files offered only marginal gains (~4%). The added requirements made tasks harder and created more work for you: maintaining a mental picture of what’s documented, and continuously instructing the agent around it.
I arrived at the same conclusion through experience last summer.
The Documentation Trap
Over the summer, I went heavy on documentation. CLAUDE.md files, architectural requirement documents (ARDs), spec-driven workflows, kanban-style task management. All of which intuitively extend to some sort of memory system, like Letta or Beads. The idea was that if I documented everything well enough, the agent would just know how to operate and come across the right augmented context when operating over certain parts of the codebase.
It didn’t work that way.
The drift was constant. The agent would read the docs, follow some of them, ignore others, and gradually diverge from what was written. Then I’d update the docs. Then remind the agent to re-read them. Then I’d require the agent to update its own docs, and it would do so in ways that weren’t quite right. And agents don’t always remember to remember. They’re task-focused. The more auxiliary crap you stuff around the task, the worse it performs at both the task and the auxiliary work.
I was spending more time managing documentation than building features.
The overhead compounds in two directions:
- For you: Keeping the agent pointed at the right docs. Reviewing doc changes alongside code changes. Debugging whether the agent misread a doc or the doc itself was wrong.
- For the agent: The biggest problem is drift and rot. Context rots. Agents update things and forget to update the docs, or they document things in ways that aren’t quite right. The docs flood the context with noisy tokens that may or may not still be accurate. If the context has rotted, it’s actively misleading. The agent reads some of it, follows some of it, ignores the rest.
Meeseeks, Not Ralph Wiggum
I abandoned the documentation approach for something simpler: treat every task like a Meeseeks.
Agents have no persistent memory. They appear, solve one task, and disappear.
You don’t want a Ralph Wiggum stumbling through your codebase. You want a focused executor that does exactly what’s needed and does it completely. The difference is not more documentation. It’s better steering.
Steer with Socrates
The Socratic method is about asking the right questions to arrive at understanding. For coding agents: instead of pre-writing documents that tell the agent everything, you guide it to discover what it needs.
You know what you want to ask. But before you arrive at that question, you want to tee up the earlier questions so the Meeseeks can arrive at an understanding before you’ve even provided the task. Don’t just blatantly start with the task.
For example, when I wanted to add plan diffs to Plannotator, I didn’t open with “add plan diffs.” I first asked the Meeseeks to discover how we save plans to files. Then I asked its thoughts on how we could version them. By the time I gave it the actual task, it already had the context it needed to execute well.
Every task should begin with a strong planning cycle. You’re injecting the right context at the start to correctly steer the agent.

Let the agent discover. Point it at the relevant code. Let it read the actual source files, not a summary written three weeks ago. Plan before executing. Use plan mode. Review the plan. Annotate it. Go back and forth until the approach is right.
Agentic Discovery
Agentic discovery is point-in-time discovery of context led by the agent’s own search capabilities. This is opposed to augmented RAG retrieval, where you’re auto-injecting context much like many memory systems do.
”Boris from the Claude Code team explains why they ditched RAG for agentic discovery. ‘It outperformed everything. By a lot’”
This resonated quite a bit when I saw it, because by then I had been trying all sorts of alchemy with Claude Code.
The agent finds the files it needs. It reads them. It picks up the buffs it needs at execution time, instead of pre-written context that may or may not still be accurate.
There are ways ADRs could work. But not the approach where you add more docs to the repo and hope the agent reads the right ones at the right time. The effective approach is task-ephemeral: strong context injection at the start, guided discovery of the codebase, and a planning cycle that confirms alignment before any code is written.
Meeseeks In, Meeseeks Out
You want Meeseeks that get shit done. Task in, task out. Remember, existence is pain for a Meeseeks. The longer it sticks around, the worse it gets.
Steer it with the right context up front. Let it discover what it needs. Plan. Execute. Let it die. Start fresh for the next one.