I run a growing share of my business on AI now. The bill was climbing and the answers were getting vaguer at the same time, and I assumed those were two problems. They were one. Here is what is actually going on under the hood, and what I am doing about it.

“The model was never the problem. The missing map was.”
From the archiveI am not an engineer. I am an operator who has handed a growing share of real work to AI assistants, across a lot of projects, and started paying close attention to what it costs. Two things crept up on me this month. The assistant got slower and woollier the bigger the job got, and the bill climbed faster than I expected. I assumed those were two separate problems. They turned out to be the same one wearing two hats.
The first was money. Simple questions about a big system cost a surprising amount. Not the kind of money that ends you, but the kind that makes you squint at the invoice and ask what exactly you bought.
The second was forgetfulness. On a long job, the assistant would lose the thread. We would settle something at the top of a session and an hour later it was working off a fuzzier version of it. I had been calling that "the AI getting lazy." That was wrong, and the real reason is the more useful thing to understand.
AI assistants work in units called tokens. A token is roughly a few characters of text. You pay by the token, both for what you feed in and what comes back. Most people who use these tools know that much.
Here is the part that was costing me, the part I had never had spelled out: the assistant has no memory between sessions. Every time it starts, it is a brand new hire who has never seen my work. To answer one question about a system I have been building for months, it has to go and re-learn the relevant parts from scratch, by reading them, right then, on the clock.
So when I asked something simple about a large project, it did not "just know." It went and read. Sometimes hundreds of files, to find the handful that actually mattered. And I paid for all the reading, not just the answer.
I was not paying for intelligence. I was paying for the assistant to re-read my world every single time it forgot it.
The forgetfulness comes from the same place. An assistant can only hold so much in its head at once. On a big job it fills that space, and to keep going it quietly compresses the older part and lets the detail fall away.
That is the "wait, it forgot what we agreed an hour ago" moment. It did not get lazy. It ran out of room and threw out the oldest furniture to fit the new. The longer and bigger the task, the more it sheds along the way. Same root cause as the cost: no memory, so everything is held live, and live space runs out.
The picture that finally made it land for me: imagine asking a new driver to take you somewhere in a city they have never seen, and they refuse to use a map. Every trip, they drive up and down every street until they happen onto your destination. They get there. The fare is enormous, and they are wrung out by the time they arrive.
That is an AI assistant searching a large project file by file. It is not stupid. It just has no map, so it reads everything to find anything.
The answer is not a smarter model or a bigger budget. It is to build the map once and hand it to the assistant.
The technical name is a knowledge graph, but the idea is plain. A tool walks your whole system one time and writes down every meaningful piece and how the pieces connect to each other. After that, the assistant reads the map, jumps straight to the three things that matter, and skips the other nine hundred and ninety seven. Same answer, a fraction of the reading. And the map is saved to a file, so the next session opens it instantly instead of starting from zero again.
I will not sell you the headline version, because the headline version is a lie. You will see people online claim this kind of setup makes AI "seventy times cheaper." That is hype. Straight talk: the figures below come from my own research plus a walkthrough someone ran for me on a system about the size of my busiest one. They are not yet numbers I have measured on my own platform, because I am about to run that build, not finished running it. When it is done, I will come back with the real before and after.
It is easy to get this backwards. When I first sized it up I was looking at one of my smaller projects, decided a map was not worth the build, and nearly carried that conclusion over to the busiest system I run, the one place it actually pays off. The skill is not "always build a map." It is knowing which of your systems are large enough and used often enough to deserve one. Most operations have one or two that clearly qualify, and a long tail that clearly do not.
The deeper lesson was not about any one tool. It was realizing I had been treating two different problems as one.
My code and systems are structured and connected, which is exactly what a map is good at. My notes, decisions, and plans, spread across a couple of dozen projects, are loose prose that changes every day, which is a search-and-tidy problem, not a map problem. The tool that nails one is mediocre at the other. The moment I stopped trying to make one solution cover both, the right move for each got obvious, and so did where I had been overpaying.
None of this is theory I read once. It is the actual stack I am setting up right now. Treat it as one operator’s field notes, not a buy list.
graphify is the map. It walks the codebase once and commits that map into the project so every future session reads it cheaply. I checked it first for the obvious risks, no malware, nothing phoning home, the code stays on my machine, and proved the idea on a tiny test project at no token cost. The real build on my big system is the next thing I run.
Serena is the scalpel. Where the map says where things are, this jumps to an exact function and edits it surgically, live, with nothing going stale. It runs locally and costs nothing. Queued for right after the map proves out.
dream is the memory cleanup. A routine that tidies the notes the assistant keeps between sessions: merges duplicates, drops what is stale, and forces the index back under a size limit so it loads in full again. Just set up, and I will read exactly what its first pass changes before I trust it.
Full search of the prose is the one I am holding off on. A proper search engine over the loose notes is real power, but it is a server and constant re-indexing to babysit. Not yet. For a pile of notes that changes every day, better filing beats a server I have to maintain.
The order matters as much as the list. Cheapest and surest first, biggest spend only after the cheap wins are in.
Free, and it is the fix for the actual complaint: split the bloated notes file so it loads fully again and the assistant stops starting each session half-blind.
Of the one big, busy system, after the safety checks, as a deliberate one-time spend.
For live edits, once the map has earned its keep.
Only once the notes are stable and the questions genuinely demand more than good filing can give.
You do not need to touch any of the technical pieces to use what I am taking from this. Three ideas have stuck with me.
Pay once for structure, not every time for search. Anything your assistant re-derives on every run is a recurring bill. A lot of it can be converted into a one-time cost.
Match the tool to the shape of the work. Structured, connected things want a map. Loose piles of prose want good search and good filing. Using one for the other is how the invoice creeps up without anyone noticing.
Memory is a habit, not a feature you buy. The notes an assistant keeps between sessions rot if nobody prunes them. A short, regular cleanup keeps it sharp, and sharp is cheap.
My AI is expensive and forgetful for the same reason: it has no memory, so it re-reads my world on every job. The fix is not a smarter model or a bigger budget. It is to build the map once and keep the memory tidy. I am putting that in place now, and I will report back with the real numbers. The model was never my problem. The missing map was.

Twelve years across three continents rebuilding the infrastructure B2B companies use to turn good people into predictable revenue. Now working from Sweden, with a smaller calendar and a tighter focus. Thanks for reading, new essays land here most weeks.
One essay a week on the work underneath B2B revenue. No pitch.
A 45-minute call. I tell you where the leaks are, whether or not we work together.
Let's talk