Simon Corry

Science & Creativity

Product Designer
Front-End Developer
Team Leader

I'm Simon, a multi-award winning product designer based in New York. During my career I've had the privilege of creating experiences for clients including the New York Times and Google. I've also helped guide teams at Facebook and startups like Highfive and WeTransfer.

  • Visual Design
  • Design Systems
  • IA/UI/UX
  • Strategy
  • Prototyping
  • HTML/CSS
The Rule Nobody Read

I finally measured whether my agent reads the rules

Simon Corry

If you've read these before you know the setup, but for the uninitiated let me break it down. Visiblemiles (formerly Vetuu) is the massively multiplayer online roleplaying game (an MMORPG) I build on nights and weekends and my collaborator is an AI coding agent (currently Claude), running inside an editor called Cursor. Over the past year I've wrapped that agent in a frankly embarrassing amount of written guidance. Rules about what it must never do. Rules about how to write things. And a whole library of random guides, each one a careful little document explaining how some corner of the game works so the agent doesn't have to guess.

Everything you build, and the sliver of it that actually reaches through.↵ Generated with Nano Banana Pro
Everything you build, and the sliver of it that actually reaches through.

If you've read these before you know the setup, but for the uninitiated let me break it down. Visiblemiles (formerly Vetuu) is the massively multiplayer online roleplaying game (an MMORPG) I build on nights and weekends and my collaborator is an AI coding agent (currently Claude), running inside an editor called Cursor. Over the past year I've wrapped that agent in a frankly embarrassing amount of written guidance. Rules about what it must never do. Rules about how to write things. And a whole library of random guides, each one a careful little document explaining how some corner of the game works so the agent doesn't have to guess.

I was proud of that library. I kept adding to it. However… not once did I stop to ask the only question that actually mattered: does it read any of them?

A library that keeps growing and never gets opened. ↵ Generated with Nano Banana Pro
A library that keeps growing and never gets opened.

The library I never checked

There are primarily two kinds of guidance that sit around the agent. The first gets read at the start of every session (no choice in the matter). That's where the don't-burn-the-house-down stuff lives. The second kind is the library, and the agent only opens a guide from it when it decides the job in front of it actually calls for one. Working on combat? Maybe go read the combat guide. Changing how something moves on screen? There's one for that too.

You probably see the issue already, it’s that word "maybe”. I had no idea how often the agent actually reached for any of them. I'd written the guides, filed them away and then run for months on pure faith that they were doing something. The crazy thing is every time I added one I felt more in control, like I'd handed my collaborator another piece of hard-won knowledge it would surely use. In reality I'd quietly swapped two different things for each other. I'd started believing that because I had written the guidance, the guidance was being used. Which again is crazy because I'd never let myself get away with that on anything I'd shipped for an actual person. This is the brain rot AI begins to create if you’re not challenging yourself to stay honest.

Feeling in control is a poor substitute for empirical evidence
Reaching hard for an answer that was never on that side of the gap. ↵ Generated with Nano Banana Pro
Reaching hard for an answer that was never on that side of the gap.

The tool that couldn't afford the answer

So I set out to measure it, and because I'd been drinking my own automate-everything Kool-Aid, my first instinct was to build the fancy version: a job that runs itself up in the cloud while I sleep and reports back with a number.

It cost me about twelve dollars and told me precisely nothing. Twelve dollars against a six-dollar ceiling I'd set it, which it sailed straight past before giving up. And the joke is it could never have worked, because the thing it needed to read (the logs of my actual sessions with the agent) only exist on my laptop. So I'd done the equivalent of sending an intern on a wild goose chase. All because I didn’t do any due diligence on the shape of the problem I was trying to solve. Again AI brain rot.

So I did the dumb thing instead and ran the count on my own machine, straight against the logs sitting right there. A couple of minutes. Free. I have a soft spot for Occam’s Razor and this was one of those moments.

The little you can count, and the weight of everything it can't see.↵ Generated with Nano Banana Pro
The little you can count, and the weight of everything it can't see.

The count, and why I stopped trusting it

The number came back at about one session in twenty-five. Call it four percent. The guide I'd fussed over most was getting pulled into the actual work roughly four percent of the time.

That stings tbh. I hadn't written a guide nobody needed. I'd written a good one for a reader who mostly wasn't showing up.

I was pretty miffed about it for a couple of days until I realized my initial count was naive. It only caught the moments the agent said a guide's name out loud in its notes. But that isn't the only way it uses one. A lot of the time it just quietly opens the file and reads it without announcing anything. Or if the job is too big it hands off to sub agents who also read the guides in silence. BUT my counter wasn't tuned to catch either of those.

So I taught it to see in the dark, and of course the number moved. From four to six percent (not much but clear signal I was on to something). The number didn't climb because the agent started using the guide more. It climbed because I started thinking critically again. And no, I hadn't stumbled onto something new here. People far cleverer than me have written whole papers on whether these agents pay any attention to the files we write them. Often, it turns out, not as much as you'd hope.

The specifics drift off. The habit holds its line. ↵ Generated with Nano Banana Pro
The specifics drift off. The habit holds its line.

What I'd actually leave you with

None of this is a new idea, and I'd rather say that plainly than pretend otherwise. Anyone who's shipped software knows the gulf between a thing being built and a thing being used. We've got a whole tired vocabulary for tools that get made and then quietly ignored. I just got to watch it happen inside my own setup, to guidance I'd written for a machine, where the machine was the only user I had.

Two things I'm taking out of it. The first is that everything you build to make your agent smarter is invisible until you measure it. You can write the most beautiful guide in the world and have no idea whether it's load-bearing or decoration, and you'll keep happily writing more, feeling more in control the entire time, until you actually go and count. And here's the part that unsettles me most, more than any dead weight: I'd built all of it and never once checked, and I'd have carried on not checking. I still can't tell you which guides earn their place and which the model already knew cold without me. The not-knowing is the thing.

The second is smaller but no less valid. Your first measurement will lie to you. Mine did, because I'd built it with the same blind spot as the thing it was supposed to be watching. Measuring isn't a box you tick once and move on from. It's a craft and one you need to actively observe. It’s okay to get stuff wrong so long as you use the learnings to sharpen your own critical thinking and not just the model.

As I say every time, the specifics in this essay have a shelf life. The model will be cleverer by the time you read it, the percentage will have wandered, the little counter will have rotted. None of that is the point. The point is the habit underneath.

So measure the thing. Then turn around and measure your measurement.