• Home
    Hut
  • About
    Beard
  • Blog
    Book
  • Contact
    Chat
  • Projects
    Rocket
  • Illustrations
    Palette
  • Photography
    Camera
  • Games
    Joystick

    Science & Creativity

    Product Designer
    Front-End Developer
    Team Leader

    I'm Simon, a multi-award winning product designer based in New York. During my career I've had the privilege of creating experiences for clients including the New York Times and Google. I've also helped guide teams at Facebook and startups like Highfive and WeTransfer.

    • Visual Design
    • Design Systems
    • IA/UI/UX
    • Strategy
    • Prototyping
    • HTML/CSS
    A Researcher Out of Time

    On building a researcher that asks before it acts

    Simon Corry

    Six months ago I didn't know what a harness was, at least not in the sense the engineers around me kept using the word. It's the kind of term you're either expected to recognize or quietly look up later. I've been looking up a lot of words lately.

    Halftone study for A Researcher Out of Time — overlapping forms and a fractured rhythm, no figures, just dot density.↵ Claude, My Friend
    Halftone study for A Researcher Out of Time — overlapping forms and a fractured rhythm, no figures, just dot density.

    The reason any of this came up at all is that I've been quietly building Vetuu in earnest. Vetuu is the game I've been dreaming of making since I started building websites in the 90s, back when carving out a small space on the web was the whole point of being online rather than an act of mild rebellion, and the result is that I now have 25+ years of pre-alpha guilt to account for whenever someone asks me what I'm working on. It's a server-authoritative 8-bit MMORPG that I work on at night and on weekends. If I happen to ship a bug (this happens a lot) on Tuesday I will almost certainly not notice until Saturday, by which point I've forgotten what I was thinking on Tuesday and am also somewhat suspicious of the person who wrote it.

    That gap between writing the bug and noticing the bug is the gap a harness fills, even if nobody used the word harness when it came up. The model is the engine. The harness is everything around the engine, what it can read, what it can touch, how it asks for permission, what it does when it's wrong. At least that's how I've learnt to interpret it.

    The engine metaphor, abstracted — a warm core held by curving bands, not a machine you can name.↵ Claude, My Friend
    The engine metaphor, abstracted — a warm core held by curving bands, not a machine you can name.

    So I built one because I cannot audit my own work fast enough to keep up with the speed at which I can now produce it, which is a strange new problem to have, and because I wanted a machine to do the tedious part of due diligence so I could keep doing the part that I actually enjoy. Now, most Saturdays, the thing emails me, which still feels mildly absurd whenever I think about it too long.

    Most of the frame is empty paper. One small cluster of warmth. That’s the inbox before the inbox.↵ Claude, My Friend
    Most of the frame is empty paper. One small cluster of warmth. That’s the inbox before the inbox.

    The schedule is deliberate, because if it were up to me I would absolutely not run a sober second pass on anything I've just shipped. A GitHub Actions cron fires Saturday morning, which lands in my inbox around 7am Eastern in summer and 6am in winter (GitHub cron doesn't bother with daylight saving, so the whole ritual drifts a little with the calendar but who cares). The loop is roughly this: cron wakes a researcher on whatever topic is overdue on a status dashboard I keep, the agent reads the relevant code and pulls in outside context where it needs to, a mandatory second pass goes back through every claim and demotes anything it can't actually prove, and a validator throws out the polite non-findings — things like "all clear" or "working as expected," which are the phrases an agent reaches for when it has nothing real to say but would still like to be helpful (bugs the shit out of me). If nothing actionable turned up there's no email at all, which I'm coming to enjoy more than I expected. If something did, Resend delivers a briefing which I've trained the agent to deliver as a narrative not a verbose set of nonsense tech babble.

    The second pass is not optional and I am willing to die on that hill, because an earlier and rather sobering audit taught me the hard way that a single pass through any work is how you end up shipping twenty-two bugs you were quite proud of five minutes earlier.

    Some Saturdays are quiet, which means zero actionable findings means zero email, which as I said is a design choice I stand behind because silence beats noise in my world. The brief still lands in the repository for the audit trail, but I'm spared being woken up to applaud the absence of bad news, which feels like the right trade to me. Other Saturdays are not quiet, and on those mornings there's a plain-language summary up top, then sections, and some of the items already say in past tense that the harness shipped a few small safe fixes while I slept. This is really freaking cool. Doc updates, mechanical tidy-ups, the kind of change where the risk is low and the pattern is obvious enough that I would have gotten to it eventually anyway. The urgent stuff never auto-ships though. Severity is part of the interface contract, which is a designerly way of saying the things that scared the researcher are also the things that should wait for me.

    The rest land in a section that's basically your call on these, and each finding has three buttons (Yes, No, Other), the last of which opens a GitHub issue when the answer isn't binary because not every answer is. No genuinely skips, and the harness remembers the rejection so the same finding doesn't surface again next week, which is the sort of small consideration I wish more software extended to its users. Yes is the bit people ask about when this comes up at dinner, though it is not the only way work lands in the tree and I sometimes feel the quiet Saturdays don't get the attention they deserve.

    When I click Yes the link is signed, and I didn't know what an HMAC was until I needed one, so if you don't know either, welcome to the club. The plain version is that the URL carries a tamper-proof seal that binds the finding, the action, and an expiry together, so nothing can be rewritten between the email and the server. A small Vercel function checks the seal and shows me a confirmation page first, because email scanners love to pre-fetch links and I would prefer not to merge code on the basis that Outlook got curious. I click Confirm, a GitHub workflow fires, and an agent runs through eight phases using prompts I lifted almost verbatim from what I actually say to people I trust ("are you sure you've done the research," "trace this from a different angle," "are you two hundred percent sure"), and it plans, audits the plan three times from different angles, implements, audits the build three times, commits with a second signed seal, and finally hits an automated merge check that has to agree before anything lands. If it agrees the pull request merges. If it doesn't, the PR sits there and waits for me, which is the correct disposition.

    The model today is Opus 4.7 with the thinking dial turned up, but next year it'll have a different name and we'll all carry on. The harness is the part that matters.

    Two masses, a strip of bare paper between them. Permission, not penetration.↵ Claude, My Friend
    Two masses, a strip of bare paper between them. Permission, not penetration.
    Claude, My Friend

    The part of this that took longest wasn't the researcher, it was writing down everything the researcher is not allowed to do, which turned out to be most of the actual work. No secrets. No rewriting workflow files or database migrations. The linter and a diff summary, and nothing else from the shell. Ten files, five hundred lines, twenty tool turns, eight hundred dollars on a given run, and once any of those caps is hit it has to either ask for mercy or hand back a plan rather than press on, because the alternative — agents pressing on regardless — is how you end up reading post-mortems on the weekend. Every layer also has a kill switch I can flip from my phone without opening the editor, which I have actually used, which is why the kill switch exists in the first place.

    Trust isn't a feeling; it's a set of noes you write down while you're not feeling like an asshole.

    The reason I'm telling you (and ideally designers) any of this, is because the interesting work here isn't engineering, even though it looks like engineering on the surface. Deciding what the model is allowed to see before it acts, what it can change without asking, what the approval surface should look like at seven in the morning when you've had no coffee, what failure looks like when the right answer is wait rather than merge. These are all the same kinds of questions we have been answering for decades with buttons and copy and error states, just pointed at a different kind of user. The medium changed. The discipline did not, and that's worth saying because half of X would have you believe otherwise.

    I've written before about on-ramps and off-ramps when it comes to humanizing AI, and this is the same instinct applied to actual machinery rather than rhetoric. The Saturday email is an on-ramp. The kill switches are off-ramps. The machine doesn't replace me. It knocks before entering, and that small distinction is most of what I'd want anyone building with these tools to understand.

    If you want to try any of this without boiling the ocean in week one, start smaller than I did. Begin with a topic spec, what you want looked at, and why. Put a validator in front of it that refuses slop, especially the polite slop. Schedule a researcher to email you plain-language findings. Add approval buttons before anything touches production code. Only after all of that, and only when you actually trust the loop, wire in an implementation agent with caps and kill switches you've thought through. Vetuu sits at the far end of that ladder, but the first rung is honestly an afternoon's work.

    A few honest footnotes, because this kind of thing that ages like milk. This stack lives on a side project, not my day job, and I'd like that to stay obvious. It also costs real money, a weekly scout pass is often somewhere between thirty and eighty dollars, and a Yes-click build will cost more than that. Tools will change. What I'm describing is what I built in Q1/Q2 of 2026, this won't last.

    See, I still know how to bring things full circle. Now don't forget to like, subscribe and hit that notification bell.