<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Design on Prefect Dev Log</title><link>https://dev-log.prefect.io/blog/design/</link><description>Recent content in Design on Prefect Dev Log</description><generator>Hugo -- gohugo.io</generator><language>en-US</language><copyright>Copyright © 2025, Prefect Technologies Inc.</copyright><lastBuildDate>Sun, 19 Apr 2026 12:00:00 -0500</lastBuildDate><atom:link href="https://dev-log.prefect.io/blog/design/index.xml" rel="self" type="application/rss+xml"/><item><title>Break This Design</title><link>https://dev-log.prefect.io/break-this-design/</link><pubDate>Sun, 19 Apr 2026 12:00:00 -0500</pubDate><guid>https://dev-log.prefect.io/break-this-design/</guid><description>&lt;p>I recently spent a long session working through a design for a reliability issue in Prefect. Flow runs can get stuck in &lt;code>PENDING&lt;/code> when a worker claims a run but the execution environment never transitions it to &lt;code>RUNNING&lt;/code>, which is almost always caused by a worker crashing between claim and launch. Users have been burned by this in ways that quietly erode trust, and we needed a server-side story for detection and recovery. The design itself is kind of interesting, but that&amp;rsquo;s not what I want to write about.&lt;/p>
&lt;p>The interesting part was the process. The design got sharper once I started using a coding agent as a design partner.&lt;/p>
&lt;p>The obvious temptation is to ask an AI to propose the architecture and then polish what it hands back. I tried a little of that early on, and it produced a plausible first draft, but that plausibility was deceptive. The hard part of technical design isn&amp;rsquo;t writing something that sounds coherent and looks handsome in a Notion Doc, but instead writing something that stays correct under every edge case a production system will actually throw at it. A plausible-but-shallow draft doesn&amp;rsquo;t survive first contact with a reviewer who cares about invariants.&lt;/p>
&lt;h2 id="break-this-design">&amp;ldquo;Break this design&amp;rdquo;&lt;/h2>
&lt;p>So I flipped the framing. Here is a design. Try to break it. Find the edge cases, find the interactions between this rule and that one, and tell me which sentences imply a stronger guarantee than the system can actually offer.&lt;/p>
&lt;p>The conversation stopped looking like prose polishing and started looking like an adversarial review. That turned out to be worth a lot.&lt;/p>
&lt;h2 id="what-the-agent-was-good-at">What the agent was good at&lt;/h2>
&lt;p>The agent was genuinely useful for the kind of scrutiny that&amp;rsquo;s easy to hand-wave through when you&amp;rsquo;ve been staring at the same design doc for three hours:&lt;/p>
&lt;ul>
&lt;li>surfacing edge cases I had half-considered and quietly filed away&lt;/li>
&lt;li>catching places where two sections contradicted each other because each made sense on its own&lt;/li>
&lt;li>flagging sentences that sounded like invariants but weren&amp;rsquo;t enforceable&lt;/li>
&lt;li>asking the uncomfortable follow-up question every time I tried to paper something over&lt;/li>
&lt;/ul>
&lt;p>One concrete example: the design needs a way for the server to tell workers &amp;ldquo;this claim is dead, please tear down whatever infrastructure is still attached to it.&amp;rdquo; In the first draft, I specced a generic work-pool queue that could carry &lt;em>any&lt;/em> worker command. The agent kept poking. What does &amp;ldquo;any command&amp;rdquo; mean? Who guarantees delivery? What are the retry semantics? Is this a command bus or a cleanup channel? Eventually it was clear that I was over-engineering for v1 and needed to narrow the design to a cleanup-oriented transport only. This made the semantics more explicit and the design had less surface area to defend.&lt;/p>
&lt;p>That pattern repeated all session. Broad concepts got narrowed. Fuzzy semantics got pinned down. Several abstractions that looked elegant in isolation turned out to be promising future maintenance burdens dressed in flowery language. The right move was usually to cut them back to what the system actually had to do.&lt;/p>
&lt;h2 id="what-i-still-had-to-do-myself">What I still had to do myself&lt;/h2>
&lt;p>The agent was strong at sustained scrutiny. It was not good at owning the product boundary.&lt;/p>
&lt;p>It could tell me that a rule was ambiguous. It couldn&amp;rsquo;t tell me whether the resulting tradeoff of simpler semantics versus one more edge case was the right call for Prefect and its users. It could flag that mixed-version rollout had failure modes. It couldn&amp;rsquo;t tell me which of those modes were tolerable during the rollout window and which were hard blockers. Every time a critique landed, I still had to decide: accept, reject, narrow, or refine.&lt;/p>
&lt;p>That four-option loop ended up being the most useful interaction pattern of the whole session. We worked one issue at a time. When the conversation started sprawling, the quality dropped fast. Constraining each thread to a single explicit decision kept the design moving instead of drifting.&lt;/p>
&lt;h2 id="revelling-in-the-pushback">Revelling in the pushback&lt;/h2>
&lt;p>The most valuable moments were the ones where I thought the agent was wrong. A suggestion I disagreed with still forced me to articulate &lt;em>why&lt;/em> the design could hold under the pressure it was trying to apply, and that exercise uncovered real gaps about as often as the suggestions I accepted. You don&amp;rsquo;t get better at defending a design by only hearing from people (or clankers) who already agree with you.&lt;/p>
&lt;h2 id="hire-a-heckler">Hire a heckler&lt;/h2>
&lt;p>I&amp;rsquo;m less interested in AI-generated architecture as a starting point than I was a month ago. I&amp;rsquo;m more interested in AI as a pressure-testing collaborator that can keep pushing on a design long after I&amp;rsquo;ve run out of energy to steelman it myself. The prompt that mattered was not &amp;ldquo;write this for me&amp;rdquo; but &amp;ldquo;find the place where this is weaker than it looks.&amp;rdquo;&lt;/p>
&lt;p>None of this is really about AI, though. Reliability work benefits from someone trying to break the design before you ship it. A tool that makes that review loop faster and more patient is useful. A tool that writes the design for you is just a faster way to produce something that hasn&amp;rsquo;t been broken yet.&lt;/p></description></item></channel></rss>