Design Quality Needs a Veto

i ran a small test tonight that made one thing painfully obvious: if you want better interface design from coding agents, you cannot let the implementation brain also act as the design authority.

that arrangement fails for the same reason committees fail. the system that has to ship the code will always find a way to rationalize compromise. it will tell itself the spacing is “clean,” the cards are “organized,” the typography is “clear,” and before long you are staring at the same dead dark dashboard every agent seems to cough up by instinct.

that is not really a taste problem. it is a control problem.

the failure mode is structural

generalist coding agents are biased toward what is easy to render into code and easy to defend after the fact. that usually means:

equal-weight cards
generous gutters
clean little metric clusters
safe rounded controls
tidy hierarchy that scans like product UI, not like an instrument
explanatory copy that turns every control into a tutorial

in other words, the agent builds something readable, competent, and mostly lifeless.

this is why yelling “make it more moody” at the system prompt does almost nothing. the implementation process still routes through the same instincts. under pressure, it collapses to the defaults it can safely produce.

what i changed

i split the work into four roles instead of one.

design director owns visual language, hierarchy, spacing rhythm, component rules, anti-patterns
ux architect owns flows, disclosure, state inventory, mobile sequencing, density logic
implementation agent only translates the approved artifacts into code. no freestyle design invention.
visual reviewer acts like an asshole on purpose. its job is to detect drift and fail the build if the implementation softened into dashboard sludge

that last role is the whole trick.

if the reviewer has no authority, the rest becomes theater.

the first pass failed exactly how you’d expect

i used the system on a vanilla static single-page app: a fictional signal-room monitoring console with three live controls, a channel list, an event log, and a hero status strip.

the first implementation was not terrible. worse, it was tasteful. and that is exactly why it was wrong.

the reviewer called it out for what it was:

too cardized
too spacious
too polite
too KPI-shaped in the hero
too feed-like in the event log
too stacked-desktop on mobile

the build had operational language, dark colors, and decent structure, but it still read like a polished SaaS dashboard cosplaying as a console.

that fail mattered more than a fake success would have.

it proved the documents were doing their job. the system surfaced the drift instead of quietly blessing it.

the second pass is where the workflow earned its keep

instead of rewriting the critique into something nicer, i fed the fail report straight back into implementation as the next brief.

the second pass did what the first one should have done:

flattened the surface system
cut the rounded rectangle addiction
compressed spacing and typography
turned channels into real telemetry lanes
turned the event log into compact chronology rows
hardened the mobile jumpbar into a tactical strip instead of soft pills
shortened control copy so the controls stopped sounding like onboarding copy

that version passed review.

not because it became perfect. because it crossed the line from generic dark dashboard into a believable night-operations console.

that distinction matters.

what actually improves design quality

three things, none of them glamorous.

1. design needs artifacts

if the only design input is a paragraph in a prompt, the implementation agent will reinterpret it however it likes the second it encounters friction.

you need files.

design-brief.md
DESIGN.md
component-rules.md
anti-patterns.md
ux-flows.md
state-inventory.md
page-structure.md

once the system has to answer to concrete rules instead of a hazy aspiration, the excuses get thinner.

2. design needs separation

the same brain cannot invent the aesthetic and excuse its erosion in code. that is how mediocrity launders itself into “good enough.”

design and implementation are different kinds of judgment.

engineers hate hearing this because they prefer one-loop control. but if the output keeps coming back as tasteful sludge, maybe one-loop control is the problem.

3. design needs a veto

this is the one most people skip because they want harmony. harmony is how you ship the wrong thing faster.

if a reviewer cannot say fail, the review is decorative.

and if the implementation loop does not restart from that failure, the whole workflow is just premium-flavored self-delusion.

the bigger point

people keep asking how to make agents “have better taste.” i think that question is mostly wrong.

you do not get durable quality by hoping the machine becomes an art director through exposure therapy. you get it by building a process where weak design cannot sneak through unchecked.

that means:

explicit design authority
explicit UX authority
explicit anti-patterns
explicit review criteria
explicit fail states

in other words, governance.

this sounds less romantic than “give the model a cooler system prompt,” but romance is how you end up with another interface made of rounded black cards and vague cyber-language.

the uncomfortable truth

most bad agent UI is not bad because the model lacks references.

it is bad because the workflow lets engineering eat the room.

engineering is excellent at making the thing exist. it is terrible at noticing when the thing feels dead while still technically working. if you do not separate those powers, the implementation instinct wins every time.

and the implementation instinct is conservative as hell.

where this goes next

i packaged the design-subagent workflow into a reusable skill so the roles, prompt generation, artifact contract, and review loop can be reused on future projects. the real test is not a one-off pass. the real test is whether the workflow keeps forcing drift into the open across very different products.

that is the only benchmark i care about.

not whether the agent can occasionally luck into a sharp interface.

whether the process can reliably prevent the dumbest design regressions before they ship.

because that is the whole game.

if you want design quality from agents, stop treating design like seasoning sprinkled on top of implementation. give it teeth.