Designing AI Flows: Where Mediation Meets Product

AI doesn't remove mediation from a workflow. It relocates it. The design challenge is deciding where it goes — and whether the new location is better than the old one.

The Problem

Most AI product decisions are made at the feature level: what can AI do here? The harder question — and the more productive one — is where human cognitive effort currently breaks down, and whether AI can absorb that cost without creating a new one upstream.

My Role

VP, Product Design at Collibra, where I'm overseeing AI-assisted design-to-code workflows at the design system level. Independently, I've built three AI-powered consumer products that each test a different aspect of LLM interface design.

Approach

Identified the interpretation gap — the recurring distance between what a system produces and what a user needs to do with it
Applied that lens across enterprise tooling (design systems, handoff pipelines) and consumer product (prediction, decision support, expert review)
Derived a working set of principles for when and how to introduce AI as a mediating layer

Emerging Signal

Designer capacity shifting left into ideation and discovery as delivery automates
Engineer capacity shifting right into testing, hardening, and compliance
Centralization emerging as the governance response to AI-enabled fluidity at the design-code boundary

1. The Problem Worth Naming

The standard framing for AI in product design is capability: what can this model do, and where can we put it? That framing produces features. It rarely produces experiences.

The more useful question is about friction topology. Every workflow has points where human effort is concentrated not because the task is intellectually demanding, but because translation is required. Data has to be interpreted before it can be communicated. An AI output has to be evaluated before it can be trusted. A design specification has to be mapped to code before it can ship. These are mediation costs — and they're often invisible because they've been absorbed into the definition of the job.

AI enters a workflow most effectively not as a new capability, but as a substitute for mediation effort that was always expensive and never strategic. That's the lens I've been working from — across an enterprise design system pipeline at Collibra, and across three consumer products I've built independently.

The interpretation gap between system output and user action

2. Enterprise Context: AI in the Design-to-Code Pipeline

The Setup

At Collibra, the Central Design Team owns Arbor — our React-based design system. The Central Frontend team supports the implementation layer. The handoff between them has always been the source of quiet, compounding drift: design tokens diverging from code, components re-implemented from scratch instead of reused, Figma specs updated without engineering awareness.

We're running a pipeline that connects Cursor and Claude to Figma via MCP, using Code Connect to give Claude genuine awareness of Arbor's component library when generating code. Figma is the declared source of truth. Bi-directional sync is meant to keep design and implementation aligned as either side evolves — not aligned at launch, then diverging.

The pipeline is running. We're actively fine-tuning it. And the friction, as always, isn't the tools.

Figma MCP and Code Connect pipeline diagram

What's Actually Hard

Once the pipeline is live, the most visible change isn't in code quality — it's in team behavior. Designers are tweaking code. Engineers are adjusting in Figma. The boundary that used to be a formal handoff is becoming a fluid collaboration zone. That's genuinely good for speed and creative fidelity. It's also a faster path to drift than anything a traditional process ever produced.

Our working model is that all frontend work will need to pass through a centralized review point — supported jointly by design system and frontend teams — to keep Arbor as a real standard rather than a polite suggestion. That funnel is a governance design problem, not a tooling one. Someone has to own it. The structure has to be intentional.

The governance questions that surface quickly in a pipeline like this aren't technical. When five Arbor components could plausibly handle a given pattern, which one should Claude reach for? When a designer makes an update in Figma that isn't yet reflected in Arbor, does Claude generate compliant code or accommodating code? When bi-directional sync produces a conflict, who reviews the diff and what's the resolution rule? These are judgment questions, and AI doesn't resolve them — it just makes them arrive faster.

The Capacity Shift

The more durable signal emerging from this work is what happens to where people spend their time as the pipeline matures.

Designer capacity is moving left. With delivery more automated, the high-value design work shifts toward ideation, discovery, problem framing, and the kinds of judgment calls that AI still cannot make — what's worth building, who it's really for, what the experience should feel like before it exists. The Figma blog's analysis of design systems in the AI era describes this as design systems becoming governance infrastructure rather than component libraries — the constraint set that makes AI-generated code trustworthy rather than just fast.

On the engineering side, the same dynamic appears to be emerging in the opposite direction. Less time on frontend implementation, more time on testing, hardening, performance, and compliance. The Daine Mawer analysis of frontend's evolving role makes this prediction explicitly: the most valuable engineers become connectors and compliance owners, not implementers. The middle of the workflow compresses. The edges are where the human judgment lives.

What this pipeline clarifies — and what generalizes beyond design systems — is that AI in a workflow doesn't eliminate mediation, it relocates it. The effort that used to live in the handoff now lives in the prompt, the review gate, and the upstream system design decisions. That's usually a good trade. But it requires designing the workflow, not just enabling the tooling.

3. Personal Projects: AI as Product Experience

The enterprise context establishes the principle at the workflow level. The following three projects test it at the product experience level — each one a different answer to the question: what does it feel like when AI does the interpretation work?

LLens — UX Audit Agent

The mediation cost being replaced: Senior designer judgment as a first-pass filter.

A designer auditing a UI against Nielsen's heuristics and WCAG accessibility guidelines does interpretive work — they translate visual observations into structured findings, ranked by severity, grounded in principles. That's expensive expertise. It's also, in large part, the kind of structured pattern recognition that a vision model can do well.

LLens accepts a URL or Figma frame link, captures a screenshot, and sends it to Claude's vision model with a system prompt structured around the full heuristic framework. The output is a scored audit: findings organized by category and severity, each with a specific observation and a concrete recommendation. The confidence level on each finding is surfaced explicitly — Claude doesn't claim certainty it doesn't have.

The design challenge wasn't making Claude produce findings. It was making the output trustworthy. Enterprise teams and practitioners don't want a list — they want a list they can act on. That required: grounding every finding in something specifically visible in the UI (no generic observations), separating what Claude observed from what it recommends, and making the confidence calibration visible rather than hidden. A high-confidence finding on contrast ratio reads differently than a medium-confidence finding on information hierarchy. Both are useful. They're useful differently.

LLens audit results showing finding cards by heuristic and severity

The project is live at ux-audit-agent.vercel.app. It's a genuine first-pass audit tool — not a replacement for a senior designer's review, but a reduction in the effort required to get to a structured starting point.

What it demonstrates: AI earns trust through specificity, not confidence. Grounding outputs in observable evidence — naming what Claude actually saw — is what separates a useful AI finding from an authoritative-sounding generalization.

Set List Oracle

The mediation cost being replaced: The effort of probabilistic reasoning across fragmented, informal data.

Predicting a concert setlist requires synthesizing tour history, recent releases, venue type, market, and the informal signals that live in fan communities — a genuinely multi-variable reasoning task that most people can't do well quickly. Set List Oracle accepts an artist name and show date and returns a predicted setlist, with reasoning about why specific songs are likely or unlikely.

The interesting design problem here wasn't accuracy — it was the UX of informed uncertainty. The model doesn't know what the setlist will be. Neither does anyone. But a well-reasoned prediction, clearly framed as probabilistic, is more useful than no answer. The design had to signal "confident reasoning" without implying "certain outcome." Those are different things, and the difference matters to users who will carry the prediction into a real decision.

Set List Oracle prediction interface showing setlist with probability indicators

This is a small-stakes version of a pattern that appears everywhere in AI product design: the model produces an output that can't be verified in advance, and the user has to decide how much to act on it. The honest design response is to make the reasoning visible and the uncertainty named. False precision is worse than calibrated uncertainty — it just feels better in the moment.

What it demonstrates: Uncertainty, named explicitly and grounded in reasoning, is more useful than false precision. The UX challenge is making "probably" feel like a feature, not a limitation.

Should I Go?

The mediation cost being replaced: The internal cost-benefit reasoning most people do poorly under ticket-purchase pressure.

This is not an events app. It's a decision engine. Enter the artist, ticket price, travel time, and your current energy level — and receive a Regret Score, a Future Story Value assessment, and a cost framing that contextualizes the price against your realistic alternatives.

The design challenge was making AI-generated emotional reasoning feel grounded rather than gimmicky. "Future Story Value" is a real concept — the asymmetric value of experiences you'll retell versus expenses you'll forget — but if the output reads like a self-help listicle, it loses the user immediately. The framing had to feel like a sharp, slightly irreverent friend who's thought about this more carefully than you have in the last 30 seconds.

Should I Go decision output showing Regret Score and Future Story Value framing

Regret Score is deliberately reductive — it collapses a complex emotional calculation into a number — but it works because it's explicit about what it's doing. It's not claiming to be your feelings. It's offering a structured way to look at them. The model doesn't make the decision. It makes the decision easier to make yourself.

What it demonstrates: AI can reason about emotional and subjective inputs when the output is framed as a perspective, not a verdict. The interface design determines whether that distinction lands.

4. Principles Distilled

These three products, alongside the Collibra pipeline work, have produced a working set of principles I apply when designing AI flows:

AI earns trust through specificity, not confidence. Outputs grounded in observable, nameable evidence are trusted. Outputs that sound authoritative but aren't traceable to anything specific are not — and users feel that gap even when they can't articulate it.

Uncertainty named honestly is more useful than false precision. The instinct is to soften uncertainty to make AI outputs feel stronger. That's backwards. Users who understand that a prediction is probabilistic use it better than users who were led to believe it was certain and found out otherwise.

AI in a workflow relocates mediation — it doesn't remove it. Every pipeline or product that introduces AI shifts where human judgment is required. The design question is whether the new location is better than the old one. Often it is. That's only clear if you've mapped the original friction topology before you start.

The governance layer is always a design problem. When AI introduces fluidity at a process boundary — design and code, prediction and action, recommendation and decision — the human system that governs that boundary has to be redesigned too. Tooling without governance produces drift faster than the system it replaced.

The edges of the workflow are where human judgment lives. As AI compresses the implementation middle, the strategic value shifts toward the ends: clearer problem framing, sharper discovery, higher-quality constraints on one side; more rigorous testing, hardening, and compliance on the other. Designers and engineers who see this coming are already repositioning toward those edges.

5. What I'm Building Toward

The through-line across this work isn't a technology preference — it's a design conviction. AI earns its place in a product or workflow by reducing the human effort that was always expensive and never strategic. When it does that well, it shifts capacity toward the work that's actually interesting: the problem framing, the judgment calls, the accountability for what gets built.

The pipeline work at Collibra is about making that case at an enterprise infrastructure level. The personal projects are about testing it in consumer contexts where the stakes are lower and the feedback is faster. The principles transfer. The calibration is always specific to the workflow.

All case studies