The UI that travels with the tool

Every application that consumes another application's data has to solve the same quiet problem: who builds the display layer?

A personal organiser wants to show the user a flight option. Someone has to build a flight card — the layout, the fields, the hierarchy of information, the way baggage fees sit alongside the price. That someone is the personal organiser team, even though everything they know about flights comes from a travel application they are calling as a tool. The travel application knows exactly how a flight result should be displayed. The personal organiser team is guessing.

When the travel application adds a new field — a carbon offset figure, an updated baggage policy — the personal organiser team finds out the hard way: a field appears in the data and nothing in the UI shows it. Someone files a ticket. Someone updates the flight card. The display logic and the data source are permanently out of step, maintained by different people, with no automatic way to know when one has changed without the other.

This is the standard model and almost nobody questions it. The tool provides the data. The consumer provides the display. The cost is absorbed so routinely that it does not appear on any estimate.

The question worth asking

Who actually knows how a flight result should be displayed?

The travel application does. It built the data model. It knows which fields are primary and which are secondary. It knows that the departure time matters more than the airline's internal booking reference. It knows how to handle the case where baggage is included versus where it costs extra. When the travel application decides to restructure how it presents connection times, the personal organiser should not have to notice.

The question this points to is: what if the tool response included not just the data but a description of how to display it? The travel application returns a flight result and a flight card. The personal organiser renders the card. The travel application owns the display logic. The personal organiser never needed to know anything about flights.

This was tried before

Microsoft introduced Adaptive Cards in 2017. The idea was close to this: structured JSON descriptions of UI surfaces that any host application could render. Write the card once; render it everywhere Teams supports it, Outlook supports it, Windows Notifications supports it.

It did not become widespread. Two failures explain most of it.

The first was a distribution problem in the wrong direction. Adaptive Cards required every host application to implement a renderer so that bots could send rich content to them. Microsoft needed to convince every chat application, every email client, every notification surface to build a renderer for a format Microsoft controlled. Slack had Block Kit. Google had its own format. Neither had any reason to implement a Microsoft spec. The renderer ecosystem never grew beyond Microsoft's own products.

The second was governance. Adaptive Cards was transparently a Microsoft project in service of Microsoft's platform. Even the organisations that might have benefited from a common card format declined to build infrastructure that would make their users more dependent on Microsoft's product decisions. The format was open in the legal sense and closed in the trust sense.

The renderer inconsistency that followed — the same card rendering differently in Teams versus Outlook versus Windows — was a symptom of both problems. Maintaining the promise across a fragmented, under-incentivised renderer ecosystem turned out to be harder than producing the spec.

Why the AI agent stack is different

When an AI agent calls a tool, the dependency runs differently from the Adaptive Cards model.

In the Adaptive Cards model, the problem was M × N: M host applications each needed to implement a renderer, and N bots each needed to produce cards. Microsoft controlled neither side outside its own products, so neither side had sufficient reason to act.

In the agent tool model, the problem is M + N: N tool providers include a UI surface in their tool response — once, and they own it — and M consuming applications build one generic renderer — once, forever. After that, every new tool a consuming application calls arrives with its own display logic. The consuming application's renderer gets richer with every tool it connects to, without any code change.

The network effect runs in the right direction. More tools that return surfaces make the generic renderer more valuable. A generic renderer that exists makes it more worthwhile for tool providers to include surfaces. Each side creates incentive for the other. Adaptive Cards had the opposite problem: each side waited for the other.

The governance question is also different. The tool provider has a direct incentive to include a surface in the response, because it means the domain is displayed correctly wherever the tool is called. This is not a commitment to a platform. It is an expression of what the tool knows about its own data.

What this looks like in practice

The personal organiser calls the travel application's flight search tool. The response contains three things: the structured flight data for the agent to reason about, context that explains the data to the consuming LLM, and a surface description — a flight card — that any A2UI-capable renderer can display. The personal organiser passes the surface to its renderer. The user sees a flight card designed by the travel application, consistent with how flights are displayed everywhere the travel application is used.

When the travel application updates its card to add a carbon offset figure, the personal organiser renders the new card automatically. No ticket. No cross-team coordination. No display code in the personal organiser that knows what a carbon offset is.

The same pattern extends in every direction. A legal application exposes a case summary tool; any consuming application renders case summaries correctly without knowing anything about legal domain structure. An accounting application exposes an invoice tool; every application that calls it displays invoices the way the accounting application intends. The tool provider's domain expertise travels with the data.

The deterministic angle

Complex surfaces do not need an LLM to generate them. A deterministic handler — a participant that always produces the same output for a given input, without any model call — can produce a correct, validated surface description for any domain state it knows. A flight with three connections always produces the same card structure. An invoice with five line items always produces the same display layout.

This matters because the one genuine risk in agent-generated UI is unreliability: a model that occasionally produces a malformed surface description, an invalid component reference, a field that does not exist. Deterministic generation eliminates that risk for the surfaces that do not require inference. The LLM in the tool stack handles the parts that require reasoning; the deterministic participant handles the parts that require consistency.

The shape this requires

The model has two sides, and neither is expensive.

On the tool-provider side, the surface description travels alongside the data — not replacing it. Consuming agents without a renderer ignore the surface and use the data as before. Consuming agents with a renderer pick up the display for free. The tool provider pays a one-time cost to include the surface; every consumer benefits.

On the consuming side, one generic renderer — built once — handles every tool that speaks the protocol. The renderer knows nothing about any domain. It knows how to render the component vocabulary the protocol defines. Every tool that speaks the protocol works, including tools the consuming application has never encountered before.

What this adds up to

The failure of Adaptive Cards was not a failure of the idea. It was a failure of the adoption model and the governance. The idea — that the data source knows better than the consumer how its domain should be displayed, and that this knowledge should travel with the data — is correct. It was simply applied in an environment where the incentives ran the wrong way.

The AI agent stack is the first environment where the incentives run correctly. Tool providers want their domain displayed right. Consuming applications want richness without rebuilding every domain from scratch. The generic renderer is built once and benefits from every new tool. The flight card designed by the travel application is the flight card that gets used everywhere, maintained by the people who know what flights are.

That is a different problem from the one Adaptive Cards tried and failed to solve. The shape is the same. The direction is opposite.

Beach is an open-source project from Cool AI. Packages are published at npmjs.com/~cool-ai.