The LLM is not the centre of your application

Open any guide to building an AI application in 2026 and the picture is the same: an LLM in the middle, tools on one side, users on another, memory somewhere, a prompt assembly layer on top. Everything revolves around the model call. The code follows the picture: the main loop is a conversation loop, the state is a conversation history, the integrations are presented as tools the model can choose to call.

This shape is natural enough that most teams never question it. It is also the reason a large fraction of AI-enabled applications built in the last two years will be rebuilt in the next two. The LLM is not the right thing to put at the centre. It is a participant. Treating it as the centre forces you to restructure every time the right participant mix changes — which, given how quickly the economics and capability landscape of AI is moving, is often.

What changes when the LLM is the centre

Imagine an application that classifies incoming support tickets and drafts a first-pass response. Twelve months ago, the sensible implementation was a single frontier-model call: prompt in, classification and draft out. Fine.

Six months ago, the costs of frontier-model calls at volume had grown large enough that a pipeline made more sense: a small classifier handles the routing, and only the tickets that need nuance go to the frontier model. Two participants. Different shape.

Today, a deterministic workflow handles ninety percent of cases at near-zero cost, with a small fast model only for novel wording and the frontier model reserved for escalations the deterministic path explicitly flags. Three participants, plus a routing rule. Different shape again.

If the application was built with an LLM at the centre, each of these transitions was a restructuring. The conversation loop had to become a pipeline. The pipeline had to learn to skip the model entirely for common cases. The state model, the logging, the error handling — all of it had to follow the reorganisation.

If the application was built with a routing core and a set of participants, none of the transitions required restructuring. You swap a participant. You add one. You change a routing rule. The rest of the application doesn't notice.

The participant model, stated plainly

The alternative is boringly familiar once you name it. An application has a routing core — a thing that receives events and decides which participant should handle them. A participant is anything that can receive an event and produce a result: an LLM call, a small classifier, a deterministic function, a long-running batch process, a human approval step, a peer agent over the network. Each participant implements the same interface. The routing core does not know what kind of participant it is sending an event to, only that it is a participant.

Results come back through the same interface. Where they come from, and how long they take, is a matter of participant implementation. A deterministic participant returns in microseconds. An LLM participant returns in seconds. A human participant returns in hours. A long-running batch participant returns tomorrow. The routing core treats all of these uniformly; it has some mechanism for holding calls open until results arrive, but that mechanism is indifferent to who is being waited on. (That mechanism is worth a piece of its own; I have written one at The asynchronous problem nobody names.)

This is not a novel architecture. It is what any experienced integration architect would draw on a whiteboard when asked to design a system that composes heterogeneous capabilities. What is novel is noticing that AI-enabled applications are integration architectures and should be drawn the same way.

Why the LLM-centric shape emerged

It emerged because the LLM used to be the special thing. In 2023, having an LLM in your application at all was the innovation. Building the application around it — making the LLM the protagonist, the tools its accessories — made sense because the LLM was what you were demonstrating.

That moment has passed. An LLM is one component among many. Some applications still revolve around a single LLM call, and that is fine for those applications — it corresponds to the "single participant" case in the participant model, which is the simplest possible instance, not a different architecture. But applications that compose multiple capabilities, handle different tasks with different economics, route different cases to different handlers — applications in other words that resemble the production AI systems now being built — no longer match the LLM-centric shape. They match the participant model. Most of them are being built with the old shape anyway, because the old shape is what the tutorials teach.

What to look for in your own code

A simple test: find the main control flow in your AI application. Does it read like a conversation between a user and a model, with side trips to tools? If so, the LLM is the centre.

Now imagine you need to replace the LLM with a deterministic workflow for the common case. How many files does that touch? How much of the state model survives? How many of your tests still apply?

In an LLM-centric application, the answer is "most of them, and it is a project." In an application with a participant model, the answer is "one file, most of them, and an afternoon." If your answer is closer to the first, your application has the wrong shape — not because it does not work today, but because the work needed to adapt it is disproportionate to the change being requested.

The harder version of the same point

The deeper reason this matters is not that LLMs are fashion-sensitive. It is that the industry has not yet converged on what the most useful AI primitive is. The frontier model is one candidate. A small specialised model is another. A composition of models with a router is another. A deterministic workflow with a single classifier at one step is another. A peer-agent network is another. All of these are being explored. None of them has definitively won. And the winners, when they emerge, will not be the same for every task.

An application that picks one of these and builds everything around it is betting on the winner, now, in a market that has not decided. An application that treats all of them as participants and composes what it needs has made no such bet. It is not that the second application is cleverer. It is that it has not taken a position on a question it does not need to answer.

The LLM is a participant. It is, today, a remarkably useful one. It may, tomorrow, be one of several equally-useful ones, or less useful than it is today, or more. The shape of your application should be indifferent to which of these turns out to be the case.