The asynchronous problem nobody names
A user asks your AI-enabled application a question. The application decides that answering well requires three things: a record from a local database, a peer agent's opinion, and — because the answer will commit the user to spending money — a human's approval before the answer is returned.
The database record comes back in ten milliseconds. The peer agent responds in nine seconds. The human approves in forty minutes. The application's job is to hold the user's original question open, in some coherent state, for those forty minutes; to stitch the three results into one answer when they are all in; and to deliver that answer on whatever channel the user was speaking on, which may or may not be the channel that is still connected.
This pattern — a single call whose completion requires several heterogeneous results that arrive asynchronously, from different places, at different times — is the common case in any AI application that does real work. It is not a special case. It is what happens the moment the application stops being a single-model prompt-response toy.
And yet, in almost every codebase where this pattern appears, there is no named abstraction for it. The work of holding the call open, collecting the parts, and delivering the completed whole is distributed across dozens of files, implemented slightly differently in each, and almost impossible to reason about as a single concern. The missing primitive has a shape, once you look for it, but most teams discover the shape only after three failed attempts to handle a specific case.
This article is about the shape of the missing primitive and the symptoms you can use to tell whether you are hitting the wall that requires it.
What real AI applications look like
A toy AI application has a simple structure. A user sends a request. The application calls an LLM. The LLM returns a result. The application returns the result to the user. The entire flow is synchronous and linear. The total time is roughly the model's time. Errors are exceptions thrown up the call stack.
A real AI application looks nothing like this. Most real calls fan out. The LLM's tool calls trigger downstream work. Some of the tool calls are fast and local. Some are slow and remote. Some return structured data. Some return streaming text that needs to be accumulated and processed before it is useful. Some return nothing at all until a human has intervened. Some return results that trigger further LLM calls, which trigger further tool calls, several turns deep.
Meanwhile, the original user request is still open. Somewhere, something is holding the expectation that a result will eventually be delivered, on a specific channel, tied to a specific thread, in a form the channel can consume. If the user was interacting via an SSE stream, the stream is waiting. If the user was interacting via email, a draft reply needs to be assembled and sent. If the user disconnected, the reply needs to go somewhere the user can find it later — an inbox, a notification, a persistent chat history.
None of this is exotic. This is how real applications work. What is exotic — in the pejorative sense — is that most codebases implement all of this ad-hoc, per-case, with deep coupling between the handler of a specific request and the mechanism of delivering the specific reply.
The symptoms of the missing primitive
If your AI codebase is hitting the wall that the missing primitive would solve, you are seeing some combination of these symptoms:
The handler knows about the channel. The function that processes a user's request ends up aware of whether the request came in over HTTP, over a message queue, or over a chat integration, because it needs to know how to send the result back. This aggravates quickly: each new channel adds conditional branches to handlers that should not care about channels.
Partial results are hard to handle. Your code is structured as "do all the work, then return the result." But for real applications, the first part of the answer should often be delivered to the user as soon as it is known, with further parts streaming in as they arrive. Retrofitting this into a handler-returns-result shape is painful; handlers end up writing directly to the channel, and the separation between "compute the result" and "deliver the result" collapses.
Human-in-the-loop is a special case. A call that requires a human approval is handled entirely differently from a call that requires a peer agent's response, even though structurally they are the same thing: an external party has to return something before the call can complete. The human-approval path is full of custom state machines, custom persistence, custom retry logic. The peer-agent path is too, and the two paths are slightly different for reasons nobody can explain any more.
Out-of-order results crash the code. A downstream result arrives a few hundred milliseconds after the fallback logic fired and the call already completed. The late result now has nowhere to go. Either it is silently discarded (and the code is subtly wrong) or it is handled by some opportunistic branch that is tested poorly because it rarely runs.
You can't tell, from the log, why a call completed when it did. A call returns to the user. Why then? Because all the results were in? Because a timeout fired? Because one participant produced a definitive answer that obsoleted the others? The log has the events but does not have the answer to "what closed this call."
If you are seeing one or two of these, you are at the boundary of needing the missing primitive. If you are seeing all five, you are well past it, and you are probably patching the consequences in fifty different files.
The shape of the primitive
The primitive that closes this gap has a shape that is easy to describe in words and, in practice, slightly harder to build than it looks.
It is a registry. When a call is made whose completion may involve several asynchronous results from heterogeneous sources, an entry is created. The entry records: which request this call belongs to, which results are expected, which have arrived, what the completion criterion is (all results, any one definitive result, a quorum, a timeout), and what to do when the entry is complete. Individual results arriving from wherever — downstream LLMs, tool responses, human approvals, peer agents, timeouts — register against the entry. The entry evaluates its completion criterion each time a result lands. When the criterion is met, the registered completion action fires: assemble the results, deliver them to whichever channel the original request indicated, emit whatever events the rest of the application needs to know it has happened.
This sounds like a to-do list. The useful thing about it is what it does for the rest of the code. The handler for the original request no longer has to know about the channel — it produces a result and lets the registry's completion action deliver it. The handlers for individual downstream results no longer need to know about the original request — they know only that their result is a fragment against an entry. Partial results are natural: each arrival is a state transition on the entry, not a new branch in handler code. Human-in-the-loop is not a special case: a human approval is just another kind of result against an entry, identical in structure to a peer agent's response. Out-of-order arrivals are not crashes: they are, at worst, late arrivals against an entry that has already completed, which the registry can handle uniformly. And the log becomes answerable to "why did this call complete when it did": the entry itself has the answer.
The shape is not complicated. What is uncommon is having an explicit, named abstraction for it rather than implementing the same logic seven times in seven different handlers.
Why this matters more in AI applications than in classical distributed systems
Distributed systems have been handling asynchronous fan-out for decades. Workflow engines, saga patterns, event-sourced systems — all of these solve parts of this problem. So why is the AI case different?
It is different in one specific way: the heterogeneity of participants. A classical distributed system has a few kinds of participant: services, queues, databases. An AI application has far more: LLMs, tools, small specialised models, humans, peer agents, batch processes, classifiers, policy engines, retrieval systems. Each has a different latency profile, a different reliability profile, a different output shape, a different failure mode. The workflow engines and saga patterns built for classical distributed systems make assumptions about these profiles that do not hold in the AI case.
A specifically AI-shaped version of the primitive is needed. It has to handle results that are streamed, not just single-valued. It has to handle results that come from humans and therefore take minutes or hours. It has to handle participants that can be asked to do the same work twice and may produce different results. It has to handle participants that want to emit partial structured output (a plan step, a tool call) as intermediate states rather than final results. These requirements extend the basic shape; they do not replace it.
What to do with this
If you are building an AI application and you recognise some of the symptoms above, two things are worth doing.
The first is naming the primitive in your own codebase, even informally, and starting to refactor the ad-hoc logic into it. Call it a ledger, a manifest, a registry, a pending-calls map — the name matters less than the existence. Until it has a name, it cannot be reasoned about.
The second is asking, when you build a new feature that involves any asynchronous participant, whether the feature fits naturally as an entry against the registry or whether it requires you to write yet another bespoke state machine. If it requires the latter, that is a signal that your primitive is either wrong or incomplete, and the fix is in the primitive, not in the feature.
Applications that get this right have a readable control flow, a debuggable log, and an architecture that survives the arrival of a new kind of participant without a rewrite. Applications that do not get it right grow increasingly baroque, with each new participant type making the others harder to reason about. The difference is not about which languages or frameworks are used. It is about whether a specific, unglamorous abstraction has been given a name and enforced.