Routing Decisions Made by an LLM

A question that arises naturally once a Beach application is up and running is whether the routing decisions themselves — the dispatching of an event to one handler rather than another — could be made by an LLM rather than by a when predicate over typed payload fields. The answer, which this article comes back to repeatedly, is yes, of course; but it is not always wise, and it is sometimes precisely the wrong tool.

The trouble with the question is that it elides two genuinely different sorts of routing decision. One is structural routing, where the routing key is a deterministic property of the event — a priority field set upstream, a channelId, an authenticated user's role. The other is semantic routing, where the routing key is a judgement about the meaning of the event — what the user actually wants, whether the message is a complaint or a question, which of three product domains a free-text request belongs to.

Structural routing is what when predicates were built for. Semantic routing is where LLM-driven routing earns its place.

Where structural routing is sufficient

Consider an internal HR support application. An employee submits a query through a web form that includes a structured category field (a dropdown: "leave", "benefits", "policy", "other"). The application routes leave queries to one orchestrator (which has the leave-balance and calendar tools), benefits queries to a second (with the benefits-database tools), policy queries to a third, and the residual "other" category to a generalist orchestrator that escalates to a human if needed.

This is straightforward structural routing. The category field was set by the form, by a human who knew what they wanted to ask about; the routing rule simply matches it.

router.loadRoutingConfig({
  rules: [
    { source: 'hr-form', eventType: 'submitted',
      handler: 'leave-orchestrator',
      when: { payload: { category: { equals: 'leave' } } } },
    { source: 'hr-form', eventType: 'submitted',
      handler: 'benefits-orchestrator',
      when: { payload: { category: { equals: 'benefits' } } } },
    { source: 'hr-form', eventType: 'submitted',
      handler: 'policy-orchestrator',
      when: { payload: { category: { equals: 'policy' } } } },
    { source: 'hr-form', eventType: 'submitted', handler: 'generalist-orchestrator' },
  ],
});

There is no benefit to involving an LLM here. The structural data already encodes the routing decision. Putting an LLM in the path adds latency, cost, and a new failure mode (the LLM mis-classifies a message that the dropdown had already correctly categorised) for no offsetting gain.

The temptation to over-engineer routing — to assume that "an LLM that reads the message body" will always be more accurate than "the field the user filled in on the form" — is one of the more common architectural missteps in early Beach applications. The structural data, where it exists, is almost always more reliable than what an LLM would infer about the same content.

Where semantic routing earns its place

Consider, instead, a corporate customer-success function with a single email address that receives all incoming customer correspondence: complaints, questions, feature requests, account changes, occasionally spam, occasionally messages intended for the sales address that arrived here by mistake. The structural data is useless — every email arrives over the same channel, from the same domain of senders, with no machine-readable categorisation. The decision of where to route each message is irreducibly a judgement about its content.

This is where an LLM, used carefully, is the right tool. A small, fast model — Claude Haiku, or any of its peers in that performance tier — acts as a classifier. It reads the email, decides what kind of message it is, and emits a structured decision that deterministic routing rules then act on.

const triageActor = {
  id: 'inbound-triage',
  model: 'claude-haiku-4-5',
  systemPrompt: [
    respondToolSnippet,
    turnStatesSnippet,
    `You triage inbound customer email. Read the message and classify it as one of:
- 'complaint'      — the customer is unhappy about something
- 'question'       — the customer wants to know something
- 'feature-request' — the customer is suggesting an improvement
- 'account-change' — the customer wants to update their account
- 'misrouted-sales' — the message is clearly intended for sales
- 'spam'           — the message is unsolicited marketing or junk

Reply only with respond({ parts: [{ partType: 'response', data: { class: '<one of above>' } }], turnState: 'complete' }).`,
  ].join('\n\n'),
  tools: [],
  domainDataSchema: {
    type: 'object',
    properties: {
      class: { enum: ['complaint', 'question', 'feature-request', 'account-change', 'misrouted-sales', 'spam'] },
    },
    required: ['class'],
  },
  maxTokens: 64,
  temperature: 0,
};

The triage actor's job is narrow: read the message, emit a single classification. The output is a structured decision the deterministic routing rules can act on:

router.loadRoutingConfig({
  rules: [
    { source: 'email', eventType: 'received', handler: 'triage-handler' },

    { source: 'assistant', eventType: 'triage_complete',
      handler: 'complaints-orchestrator',
      when: { payload: { class: { equals: 'complaint' } } } },

    { source: 'assistant', eventType: 'triage_complete',
      handler: 'questions-orchestrator',
      when: { payload: { class: { equals: 'question' } } } },

    { source: 'assistant', eventType: 'triage_complete',
      handler: 'feature-request-archivist',
      when: { payload: { class: { equals: 'feature-request' } } } },

    { source: 'assistant', eventType: 'triage_complete',
      handler: 'account-change-orchestrator',
      when: { payload: { class: { equals: 'account-change' } } } },

    { source: 'assistant', eventType: 'triage_complete',
      handler: 'forward-to-sales',
      when: { payload: { class: { equals: 'misrouted-sales' } } } },

    { source: 'assistant', eventType: 'triage_complete',
      handler: 'spam-archive',
      when: { payload: { class: { equals: 'spam' } } } },
  ],
});

The architecture is therefore a small LLM doing the irreducibly judgement-shaped part of the work, and a sequence of deterministic routing rules acting on its structured output. The LLM is in the path, but only briefly, and only for the bit of work that genuinely requires it.

Examples of applications that benefit from LLM-driven routing

A handful of canonical cases, drawn from real Beach deployments and adjacent agentic applications:

Inbound triage on unstructured channels. Email, voicemail (transcribed), social-media DMs, public web-form submissions where the user did not select a category. The LLM reads the message and classifies it; everything downstream is deterministic.

Intent disambiguation in conversational interfaces. A chat panel where users type free text. "Where is my booking?" should route to the booking-status orchestrator; "I want to change my booking" should route to the amendment orchestrator; "I'd like a refund" should route to a complaint-and-refund orchestrator that can summon a human if the policy requires. The structural data (just the message text) is insufficient; an LLM disambiguates intent, emits a structured intent field, and the routing rules take it from there.

Severity assessment for incident reports. An internal IT-support application receives reports of varying severity in free text. "My printer is jammed" is a P3; "the company VPN is down for the entire London office" is a P1. The LLM reads the report, classifies severity, and routes — to an automated knowledge-base lookup for low severity, to a paging system for high severity.

Language-and-region routing. A multi-region application receives messages in many languages. The LLM identifies the language and region, dispatches to the appropriate region's orchestrator (which has region-specific compliance rules and locale-aware tools).

Compliance pre-screening. A financial-services application receives messages and screens for content that requires specific regulatory handling — anything mentioning insider information, anything that could constitute a complaint under Consumer Duty rules, anything that triggers a Subject Access Request workflow. The LLM reads, flags, and routes; downstream handlers ensure regulatory obligations are met without relying on the customer to identify the regulatory shape themselves.

In every one of these cases, the LLM is doing what an LLM is uniquely good at: reading unstructured natural-language input and producing a typed, machine-readable judgement about its meaning. Once that judgement is in hand, the rest of the application is deterministic.

When LLM-driven routing is the wrong tool

An LLM is the wrong choice for routing decisions in any of the following circumstances:

The routing key is already structurally available. The HR-form example above. When the user has already told the form their query is about leave, the LLM should not second-guess them. There are edge cases where the form's category is misleading — a user filed a "policy" query that is in fact a complaint — but those belong with the destination orchestrator (which can recognise "this looks like a complaint" and either re-route or escalate), not with an LLM that overrides the structural routing.

The decision must be deterministic for compliance or audit reasons. A trade-routing system in a financial firm cannot route trades through an LLM whose decisions are not reproducible byte-for-byte across runs. Some industries require deterministic dispatch even where an LLM would, in practice, be more accurate; the regulatory burden of explaining and auditing an LLM-driven decision outweighs the loss in precision.

The latency budget will not accommodate an extra LLM hop. Most LLM classifications are sub-second on Haiku-class models, but a routing decision that sits inside a real-time control loop where every millisecond matters is a poor place to spend the hop.

The decision is high-stakes and the model is not. A small fast model classifying a message that triggers a £10,000 spend is asking the wrong question of the wrong model. Either escalate the routing decision to the larger orchestrator (which has more context and a richer prompt) or interpose a human review. Do not try to extract production-grade decisions from a Haiku-class triage actor on inputs that genuinely matter.

The classification will drift on prompt changes. An LLM-driven routing decision is, ultimately, an artefact of the prompt that produced it. A prompt change tomorrow will route some messages differently than today. When the application has consumers downstream of the routing decision who have come to depend on it being stable — "we always route X to handler Y" — an LLM-driven router will betray that expectation. Consider whether the routing key needs to be a feature contract; if it does, it should be deterministic.

How to do it well, when you do

Once a team has decided that LLM-driven routing belongs in some part of the application, a handful of practices apply:

Use a small, fast, low-temperature model. Haiku-class. The decision is narrow; larger models add cost and latency without meaningful accuracy gain on this kind of classification.

Constrain the output with domainDataSchema. The triage actor should emit a typed decision against an enum. Beach's domainDataSchema enforces the shape at the respond() boundary, and downstream handlers can branch on event.data.class === 'complaint' deterministically, without parsing free text.

Keep the prompt narrow and the tool list empty. A triage actor that has tools available to it is no longer a triage actor; it is a small orchestrator. When the decision genuinely needs to look up data, that is a different kind of work and should not be conflated with routing.

Log every routing decision. The LLM's classification, its reasoning (if it has been asked for), and the triggering event. When a routing decision turns out to have been wrong, the log is what tells the team whether the issue was the model, the prompt, or the input.

Have a fallback. When the LLM's output cannot be parsed against the domainDataSchema — the model emitted something outside the enum, or respond() failed to validate — do not silently drop the message. Route it to a default orchestrator (usually the generalist) and surface the parse failure to observability.

Consider whether the routing decision belongs in the orchestrator instead. The right architecture is sometimes not "triage actor classifies, deterministic rules dispatch" but "single orchestrator handles everything, with the routing implicit in which tools it calls". Triage-then-route is appropriate when there are genuinely distinct downstream orchestrators with different prompts, tools, and concerns; the single-orchestrator pattern is appropriate when the work is similar enough that a single agent's tool selection is sufficient.

Cost considerations

A small, fast classifier model called once per inbound message costs, in 2026 prices, on the order of fractions of a penny per call — perhaps £0.001 for a Haiku-tier classification of a typical email. Scaled to a busy customer-success function processing thousands of messages a day, this is a few pounds a day in classification cost. Most teams can absorb this readily.

The cost calculus changes if the application's volume is much higher (millions of messages a day) or if the classification step is performed multiple times per message. In high-volume cases, conventional non-LLM classification (regex, simple ML, dedicated classifier services) becomes more attractive, even at a small cost in accuracy. Beach does not impose an LLM-shaped solution; the triage actor can be replaced by a deterministic handler that uses any classifier the engineering team prefers.

Actors versus handlers — the broader question of when an LLM is the right tool.
Creating routing rules — the deterministic routing layer that follows an LLM-driven classifier.
Your first actor — the minimum LLM actor; suitable as a triage classifier with domainDataSchema constraining the output.
Anti-patterns — including the common mistake of putting an LLM in the path of a routing decision the structural data already encodes.