Setting Up WhatsApp
This guide walks the reader from a fresh Meta Business Account to a Beach-driven WhatsApp number where the orchestrator reads each incoming message, runs through its tool loop, and replies — quoting the parent message when appropriate.
The wiring uses three packages:
@cool-ai/beach-channel-whatsapp— webhook inbound and Cloud-API outbound, plus theWhatsAppTextFormatterbaseline.@cool-ai/beach-transport-whatsapp— wire layer underneath the channel; the channel package depends on it transitively.@cool-ai/beach-core— theManifestthat gates "send only when ready".
A Beach orchestrator is assumed to be wired already. See Getting Started for the basics; this guide concerns the WhatsApp-specific pieces.
Install
npm install \
@cool-ai/beach-channel-whatsapp \
@cool-ai/beach-core
@cool-ai/beach-transport-whatsapp arrives as a transitive dependency. WhatsApp does not need any IMAP / SMTP libraries — just the Meta HTTPS endpoint.
Pre-flight — what you need
A real WhatsApp Business deployment requires a Meta Business Account, a verified business phone number, an App with WhatsApp Business product enabled, and a webhook URL Meta can reach over HTTPS. Follow Meta's setup once; Beach has no role in any of that. By the time you start this guide you should have:
phoneNumberId— Meta-issued id for the WhatsApp Business phone number. Used as the path segment on outbound/messagesPOSTs.- App Secret — used to verify Meta's
X-Hub-Signature-256on inbound webhooks. Treat as a credential. - Verify Token — a string you chose when registering the webhook. Meta echoes it during the GET handshake.
- Bearer Token — the access token that authorises outbound
/messagesPOSTs. System-user tokens are long-lived; user tokens expire. ThetokenProvidercallback is responsible for refresh. - A reachable webhook URL — the host your Beach app runs on, with TLS, registered with Meta. For local development, use a tunnel (
ngrok,cloudflared).
Step 1 — Configure the WhatsAppChannel
import { WhatsAppChannel } from '@cool-ai/beach-channel-whatsapp';
const whatsappChannel = new WhatsAppChannel({
id: 'whatsapp',
transport: {
phoneNumberId: process.env.META_PHONE_NUMBER_ID!,
appSecret: process.env.META_APP_SECRET!,
verifyToken: process.env.META_VERIFY_TOKEN!,
tokenProvider: async () => process.env.META_BEARER_TOKEN!,
},
onInbound: async (missive) => {
// We will fill this in below.
},
});
The tokenProvider callback runs on every send. For a system-user token this can be a closure over a static value. For a user token, the provider owns refresh — Beach calls; the provider returns a current access token. Same shape as the SMTP OAuth2 path in @cool-ai/beach-channel-email.
Step 2 — Mount the webhook
import express from 'express';
const app = express();
// IMPORTANT — do NOT mount express.json() ahead of the WhatsApp handler.
// Meta signs the raw bytes; a re-serialised JSON body will not verify.
app.all('/whatsapp/webhook', whatsappChannel.webhookHandler());
// Other routes (with their own JSON parsing if needed) are fine to mount
// alongside; the parser-bypass only matters for the webhook path.
app.use('/api', express.json(), apiRouter);
app.listen(process.env.PORT);
The handler answers two flavours of request:
- GET — Meta's subscription handshake. Responds with
hub.challengewhenhub.mode === 'subscribe'and the verify-token matches; 403 otherwise. Meta sends this once when you register the webhook URL. - POST — webhook delivery. Verifies
X-Hub-Signature-256in constant time, parses the payload, and invokesonInboundfor each contained user message.
The handler always responds 200 once the signature has verified. Errors thrown from onInbound are caught and logged — Meta retries 5xx aggressively, and idempotent processing on messageId is the consumer's job.
Step 3 — Open a Delivery Manifest per inbound
When a message arrives, the channel calls onInbound with a parsed Missive. Three things happen in order: persist the inbound for audit; open a Delivery Manifest keyed to the inbound message id; run a turn whose settled parts fill the manifest, whose onComplete formats and sends.
import { Manifest, ManifestRegistry } from '@cool-ai/beach-core';
import { randomUUID } from 'node:crypto';
const manifestRegistry = new ManifestRegistry();
whatsappChannel.onInbound = async (inboundMissive) => {
// 1. Persist the inbound for audit. The store is consumer-owned.
await missiveStore.write(inboundMissive);
// 2. Open a Delivery Manifest keyed to the inbound message id. The
// outbound message will be held until `main_reply` is filled.
const manifestId = `whatsapp-delivery:${inboundMissive.id}`;
const manifest = new Manifest({
id: manifestId,
expected: ['main_reply'],
onComplete: async (filled) => {
await whatsappChannel.sendFormatted({
inbound: inboundMissive,
filledSlots: filled,
});
},
});
manifestRegistry.register(manifest);
// 3. Resolve a session. WhatsApp clusters by phone number unless the
// user used the quote-reply UI, in which case `threadId` is the
// quoted message id. Resolve accordingly.
const sessionId = await resolveSessionForThread(inboundMissive.threadId!);
const turnId = randomUUID();
// 4. Run a turn. The actor's final respond() emits the parts that fill
// `main_reply`. Interim respond() parts are ignored by the batched
// edge — they do not reach the manifest.
const settled = await sessionManager.runTurn({
sessionId, turnId,
actorId: 'concierge',
actorConfig: { /* ... */ },
registry: tools,
provider,
inboundMessage: { role: 'user', content: inboundMissive.parts[0]?.text ?? '' },
slotKey: 'concierge.reply',
});
// 5. Fill the manifest. The default formatter (WhatsAppTextFormatter)
// walks the parts, joins text-bearing ones, and sends a text message.
manifestRegistry.deliver(manifestId, 'main_reply', { parts: settled.parts });
};
The manifest lives above the turn. The session manager, router, and actor know nothing about WhatsApp; only the inbound/outbound edges do.
Step 4 — Customise the formatter (optional)
sendFormatted defaults to WhatsAppTextFormatter. Pass a custom formatter to render a2ui-surface parts as interactive buttons, artifact parts as media, or anything else WhatsApp's wire format supports:
import type { ChannelFormatter } from '@cool-ai/beach-format';
import type { OutboundWhatsApp } from '@cool-ai/beach-channel-whatsapp';
const buttonsOrText: ChannelFormatter<OutboundWhatsApp> = {
channelClass: 'whatsapp',
async format({ inbound, filledSlots }) {
const parts = filledSlots.get('main_reply') as Array<{ partType: string; text?: string; data?: unknown }>;
const surface = parts.find(p => p.partType === 'a2ui-surface');
if (surface !== undefined && hasButtonChoices(surface.data)) {
return {
messageType: 'interactive-buttons',
to: inbound.origin.address!,
body: extractText(parts),
buttons: extractButtonChoices(surface.data).slice(0, 3),
};
}
return {
messageType: 'text',
to: inbound.origin.address!,
body: extractText(parts),
};
},
};
await whatsappChannel.sendFormatted({ inbound, filledSlots }, buttonsOrText);
WhatsApp interactive buttons cap at 3; lists cap at 10 rows total. The wire-layer adapter validates these limits and throws before posting — the formatter does not need to recheck.
Step 5 — Test the round trip
Send a message to the configured phone number from a real WhatsApp client. Watch the logs:
- Meta delivers the webhook; signature verifies; the parsed inbound shows
messageId,from,content.body. onInboundruns; the inboundMissivelands in the store; the manifest opens.- The turn runs; the orchestrator emits its parts;
main_replyfills. - The manifest's
onCompletefires; the formatter produces anOutboundWhatsApp; the Cloud API POST returns awamid. - The reply arrives on the WhatsApp client.
For continuity, reply to the agent's reply and check that:
- The follow-up lands in the same session —
threadIdresolution is by phone number when there's no quote, or by quoted message id when the user used quote-reply. - The orchestrator sees the new message as a follow-up to the conversation, not as a new query.
Common pitfalls
express.json() mounted ahead of the webhook. Meta signs the raw bytes; once Express has parsed-and-re-serialised the body, the signature verification fails with a 401 and the inbound is dropped. Mount the webhook on a path that bypasses the JSON body parser (the example above does this by mounting whatsappChannel.webhookHandler() first, then app.use('/api', express.json(), ...) for everything else).
Bearer-token expiry under load. A user-context token expires hours; if your tokenProvider returns a stale value, every send fails with 401 until the next cache refresh. Make the provider refresh-aware (refresh proactively shortly before expiry, not on the failed request). System-user tokens are long-lived but can still be revoked.
Replies to outbound messages don't quote. WhatsAppTextFormatter does not quote-reply by default. Construct it with { quoteParent: true } to copy inbound.origin.messageId into the outbound contextMessageId, which renders Meta's quote-reply UI.
"No text content survived rule filtering." The default formatter throws when the main_reply slot has no text-bearing parts. This usually means the delivery rules dropped everything — the orchestrator only emitted parts the rules consider non-text (thinking, progress, domain-data). Check the actor's tool loop; verify it eventually emits a response part. If you intentionally want a non-text reply (a media artefact only), implement a custom formatter and skip the text-body path.
Webhook delivery outside the verified phone number. Meta sends webhooks for every message your business account receives across all configured numbers. The wire layer's parser includes toPhoneNumber on each ParsedInboundWhatsApp so multi-number deployments can dispatch — but the channel package supports one number per WhatsAppChannel instance. Run one channel per number; route at the webhook layer (one endpoint per channel) or use a single endpoint that picks the channel by metadata.phone_number_id.
Message rate limiting. Meta enforces per-phone-number message rates. The Cloud API documents the current limits; bursts above them return 429. Beach does not rate-limit outbound sends; consumers wrap send themselves until a shared rate-limit primitive lands.
Media is identified, not uploaded. Outbound media messages take a mediaId from a prior /media POST. The wire-layer adapter does not upload — consumers either pass a public link to outbound, or call Meta's /media endpoint themselves and pass the resulting id.
What this leaves you with
A Beach application that reads its own WhatsApp number and replies. Same orchestrator, same tool loop, same audit trail as the chat path; only the inbound and outbound shape differs. Switch back to chat and the orchestrator does not change.
Related
- Setting up Email — sibling guide; same shape, different transport.
- Manifests — Delivery Manifests in detail.
- Streaming vs batched edges — why WhatsApp needs a Delivery Manifest.
- Reference: envelope — the part shape the formatter consumes.