[ BACK_TO_TRANSMISSIONS ]
[TUTORIAL]2026-04-28·5 min read

LLM TOKEN FAN-OUT: STREAM TO MULTIPLE CLIENTS AT ONCE

[TOKENS]

OpenAI streams tokens one at a time. You want them in your web UI as they arrive. You also want them in your mobile app. And your analytics pipeline. And your supervisor dashboard. Opening four separate OpenAI streams is expensive and slow. Fan-out is the answer.

THE PATTERN

One source session streams from the LLM. HZRelay duplicates each token frame to N registered subscribers. Backpressure is handled per subscriber — a slow client doesn't block the others.

fanout.ts
const session = createSession({
inbound: { type: 'websocket' },
llm: { provider: 'openai', apiKey: env.OAI },
outbound: { type: 'fanout', subscribers: [
{ type: 'websocket', id: 'web_client' },
{ type: 'websocket', id: 'mobile_client' },
{ type: 'webhook', url: 'https://analytics.internal/ingest' },
]},
});
 
// each subscriber gets every token as it arrives
session.on('llm.token', (e) => console.log(e.token))

MULTI-AGENT PIPELINES

Fan-out enables multi-agent chains. Agent A output becomes Agent B input. HZRelay routes the token stream between sessions — no manual piping, no shared state.

[ SYS_LOG ] LIVE
LLMstream open: agent_a → gpt-4o [triage]
OUT→ web_client (22ms latency)
OUT→ agent_b session [specialist]
OUT→ analytics webhook
OK3 subscribers — 0 drops — backpressure managed
_