PHONE + WEB VOICE UNDER ONE SESSION MODEL
Most teams end up maintaining two completely separate voice pipelines. The Twilio pipeline handles inbound phone calls — mulaw codec, Media Streams WebSocket, telephony events. The WebRTC pipeline handles browser voice — Opus codec, ICE negotiation, browser APIs. Two codebases. Two sets of reconnect logic. Two dashboards. Same AI underneath.
THE PROBLEM
The divergence happens at the transport layer. Twilio and WebRTC speak completely different protocols. To unify them, you'd need to abstract both behind a common interface — handle codec differences at the boundary, normalize events, and expose the same API to your agent code regardless of which transport is active.
[WITHOUT HZRELAY]
Two separate WebSocket handlers. Two codec paths. Two reconnect strategies. Two sets of events your agent must handle. Doubling code for the same product feature.
THE SOLUTION: ONE SESSION CONFIG
HZRelay exposes a single session model. The inbound: field determines the transport. Everything downstream — STT, LLM, TTS, events — is identical regardless of whether the user is on a phone or in a browser.
SAME EVENTS, ALWAYS
Whether the call comes from Twilio or WebRTC, your agent sees identical events: speech.start, transcript.final, llm.response, tts.audio_start. The codec differences (mulaw vs Opus) are absorbed by the adapter layer — your code is transport-agnostic.