HOMEDOCSQUICKSTART
[ GETTING_STARTED ]

Quickstart

SDK v0.1 — STABLE

HZRelay routes real-time streams between any provider. Voice, tokens, events — one SDK, one session model, zero codec code.

[WHAT YOU GET]

Working AI phone call (Twilio → Deepgram → OpenAI → ElevenLabs → caller) in under 30 minutes. No sample rate config. No reconnect logic. No separate web/phone pipelines.

1.0 HOW IT WORKS

[ PIPELINE_DIAGRAM ]

INBOUND
TWILIO
NORMALIZE
CODEC
STT
DEEPGRAM
LLM
OPENAI
TTS
ELEVENLABS
OUTBOUND
TWILIO

HZRelay owns every arrow. You configure; we route.

HZRelay sits between your providers. Twilio sends mulaw 8kHz — we transcode to PCM 16kHz before Deepgram sees it. ElevenLabs returns audio — we transcode back to mulaw before Twilio plays it. You never touch codecs.

2.0 PREREQUISITES

TwilioPSTN
DeepgramSTT
OpenAILLM
ElevenLabsTTS
HZRelayROUTER

3.0 INSTALL SDK

terminal
$ npm install @hzrelay/sdk
# or
$ pip install hzrelay

4.0 CREATE SESSION

Describe what you want. HZRelay handles how.

agent.ts — DEPTH 1 (you drive the loop)
import { createSession } from '@hzrelay/sdk'
const session = createSession({
apiKey: 'hz_...',
// ── inbound ──────────────────────────────
inbound: { type: 'twilio' },
// ── providers (your keys, never stored) ──
stt: { provider: 'deepgram', apiKey: process.env.DG_KEY },
llm: { provider: 'openai', apiKey: process.env.OAI_KEY },
tts: { provider: 'elevenlabs', apiKey: process.env.EL_KEY },
outbound: { type: 'twilio' },
});
// pipe events into your own agent logic
session.on('transcript', (e) => myAgent(e.text))
session.on('llm_response', (e) => session.speak(e.text))

[CODEC_NOTE]

Twilio sends mulaw 8kHz. Deepgram expects PCM 16kHz. HZRelay transcodes at every adapter boundary — you never specify encoding, sample rate, or chunk size.

5.0 CONFIGURE TWILIO

Point your Twilio phone number webhook here. When a call arrives, Twilio opens a Media Stream WebSocket to HZRelay. The session_id links it to your SDK session.

Twilio console → Phone Numbers → Voice webhook URL
// A call comes in → set webhook to:
https://your-server.com/twilio/stream?session_id={session.id}
// TwiML response (tells Twilio to open Media Stream)
<Response>
<Connect>
<Stream url="wss://your-server.com/twilio/stream?session_id={id}"/>
</Connect>
</Response>

6.0 OPTIONAL: AGENT MODE

Add an agent: block — HZRelay runs transcript → LLM → TTS automatically (Depth 2). No event handlers needed.

agent.ts — DEPTH 2 (we drive the loop)
const session = createSession({
// ... same provider config ...
// add this block ↓
agent: {
systemPrompt: 'You are a helpful scheduling assistant...',
memory: 'sliding-window', // context trimming — automatic
turnDetection: 'silence', // 'semantic' in Phase 3
bargeIn: true, // caller interrupts → TTS flushes
},
});
// that's it — caller hears AI respond in ~762ms

7.0 EVENTS REFERENCE

EVENTPAYLOADWHEN
session.created{ session_id }Session ready, pipeline active
speech.start{ ts }VAD detects speech began
speech.end{ ts, duration_ms }VAD detects silence threshold hit
transcript.interim{ text, confidence }Streaming partial STT result
transcript.final{ text, latency_ms }Final STT result — triggers LLM
llm.token_start{ latency_ms }First LLM token received
llm.response{ text }Full LLM response complete
tts.audio_start{ latency_ms }First TTS audio frame ready
tts.interruptednullBarge-in flushed TTS buffer
call.started{ stream_sid, call_sid }Twilio call connected
call.ended{ stream_sid }Call terminated
error{ code, provider, msg }Adapter error — check retryable

8.0 LATENCY METRICS

Every session records millisecond timestamps per stage. Call session.getMetrics() anytime — or hit the REST endpoint.

metrics response — GET /voice/metrics?session_id=a3f9b2c1
{
"audio_received_ms": 0,
"stt_start_ms": 12,
"stt_final_ms": 310,
"llm_start_ms": 312,
"llm_first_token_ms": 680,
"tts_start_ms": 685,
"tts_first_audio_ms": 760,
"audio_sent_ms": 762
}

NEXT STEPS