# ZuckZapGo β€” Calls API: complete LLM build guide > Paste this whole file into any LLM (Claude, GPT, Gemini, …) and ask it to build a > WhatsApp **voice and video call** integration against ZuckZapGo. It is self-contained: every > endpoint, the exact audio/video wire-formats, the enable sequence, full end-to-end flows, a > drop-in browser client, a server-side (Node/Python/Go) client, the AI-voice-agent mode, > passkey pairing, and the common pitfalls are all here. Nothing else is required. > > Native two-way call audio powered by **meowcaller** (https://github.com/purpshell/meowcaller) β€” πŸ™ thank you, Rajeh (@purpshell). - **Format**: this document is the source of truth. Field names, status codes, and the PCM framing below are exact β€” copy them verbatim. - **Two placeholders** you must fill in: `{BASE_URL}` (e.g. `https://your-host` or `http://localhost:8080`) and `{TOKEN}` (the per-instance user token). --- ## 0. The one-paragraph mental model ZuckZapGo exposes a **native, pure-Go VoIP engine**. You drive calls with plain **REST** (`/call/*`, JSON, token header) and you carry **audio** over a single **bidirectional WebSocket** at `GET /call/{call_id}/stream` and **video** over a second WebSocket at `GET /call/{call_id}/video/stream`. The audio socket speaks **raw PCM**: signed 16-bit little-endian, **16 000 Hz, mono**, in **960-sample (1920-byte, 60 ms) frames**. The video socket speaks raw **H.264 Annex-B access units** (one frame per binary message). The engine does not encode or decode pixels; your client must produce and consume H.264. There is **no WebRTC, no SDP, no ICE** to deal with β€” the engine already terminates the WhatsApp media relay (SRTP) for you and hands you decoded PCM / H.264. Your only job is: enable the engine, place/answer a call, open the WebSocket(s), and play/capture PCM/H.264. ``` control plane (REST, token header) your app ───────────────────────────────────────────────► ZuckZapGo ──► WhatsApp POST /call/dial Β· /call/answer Β· /call/hangup Β· … (VoIP engine) relay ◄─────────────────────────────────────────────── media plane (WebSocket, ?token=) your app ◄══════ s16le 16kHz mono PCM (peer audio) ════════ ZuckZapGo ◄═ SRTP/16kHz ═══════ s16le 16kHz mono PCM (your mic) ═══════► /call/{id}/stream your app ◄══════ H.264 Annex-B (peer video) ══════════════ ZuckZapGo ◄═ SRTP/H.264 ═══════ H.264 Annex-B (your camera) ═══════════► /call/{id}/video/stream ``` **Golden rules** 1. Audio/video only flows **after the call is answered** (callee picks up an outbound call, or you answer an inbound call). Before that the relay is silent β€” that is normal, not a bug. 2. For app/browser-driven two-way audio/video, the instance **must** be in `call_inbound_mode: "manual"`. Other modes make the engine answer and handle audio server-side. 3. The audio WebSocket and server-side **recording are mutually exclusive per call** β€” they share the call's single inbound sink. Use one or the other. 4. Video is **optional**: a call can be audio-only (`video: false`) or video (`video: true`). You may open the audio socket, the video socket, or both. --- ## 1. Authentication | Surface | How to authenticate | |---|---| | REST `/call/*` (and all standard endpoints) | HTTP header `token: {TOKEN}` | | WebSocket `/call/{call_id}/stream` | Query param `?token={TOKEN}` (browsers cannot set WS headers). The `token:` header also works for non-browser clients. | | Admin endpoints (not needed for calls) | HTTP header `Authorization: {ADMIN_TOKEN}` | The token identifies the **instance** (one WhatsApp number). All `/call/*` calls operate on that instance's engine. --- ## 2. Response envelope Every JSON response is wrapped: ```jsonc // success { "code": 200, "data": { /* payload */ }, "success": true } // error { "code": 409, "error": "calls engine not enabled for this instance ...", "success": false } ``` Always read the payload from `.data` and check `.success` / HTTP status. Examples below show the `data` payload. --- ## 3. Two call APIs β€” use the native engine one for audio ZuckZapGo has **two** families under `/call/*`. Do not mix them up: | Family | Endpoints | Purpose | Audio? | |---|---|---|---| | **Native VoIP engine** βœ… | `/call/config`, `/call/status`, `/call/dial`, `/call/answer`, `/call/hangup`, `/call/play`, `/call/record/start`, `/call/record/stop`, `/call/{id}/stream`, `/call/{id}/video/stream`, `/call/{id}/video/state` | Real media: dial, answer, two-way PCM audio, two-way H.264 video, play files, record, AI agents | **Yes** β€” this is the one you want | | Legacy signaling (Spec 004) | `/call/reject`, `/call/accept`, `/call/preaccept`, `/call/terminate`, `/call/initiate`, `/call/reject/send` | Raw WhatsApp call-signaling control only | **No.** `/call/initiate` returns **501** (needs a WebRTC stack not in this build). Ignore these for audio. | **To make calls with audio, only use the Native VoIP engine endpoints.** Outbound = `POST /call/dial` (never `/call/initiate`). --- ## 4. Endpoint reference (Native VoIP engine) All paths are relative to `{BASE_URL}`. All REST calls send `token: {TOKEN}`. ### 4.1 `GET /call/config` β€” read engine configuration Response `data`: ```jsonc { "callsEnabled": true, "callInboundMode": "manual", // manual | bot | ivr | ai | webhook | reject "callRecord": false, "callSttUrl": "", // AI modes only "callLlmUrl": "", "callTtsUrl": "", "callSystemPrompt": "", "callGreeting": "" } ``` > `callProviderToken` is write-only and never returned. ### 4.2 `PUT /call/config` β€” update engine configuration (applies on next reconnect) Body: ```jsonc { "callsEnabled": true, "callInboundMode": "manual", // empty defaults to "webhook"; invalid value β†’ 400 "callRecord": false, "callSttUrl": null, // AI modes (ai/ivr/bot); see Β§9 "callLlmUrl": null, "callTtsUrl": null, "callProviderToken": null, // bearer token for the STT/LLM/TTS provider(s) "callSystemPrompt": null, "callGreeting": null } ``` Response `data`: `{ "ok": true }`. **The new config takes effect on the instance's next reconnect** β€” see Β§5. ### 4.3 `GET /call/status` β€” list live calls Response `data`: ```jsonc { "calls": [ { "callId": "A1B2C3...", // engine's own id β€” use this everywhere "peer": "5521999999999@s.whatsapp.net", "state": "active", // idle|calling|ringing|connecting|active|ended|unknown "direction": "inbound", // inbound | outbound "video": false, "recording": false, "startedAt": "2026-06-28T12:00:00Z" } ] } ``` Poll this (~2.5 s) to detect **inbound ringing calls** and remote hangups β€” the media socket alone does not signal a remote hangup. **Detect "call ended" by the call's _disappearance_ from the `calls[]` array, not by a `state:"ended"` value**: when a call ends the engine stops tracking it, so it vanishes from the list (you will rarely, if ever, observe `state:"ended"` here). Once a call you were in is gone from `calls[]`, tear your UI/socket down. ### 4.4 `POST /call/dial` β€” place an outbound call Body: `{ "phone": "5521999999999", "video": false }` - `phone` = **E.164 digits only, no `+`** (country code + number). - `video` = `true` starts an H.264 video call (the offer advertises video capability). - Response `data`: `{ "callId": "A1B2C3..." }`. - **409** if the engine is not enabled (enable + reconnect first, Β§5). - **502** if the peer is unreachable. After dialing, open the media WebSocket(s) (Β§6, Β§7) immediately; audio/video starts when the callee answers. ### 4.5 `POST /call/answer` β€” answer a ringing inbound call (manual mode) Body: `{ "callId": "A1B2C3..." }` β†’ `data: { "ok": true }`. **404** if the call is not found. Then open the media WebSocket (Β§6). ### 4.6 `POST /call/hangup` β€” end a live call Body: `{ "callId": "A1B2C3..." }` β†’ `data: { "ok": true }`. **404** if not found. Also close the WebSocket. Use this to **decline** an inbound call too. ### 4.7 `POST /call/play` β€” play an audio file/clip into the call Body: `{ "callId": "...", "audioUrl": "https://…/clip.mp3" }` or `{ "callId": "...", "audioBase64": "" }`. - Format auto-detected (WAV / MP3 / Ogg-Opus) and resampled to 16 kHz mono. - Plays to the peer. β†’ `data: { "ok": true }`. Works alongside or instead of the mic stream. ### 4.8 `POST /call/record/start` β€” start server-side recording Body: `{ "callId": "..." }` β†’ `data: { "ok": true }`. Records the **peer's** audio to a WAV. Mutually exclusive with the media WebSocket on the same call. **`start` succeeds regardless of storage config** β€” a missing/misconfigured S3 does not fail here; it surfaces later as an empty `mediaKey` on `stop` (see Β§4.9). ### 4.9 `POST /call/record/stop` β€” stop + upload recording Body: `{ "callId": "..." }` β†’ `data: { "ok": true, "mediaKey": "" }`. The WAV is uploaded to object storage (S3) and `mediaKey` is its key. **Treat an empty `mediaKey` (`""`) as failure** β€” `ok` is still `true` even when the upload had no storage configured or errored, so check `mediaKey` is non-empty, not just `ok`. ### 4.10 `GET /call/{call_id}/stream` β€” bidirectional PCM media WebSocket The audio plane. Full protocol in Β§6. Auth via `?token={TOKEN}`. ### 4.11 `GET /call/{call_id}/video/stream` β€” bidirectional H.264 video WebSocket The video plane. Open this **in addition to** the audio socket for a video call. Auth via `?token={TOKEN}`. ``` GET {BASE_URL_AS_WS}/call/{call_id}/video/stream?token={TOKEN} ``` - Use **binary** frames only; text frames are ignored. - **Server β†’ client** (binary): peer video as **H.264 Annex-B access units**. Each message is one access unit (NALUs prefixed with start code `0x00 0x00 0x00 0x01`). Feed directly to a hardware or software H.264 decoder (e.g. `