arizuko › components › ttsd
ttsd is a thin reverse proxy in front of a text-to-speech backend. It accepts POST /v1/audio/speech in the OpenAI shape ({model, voice, input, response_format}), forwards the request verbatim to the configured backend, and streams audio bytes back to the caller.
It also exposes GET /v1/voices (a passthrough for backends that list voices) and a GET /health that returns 503 when the upstream is unreachable.
arizuko speaks one TTS protocol — OpenAI’s. ttsd pins that contract at the daemon boundary so the gateway and the send_voice MCP tool never see the choice of backend. Default is the bundled Kokoro-FastAPI container; flip TTS_BACKEND_URL to Piper, Coqui, OpenAI cloud, or anything else that speaks /v1/audio/speech, and no caller code changes.
It also normalises the health signal. Kokoro’s readiness probe and OpenAI’s 401 on an unauthenticated HEAD would each need a custom check in gated; ttsd hides that and reports one {status:"ok"|"disconnected"} shape matching every other arizuko adapter.
One request, one upstream call. ttsd runs an httputil.NewSingleHostReverseProxy against TTS_BACKEND_URL:
POST /v1/audio/speech with the OpenAI body.URL.Path verbatim onto the backend URL.Content-Type) back to the caller.502 tts backend unreachable.The /health handler probes the backend’s /health with a 3-second timeout, falling back to a HEAD / for backends that don’t expose one. Used by gated to gate send_voice: when the probe returns 503, the tool returns chanlib.ErrUnsupported and the agent falls back to a plain text reply.
agent
| send_voice(chat_jid, text, voice)
v
gated ---> POST TTS_BASE_URL/v1/audio/speech
|
v
ttsd ---> POST TTS_BACKEND_URL/v1/audio/speech
|
v
kokoro / piper / openai cloud
|
v audio bytes (ogg, mp3, …)
gated caches at <data_dir>/tts/<hash>.ogg, hands to adapter
Inputs: HTTP from gated (or any other OpenAI-shaped TTS caller). Outputs: audio bytes from the upstream backend. Hard deps: a reachable backend at TTS_BACKEND_URL.
Concepts: concepts/voice covers the full voice-in / voice-out flow including transcription, voice selection, caching, and per-platform delivery.
Yes. ttsd has no DB, no auth, no admin UI — just an env-configured reverse proxy. Front it with proxyd or any other auth layer when exposing it.
# bundled Kokoro backend
docker run -d --name kokoro -p 8881:8880 \
ghcr.io/remsky/kokoro-fastapi-cpu:latest
docker run -d --name ttsd -p 8880:8880 \
-e TTS_BACKEND_URL=http://localhost:8881 \
arizuko-ttsd:latest
# probe
curl -s localhost:8880/health
curl -s -X POST localhost:8880/v1/audio/speech \
-H 'content-type: application/json' \
-d '{"model":"kokoro","voice":"af_sky","input":"hello","response_format":"mp3"}' \
--output hello.mp3
TTSD_ADDR — listen address (default :8880).TTS_BACKEND_URL — OpenAI-compatible TTS backend (default http://kokoro:8880).LOG_LEVEL — debug / info / warn / error.Full list and defaults in reference/env.
ttsd/README.md — endpoint contract and backend options.specs/5/T-voice-synthesis.md — canonical voice spec.