Text-to-speech
Generate MP3 or WAV audio from text with Kokoro-82M via POST /v1/audio/speech — OpenAI-compatible.
Basic request
curl https://api.ecohash.com/v1/audio/speech \
-H "Authorization: Bearer eco_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro-82m",
"voice": "af_bella",
"input": "Hello from EcoLink. The weather today is beautiful."
}' \
--output out.mp3
The response body is the audio bytes — pipe directly to a file.
Parameters
| Parameter | Type | Default | Notes |
|---|---|---|---|
model | string | — | Required. "kokoro-82m" |
voice | string | "af_bella" | Voice identifier — see list below |
input | string | — | Text to synthesize, up to 4000 characters |
response_format | string | "mp3" | "mp3", "wav", or "opus" |
speed | number | 1.0 | 0.5–2.0, playback speed multiplier |
Voices
Kokoro ships about a dozen voices. Prefix af_ = American female, am_ = American male, bf_ = British female, bm_ = British male.
| Voice | Style |
|---|---|
af_bella | American female, warm and conversational |
af_nicole | American female, professional |
af_sarah | American female, friendly |
af_sky | American female, expressive, higher pitch |
am_adam | American male, casual |
am_michael | American male, deep and authoritative |
bf_emma | British female, refined |
bf_isabella | British female, animated |
bm_george | British male, resonant |
bm_lewis | British male, energetic |
More voices may be added — check the TTS playground for the current list.
Python
from openai import OpenAI
client = OpenAI(api_key="eco_...", base_url="https://api.ecohash.com/v1")
with client.audio.speech.with_streaming_response.create(
model="kokoro-82m",
voice="af_bella",
input="The quick brown fox jumps over the lazy dog.",
) as resp:
resp.stream_to_file("hello.mp3")
Tips
- Keep sentences reasonable length. Kokoro handles long text well but natural prosody is best for paragraph-sized chunks.
- Punctuation matters. Periods, commas, and question marks control pacing and intonation.
"Hello! How are you?"sounds better than"Hello how are you". - SSML-style control is limited. Kokoro respects basic text, not SSML tags. If you need precise pauses, add commas/ellipses.
- Test voice match to your use case. Different voices project different personalities — try a few in the Playground before committing.
Billing
Text-to-speech bills per second of audio generated — roughly proportional to text length.