Skip to main content

Text-to-speech

Generate MP3 or WAV audio from text with Kokoro-82M via POST /v1/audio/speech — OpenAI-compatible.

Basic request

curl https://api.ecohash.com/v1/audio/speech \
-H "Authorization: Bearer eco_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro-82m",
"voice": "af_bella",
"input": "Hello from EcoLink. The weather today is beautiful."
}' \
--output out.mp3

The response body is the audio bytes — pipe directly to a file.

Parameters

ParameterTypeDefaultNotes
modelstringRequired. "kokoro-82m"
voicestring"af_bella"Voice identifier — see list below
inputstringText to synthesize, up to 4000 characters
response_formatstring"mp3""mp3", "wav", or "opus"
speednumber1.00.5–2.0, playback speed multiplier

Voices

Kokoro ships about a dozen voices. Prefix af_ = American female, am_ = American male, bf_ = British female, bm_ = British male.

VoiceStyle
af_bellaAmerican female, warm and conversational
af_nicoleAmerican female, professional
af_sarahAmerican female, friendly
af_skyAmerican female, expressive, higher pitch
am_adamAmerican male, casual
am_michaelAmerican male, deep and authoritative
bf_emmaBritish female, refined
bf_isabellaBritish female, animated
bm_georgeBritish male, resonant
bm_lewisBritish male, energetic

More voices may be added — check the TTS playground for the current list.

Python

from openai import OpenAI
client = OpenAI(api_key="eco_...", base_url="https://api.ecohash.com/v1")

with client.audio.speech.with_streaming_response.create(
model="kokoro-82m",
voice="af_bella",
input="The quick brown fox jumps over the lazy dog.",
) as resp:
resp.stream_to_file("hello.mp3")

Tips

  • Keep sentences reasonable length. Kokoro handles long text well but natural prosody is best for paragraph-sized chunks.
  • Punctuation matters. Periods, commas, and question marks control pacing and intonation. "Hello! How are you?" sounds better than "Hello how are you".
  • SSML-style control is limited. Kokoro respects basic text, not SSML tags. If you need precise pauses, add commas/ellipses.
  • Test voice match to your use case. Different voices project different personalities — try a few in the Playground before committing.

Billing

Text-to-speech bills per second of audio generated — roughly proportional to text length.