Text-to-speech

Generate MP3 or WAV audio from text with Kokoro-82M via POST /v1/audio/speech — OpenAI-compatible.

Basic request

curl https://api.ecohash.com/v1/audio/speech \
  -H "Authorization: Bearer eco_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "voice": "af_bella",
    "input": "Hello from EcoLink. The weather today is beautiful."
  }' \
  --output out.mp3

The response body is the audio bytes — pipe directly to a file.

Parameters

Parameter	Type	Default	Notes
`model`	string	—	Required. `"kokoro-82m"`
`voice`	string	`"af_bella"`	Voice identifier — see list below
`input`	string	—	Text to synthesize, up to 4000 characters
`response_format`	string	`"mp3"`	`"mp3"`, `"wav"`, or `"opus"`
`speed`	number	1.0	0.5–2.0, playback speed multiplier

Voices

Kokoro ships about a dozen voices. Prefix af_ = American female, am_ = American male, bf_ = British female, bm_ = British male.

Voice	Style
`af_bella`	American female, warm and conversational
`af_nicole`	American female, professional
`af_sarah`	American female, friendly
`af_sky`	American female, expressive, higher pitch
`am_adam`	American male, casual
`am_michael`	American male, deep and authoritative
`bf_emma`	British female, refined
`bf_isabella`	British female, animated
`bm_george`	British male, resonant
`bm_lewis`	British male, energetic

More voices may be added — check the TTS playground for the current list.

Python

from openai import OpenAI
client = OpenAI(api_key="eco_...", base_url="https://api.ecohash.com/v1")

with client.audio.speech.with_streaming_response.create(
    model="kokoro-82m",
    voice="af_bella",
    input="The quick brown fox jumps over the lazy dog.",
) as resp:
    resp.stream_to_file("hello.mp3")

Tips

Keep sentences reasonable length. Kokoro handles long text well but natural prosody is best for paragraph-sized chunks.
Punctuation matters. Periods, commas, and question marks control pacing and intonation. "Hello! How are you?" sounds better than "Hello how are you".
SSML-style control is limited. Kokoro respects basic text, not SSML tags. If you need precise pauses, add commas/ellipses.
Test voice match to your use case. Different voices project different personalities — try a few in the Playground before committing.

Billing

Text-to-speech bills per second of audio generated — roughly proportional to text length.

Basic request​

Parameters​

Voices​

Python​

Tips​

Billing​