Speech-to-text
Transcribe audio to text with Whisper Large v3 via POST /v1/audio/transcriptions — OpenAI-compatible.
Basic request
curl https://api.ecohash.com/v1/audio/transcriptions \
-H "Authorization: Bearer eco_YOUR_KEY" \
-F model=large-v3 \
-F file=@recording.mp3
Response:
{ "text": "Hello, this is a test of the EcoLink transcription API." }
Parameters
This endpoint takes multipart/form-data (not JSON):
| Field | Type | Default | Notes |
|---|---|---|---|
model | string | — | Required. "large-v3" |
file | file | — | Required. Audio file — mp3, wav, m4a, webm, ogg, flac |
language | string | auto-detect | ISO-639-1 code: en, es, fr, zh, ja, etc. Specify to skip auto-detect |
prompt | string | empty | Context prompt to bias the transcription |
response_format | string | "json" | "json", "text", "srt", "vtt", "verbose_json" |
temperature | number | 0 | Sampling temperature; raise if you get hallucinations on silent audio |
Response formats
json (default)
{ "text": "Full transcript goes here." }
text
Plain text, no JSON wrapping:
Full transcript goes here.
srt / vtt (subtitles with timestamps)
1
00:00:00,000 --> 00:00:04,000
Hello, this is a test.
2
00:00:04,000 --> 00:00:08,500
Of the EcoLink transcription API.
verbose_json (words + timestamps)
{
"task": "transcribe",
"language": "en",
"duration": 8.5,
"text": "Hello, this is a test. Of the EcoLink transcription API.",
"segments": [
{ "start": 0.0, "end": 4.0, "text": "Hello, this is a test." },
{ "start": 4.0, "end": 8.5, "text": "Of the EcoLink transcription API." }
]
}
Python
from openai import OpenAI
client = OpenAI(api_key="eco_...", base_url="https://api.ecohash.com/v1")
with open("recording.mp3", "rb") as f:
resp = client.audio.transcriptions.create(
model="large-v3",
file=f,
response_format="verbose_json",
)
print(resp.text)
for seg in resp.segments:
print(f"[{seg.start:.2f}-{seg.end:.2f}] {seg.text}")
Limits
- File size: up to 25 MB
- Duration: up to 30 minutes per request
- For longer audio: split client-side into 5–10 minute chunks, transcribe each, concatenate
Tips
- Specify
languagewhen you know it — avoids auto-detect mistakes on short clips. - Use
promptto bias toward domain terms:"This is a discussion about Kubernetes, nginx, and CEL expressions."helps the model spell technical terms. - Strip silence before upload if your audio has long silent gaps — Whisper can hallucinate on silence.
- Prefer mono, 16-kHz audio. Stereo gets downmixed; higher sample rates get resampled.
Billing
Speech-to-text bills per second of audio duration (not file size, not processing time).