Speech-to-text

Transcribe audio to text with Whisper Large v3 via POST /v1/audio/transcriptions — OpenAI-compatible.

Basic request

curl https://api.ecohash.com/v1/audio/transcriptions \
  -H "Authorization: Bearer eco_YOUR_KEY" \
  -F model=large-v3 \
  -F file=@recording.mp3

Response:

{ "text": "Hello, this is a test of the EcoLink transcription API." }

Parameters

This endpoint takes multipart/form-data (not JSON):

Field	Type	Default	Notes
`model`	string	—	Required. `"large-v3"`
`file`	file	—	Required. Audio file — `mp3`, `wav`, `m4a`, `webm`, `ogg`, `flac`
`language`	string	auto-detect	ISO-639-1 code: `en`, `es`, `fr`, `zh`, `ja`, etc. Specify to skip auto-detect
`prompt`	string	empty	Context prompt to bias the transcription
`response_format`	string	`"json"`	`"json"`, `"text"`, `"srt"`, `"vtt"`, `"verbose_json"`
`temperature`	number	0	Sampling temperature; raise if you get hallucinations on silent audio

Response formats

`json` (default)

{ "text": "Full transcript goes here." }

`text`

Plain text, no JSON wrapping:

Full transcript goes here.

`srt` / `vtt` (subtitles with timestamps)

1
00:00:00,000 --> 00:00:04,000
Hello, this is a test.

2
00:00:04,000 --> 00:00:08,500
Of the EcoLink transcription API.

`verbose_json` (words + timestamps)

{
  "task": "transcribe",
  "language": "en",
  "duration": 8.5,
  "text": "Hello, this is a test. Of the EcoLink transcription API.",
  "segments": [
    { "start": 0.0, "end": 4.0, "text": "Hello, this is a test." },
    { "start": 4.0, "end": 8.5, "text": "Of the EcoLink transcription API." }
  ]
}

Python

from openai import OpenAI
client = OpenAI(api_key="eco_...", base_url="https://api.ecohash.com/v1")

with open("recording.mp3", "rb") as f:
    resp = client.audio.transcriptions.create(
        model="large-v3",
        file=f,
        response_format="verbose_json",
    )

print(resp.text)
for seg in resp.segments:
    print(f"[{seg.start:.2f}-{seg.end:.2f}] {seg.text}")

Limits

File size: up to 25 MB
Duration: up to 30 minutes per request
For longer audio: split client-side into 5–10 minute chunks, transcribe each, concatenate

Tips

Specify language when you know it — avoids auto-detect mistakes on short clips.
Use prompt to bias toward domain terms: "This is a discussion about Kubernetes, nginx, and CEL expressions." helps the model spell technical terms.
Strip silence before upload if your audio has long silent gaps — Whisper can hallucinate on silence.
Prefer mono, 16-kHz audio. Stereo gets downmixed; higher sample rates get resampled.

Billing

Speech-to-text bills per second of audio duration (not file size, not processing time).

Basic request​

Parameters​

Response formats​

json (default)​

text​

srt / vtt (subtitles with timestamps)​

verbose_json (words + timestamps)​

Python​

Limits​

Tips​

Billing​