Platform Models
Platform models are inference services EcoLink runs for you — always on, OpenAI-compatible, billed per-request. Nothing to deploy, no GPU to manage; just hit the API with your key.
What's available
- Text LLMs — Llama 3.1 8B Instruct, Gemma 4 31B IT
- Vision LLMs — Qwen2.5-VL-7B-Instruct
- Embeddings — text-embedding models for search, RAG, clustering
- Reranker — cross-encoder reranker for search pipelines
- Image generation — FLUX.1 Schnell (text → image)
- Text-to-speech — Kokoro-82M (a dozen natural voices)
- Speech-to-text — Whisper Large v3
- Video generation — Wan2.1 T2V 1.3B (text → short video, async)
See the full list in the Model catalog with context windows, pricing, and region availability.
How to call
Every platform model uses an OpenAI-compatible endpoint at https://api.ecohash.com/v1/...:
| Category | Endpoint |
|---|---|
| Chat / vision LLM | POST /v1/chat/completions |
| Embeddings | POST /v1/embeddings |
| Reranker | POST /v1/rerank |
| Image generation | POST /v1/images/generations |
| Text-to-speech | POST /v1/audio/speech |
| Speech-to-text | POST /v1/audio/transcriptions |
| Video generation | POST /v1/video/generations (async — returns a job, poll for result) |
Authentication is a standard Authorization: Bearer eco_... header. See API Keys.
How routing works
When you call /v1/chat/completions with model: "meta-llama/Llama-3.1-8B-Instruct", EcoLink routes your request to the healthiest region serving that model. You don't pick a region — the platform does it for you based on current load and region health.
Regional routing is transparent; the only thing you'll see is the x-ecolink-region header on the response indicating where it was served.
Unpriced models
If you try to call a model that's registered but doesn't yet have pricing set, you'll get:
HTTP 402 Payment Required
{ "error": "model_not_priced: <model_id> is registered but not yet priced — contact admin to set pricing before use" }
Ping the #ecolink-support Slack channel and we'll set pricing.
What about my own models?
If you want to deploy your own model (HuggingFace checkpoint, fine-tune, custom container), that's user inference — a separate feature where you launch your own inference endpoint, get a unique URL, and call it via the same API key.