Skip to main content

Platform Models

Platform models are inference services EcoLink runs for you — always on, OpenAI-compatible, billed per-request. Nothing to deploy, no GPU to manage; just hit the API with your key.

What's available

  • Text LLMs — Llama 3.1 8B Instruct, Gemma 4 31B IT
  • Vision LLMs — Qwen2.5-VL-7B-Instruct
  • Embeddings — text-embedding models for search, RAG, clustering
  • Reranker — cross-encoder reranker for search pipelines
  • Image generation — FLUX.1 Schnell (text → image)
  • Text-to-speech — Kokoro-82M (a dozen natural voices)
  • Speech-to-text — Whisper Large v3
  • Video generation — Wan2.1 T2V 1.3B (text → short video, async)

See the full list in the Model catalog with context windows, pricing, and region availability.

How to call

Every platform model uses an OpenAI-compatible endpoint at https://api.ecohash.com/v1/...:

CategoryEndpoint
Chat / vision LLMPOST /v1/chat/completions
EmbeddingsPOST /v1/embeddings
RerankerPOST /v1/rerank
Image generationPOST /v1/images/generations
Text-to-speechPOST /v1/audio/speech
Speech-to-textPOST /v1/audio/transcriptions
Video generationPOST /v1/video/generations (async — returns a job, poll for result)

Authentication is a standard Authorization: Bearer eco_... header. See API Keys.

How routing works

When you call /v1/chat/completions with model: "meta-llama/Llama-3.1-8B-Instruct", EcoLink routes your request to the healthiest region serving that model. You don't pick a region — the platform does it for you based on current load and region health.

Regional routing is transparent; the only thing you'll see is the x-ecolink-region header on the response indicating where it was served.

Unpriced models

If you try to call a model that's registered but doesn't yet have pricing set, you'll get:

HTTP 402 Payment Required
{ "error": "model_not_priced: <model_id> is registered but not yet priced — contact admin to set pricing before use" }

Ping the #ecolink-support Slack channel and we'll set pricing.

What about my own models?

If you want to deploy your own model (HuggingFace checkpoint, fine-tune, custom container), that's user inference — a separate feature where you launch your own inference endpoint, get a unique URL, and call it via the same API key.

Next steps