OpenAI-compatible vs custom
EcoLink supports two container shapes for user inference instances:
- OpenAI-compatible (default, recommended) — your container exposes standard
/v1/...endpoints - Custom API path — your container exposes a different HTTP surface; EcoLink proxies through
Pick based on what your inference server speaks.
OpenAI-compatible mode
Most modern inference servers speak the OpenAI schema:
- vLLM:
vllm/vllm-openai:latest— OpenAI-compatible out of the box - Text Generation Inference (TGI):
ghcr.io/huggingface/text-generation-inference:3.0.0— has/v1/chat/completionsand/v1/completions - SGLang:
lmsysorg/sglang:latest— has OpenAI-compat mode - LMDeploy, Ollama, llama.cpp (
--api): all support OpenAI-compatible endpoints - Embedding servers:
infinity,text-embeddings-inference
How it works
When you call:
curl https://api.ecohash.com/v1/chat/completions \
-H "Authorization: Bearer eco_YOUR_KEY" \
-d '{ "model": "my-llama:142", "messages": [...] }'
EcoLink:
- Parses
my-llama:142→ instance ID = 142 - Validates your API key owns that instance
- Picks a healthy region (routes to the best one)
- Forwards the request to that region's pod
- Strips the
:142suffix from the model field before forwarding (so the container seesmodel: "my-llama") - Streams the response back
The container never sees the instance ID or region routing logic — it just sees a standard OpenAI request.
Advantages
- Zero code in your app — use the OpenAI SDK, point base URL at
api.ecohash.com/v1, done - Works in the console Playground — your instance appears as a selectable model
- Regional failover — if one region's pod is unhealthy, EcoLink routes to another
- Unified billing — per-request costs surface in Account → Billing just like platform models
Endpoints supported
All six OpenAI-compatible endpoints route the same way:
| Category | Endpoint |
|---|---|
| Chat / vision LLM | POST /v1/chat/completions |
| Embeddings | POST /v1/embeddings |
| Image generation | POST /v1/images/generations |
| Text-to-speech | POST /v1/audio/speech |
| Speech-to-text | POST /v1/audio/transcriptions |
| Video generation | POST /v1/video/generations |
The one your instance handles is determined by its registered category (chat / embedding / image / audio / video).
Custom API path mode
Use when your container speaks a non-OpenAI protocol — a bespoke API, gRPC, an older HF interface, or anything custom.
Setup
- At launch time, uncheck "Container is OpenAI-compatible".
- Set Custom API path, e.g.,
/generateor/api/v1/predict.
EcoLink exposes a proxy URL instead of the unified API:
https://api.ecohash.com/inference-instances/<id>/proxy/<your_custom_path>
Calling it
curl https://api.ecohash.com/inference-instances/142/proxy/generate \
-H "Authorization: Bearer eco_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{ "prompt": "Hello", "max_tokens": 100 }'
Request body is forwarded as-is. Response is streamed back as-is. Authentication is still your EcoLink API key.
Tradeoffs
| Feature | OpenAI-compatible | Custom |
|---|---|---|
| SDK support | Every OpenAI SDK works | Need raw HTTP |
| Playground visibility | Yes | No |
| Unified API URL | Yes (/v1/...) | No (/inference-instances/<id>/proxy/...) |
| Regional failover | Yes | Yes (same routing logic) |
| Billing | Per-request tokens / images / seconds | Flat GPU-hour only (no per-request metering) |
| Setup simplicity | Zero — use existing server images | Need to know your container's custom schema |
When to use custom mode
- You have a fine-tuned non-LLM model (e.g., a custom diffusion model or retrieval pipeline)
- Your container speaks a legacy API you don't want to change
- You want raw HTTP passthrough for a custom use case
For new deployments, strongly prefer OpenAI-compatible — it's simpler for your users and integrates with the rest of the platform.
How to make your container OpenAI-compatible
If you're writing a custom inference server, here are the three endpoints you'd implement for LLM use:
POST /v1/chat/completions— standard OpenAI schemaGET /v1/models— return your served model (used by clients to enumerate)GET /health— simple200 OKfor the pod readiness probe
Use one of the pre-built servers (vLLM, TGI, SGLang) unless you have a specific reason to roll your own. They handle streaming, batching, tokenization, error cases — lots of subtle behavior.