OpenAI-compatible vs custom

EcoLink supports two container shapes for user inference instances:

OpenAI-compatible (default, recommended) — your container exposes standard /v1/... endpoints
Custom API path — your container exposes a different HTTP surface; EcoLink proxies through

Pick based on what your inference server speaks.

OpenAI-compatible mode

Most modern inference servers speak the OpenAI schema:

vLLM: vllm/vllm-openai:latest — OpenAI-compatible out of the box
Text Generation Inference (TGI): ghcr.io/huggingface/text-generation-inference:3.0.0 — has /v1/chat/completions and /v1/completions
SGLang: lmsysorg/sglang:latest — has OpenAI-compat mode
LMDeploy, Ollama, llama.cpp (--api): all support OpenAI-compatible endpoints
Embedding servers: infinity, text-embeddings-inference

How it works

When you call:

curl https://api.ecohash.com/v1/chat/completions \
  -H "Authorization: Bearer eco_YOUR_KEY" \
  -d '{ "model": "my-llama:142", "messages": [...] }'

EcoLink:

Parses my-llama:142 → instance ID = 142
Validates your API key owns that instance
Picks a healthy region (routes to the best one)
Forwards the request to that region's pod
Strips the :142 suffix from the model field before forwarding (so the container sees model: "my-llama")
Streams the response back

The container never sees the instance ID or region routing logic — it just sees a standard OpenAI request.

Advantages

Zero code in your app — use the OpenAI SDK, point base URL at api.ecohash.com/v1, done
Works in the console Playground — your instance appears as a selectable model
Regional failover — if one region's pod is unhealthy, EcoLink routes to another
Unified billing — per-request costs surface in Account → Billing just like platform models

Endpoints supported

All six OpenAI-compatible endpoints route the same way:

Category	Endpoint
Chat / vision LLM	`POST /v1/chat/completions`
Embeddings	`POST /v1/embeddings`
Image generation	`POST /v1/images/generations`
Text-to-speech	`POST /v1/audio/speech`
Speech-to-text	`POST /v1/audio/transcriptions`
Video generation	`POST /v1/video/generations`

The one your instance handles is determined by its registered category (chat / embedding / image / audio / video).

Custom API path mode

Use when your container speaks a non-OpenAI protocol — a bespoke API, gRPC, an older HF interface, or anything custom.

Setup

At launch time, uncheck "Container is OpenAI-compatible".
Set Custom API path, e.g., /generate or /api/v1/predict.

EcoLink exposes a proxy URL instead of the unified API:

https://api.ecohash.com/inference-instances/<id>/proxy/<your_custom_path>

Calling it

curl https://api.ecohash.com/inference-instances/142/proxy/generate \
  -H "Authorization: Bearer eco_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "prompt": "Hello", "max_tokens": 100 }'

Request body is forwarded as-is. Response is streamed back as-is. Authentication is still your EcoLink API key.

Tradeoffs

Feature	OpenAI-compatible	Custom
SDK support	Every OpenAI SDK works	Need raw HTTP
Playground visibility	Yes	No
Unified API URL	Yes (`/v1/...`)	No (`/inference-instances/<id>/proxy/...`)
Regional failover	Yes	Yes (same routing logic)
Billing	Per-request tokens / images / seconds	Flat GPU-hour only (no per-request metering)
Setup simplicity	Zero — use existing server images	Need to know your container's custom schema

When to use custom mode

You have a fine-tuned non-LLM model (e.g., a custom diffusion model or retrieval pipeline)
Your container speaks a legacy API you don't want to change
You want raw HTTP passthrough for a custom use case

For new deployments, strongly prefer OpenAI-compatible — it's simpler for your users and integrates with the rest of the platform.

How to make your container OpenAI-compatible

If you're writing a custom inference server, here are the three endpoints you'd implement for LLM use:

POST /v1/chat/completions — standard OpenAI schema
GET /v1/models — return your served model (used by clients to enumerate)
GET /health — simple 200 OK for the pod readiness probe

Use one of the pre-built servers (vLLM, TGI, SGLang) unless you have a specific reason to roll your own. They handle streaming, batching, tokenization, error cases — lots of subtle behavior.

OpenAI-compatible mode​

How it works​

Advantages​

Endpoints supported​

Custom API path mode​

Setup​

Calling it​

Tradeoffs​

When to use custom mode​

How to make your container OpenAI-compatible​

Related​