Calling your endpoint

After your inference instance is running, call it the same way you call platform models — just use <model_name>:<instance_id> in the model field.

OpenAI-compatible instances

curl

curl https://api.ecohash.com/v1/chat/completions \
  -H "Authorization: Bearer eco_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-llama:142",
    "messages": [
      {"role": "user", "content": "Hello from my own model"}
    ]
  }'

Python (OpenAI SDK)

from openai import OpenAI
client = OpenAI(
    api_key="eco_YOUR_KEY",
    base_url="https://api.ecohash.com/v1",
)
resp = client.chat.completions.create(
    model="my-llama:142",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

TypeScript

import OpenAI from "openai";
const client = new OpenAI({
  apiKey: "eco_YOUR_KEY",
  baseURL: "https://api.ecohash.com/v1",
});
const resp = await client.chat.completions.create({
  model: "my-llama:142",
  messages: [{ role: "user", content: "Hello" }],
});

Streaming

Same as platform models — add "stream": true:

stream = client.chat.completions.create(
    model="my-llama:142",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True,
)
for chunk in stream:
    if delta := chunk.choices[0].delta.content:
        print(delta, end="", flush=True)

Custom (non-OpenAI) instances

For instances launched with OpenAI-compatible unchecked:

curl https://api.ecohash.com/inference-instances/142/proxy/generate \
  -H "Authorization: Bearer eco_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "prompt": "Hello", "max_tokens": 100 }'

Replace 142 with your instance ID and generate with your custom_api_path. Request body and response are passed through verbatim.

Anyone on your account with an API key can call your instance — it's account-scoped, not user-scoped. Same API URL, same model:instance_id. Team invites work through the Users page.

Using in the playground

OpenAI-compatible instances appear in the console's Playground dropdown (Chat / Image / etc., matching the instance's category):

Chat playground — your instance appears alongside meta-llama/Llama-3.1-8B-Instruct etc.
Image playground — if it's an image model
Embeddings / Reranker — if it's one of those

Select your instance from the dropdown, chat / prompt as usual. Every call bills the same way as an API call.

Custom (non-OpenAI) instances do not appear in the Playground — use curl or your own client to test them.

How routing works

When you call /v1/chat/completions with model: "my-llama:142":

EcoLink parses the :142 suffix → instance ID 142
Validates your API key belongs to the account that owns instance 142
Looks up instance 142's regional deployments in the instance cache — refreshed every 10s
Picks the best region (round-robin across healthy pods, failing over on error)
Forwards the request to <region-gateway>/<instance_id>/v1/chat/completions
Streams the response back

The response carries x-ecolink-region: mv (or whichever region served it) — useful for debugging latency.

Errors

HTTP	Meaning
401	API key invalid or doesn't own the instance
402	Account balance ≤ $0 OR the instance was terminated for credit depletion
404	Unknown model / instance ID — check spelling, verify the instance is still `running`
429	Rate limited (depends on your plan)
503	No healthy pod available in any target region — retry

Latency

Running instance — latency is network + inference. Typical TTFT for a 7B LLM is 100–300 ms; TPOT depends on GPU count and model size.
Multi-region instance, one region unhealthy — requests automatically fail over to the healthy region; you may see a brief spike until the router's health check converges.
Single-region instance, that region unhealthy — no failover possible. This can happen with filesystem-backed models (they're pinned to one region). Redeploy or wait for the region to recover.

OpenAI-compatible instances​

curl​

Python (OpenAI SDK)​

TypeScript​

Streaming​

Custom (non-OpenAI) instances​

Sharing across your team​

Using in the playground​

How routing works​

Errors​

Latency​

Related​