Rate limits

EcoLink enforces per-API-key rate limits to protect the platform from runaway loops and to share capacity fairly across tenants.

How to tell

When you hit a limit, the response is:

HTTP 429 Too Many Requests
{ "error": "rate_limit_exceeded", "message": "Too many requests — please slow down" }

Along with response headers describing the current state:

Header	Meaning
`x-ratelimit-limit`	Requests allowed in the current window
`x-ratelimit-remaining`	Requests left in the current window
`x-ratelimit-reset`	Seconds until the window resets
`retry-after`	Seconds to wait before retrying

Strategy: exponential backoff

Client-side best practice:

import time, requests
from random import random

def call_with_backoff(url, headers, json_body, max_retries=5):
    for attempt in range(max_retries):
        resp = requests.post(url, headers=headers, json=json_body)
        if resp.status_code != 429:
            return resp
        # 429 — back off. Use Retry-After if present.
        sleep = int(resp.headers.get("retry-after", 2 ** attempt)) + random()
        time.sleep(sleep)
    return resp  # final attempt's response

Limits for pay-as-you-go accounts

Pay-as-you-go accounts are gated by a single per-API-key cap that prevents runaway loops or leaked keys from saturating the platform:

Dimension	Limit
Requests per minute, per API key	200
Tokens per minute	not enforced as a separate cap; per-request token limits are governed by the model's context window
Concurrent in-flight requests	not explicitly capped

This is generous for interactive development and most small-to-mid production workloads. If you're saturating 200 RPM on a single API key, the simplest fixes are:

Use multiple API keys for separately-tracked services (e.g., production app vs analytics batch job). Each gets its own 200 RPM budget.
Batch your requests where possible (embeddings batches of 32–256 strings, multi-message chat).
Reach out in the #ecolink-support Slack channel — for sustained high-throughput use cases, ops can raise the per-key cap on your account.

Enterprise rate limits

Enterprise contracts include negotiated rate-limit caps tied to your committed usage. Reach out to sales if you need predictable high-throughput limits as part of a production SLA.

Avoiding 429s

Batch your requests where possible. Embeddings in batches of 32–256 strings is much better than one-at-a-time.
Stream long completions so you don't hold open many sockets unnecessarily.
Respect retry-after — don't spam-retry through the limit window; it'll only extend the throttle.
Use dedicated keys per service so one runaway script doesn't affect production traffic on the same key.

Other status codes that look like rate limits

Don't confuse with:

503 Service Unavailable — the target model has no healthy pod right now. Retry after a few seconds; not a tenant-specific rate limit.
402 Payment Required — your balance is insufficient. Not a rate limit; no amount of waiting fixes this without crediting the account.

How to tell​

Strategy: exponential backoff​

Limits for pay-as-you-go accounts​

Enterprise rate limits​

Avoiding 429s​

Other status codes that look like rate limits​