Skip to main content

Rate limits

EcoLink enforces per-API-key rate limits to protect the platform from runaway loops and to share capacity fairly across tenants.

How to tell

When you hit a limit, the response is:

HTTP 429 Too Many Requests
{ "error": "rate_limit_exceeded", "message": "Too many requests — please slow down" }

Along with response headers describing the current state:

HeaderMeaning
x-ratelimit-limitRequests allowed in the current window
x-ratelimit-remainingRequests left in the current window
x-ratelimit-resetSeconds until the window resets
retry-afterSeconds to wait before retrying

Strategy: exponential backoff

Client-side best practice:

import time, requests
from random import random

def call_with_backoff(url, headers, json_body, max_retries=5):
for attempt in range(max_retries):
resp = requests.post(url, headers=headers, json=json_body)
if resp.status_code != 429:
return resp
# 429 — back off. Use Retry-After if present.
sleep = int(resp.headers.get("retry-after", 2 ** attempt)) + random()
time.sleep(sleep)
return resp # final attempt's response

Limits for pay-as-you-go accounts

Pay-as-you-go accounts are gated by a single per-API-key cap that prevents runaway loops or leaked keys from saturating the platform:

DimensionLimit
Requests per minute, per API key200
Tokens per minutenot enforced as a separate cap; per-request token limits are governed by the model's context window
Concurrent in-flight requestsnot explicitly capped

This is generous for interactive development and most small-to-mid production workloads. If you're saturating 200 RPM on a single API key, the simplest fixes are:

  • Use multiple API keys for separately-tracked services (e.g., production app vs analytics batch job). Each gets its own 200 RPM budget.
  • Batch your requests where possible (embeddings batches of 32–256 strings, multi-message chat).
  • Reach out in the #ecolink-support Slack channel — for sustained high-throughput use cases, ops can raise the per-key cap on your account.

Enterprise rate limits

Enterprise contracts include negotiated rate-limit caps tied to your committed usage. Reach out to sales if you need predictable high-throughput limits as part of a production SLA.

Avoiding 429s

  • Batch your requests where possible. Embeddings in batches of 32–256 strings is much better than one-at-a-time.
  • Stream long completions so you don't hold open many sockets unnecessarily.
  • Respect retry-after — don't spam-retry through the limit window; it'll only extend the throttle.
  • Use dedicated keys per service so one runaway script doesn't affect production traffic on the same key.

Other status codes that look like rate limits

Don't confuse with:

  • 503 Service Unavailable — the target model has no healthy pod right now. Retry after a few seconds; not a tenant-specific rate limit.
  • 402 Payment Required — your balance is insufficient. Not a rate limit; no amount of waiting fixes this without crediting the account.